[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
[ https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-14450: -- Attachment: HIVE-14450.1.patch > Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum > > > Key: HIVE-14450 > URL: https://issues.apache.org/jira/browse/HIVE-14450 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Teddy Choi > Attachments: HIVE-14450.1.patch > > > {code} > public static int truncate(byte[] bytes, int start, int length, int > maxLength) { > int end = start + length; > // count characters forward > int j = start; > int charCount = 0; > while(j < end) { > // UTF-8 continuation bytes have 2 high bits equal to 0x80. > if ((bytes[j] & 0xc0) != 0x80) { > if (charCount == maxLength) { > break; > } > ++charCount; > } > j++; > } > return (j - start); > } > {code} > Should not read the bytes if the maxLength is 4096 and the input string has > 256 bytes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
[ https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-14450: -- Status: Patch Available (was: Open) > Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum > > > Key: HIVE-14450 > URL: https://issues.apache.org/jira/browse/HIVE-14450 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 2.2.0 >Reporter: Gopal V >Assignee: Teddy Choi > Attachments: HIVE-14450.1.patch > > > {code} > public static int truncate(byte[] bytes, int start, int length, int > maxLength) { > int end = start + length; > // count characters forward > int j = start; > int charCount = 0; > while(j < end) { > // UTF-8 continuation bytes have 2 high bits equal to 0x80. > if ((bytes[j] & 0xc0) != 0x80) { > if (charCount == maxLength) { > break; > } > ++charCount; > } > j++; > } > return (j - start); > } > {code} > Should not read the bytes if the maxLength is 4096 and the input string has > 256 bytes. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
[ https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-14450: --- Affects Version/s: 2.2.0 > Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum > > > Key: HIVE-14450 > URL: https://issues.apache.org/jira/browse/HIVE-14450 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 2.2.0 >Reporter: Gopal V > > {code} > public static int truncate(byte[] bytes, int start, int length, int > maxLength) { > int end = start + length; > // count characters forward > int j = start; > int charCount = 0; > while(j < end) { > // UTF-8 continuation bytes have 2 high bits equal to 0x80. > if ((bytes[j] & 0xc0) != 0x80) { > if (charCount == maxLength) { > break; > } > ++charCount; > } > j++; > } > return (j - start); > } > {code} > Should not read the bytes if the maxLength is 4096 and the input string has > 256 bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
[ https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-14450: --- Description: {code} public static int truncate(byte[] bytes, int start, int length, int maxLength) { int end = start + length; // count characters forward int j = start; int charCount = 0; while(j < end) { // UTF-8 continuation bytes have 2 high bits equal to 0x80. if ((bytes[j] & 0xc0) != 0x80) { if (charCount == maxLength) { break; } ++charCount; } j++; } return (j - start); } {code} Should not read the bytes if the maxLength is 4096 and the input string has 256 bytes. was: {code} public static int truncate(byte[] bytes, int start, int length, int maxLength) { int end = start + length; // count characters forward int j = start; int charCount = 0; while(j < end) { // UTF-8 continuation bytes have 2 high bits equal to 0x80. if ((bytes[j] & 0xc0) != 0x80) { if (charCount == maxLength) { break; } ++charCount; } j++; } return (j - start); } {code} Should not dirty the L1 cache if the maxLength is 4096 and the input string has 256 bytes. > Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum > > > Key: HIVE-14450 > URL: https://issues.apache.org/jira/browse/HIVE-14450 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 2.2.0 >Reporter: Gopal V > > {code} > public static int truncate(byte[] bytes, int start, int length, int > maxLength) { > int end = start + length; > // count characters forward > int j = start; > int charCount = 0; > while(j < end) { > // UTF-8 continuation bytes have 2 high bits equal to 0x80. > if ((bytes[j] & 0xc0) != 0x80) { > if (charCount == maxLength) { > break; > } > ++charCount; > } > j++; > } > return (j - start); > } > {code} > Should not read the bytes if the maxLength is 4096 and the input string has > 256 bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
[ https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-14450: --- Component/s: Vectorization > Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum > > > Key: HIVE-14450 > URL: https://issues.apache.org/jira/browse/HIVE-14450 > Project: Hive > Issue Type: Improvement > Components: Vectorization >Affects Versions: 2.2.0 >Reporter: Gopal V > > {code} > public static int truncate(byte[] bytes, int start, int length, int > maxLength) { > int end = start + length; > // count characters forward > int j = start; > int charCount = 0; > while(j < end) { > // UTF-8 continuation bytes have 2 high bits equal to 0x80. > if ((bytes[j] & 0xc0) != 0x80) { > if (charCount == maxLength) { > break; > } > ++charCount; > } > j++; > } > return (j - start); > } > {code} > Should not read the bytes if the maxLength is 4096 and the input string has > 256 bytes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)