[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum

2017-02-06 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-14450:
--
Attachment: HIVE-14450.1.patch

> Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
> 
>
> Key: HIVE-14450
> URL: https://issues.apache.org/jira/browse/HIVE-14450
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-14450.1.patch
>
>
> {code}
> public static int truncate(byte[] bytes, int start, int length, int 
> maxLength) {
> int end = start + length;
> // count characters forward
> int j = start;
> int charCount = 0;
> while(j < end) {
>   // UTF-8 continuation bytes have 2 high bits equal to 0x80.
>   if ((bytes[j] & 0xc0) != 0x80) {
> if (charCount == maxLength) {
>   break;
> }
> ++charCount;
>   }
>   j++;
> }
> return (j - start);
>   }
> {code}
> Should not read the bytes if the maxLength is 4096 and the input string has 
> 256 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum

2017-02-06 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi updated HIVE-14450:
--
Status: Patch Available  (was: Open)

> Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
> 
>
> Key: HIVE-14450
> URL: https://issues.apache.org/jira/browse/HIVE-14450
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>Assignee: Teddy Choi
> Attachments: HIVE-14450.1.patch
>
>
> {code}
> public static int truncate(byte[] bytes, int start, int length, int 
> maxLength) {
> int end = start + length;
> // count characters forward
> int j = start;
> int charCount = 0;
> while(j < end) {
>   // UTF-8 continuation bytes have 2 high bits equal to 0x80.
>   if ((bytes[j] & 0xc0) != 0x80) {
> if (charCount == maxLength) {
>   break;
> }
> ++charCount;
>   }
>   j++;
> }
> return (j - start);
>   }
> {code}
> Should not read the bytes if the maxLength is 4096 and the input string has 
> 256 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum

2016-08-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14450:
---
Affects Version/s: 2.2.0

> Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
> 
>
> Key: HIVE-14450
> URL: https://issues.apache.org/jira/browse/HIVE-14450
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>
> {code}
> public static int truncate(byte[] bytes, int start, int length, int 
> maxLength) {
> int end = start + length;
> // count characters forward
> int j = start;
> int charCount = 0;
> while(j < end) {
>   // UTF-8 continuation bytes have 2 high bits equal to 0x80.
>   if ((bytes[j] & 0xc0) != 0x80) {
> if (charCount == maxLength) {
>   break;
> }
> ++charCount;
>   }
>   j++;
> }
> return (j - start);
>   }
> {code}
> Should not read the bytes if the maxLength is 4096 and the input string has 
> 256 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum

2016-08-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14450:
---
Description: 
{code}
public static int truncate(byte[] bytes, int start, int length, int maxLength) {
int end = start + length;

// count characters forward
int j = start;
int charCount = 0;
while(j < end) {
  // UTF-8 continuation bytes have 2 high bits equal to 0x80.
  if ((bytes[j] & 0xc0) != 0x80) {
if (charCount == maxLength) {
  break;
}
++charCount;
  }
  j++;
}
return (j - start);
  }
{code}

Should not read the bytes if the maxLength is 4096 and the input string has 256 
bytes.

  was:
{code}
public static int truncate(byte[] bytes, int start, int length, int maxLength) {
int end = start + length;

// count characters forward
int j = start;
int charCount = 0;
while(j < end) {
  // UTF-8 continuation bytes have 2 high bits equal to 0x80.
  if ((bytes[j] & 0xc0) != 0x80) {
if (charCount == maxLength) {
  break;
}
++charCount;
  }
  j++;
}
return (j - start);
  }
{code}

Should not dirty the L1 cache if the maxLength is 4096 and the input string has 
256 bytes.


> Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
> 
>
> Key: HIVE-14450
> URL: https://issues.apache.org/jira/browse/HIVE-14450
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>
> {code}
> public static int truncate(byte[] bytes, int start, int length, int 
> maxLength) {
> int end = start + length;
> // count characters forward
> int j = start;
> int charCount = 0;
> while(j < end) {
>   // UTF-8 continuation bytes have 2 high bits equal to 0x80.
>   if ((bytes[j] & 0xc0) != 0x80) {
> if (charCount == maxLength) {
>   break;
> }
> ++charCount;
>   }
>   j++;
> }
> return (j - start);
>   }
> {code}
> Should not read the bytes if the maxLength is 4096 and the input string has 
> 256 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14450) Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum

2016-08-05 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-14450:
---
Component/s: Vectorization

> Vectorization: StringExpr::truncate() can assume 1 byte per-char minimum
> 
>
> Key: HIVE-14450
> URL: https://issues.apache.org/jira/browse/HIVE-14450
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 2.2.0
>Reporter: Gopal V
>
> {code}
> public static int truncate(byte[] bytes, int start, int length, int 
> maxLength) {
> int end = start + length;
> // count characters forward
> int j = start;
> int charCount = 0;
> while(j < end) {
>   // UTF-8 continuation bytes have 2 high bits equal to 0x80.
>   if ((bytes[j] & 0xc0) != 0x80) {
> if (charCount == maxLength) {
>   break;
> }
> ++charCount;
>   }
>   j++;
> }
> return (j - start);
>   }
> {code}
> Should not read the bytes if the maxLength is 4096 and the input string has 
> 256 bytes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)