[jira] [Created] (HADOOP-14525) org.apache.hadoop.io.Text Truncate

BELUGA BEHR (JIRA) Tue, 13 Jun 2017 09:53:27 -0700

BELUGA BEHR created HADOOP-14525:
------------------------------------

             Summary: org.apache.hadoop.io.Text Truncate
                 Key: HADOOP-14525
                 URL: https://issues.apache.org/jira/browse/HADOOP-14525
             Project: Hadoop Common
          Issue Type: Improvement
          Components: io
    Affects Versions: 2.8.1
            Reporter: BELUGA BEHR



For Apache Hive, VARCHAR fields are much slower than STRING fields when a 
precision (string length cap) is included.  Keep in mind that this precision is 
the number of UTF-8 characters in the string, not the number of bytes.

The general procedure is:

# Load an entire byte buffer into a {{Text}} object
# Convert it to a {{String}}
# Count N number of character code points
# Substring the {{String}} at the correct place
# Convert the String back into a byte array and populate the {{Text}} object

It would be great if the {{Text}} object could offer a truncate/substring 
method based on character count that did not require copying data around



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

[jira] [Created] (HADOOP-14525) org.apache.hadoop.io.Text Truncate

Reply via email to