BELUGA BEHR created HADOOP-14525: ------------------------------------ Summary: org.apache.hadoop.io.Text Truncate Key: HADOOP-14525 URL: https://issues.apache.org/jira/browse/HADOOP-14525 Project: Hadoop Common Issue Type: Improvement Components: io Affects Versions: 2.8.1 Reporter: BELUGA BEHR
For Apache Hive, VARCHAR fields are much slower than STRING fields when a precision (string length cap) is included. Keep in mind that this precision is the number of UTF-8 characters in the string, not the number of bytes. The general procedure is: # Load an entire byte buffer into a {{Text}} object # Convert it to a {{String}} # Count N number of character code points # Substring the {{String}} at the correct place # Convert the String back into a byte array and populate the {{Text}} object It would be great if the {{Text}} object could offer a truncate/substring method based on character count that did not require copying data around -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org