[ 
https://issues.apache.org/jira/browse/HIVE-16889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani reassigned HIVE-16889:
--------------------------------------

    Assignee: Peter Vary  (was: Janaki Lahorani)

> Improve Performance Of VARCHAR
> ------------------------------
>
>                 Key: HIVE-16889
>                 URL: https://issues.apache.org/jira/browse/HIVE-16889
>             Project: Hive
>          Issue Type: Improvement
>          Components: Types
>    Affects Versions: 2.1.1, 3.0.0
>            Reporter: BELUGA BEHR
>            Assignee: Peter Vary
>            Priority: Major
>
> Often times, organizations use tools that create table schemas on the fly and 
> they specify a  VARCHAR column with precision.  In these scenarios, 
> performance suffers even though one could assume performance should be better 
> since there is pre-existing knowledge about the size of the data and buffers 
> could be more efficiently setup then in the case where no such knowledge 
> exists.
> Most of the performance seems to be caused by reading a STRING from a file 
> into a byte buffer, checking the length of the STRING, truncating the STRING 
> if needed, and then serializing the STRING back into bytes again.
> From the code, I have identified several areas where develops left notes 
> about later improvements.
> # org.apache.hadoop.hive.serde2.io.HiveVarcharWritable.enforceMaxLength(int)
> # org.apache.hadoop.hive.serde2.lazy.LazyHiveVarchar.init(ByteArrayRef, int, 
> int)
> # 
> org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorUtils.getHiveVarchar(Object,
>  PrimitiveObjectInspector)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to