[ 
https://issues.apache.org/jira/browse/HIVE-17593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542455#comment-16542455
 ] 

Junjie Chen commented on HIVE-17593:
------------------------------------

The previous unit test failure (vectorized_parquet_types.q) is because of 
different length UDF used for CHAR.  

When performing query in non-vectorized mode, GenericUDFLength is used to 
calculate length of column, it converts the primitive value to string by using 
PrimitiveObjectInspectorUtil.getString, in which the tailing spaces is ignored 
for CHAR type.
However, when performing query in vectorized mode, StringLength is used to 
calculate the length of column, it treats column as byte array and doesn't 
consider the column type. 

> DataWritableWriter strip spaces for CHAR type before writing, but predicate 
> generator doesn't do same thing.
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-17593
>                 URL: https://issues.apache.org/jira/browse/HIVE-17593
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 2.3.0, 3.0.0
>            Reporter: Junjie Chen
>            Assignee: Junjie Chen
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-17593.2.patch, HIVE-17593.3.patch, HIVE-17593.patch
>
>
> DataWritableWriter strip spaces for CHAR type before writing. While when 
> generating predicate, it does NOT do same striping which should cause data 
> missing!
> In current version, it doesn't cause data missing since predicate is not well 
> push down to parquet due to HIVE-17261.
> Please see ConvertAstTosearchArg.java, getTypes treats CHAR and STRING as 
> same which will build a predicate with tail spaces.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to