[jira] [Comment Edited] (HIVE-18422) Vectorized input format should not be used when input format is excluded and row.serde is enabled

Matt McCline (JIRA) Thu, 18 Jan 2018 07:19:33 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330611#comment-16330611
 ]


Matt McCline edited comment on HIVE-18422 at 1/18/18 3:18 PM:
--------------------------------------------------------------

I'm looking at it – it doesn't seem quite right.

There is a precedence/order in evaluating the 3 variable that control 
vectorization of input formats.

1. hive.vectorized.use.vectorized.input.format
 2. hive.vectorized.use.vector.serde.deserialize
 3. hive.vectorized.use.row.serde.deserialize

 

If #1 is true and the input format is assignable from 
VectorizedInputFormatInterface, then we vectorize.

Otherwise, look at #2.  If  #2 is true and input format is TextInputFormat or 
SequenceFileInputFormat, then vectorize using vector serde.

Finally, look #3.  If #3 is true and input format is not excluded, then 
vectorize using row serde.

So, it seems like what is missing in the repro steps issetting 
hive.vectorized.use.vectorized.input.format to false.  In that way, if the 
customer does have hive.vectorized.use.vectorized.input.format  true, then it 
will ignore the exclude since that only applies to vectorizing row serde and it 
will vectorized the vertex.  And, the change in this JIRA is not needed except 
for fixing a Q file perhaps.


was (Author: mmccline):
I'm looking at it – it doesn't seem quite right.

There is a precedence/order in evaluating the 3 variable that control 
vectorization of input formats.

1. hive.vectorized.use.vectorized.input.format
2. hive.vectorized.use.vector.serde.deserialize
3. hive.vectorized.use.row.serde.deserialize

 

If #1 is true and the input format is assignable from 
VectorizedInputFormatInterface, then we vectorize.

Otherwise, look at #2.  If  #2 is true and input format is TextInputFormat or 
SequenceFileInputFormat, then vectorize using vector serde.

Finally, look #3.  If #3 is true and input format is not excluded, then 
vectorize using row serde.

So, it seems like what is missing in the repro steps issetting 
hive.vectorized.use.vectorized.input.format to false.  In that way, if the 
customer does have hive.vectorized.use.vectorized.input.format  true, then it 
will ignore the exclude since that only applies to vectorizing row serde and it 
will vectorized the vertex.

> Vectorized input format should not be used when input format is excluded and 
> row.serde is enabled
> -------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18422
>                 URL: https://issues.apache.org/jira/browse/HIVE-18422
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-18422.01.patch, HIVE-18422.02.patch
>
>
> HIVE-17534 introduced a config which gives a capability to exclude certain 
> inputformat from vectorized execution without affecting other input formats. 
> If an input format is excluded and row.serde is enabled at the same time, 
> vectorizer still sets the {{useVectorizedInputFormat}} to true which causes 
> Vectorized readers to be used in row.serde mode.
> In order to reproduce:
> {noformat}
> set hive.fetch.task.conversion=none;
> set hive.vectorized.use.row.serde.deserialize=true;
> set hive.vectorized.use.vector.serde.deserialize=true;
> set hive.vectorized.execution.enabled=true;
> set hive.vectorized.execution.reduce.enabled=true;
> set hive.vectorized.row.serde.inputformat.excludes=;
> -- SORT_QUERY_RESULTS
> -- exclude MapredParquetInputFormat from vectorization, this should cause 
> mapwork vectorization to be disabled
> set 
> hive.vectorized.input.format.excludes=org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
> set hive.vectorized.use.vectorized.input.format=true;
> create table orcTbl (t1 tinyint, t2 tinyint)
> stored as orc;
> insert into orcTbl values (54, 9), (-104, 25), (-112, 24);
> explain vectorization select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10;
> select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HIVE-18422) Vectorized input format should not be used when input format is excluded and row.serde is enabled

Reply via email to