[jira] [Commented] (HIVE-18422) Vectorized input format should not be used when vectorized input format is excluded and row.serde is enabled

Vihang Karajgaonkar (JIRA) Thu, 18 Jan 2018 14:01:30 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16331308#comment-16331308
 ]


Vihang Karajgaonkar commented on HIVE-18422:
--------------------------------------------

Thanks for taking a look [~mmccline]. The exclude above is for excluding the 
vectorized input format not the row.serde. The usecase which I am trying to fix 
is below:

User has enabled both the usage of vectorized input format 
{{hive.vectorized.use.vectorized.input.format}} and vectorized row serde using 
{{hive.vectorized.use.row.serde.deserialize}}
User has also excluded one file format using 
{{hive.vectorized.input.format.excludes}} to make sure it does not use 
vectorized file format. Eg. user does not want to use parquet vectorized input 
format. In such a case ORC should be able to use vectorized input format and 
parquet should fall back on using row.serde. But turns out that in such a case, 
parquet still instantiates a vectorized input format because the mapwork says 
it to use vectorized input format. Hope that makes a bit less confusing.





> Vectorized input format should not be used when vectorized input format is 
> excluded and row.serde is enabled
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-18422
>                 URL: https://issues.apache.org/jira/browse/HIVE-18422
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 3.0.0, 2.4.0
>            Reporter: Vihang Karajgaonkar
>            Assignee: Vihang Karajgaonkar
>            Priority: Minor
>         Attachments: HIVE-18422.01.patch, HIVE-18422.02.patch
>
>
> HIVE-17534 introduced a config which gives a capability to exclude certain 
> inputformat from vectorized execution without affecting other input formats. 
> If an input format is excluded and row.serde is enabled at the same time, 
> vectorizer still sets the {{useVectorizedInputFormat}} to true which causes 
> Vectorized readers to be used in row.serde mode.
> In order to reproduce:
> {noformat}
> set hive.fetch.task.conversion=none;
> set hive.vectorized.use.row.serde.deserialize=true;
> set hive.vectorized.use.vector.serde.deserialize=true;
> set hive.vectorized.execution.enabled=true;
> set hive.vectorized.execution.reduce.enabled=true;
> set hive.vectorized.row.serde.inputformat.excludes=;
> -- SORT_QUERY_RESULTS
> -- exclude MapredParquetInputFormat from vectorization, this should cause 
> mapwork vectorization to be disabled
> set 
> hive.vectorized.input.format.excludes=org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
> set hive.vectorized.use.vectorized.input.format=true;
> create table orcTbl (t1 tinyint, t2 tinyint)
> stored as orc;
> insert into orcTbl values (54, 9), (-104, 25), (-112, 24);
> explain vectorization select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10;
> select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10;
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HIVE-18422) Vectorized input format should not be used when vectorized input format is excluded and row.serde is enabled

Reply via email to