[ https://issues.apache.org/jira/browse/HIVE-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330611#comment-16330611 ]
Matt McCline edited comment on HIVE-18422 at 1/18/18 3:18 PM: -------------------------------------------------------------- I'm looking at it – it doesn't seem quite right. There is a precedence/order in evaluating the 3 variable that control vectorization of input formats. 1. hive.vectorized.use.vectorized.input.format 2. hive.vectorized.use.vector.serde.deserialize 3. hive.vectorized.use.row.serde.deserialize If #1 is true and the input format is assignable from VectorizedInputFormatInterface, then we vectorize. Otherwise, look at #2. If #2 is true and input format is TextInputFormat or SequenceFileInputFormat, then vectorize using vector serde. Finally, look #3. If #3 is true and input format is not excluded, then vectorize using row serde. So, it seems like what is missing in the repro steps issetting hive.vectorized.use.vectorized.input.format to false. In that way, if the customer does have hive.vectorized.use.vectorized.input.format true, then it will ignore the exclude since that only applies to vectorizing row serde and it will vectorized the vertex. And, the change in this JIRA is not needed except for fixing a Q file perhaps. was (Author: mmccline): I'm looking at it – it doesn't seem quite right. There is a precedence/order in evaluating the 3 variable that control vectorization of input formats. 1. hive.vectorized.use.vectorized.input.format 2. hive.vectorized.use.vector.serde.deserialize 3. hive.vectorized.use.row.serde.deserialize If #1 is true and the input format is assignable from VectorizedInputFormatInterface, then we vectorize. Otherwise, look at #2. If #2 is true and input format is TextInputFormat or SequenceFileInputFormat, then vectorize using vector serde. Finally, look #3. If #3 is true and input format is not excluded, then vectorize using row serde. So, it seems like what is missing in the repro steps issetting hive.vectorized.use.vectorized.input.format to false. In that way, if the customer does have hive.vectorized.use.vectorized.input.format true, then it will ignore the exclude since that only applies to vectorizing row serde and it will vectorized the vertex. > Vectorized input format should not be used when input format is excluded and > row.serde is enabled > ------------------------------------------------------------------------------------------------- > > Key: HIVE-18422 > URL: https://issues.apache.org/jira/browse/HIVE-18422 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 3.0.0, 2.4.0 > Reporter: Vihang Karajgaonkar > Assignee: Vihang Karajgaonkar > Priority: Minor > Attachments: HIVE-18422.01.patch, HIVE-18422.02.patch > > > HIVE-17534 introduced a config which gives a capability to exclude certain > inputformat from vectorized execution without affecting other input formats. > If an input format is excluded and row.serde is enabled at the same time, > vectorizer still sets the {{useVectorizedInputFormat}} to true which causes > Vectorized readers to be used in row.serde mode. > In order to reproduce: > {noformat} > set hive.fetch.task.conversion=none; > set hive.vectorized.use.row.serde.deserialize=true; > set hive.vectorized.use.vector.serde.deserialize=true; > set hive.vectorized.execution.enabled=true; > set hive.vectorized.execution.reduce.enabled=true; > set hive.vectorized.row.serde.inputformat.excludes=; > -- SORT_QUERY_RESULTS > -- exclude MapredParquetInputFormat from vectorization, this should cause > mapwork vectorization to be disabled > set > hive.vectorized.input.format.excludes=org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat; > set hive.vectorized.use.vectorized.input.format=true; > create table orcTbl (t1 tinyint, t2 tinyint) > stored as orc; > insert into orcTbl values (54, 9), (-104, 25), (-112, 24); > explain vectorization select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10; > select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10; > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)