[ https://issues.apache.org/jira/browse/HIVE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hari Sankar Sivarama Subramaniyan updated HIVE-6642: ---------------------------------------------------- Fix Version/s: 0.13.0 > Query fails to vectorize when a non string partition column is part of the > query expression > ------------------------------------------------------------------------------------------- > > Key: HIVE-6642 > URL: https://issues.apache.org/jira/browse/HIVE-6642 > Project: Hive > Issue Type: Bug > Reporter: Hari Sankar Sivarama Subramaniyan > Assignee: Hari Sankar Sivarama Subramaniyan > Fix For: 0.13.0 > > Attachments: HIVE-6642-2.patch, HIVE-6642-3.patch, HIVE-6642-4.patch, > HIVE-6642.1.patch > > > drop table if exists alltypesorc_part; > CREATE TABLE alltypesorc_part ( > ctinyint tinyint, > csmallint smallint, > cint int, > cbigint bigint, > cfloat float, > cdouble double, > cstring1 string, > cstring2 string, > ctimestamp1 timestamp, > ctimestamp2 timestamp, > cboolean1 boolean, > cboolean2 boolean) partitioned by (ds int) STORED AS ORC; > insert overwrite table alltypesorc_part partition (ds=2011) select * from > alltypesorc limit 100; > insert overwrite table alltypesorc_part partition (ds=2012) select * from > alltypesorc limit 200; > explain select * > from (select ds from alltypesorc_part) t1, > alltypesorc t2 > where t1.ds = t2.cint > order by t2.ctimestamp1 > limit 100; > The above query fails to vectorize because (select ds from alltypesorc_part) > t1 returns a string column and the join equality on t2 is performed on an int > column. The correct output when vectorization is turned on should be: > STAGE DEPENDENCIES: > Stage-5 is a root stage > Stage-2 depends on stages: Stage-5 > Stage-0 is a root stage > STAGE PLANS: > Stage: Stage-5 > Map Reduce Local Work > Alias -> Map Local Tables: > t1:alltypesorc_part > Fetch Operator > limit: -1 > Alias -> Map Local Operator Tree: > t1:alltypesorc_part > TableScan > alias: alltypesorc_part > Statistics: Num rows: 300 Data size: 62328 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: ds (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 300 Data size: 1200 Basic stats: COMPLETE > Column stats: COMPLETE > HashTable Sink Operator > condition expressions: > 0 {_col0} > 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} > {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} > {cboolean2} > keys: > 0 _col0 (type: int) > 1 cint (type: int) > Stage: Stage-2 > Map Reduce > Map Operator Tree: > TableScan > alias: t2 > Statistics: Num rows: 3536 Data size: 1131711 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: > Inner Join 0 to 1 > condition expressions: > 0 {_col0} > 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} > {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2} > keys: > 0 _col0 (type: int) > 1 cint (type: int) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, > _col6, _col7, _col8, _col9, _col10, _col11, _col12 > Statistics: Num rows: 3889 Data size: 1244882 Basic stats: > COMPLETE Column stats: NONE > Filter Operator > predicate: (_col0 = _col3) (type: boolean) > Statistics: Num rows: 1944 Data size: 622280 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: int), _col1 (type: tinyint), > _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: > float), _col6 (type: double), _col7 (type: string), _col8 (type: string), > _col\ > 9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 > (type: boolean) > outputColumnNames: _col0, _col1, _col2, _col3, _col4, > _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12 > Statistics: Num rows: 1944 Data size: 622280 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > key expressions: _col9 (type: timestamp) > sort order: + > Statistics: Num rows: 1944 Data size: 622280 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: int), _col1 (type: > tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), > _col5 (type: float), _col6 (type: double), _col7 (type: string), _col8 (type: > strin\ > g), _col9 (type: timestamp), _col10 (type: timestamp), _col11 (type: > boolean), _col12 (type: boolean) > Local Work: > Map Reduce Local Work > Execution mode: vectorized > Reduce Operator Tree: > Extract > Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE > Column stats: NONE > Limit > Number of rows: 100 > Statistics: Num rows: 100 Data size: 32000 Basic stats: COMPLETE > Column stats: NONE > File Output Operator > compressed: false > Statistics: Num rows: 100 Data size: 32000 Basic stats: > COMPLETE Column stats: NONE > table: > input format: org.apache.hadoop.mapred.TextInputFormat > output format: > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat > serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe > Stage: Stage-0 > Fetch Operator > limit: 100 > where as with the current code, vectorization fails to take place because of > the following exception > 14/03/12 14:43:19 DEBUG vector.VectorizationContext: No vector udf found for > GenericUDFOPEqual, descriptor: Argument Count = 2, mode = FILTER, Argument > Types = {STRING,LONG}, Input Expression Types = {COLUMN,COLUMN} > 14/03/12 14:43:19 DEBUG physical.Vectorizer: Failed to vectorize > org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFOPEqual, is > not supported > at > org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:854) > at > org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:300) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:682) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateFilterOperator(Vectorizer.java:606) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateOperator(Vectorizer.java:537) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$ValidationNodeProcessor.process(Vectorizer.java:367) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateMapWork(Vectorizer.java:314) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:283) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:270) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194) > at > org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139) > at > org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(Vectorizer.java:519) > at > org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:100) > at > org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290) > at > org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:216) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9286) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > at > org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:398) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:294) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:948) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:996) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:884) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:874) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359) > at > org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:457) > at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:467) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:125) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:160) -- This message was sent by Atlassian JIRA (v6.2#6252)