[jira] [Updated] (HIVE-6642) Query fails to vectorize when a non string partition column is part of the query expression

Hari Sankar Sivarama Subramaniyan (JIRA) Wed, 26 Mar 2014 19:15:23 -0700

     [ 
https://issues.apache.org/jira/browse/HIVE-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hari Sankar Sivarama Subramaniyan updated HIVE-6642:
----------------------------------------------------

    Fix Version/s: 0.13.0

> Query fails to vectorize when a non string partition column is part of the 
> query expression
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-6642
>                 URL: https://issues.apache.org/jira/browse/HIVE-6642
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Hari Sankar Sivarama Subramaniyan
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6642-2.patch, HIVE-6642-3.patch, HIVE-6642-4.patch, 
> HIVE-6642.1.patch
>
>
> drop table if exists alltypesorc_part;
> CREATE TABLE alltypesorc_part (
> ctinyint tinyint,
> csmallint smallint,
> cint int,
> cbigint bigint,
> cfloat float,
> cdouble double,
> cstring1 string,
> cstring2 string,
> ctimestamp1 timestamp,
> ctimestamp2 timestamp,
> cboolean1 boolean,
> cboolean2 boolean) partitioned by (ds int) STORED AS ORC;
> insert overwrite table alltypesorc_part partition (ds=2011) select * from 
> alltypesorc limit 100;
> insert overwrite table alltypesorc_part partition (ds=2012) select * from 
> alltypesorc limit 200;
> explain select *
> from (select ds from alltypesorc_part) t1,
>      alltypesorc t2
> where t1.ds = t2.cint
> order by t2.ctimestamp1
> limit 100;
> The above query fails to vectorize because (select ds from alltypesorc_part) 
> t1 returns a string column and the join equality on t2 is performed on an int 
> column. The correct output when vectorization is turned on should be:
> STAGE DEPENDENCIES:
>   Stage-5 is a root stage
>   Stage-2 depends on stages: Stage-5
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-5
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         t1:alltypesorc_part
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         t1:alltypesorc_part
>           TableScan
>             alias: alltypesorc_part
>             Statistics: Num rows: 300 Data size: 62328 Basic stats: COMPLETE 
> Column stats: COMPLETE
>             Select Operator
>               expressions: ds (type: int)
>               outputColumnNames: _col0
>               Statistics: Num rows: 300 Data size: 1200 Basic stats: COMPLETE 
> Column stats: COMPLETE
>               HashTable Sink Operator
>                 condition expressions:
>                   0 {_col0}
>                   1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} 
> {cdouble} {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} 
> {cboolean2}
>                 keys:
>                   0 _col0 (type: int)
>                   1 cint (type: int)
>   Stage: Stage-2
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: t2
>             Statistics: Num rows: 3536 Data size: 1131711 Basic stats: 
> COMPLETE Column stats: NONE
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {_col0}
>                 1 {ctinyint} {csmallint} {cint} {cbigint} {cfloat} {cdouble} 
> {cstring1} {cstring2} {ctimestamp1} {ctimestamp2} {cboolean1} {cboolean2}
>               keys:
>                 0 _col0 (type: int)
>                 1 cint (type: int)
>               outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, 
> _col6, _col7, _col8, _col9, _col10, _col11, _col12
>               Statistics: Num rows: 3889 Data size: 1244882 Basic stats: 
> COMPLETE Column stats: NONE
>               Filter Operator
>                 predicate: (_col0 = _col3) (type: boolean)
>                 Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: int), _col1 (type: tinyint), 
> _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), _col5 (type: 
> float), _col6 (type: double), _col7 (type: string), _col8 (type: string), 
> _col\
> 9 (type: timestamp), _col10 (type: timestamp), _col11 (type: boolean), _col12 
> (type: boolean)
>                   outputColumnNames: _col0, _col1, _col2, _col3, _col4, 
> _col5, _col6, _col7, _col8, _col9, _col10, _col11, _col12
>                   Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
>                   Reduce Output Operator
>                     key expressions: _col9 (type: timestamp)
>                     sort order: +
>                     Statistics: Num rows: 1944 Data size: 622280 Basic stats: 
> COMPLETE Column stats: NONE
>                     value expressions: _col0 (type: int), _col1 (type: 
> tinyint), _col2 (type: smallint), _col3 (type: int), _col4 (type: bigint), 
> _col5 (type: float), _col6 (type: double), _col7 (type: string), _col8 (type: 
> strin\
> g), _col9 (type: timestamp), _col10 (type: timestamp), _col11 (type: 
> boolean), _col12 (type: boolean)
>       Local Work:
>         Map Reduce Local Work
>       Execution mode: vectorized
>       Reduce Operator Tree:
>         Extract
>           Statistics: Num rows: 1944 Data size: 622280 Basic stats: COMPLETE 
> Column stats: NONE
>           Limit
>             Number of rows: 100
>             Statistics: Num rows: 100 Data size: 32000 Basic stats: COMPLETE 
> Column stats: NONE
>             File Output Operator
>               compressed: false
>               Statistics: Num rows: 100 Data size: 32000 Basic stats: 
> COMPLETE Column stats: NONE
>               table:
>                   input format: org.apache.hadoop.mapred.TextInputFormat
>                   output format: 
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                   serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: 100
> where as with the current code, vectorization fails to take place because of 
> the following exception
> 14/03/12 14:43:19 DEBUG vector.VectorizationContext: No vector udf found for 
> GenericUDFOPEqual, descriptor: Argument Count = 2, mode = FILTER, Argument 
> Types = {STRING,LONG}, Input Expression Types = {COLUMN,COLUMN}
> 14/03/12 14:43:19 DEBUG physical.Vectorizer: Failed to vectorize
> org.apache.hadoop.hive.ql.metadata.HiveException: Udf: GenericUDFOPEqual, is 
> not supported
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getGenericUdfVectorExpression(VectorizationContext.java:854)
>       at 
> org.apache.hadoop.hive.ql.exec.vector.VectorizationContext.getVectorExpression(VectorizationContext.java:300)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateExprNodeDesc(Vectorizer.java:682)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateFilterOperator(Vectorizer.java:606)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.validateOperator(Vectorizer.java:537)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$ValidationNodeProcessor.process(Vectorizer.java:367)
>       at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
>       at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94)
>       at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78)
>       at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:132)
>       at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.validateMapWork(Vectorizer.java:314)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.convertMapWork(Vectorizer.java:283)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer$VectorizationDispatcher.dispatch(Vectorizer.java:270)
>       at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.dispatch(TaskGraphWalker.java:111)
>       at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.walk(TaskGraphWalker.java:194)
>       at 
> org.apache.hadoop.hive.ql.lib.TaskGraphWalker.startWalking(TaskGraphWalker.java:139)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.Vectorizer.resolve(Vectorizer.java:519)
>       at 
> org.apache.hadoop.hive.ql.optimizer.physical.PhysicalOptimizer.optimize(PhysicalOptimizer.java:100)
>       at 
> org.apache.hadoop.hive.ql.parse.MapReduceCompiler.optimizeTaskPlan(MapReduceCompiler.java:290)
>       at 
> org.apache.hadoop.hive.ql.parse.TaskCompiler.compile(TaskCompiler.java:216)
>       at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9286)
>       at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>       at 
> org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:64)
>       at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:327)
>       at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:398)
>       at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:294)
>       at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:948)
>       at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:996)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:884)
>       at org.apache.hadoop.hive.ql.Driver.run(Driver.java:874)
>       at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:268)
>       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:220)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:359)
>       at 
> org.apache.hadoop.hive.cli.CliDriver.processReader(CliDriver.java:457)
>       at org.apache.hadoop.hive.cli.CliDriver.processFile(CliDriver.java:467)
>       at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:125)
>       at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:424)
>       at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:793)
>       at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:687)
>       at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:160)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HIVE-6642) Query fails to vectorize when a non string partition column is part of the query expression

Reply via email to