[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Description: Implement Native Vector GroupBy using fast hash table technology developed for Native Vector MapJoin and vector key handling developed for recent HIVE-12290 Native Vector ReduceSink JIRA. (was: Implement fast Vector GroupBy using fast hash table technology developed for Native Vector MapJoin and vector key handling developed for recent HIVE-12290 Native Vector ReduceSink JIRA. (Patch also includes making Native Vector MapJoin use Hybrid Grace -- but that can be separated out)) > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin and vector key handling developed for recent > HIVE-12290 Native Vector ReduceSink JIRA. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Description: Implement Native Vector GroupBy using fast hash table technology developed for Native Vector MapJoin, etc. (was: Implement Native Vector GroupBy using fast hash table technology developed for Native Vector MapJoin and vector key handling developed for recent HIVE-12290 Native Vector ReduceSink JIRA.) > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch > > > Implement Native Vector GroupBy using fast hash table technology developed > for Native Vector MapJoin, etc. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12369) Native Vector GroupBy
[ https://issues.apache.org/jira/browse/HIVE-12369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-12369: Summary: Native Vector GroupBy (was: Faster Vector GroupBy) > Native Vector GroupBy > - > > Key: HIVE-12369 > URL: https://issues.apache.org/jira/browse/HIVE-12369 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-12369.01.patch, HIVE-12369.02.patch > > > Implement fast Vector GroupBy using fast hash table technology developed for > Native Vector MapJoin and vector key handling developed for recent HIVE-12290 > Native Vector ReduceSink JIRA. > (Patch also includes making Native Vector MapJoin use Hybrid Grace -- but > that can be separated out) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099513#comment-16099513 ] Lefty Leverenz commented on HIVE-16998: --- I suggest adding a second sentence to the parameter description: "If hive.spark.dynamic.partition.pruning is set to false, this parameter value is ignored." > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15758) Allow correlated scalar subqueries with aggregates which has non-equi join predicates
[ https://issues.apache.org/jira/browse/HIVE-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099510#comment-16099510 ] Lefty Leverenz commented on HIVE-15758: --- I added a TODOC3.0 label but I'm not sure where this should be documented -- subqueries, joins, or aggregates? * [Subqueries | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries] * [Joins | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins] * [UDAFs | https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-Built-inAggregateFunctions(UDAF)] > Allow correlated scalar subqueries with aggregates which has non-equi join > predicates > - > > Key: HIVE-15758 > URL: https://issues.apache.org/jira/browse/HIVE-15758 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Reporter: Vineet Garg >Assignee: Vineet Garg > Labels: TODOC3.0, sub-query > Fix For: 3.0.0 > > Attachments: HIVE-15758.1.patch, HIVE-15758.2.patch, > HIVE-15758.3.patch > > > Queries such as > {code} select * from part where p_size <> (select count(p_size) from part pp > where part.p_type <> pp.p_type); {code} are currently not allowed since HIVE > doesn't know how to rewrite such queries to preserve the correctness for > cases when there is zero row -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results
[ https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16791 started by Deepak Jaiswal. - > Tez engine giving inaccurate results on SMB Map joins while map-join and > shuffle join gets correct results > -- > > Key: HIVE-16791 > URL: https://issues.apache.org/jira/browse/HIVE-16791 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Reporter: Saumil Mayani >Assignee: Deepak Jaiswal > Attachments: sample-data-query.txt, sample-data.tar.gz-aa, > sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad > > > SMB Join gives incorrect results. > {code} > SMB-Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=50; > OK > 2016 1 11999639 > 2016 2 18955110 > 2017 2 22217437 > Time taken: 92.647 seconds, Fetched: 3 row(s) > {code} > {code} > MAP-JOIN > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 17.49 seconds, Fetched: 3 row(s) > {code} > {code} > Shuffle Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=false; > set hive.auto.convert.join=false; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 38.575 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results
[ https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16791: -- Status: Open (was: Patch Available) > Tez engine giving inaccurate results on SMB Map joins while map-join and > shuffle join gets correct results > -- > > Key: HIVE-16791 > URL: https://issues.apache.org/jira/browse/HIVE-16791 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Reporter: Saumil Mayani >Assignee: Deepak Jaiswal > Attachments: sample-data-query.txt, sample-data.tar.gz-aa, > sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad > > > SMB Join gives incorrect results. > {code} > SMB-Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=50; > OK > 2016 1 11999639 > 2016 2 18955110 > 2017 2 22217437 > Time taken: 92.647 seconds, Fetched: 3 row(s) > {code} > {code} > MAP-JOIN > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 17.49 seconds, Fetched: 3 row(s) > {code} > {code} > Shuffle Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=false; > set hive.auto.convert.join=false; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 38.575 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-15758) Allow correlated scalar subqueries with aggregates which has non-equi join predicates
[ https://issues.apache.org/jira/browse/HIVE-15758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-15758: -- Labels: TODOC3.0 sub-query (was: sub-query) > Allow correlated scalar subqueries with aggregates which has non-equi join > predicates > - > > Key: HIVE-15758 > URL: https://issues.apache.org/jira/browse/HIVE-15758 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer >Reporter: Vineet Garg >Assignee: Vineet Garg > Labels: TODOC3.0, sub-query > Fix For: 3.0.0 > > Attachments: HIVE-15758.1.patch, HIVE-15758.2.patch, > HIVE-15758.3.patch > > > Queries such as > {code} select * from part where p_size <> (select count(p_size) from part pp > where part.p_type <> pp.p_type); {code} are currently not allowed since HIVE > doesn't know how to rewrite such queries to preserve the correctness for > cases when there is zero row -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results
[ https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099486#comment-16099486 ] Hive QA commented on HIVE-16791: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12871673/sample-data-query.txt {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6124/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6124/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6124/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-25 04:40:11.356 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-6124/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-25 04:40:11.359 + cd apache-github-source-source + git fetch origin >From https://github.com/apache/hive 9a85331..88da238 master -> origin/master 2c7c92a..c37fdf9 branch-2 -> origin/branch-2 + git reset --hard HEAD HEAD is now at 9a85331 HIVE-17114: HoS: Possible skew in shuffling when data is not really skewed (Rui reviewed by Chao) + git clean -f -d Removing ql/src/test/queries/clientpositive/smb_join1.q Removing ql/src/test/results/clientpositive/llap/smb_join1.q.out + git checkout master Already on 'master' Your branch is behind 'origin/master' by 4 commits, and can be fast-forwarded. (use "git pull" to update your local branch) + git reset --hard origin/master HEAD is now at 88da238 HIVE-16222 : add a setting to disable row.serde for specific formats; enable for others (Sergey Shelukhin, reviewed by Matt McCline) + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-25 04:40:18.375 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch patch: Only garbage was found in the patch input. patch: Only garbage was found in the patch input. patch: Only garbage was found in the patch input. fatal: unrecognized input The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12871673 - PreCommit-HIVE-Build > Tez engine giving inaccurate results on SMB Map joins while map-join and > shuffle join gets correct results > -- > > Key: HIVE-16791 > URL: https://issues.apache.org/jira/browse/HIVE-16791 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Reporter: Saumil Mayani >Assignee: Deepak Jaiswal > Attachments: sample-data-query.txt, sample-data.tar.gz-aa, > sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad > > > SMB Join gives incorrect results. > {code} > SMB-Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=50; > OK > 2016 1 11999639 > 2016 2 18955110 > 2017 2 22217437 > Time taken: 92.647 seconds, Fetched: 3 row(s) > {code} > {code} > MAP-JOIN > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set
[jira] [Commented] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099485#comment-16099485 ] Sahil Takiar commented on HIVE-17087: - Hey [~kellyzly]. Yes, opRules is a LinkedHashMap, so they should be run in order. After looking at HIVE-10559 in more detail, I think there is a simpler fix that can be made to avoid the NPE. I've left some comments in RB about it. > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, > HIVE-17087.3.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1, partitioned_table2 where > partitioned_table1.part_col = partitioned_table2.part_col; > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-3 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Stage: Stage-3 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: partitioned_table2 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator >
[jira] [Commented] (HIVE-12878) Support Vectorization for TEXTFILE and other formats
[ https://issues.apache.org/jira/browse/HIVE-12878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099480#comment-16099480 ] Lefty Leverenz commented on HIVE-12878: --- Doc note: HIVE-16222 changes the default value of *hive.vectorized.use.row.serde.deserialize* to true in release 3.0.0. > Support Vectorization for TEXTFILE and other formats > > > Key: HIVE-12878 > URL: https://issues.apache.org/jira/browse/HIVE-12878 > Project: Hive > Issue Type: New Feature > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Fix For: 2.1.0 > > Attachments: HIVE-12878.01.patch, HIVE-12878.02.patch, > HIVE-12878.03.patch, HIVE-12878.04.patch, HIVE-12878.05.patch, > HIVE-12878.06.patch, HIVE-12878.07.patch, HIVE-12878.08.patch, > HIVE-12878.091.patch, HIVE-12878.092.patch, HIVE-12878.093.patch, > HIVE-12878.09.patch > > > Support vectorizing when the input format is TEXTFILE and other formats for > better Map Vertex performance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others
[ https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099476#comment-16099476 ] Lefty Leverenz commented on HIVE-16222: --- Doc note: This adds *hive.vectorized.row.serde.inputformat.excludes* to HiveConf.java and changes the default value of *hive.vectorized.use.row.serde.deserialize* to true, so the wiki needs to be updated. * [Configuration Properties -- Vectorization | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Vectorization] ** [hive.vectorized.use.row.serde.deserialize | https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.vectorized.use.row.serde.deserialize] Added a TODOC3.0 label. (Welcome back, Sergey.) > add a setting to disable row.serde for specific formats; enable for others > -- > > Key: HIVE-16222 > URL: https://issues.apache.org/jira/browse/HIVE-16222 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, > HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, > HIVE-16222.patch > > > Per [~gopalv] > {quote} > row.serde = true ... breaks Parquet (they expect to get the same object back, > which means you can't buffer 1024 rows). > {quote} > We want to enable this and vector.serde for text vectorization. Need to turn > it off for specific formats. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others
[ https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lefty Leverenz updated HIVE-16222: -- Labels: TODOC3.0 (was: ) > add a setting to disable row.serde for specific formats; enable for others > -- > > Key: HIVE-16222 > URL: https://issues.apache.org/jira/browse/HIVE-16222 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: TODOC3.0 > Fix For: 3.0.0 > > Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, > HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, > HIVE-16222.patch > > > Per [~gopalv] > {quote} > row.serde = true ... breaks Parquet (they expect to get the same object back, > which means you can't buffer 1024 rows). > {quote} > We want to enable this and vector.serde for text vectorization. Need to turn > it off for specific formats. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16948: Description: in [union_subquery.q|https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/spark_dynamic_partition_pruning.q#L107] in spark_dynamic_partition_pruning.q {code} set hive.optimize.ppd=true; set hive.ppd.remove.duplicatefilters=true; set hive.spark.dynamic.partition.pruning=true; set hive.optimize.metadataonly=false; set hive.optimize.index.filter=true; set hive.strict.checks.cartesian.product=false; explain select ds from (select distinct(ds) as ds from srcpart union all select distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); {code} explain {code} STAGE DEPENDENCIES: Stage-2 is a root stage Stage-1 depends on stages: Stage-2 Stage-0 depends on stages: Stage-1 STAGE PLANS: Stage: Stage-2 Spark Edges: Reducer 11 <- Map 10 (GROUP, 1) Reducer 13 <- Map 12 (GROUP, 1) DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 Vertices: Map 10 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: ds (type: string) outputColumnNames: ds Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE Group By Operator aggregations: max(ds) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: string) Map 12 Map Operator Tree: TableScan alias: srcpart Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE Select Operator expressions: ds (type: string) outputColumnNames: ds Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE Group By Operator aggregations: min(ds) mode: hash outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Reduce Output Operator sort order: Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE value expressions: _col0 (type: string) Reducer 11 Reduce Operator Tree: Group By Operator aggregations: max(VALUE._col0) mode: mergepartial outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Filter Operator predicate: _col0 is not null (type: boolean) Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE Group By Operator keys: _col0 (type: string) mode: hash outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE Spark Partition Pruning Sink Operator partition key expr: ds Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE target column name: ds target work: Map 1 Select Operator expressions: _col0 (type: string) outputColumnNames: _col0 Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE Group By
[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16948: Status: Patch Available (was: Open) > Invalid explain when running dynamic partition pruning query in HOS > --- > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16948.patch > > > union_subquery.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator >
[jira] [Updated] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] liyunzhang_intel updated HIVE-16948: Attachment: HIVE-16948.patch > Invalid explain when running dynamic partition pruning query in HOS > --- > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-16948.patch > > > union_subquery.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col0 (type: string) > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition
[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS
[ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099424#comment-16099424 ] liyunzhang_intel commented on HIVE-16948: - the reason why Map4 does not exist in the explain is because of [CombineEquivalentWorkResolver|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java] before CombineEquivalentWorkResolver optimization is enabled,Map4 exists in the explain, after CombineEquivalentWorkResolver which will find and combine equivalent works. Map4 is deleted because Map4 equals Map1. So we need to remove the spark dynamic pruning sink branch in Reducer 11 and Reducer 13 in Stage-2. [~lirui], [~stakiar], [~csun] please help review, thanks! > Invalid explain when running dynamic partition pruning query in HOS > --- > > Key: HIVE-16948 > URL: https://issues.apache.org/jira/browse/HIVE-16948 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > > union_subquery.q > {code} > set hive.optimize.ppd=true; > set hive.ppd.remove.duplicatefilters=true; > set hive.spark.dynamic.partition.pruning=true; > set hive.optimize.metadataonly=false; > set hive.optimize.index.filter=true; > set hive.strict.checks.cartesian.product=false; > explain select ds from (select distinct(ds) as ds from srcpart union all > select distinct(ds) as ds from srcpart) s where s.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart); > {code} > explain > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-1 depends on stages: Stage-2 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > Edges: > Reducer 11 <- Map 10 (GROUP, 1) > Reducer 13 <- Map 12 (GROUP, 1) > DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2 > Vertices: > Map 10 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: max(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Map 12 > Map Operator Tree: > TableScan > alias: srcpart > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Select Operator > expressions: ds (type: string) > outputColumnNames: ds > Statistics: Num rows: 1 Data size: 23248 Basic stats: > PARTIAL Column stats: NONE > Group By Operator > aggregations: min(ds) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > value expressions: _col0 (type: string) > Reducer 11 > Reduce Operator Tree: > Group By Operator > aggregations: max(VALUE._col0) > mode: mergepartial > outputColumnNames: _col0 > Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE > Column stats: NONE > Filter Operator > predicate: _col0 is not null (type: boolean) > Statistics: Num rows: 1 Data size: 184 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: string) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 2 Data size: 368 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions:
[jira] [Updated] (HIVE-17139) Conditional expressions optimization: skip the expression evaluation if the condition is not satisfied for vectorization engine.
[ https://issues.apache.org/jira/browse/HIVE-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ke Jia updated HIVE-17139: -- Attachment: HIVE-17139.2.patch > Conditional expressions optimization: skip the expression evaluation if the > condition is not satisfied for vectorization engine. > > > Key: HIVE-17139 > URL: https://issues.apache.org/jira/browse/HIVE-17139 > Project: Hive > Issue Type: Improvement >Reporter: Ke Jia >Assignee: Ke Jia > Attachments: HIVE-17139.1.patch, HIVE-17139.2.patch > > > The case when and if statement execution for Hive vectorization is not > optimal, which all the conditional and else expressions are evaluated for > current implementation. The optimized approach is to update the selected > array of batch parameter after the conditional expression is executed. Then > the else expression will only do the selected rows instead of all. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099386#comment-16099386 ] Ashutosh Chauhan commented on HIVE-16997: - +1 pending test. Lets tackle a) removing base64 encoding and b) displaying only 2 chars of bit vector in desc output in a follow-up. > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, > HIVE-16997.06.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16954) LLAP IO: better debugging
[ https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099385#comment-16099385 ] Gopal V commented on HIVE-16954: Meant to +1 - this hasn't been perf tested, so will return to this if I find these functions in my profile runs. > LLAP IO: better debugging > - > > Key: HIVE-16954 > URL: https://issues.apache.org/jira/browse/HIVE-16954 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17087) Remove unnecessary HoS DPP trees during map-join conversion
[ https://issues.apache.org/jira/browse/HIVE-17087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-17087: Attachment: HIVE-17087.3.patch > Remove unnecessary HoS DPP trees during map-join conversion > --- > > Key: HIVE-17087 > URL: https://issues.apache.org/jira/browse/HIVE-17087 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17087.1.patch, HIVE-17087.2.patch, > HIVE-17087.3.patch > > > Ran the following query in the {{TestSparkCliDriver}}: > {code:sql} > set hive.spark.dynamic.partition.pruning=true; > set hive.auto.convert.join=true; > create table partitioned_table1 (col int) partitioned by (part_col int); > create table partitioned_table2 (col int) partitioned by (part_col int); > create table regular_table (col int); > insert into table regular_table values (1); > alter table partitioned_table1 add partition (part_col = 1); > insert into table partitioned_table1 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > alter table partitioned_table2 add partition (part_col = 1); > insert into table partitioned_table2 partition (part_col = 1) values (1), > (2), (3), (4), (5), (6), (7), (8), (9), (10); > explain select * from partitioned_table1, partitioned_table2 where > partitioned_table1.part_col = partitioned_table2.part_col; > {code} > and got the following explain plan: > {code} > STAGE DEPENDENCIES: > Stage-2 is a root stage > Stage-3 depends on stages: Stage-2 > Stage-1 depends on stages: Stage-3 > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-2 > Spark > A masked pattern was here > Vertices: > Map 3 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: _col1 (type: int) > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Group By Operator > keys: _col0 (type: int) > mode: hash > outputColumnNames: _col0 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark Partition Pruning Sink Operator > partition key expr: part_col > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > target column name: part_col > target work: Map 2 > Stage: Stage-3 > Spark > A masked pattern was here > Vertices: > Map 2 > Map Operator Tree: > TableScan > alias: partitioned_table2 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Spark HashTable Sink Operator > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > Local Work: > Map Reduce Local Work > Stage: Stage-1 > Spark > A masked pattern was here > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: partitioned_table1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Select Operator > expressions: col (type: int), part_col (type: int) > outputColumnNames: _col0, _col1 > Statistics: Num rows: 10 Data size: 11 Basic stats: > COMPLETE Column stats: NONE > Map Join Operator > condition map: >Inner Join 0 to 1 > keys: > 0 _col1 (type: int) > 1 _col1 (type: int) > outputColumnNames:
[jira] [Commented] (HIVE-16954) LLAP IO: better debugging
[ https://issues.apache.org/jira/browse/HIVE-16954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099383#comment-16099383 ] Sergey Shelukhin commented on HIVE-16954: - [~gopalv] ping? > LLAP IO: better debugging > - > > Key: HIVE-16954 > URL: https://issues.apache.org/jira/browse/HIVE-16954 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16954-branch-2.patch, HIVE-16954.patch > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17006) LLAP: Parquet caching
[ https://issues.apache.org/jira/browse/HIVE-17006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099361#comment-16099361 ] Sergey Shelukhin commented on HIVE-17006: - Sorry I was on vacation. Will do, after cleaning up the patch and testing more. > LLAP: Parquet caching > - > > Key: HIVE-17006 > URL: https://issues.apache.org/jira/browse/HIVE-17006 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-17006.WIP.patch > > > There are multiple options to do Parquet caching in LLAP: > 1) Full elevator (too intrusive for now). > 2) Page based cache like ORC (requires some changes to Parquet or > copy-pasted). > 3) Cache disk data on column chunk level as is. > Given that Parquet reads at column chunk granularity, (2) is not as useful as > for ORC, but still a good idea. I messaged the dev list about it but didn't > get a response, we may follow up later. > For now, do (3). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16844) Fix Connection leak in ObjectStore when new Conf object is used
[ https://issues.apache.org/jira/browse/HIVE-16844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099357#comment-16099357 ] Sunitha Beeram commented on HIVE-16844: --- [~mithun] Do you have further input on this? > Fix Connection leak in ObjectStore when new Conf object is used > --- > > Key: HIVE-16844 > URL: https://issues.apache.org/jira/browse/HIVE-16844 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Sunitha Beeram >Assignee: Sunitha Beeram > Fix For: 3.0.0 > > Attachments: HIVE-16844.1.patch > > > The code path in ObjectStore.java currently leaks BoneCP (or Hikari) > connection pools when a new configuration object is passed in. The code needs > to ensure that the persistence-factory is closed before it is nullified. > The relevant code is > [here|https://github.com/apache/hive/blob/master/metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java#L290]. > Note that pmf is set to null, but the underlying connection pool is not > closed. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099349#comment-16099349 ] Sergey Shelukhin commented on HIVE-12631: - Left some feedback on the new implementation. Looks like [~ekoifman] supports this approach. > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.17.patch, HIVE-12631.18.patch, HIVE-12631.19.patch, > HIVE-12631.1.patch, HIVE-12631.20.patch, HIVE-12631.21.patch, > HIVE-12631.22.patch, HIVE-12631.23.patch, HIVE-12631.2.patch, > HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, > HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, > HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Attachment: HIVE-16997.06.patch > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, > HIVE-16997.06.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Status: Open (was: Patch Available) > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, > HIVE-16997.06.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Status: Patch Available (was: Open) > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch, > HIVE-16997.06.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099332#comment-16099332 ] Ashutosh Chauhan commented on HIVE-17131: - This patch is for branch-2. I think there we dont want to make any changes in those interfaces. +1 > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.branch-2.patch, HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Moved] (HIVE-17162) get rid of "skipCorrupt" flag
[ https://issues.apache.org/jira/browse/HIVE-17162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin moved HADOOP-14684 to HIVE-17162: -- Key: HIVE-17162 (was: HADOOP-14684) Project: Hive (was: Hadoop Common) > get rid of "skipCorrupt" flag > - > > Key: HIVE-17162 > URL: https://issues.apache.org/jira/browse/HIVE-17162 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > The error that caused the issue was a long time ago and it's probably ok to > get rid of this flag. > Perhaps we should provide a small tool to overwrite these files without the > corrupt values. > cc [~prasanth_j] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17162) get rid of "skipCorrupt" flag
[ https://issues.apache.org/jira/browse/HIVE-17162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-17162: Description: The error that caused the issue was a long time ago and it's probably ok to get rid of this flag. Perhaps we should provide a small tool to overwrite these files without the corrupted values. cc [~prasanth_j] was: The error that caused the issue was a long time ago and it's probably ok to get rid of this flag. Perhaps we should provide a small tool to overwrite these files without the corrupt values. cc [~prasanth_j] > get rid of "skipCorrupt" flag > - > > Key: HIVE-17162 > URL: https://issues.apache.org/jira/browse/HIVE-17162 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > The error that caused the issue was a long time ago and it's probably ok to > get rid of this flag. > Perhaps we should provide a small tool to overwrite these files without the > corrupted values. > cc [~prasanth_j] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17133) NoSuchMethodError in Hadoop FileStatus.compareTo
[ https://issues.apache.org/jira/browse/HIVE-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099305#comment-16099305 ] Sergey Shelukhin commented on HIVE-17133: - Filed HADOOP-14683. Let's see what the response is, and go from there. > NoSuchMethodError in Hadoop FileStatus.compareTo > > > Key: HIVE-17133 > URL: https://issues.apache.org/jira/browse/HIVE-17133 > Project: Hive > Issue Type: Bug >Reporter: Rui Li > > The stack trace is: > {noformat} > Caused by: java.lang.NoSuchMethodError: > org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I > at > org.apache.hadoop.hive.ql.io.AcidUtils.lambda$getAcidState$0(AcidUtils.java:931) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) > at java.util.TimSort.sort(TimSort.java:234) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:929) > {noformat} > I'm on Hive master and using Hadoop 2.7.2. The method signature in Hadoop > 2.7.2 is: > https://github.com/apache/hadoop/blob/release-2.7.2-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L336 > In Hadoop 2.8.0 it becomes: > https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L332 > I think that breaks binary compatibility. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17161) BeeLine up arrow key should show the last sql instead of last line
[ https://issues.apache.org/jira/browse/HIVE-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099303#comment-16099303 ] Gopal V commented on HIVE-17161: AFAIK, you need JLine 3 for this. https://github.com/jline/jline3/blob/master/reader/src/main/java/org/jline/reader/Parser.java#L25 > BeeLine up arrow key should show the last sql instead of last line > -- > > Key: HIVE-17161 > URL: https://issues.apache.org/jira/browse/HIVE-17161 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > > Currently, when you press Up arrow on beeline prompt, it shows the last line > of the previous sql. It is hard to execute the previous sql if it was > executed using multiple lines. It would be good to improve this experience by > fetching the last command executed instead of just the last line of the > previous command. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17161) BeeLine up arrow key should show the last sql instead of last line
[ https://issues.apache.org/jira/browse/HIVE-17161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reassigned HIVE-17161: -- > BeeLine up arrow key should show the last sql instead of last line > -- > > Key: HIVE-17161 > URL: https://issues.apache.org/jira/browse/HIVE-17161 > Project: Hive > Issue Type: Improvement >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > > Currently, when you press Up arrow on beeline prompt, it shows the last line > of the previous sql. It is hard to execute the previous sql if it was > executed using multiple lines. It would be good to improve this experience by > fetching the last command executed instead of just the last line of the > previous command. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Status: Open (was: Patch Available) > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Status: Patch Available (was: Open) > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16997: --- Attachment: HIVE-16997.05.patch > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch, HIVE-16997.05.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others
[ https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-16222. - Resolution: Fixed > add a setting to disable row.serde for specific formats; enable for others > -- > > Key: HIVE-16222 > URL: https://issues.apache.org/jira/browse/HIVE-16222 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, > HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, > HIVE-16222.patch > > > Per [~gopalv] > {quote} > row.serde = true ... breaks Parquet (they expect to get the same object back, > which means you can't buffer 1024 rows). > {quote} > We want to enable this and vector.serde for text vectorization. Need to turn > it off for specific formats. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17149) Hdfs directory is not cleared if partition creation failed on HMS
[ https://issues.apache.org/jira/browse/HIVE-17149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-17149: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Pushed to master. Thanks [~zsombor.klara] for the work. > Hdfs directory is not cleared if partition creation failed on HMS > - > > Key: HIVE-17149 > URL: https://issues.apache.org/jira/browse/HIVE-17149 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 3.0.0 >Reporter: Barna Zsombor Klara >Assignee: Barna Zsombor Klara > Fix For: 3.0.0 > > Attachments: HIVE-17149.01.patch > > > Hive#loadPartition will load a directory into a Hive Table Partition. It will > alter the existing content of > the partition with the new contents and create a new partition if one does > not exist. > The file move is performed before the partition creation and if the creation > failes, the moved files are not cleared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16222) add a setting to disable row.serde for specific formats; enable for others
[ https://issues.apache.org/jira/browse/HIVE-16222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099261#comment-16099261 ] Sergey Shelukhin commented on HIVE-16222: - Looks like I forgot to push this > add a setting to disable row.serde for specific formats; enable for others > -- > > Key: HIVE-16222 > URL: https://issues.apache.org/jira/browse/HIVE-16222 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 3.0.0 > > Attachments: HIVE-16222.01.patch, HIVE-16222.02.patch, > HIVE-16222.03.patch, HIVE-16222.04.patch, HIVE-16222.05.patch, > HIVE-16222.patch > > > Per [~gopalv] > {quote} > row.serde = true ... breaks Parquet (they expect to get the same object back, > which means you can't buffer 1024 rows). > {quote} > We want to enable this and vector.serde for text vectorization. Need to turn > it off for specific formats. > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17133) NoSuchMethodError in Hadoop FileStatus.compareTo
[ https://issues.apache.org/jira/browse/HIVE-17133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099228#comment-16099228 ] Sergey Shelukhin commented on HIVE-17133: - I think we do... we should still be able run on it. We'd probably need a shim. Or we can ask to re-add the method in 2.8.1 (or whatever) and not support 2.8.0. > NoSuchMethodError in Hadoop FileStatus.compareTo > > > Key: HIVE-17133 > URL: https://issues.apache.org/jira/browse/HIVE-17133 > Project: Hive > Issue Type: Bug >Reporter: Rui Li > > The stack trace is: > {noformat} > Caused by: java.lang.NoSuchMethodError: > org.apache.hadoop.fs.FileStatus.compareTo(Lorg/apache/hadoop/fs/FileStatus;)I > at > org.apache.hadoop.hive.ql.io.AcidUtils.lambda$getAcidState$0(AcidUtils.java:931) > at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355) > at java.util.TimSort.sort(TimSort.java:234) > at java.util.Arrays.sort(Arrays.java:1512) > at java.util.ArrayList.sort(ArrayList.java:1454) > at java.util.Collections.sort(Collections.java:175) > at > org.apache.hadoop.hive.ql.io.AcidUtils.getAcidState(AcidUtils.java:929) > {noformat} > I'm on Hive master and using Hadoop 2.7.2. The method signature in Hadoop > 2.7.2 is: > https://github.com/apache/hadoop/blob/release-2.7.2-RC2/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L336 > In Hadoop 2.8.0 it becomes: > https://github.com/apache/hadoop/blob/release-2.8.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileStatus.java#L332 > I think that breaks binary compatibility. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099220#comment-16099220 ] Sergey Shelukhin commented on HIVE-16965: - +1 pending tests. llap_smb test might now start passing > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16998) Add config to enable HoS DPP only for map-joins
[ https://issues.apache.org/jira/browse/HIVE-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16998: --- Attachment: HIVE16998.1.patch Introduced parameter hive.spark.dynamic.partition.pruning.map.join.only, with a default value of false. If hive.spark.dynamic.partition.pruning is set to false, this parameter value is ignored. If the hive.spark.dynamic.partition.pruning is set to true, then if hive.spark.dynamic.partition.pruning.map.join.only is set to true, then DPP will be enabled only for queries that run with map joins, otherwise DPP will be enabled for all queries. > Add config to enable HoS DPP only for map-joins > --- > > Key: HIVE-16998 > URL: https://issues.apache.org/jira/browse/HIVE-16998 > Project: Hive > Issue Type: Sub-task > Components: Logical Optimizer, Spark >Reporter: Sahil Takiar >Assignee: Janaki Lahorani > Attachments: HIVE16998.1.patch > > > HoS DPP will split a given operator tree in two under the following > conditions: it has detected that the query can benefit from DPP, and the > filter is not a map-join (see SplitOpTreeForDPP). > This can hurt performance if the the non-partitioned side of the join > involves a complex operator tree - e.g. the query {{select count(*) from > srcpart where srcpart.ds in (select max(srcpart.ds) from srcpart union all > select min(srcpart.ds) from srcpart)}} will require running the subquery > twice, once in each Spark job. > Queries with map-joins don't get split into two operator trees and thus don't > suffer from this drawback. Thus, it would be nice to have a config key that > just enables DPP on HoS for map-joins. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099204#comment-16099204 ] Deepak Jaiswal commented on HIVE-16965: --- https://reviews.apache.org/r/61087 RB link > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction
[ https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099191#comment-16099191 ] Sergio Peña commented on HIVE-17150: It creates more than one notification per operation. One for CREATE_TABLE and another for CREATE_INDEX. These 2 notifications are not duplicated as they're in the same transaction. However, if another thread writes a new notification, then the EVENT_ID will be duplicated if it grabs the ID before committing the CREATE_INDEX operation. Currently, only CREATE_INDEX calls a CREATE_TABLE inside the same transaction. > CREATE INDEX execute HMS out-of-transaction listener calls inside a > transaction > --- > > Key: HIVE-17150 > URL: https://issues.apache.org/jira/browse/HIVE-17150 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch > > > The problem with CREATE INDEX is that it calls a CREATE TABLE operation > inside the same CREATE INDEX transaction. During listener calls, there are > some listeners that should run in an out-of-transaction context, for > instance, Sentry blocks the HMS operation until the DB log notification is > processed, but if the transaction has not finished, then the > out-of-transaction listener will block forever (or until a read-time out > happens). > A fix would be to add a parameter to the out-of-transaction listener that > alerts the listener if HMS is in an active transaction. If so, then is up to > the listener plugin to return immediately and avoid blocking the HMS > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099189#comment-16099189 ] slim bouguerra commented on HIVE-17160: --- [~sershe] and [~sseth] can you please look at the Tez part. > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16759) Add table type information to HMS log notifications
[ https://issues.apache.org/jira/browse/HIVE-16759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Janaki Lahorani updated HIVE-16759: --- Attachment: HIVE16759.3.patch > Add table type information to HMS log notifications > --- > > Key: HIVE-16759 > URL: https://issues.apache.org/jira/browse/HIVE-16759 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1 >Reporter: Sergio Peña >Assignee: Janaki Lahorani > Attachments: HIVE16759.1.patch, HIVE16759.2.patch, HIVE16759.3.patch > > > The DB notifications used by HiveMetaStore should include the table type for > all notifications that include table events, such as create, drop and alter > table. > This would be useful for consumers to identify views vs tables. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Attachment: HIVE-17160.patch > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-17160.patch > > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099186#comment-16099186 ] Sergey Shelukhin commented on HIVE-16965: - Can you link to RB? I saw it but I cannot find the email anymore :) > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction
[ https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099185#comment-16099185 ] Alexander Kolbasov commented on HIVE-17150: --- [~spena] When you have such nested operations, how do you handle the notification ID - you can't store more then a single ID in an event, but such nested operations may try to create more then a single notification ID in the transactional part. > CREATE INDEX execute HMS out-of-transaction listener calls inside a > transaction > --- > > Key: HIVE-17150 > URL: https://issues.apache.org/jira/browse/HIVE-17150 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch > > > The problem with CREATE INDEX is that it calls a CREATE TABLE operation > inside the same CREATE INDEX transaction. During listener calls, there are > some listeners that should run in an out-of-transaction context, for > instance, Sentry blocks the HMS operation until the DB log notification is > processed, but if the transaction has not finished, then the > out-of-transaction listener will block forever (or until a read-time out > happens). > A fix would be to add a parameter to the out-of-transaction listener that > alerts the listener if HMS is in an active transaction. If so, then is up to > the listener plugin to return immediately and avoid blocking the HMS > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-17160: -- Status: Patch Available (was: Open) > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099181#comment-16099181 ] slim bouguerra commented on HIVE-17160: --- Druid supports SPNEGO Kerberos authorization. http://druid.io/docs/latest/development/extensions-core/druid-kerberos.html In This PR adds decorated Http clients with Kerberos Token and Cookies manager. This will work only when LLAP is enable, since containers do not have any Kerberos credentials. User need to set the following configuration in the Hive side: {code}hive.llap.task.principal{code} and {code}hive.llap.task.keytab.file{code} > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction
[ https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17150: --- Fix Version/s: 2.4.0 > CREATE INDEX execute HMS out-of-transaction listener calls inside a > transaction > --- > > Key: HIVE-17150 > URL: https://issues.apache.org/jira/browse/HIVE-17150 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch > > > The problem with CREATE INDEX is that it calls a CREATE TABLE operation > inside the same CREATE INDEX transaction. During listener calls, there are > some listeners that should run in an out-of-transaction context, for > instance, Sentry blocks the HMS operation until the DB log notification is > processed, but if the transaction has not finished, then the > out-of-transaction listener will block forever (or until a read-time out > happens). > A fix would be to add a parameter to the out-of-transaction listener that > alerts the listener if HMS is in an active transaction. If so, then is up to > the listener plugin to return immediately and avoid blocking the HMS > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction
[ https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17150: --- Resolution: Fixed Fix Version/s: 30 Status: Resolved (was: Patch Available) > CREATE INDEX execute HMS out-of-transaction listener calls inside a > transaction > --- > > Key: HIVE-17150 > URL: https://issues.apache.org/jira/browse/HIVE-17150 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 30 > > Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch > > > The problem with CREATE INDEX is that it calls a CREATE TABLE operation > inside the same CREATE INDEX transaction. During listener calls, there are > some listeners that should run in an out-of-transaction context, for > instance, Sentry blocks the HMS operation until the DB log notification is > processed, but if the transaction has not finished, then the > out-of-transaction listener will block forever (or until a read-time out > happens). > A fix would be to add a parameter to the out-of-transaction listener that > alerts the listener if HMS is in an active transaction. If so, then is up to > the listener plugin to return immediately and avoid blocking the HMS > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction
[ https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099171#comment-16099171 ] Sergio Peña commented on HIVE-17150: Thanks [~vihangk1]. I committed to master. > CREATE INDEX execute HMS out-of-transaction listener calls inside a > transaction > --- > > Key: HIVE-17150 > URL: https://issues.apache.org/jira/browse/HIVE-17150 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch > > > The problem with CREATE INDEX is that it calls a CREATE TABLE operation > inside the same CREATE INDEX transaction. During listener calls, there are > some listeners that should run in an out-of-transaction context, for > instance, Sentry blocks the HMS operation until the DB log notification is > processed, but if the transaction has not finished, then the > out-of-transaction listener will block forever (or until a read-time out > happens). > A fix would be to add a parameter to the out-of-transaction listener that > alerts the listener if HMS is in an active transaction. If so, then is up to > the listener plugin to return immediately and avoid blocking the HMS > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction
[ https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17150: --- Fix Version/s: (was: 30) 3.0.0 > CREATE INDEX execute HMS out-of-transaction listener calls inside a > transaction > --- > > Key: HIVE-17150 > URL: https://issues.apache.org/jira/browse/HIVE-17150 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch > > > The problem with CREATE INDEX is that it calls a CREATE TABLE operation > inside the same CREATE INDEX transaction. During listener calls, there are > some listeners that should run in an out-of-transaction context, for > instance, Sentry blocks the HMS operation until the DB log notification is > processed, but if the transaction has not finished, then the > out-of-transaction listener will block forever (or until a read-time out > happens). > A fix would be to add a parameter to the out-of-transaction listener that > alerts the listener if HMS is in an active transaction. If so, then is up to > the listener plugin to return immediately and avoid blocking the HMS > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16640) The ASF Headers have some errors in some class
[ https://issues.apache.org/jira/browse/HIVE-16640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-16640: --- Fix Version/s: 2.3.0 > The ASF Headers have some errors in some class > -- > > Key: HIVE-16640 > URL: https://issues.apache.org/jira/browse/HIVE-16640 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Fix For: 2.3.0, 3.0.0 > > Attachments: HIVE-16640.1.patch > > > I found some class license hive placed in an incorrect location, some classes > missing license -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.3.patch Implemented comments from [~sershe] and [~gopalv] > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch, > HIVE-16965.3.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.
[ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099157#comment-16099157 ] Siddharth Seth commented on HIVE-17019: --- Re-looked at the patch. Mostly looks good. Some comments and questions. - How is the context set up for LogDownloadServlet. e.g.CONF_LOG_DOWNLODER_NUM_EXECUTORS. The config should likely be set up in HiveConf in some way. - init for the servlet will happen once at startup? So if there's multiple requests to download, and the limit is hit, all webserver threads will block? Should we just return an error if there's too many parallel downloads, so that other parts of the UI continue to be functional. - In terms of the security - this becomes interesting. Essentially says that the feature will only work if authentication is enabled on secure clusters. - Timeout for the downloads as a separate jira? - Are any credentials required on the HttpClient created to download artifacts from various end points? - For Constants like TIMELINE_PATH_PREFIX - any chance YARN has a helper method? Otherwise we should file a jira to ask yarn to expose such utilities. - Both dagId and queryId cannot be specified at the same time? > Add support to download debugging information as an archive. > > > Key: HIVE-17019 > URL: https://issues.apache.org/jira/browse/HIVE-17019 > Project: Hive > Issue Type: Bug >Reporter: Harish Jaiprakash >Assignee: Harish Jaiprakash > Attachments: HIVE-17019.01.patch, HIVE-17019.02.patch, > HIVE-17019.03.patch > > > Given a queryId or dagId, get all information related to it: like, tez am, > task logs, hive ats data, tez ats data, slider am status, etc. Package it > into and archive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17150) CREATE INDEX execute HMS out-of-transaction listener calls inside a transaction
[ https://issues.apache.org/jira/browse/HIVE-17150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099145#comment-16099145 ] Vihang Karajgaonkar commented on HIVE-17150: Hi [~spena] the change looks good to me. The failures are unrelated. +1 > CREATE INDEX execute HMS out-of-transaction listener calls inside a > transaction > --- > > Key: HIVE-17150 > URL: https://issues.apache.org/jira/browse/HIVE-17150 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.3.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Attachments: HIVE-17150.1.patch, HIVE-17150.2.patch > > > The problem with CREATE INDEX is that it calls a CREATE TABLE operation > inside the same CREATE INDEX transaction. During listener calls, there are > some listeners that should run in an out-of-transaction context, for > instance, Sentry blocks the HMS operation until the DB log notification is > processed, but if the transaction has not finished, then the > out-of-transaction listener will block forever (or until a read-time out > happens). > A fix would be to add a parameter to the out-of-transaction listener that > alerts the listener if HMS is in an active transaction. If so, then is up to > the listener plugin to return immediately and avoid blocking the HMS > operation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17159) Make metastore a separately releasable module
[ https://issues.apache.org/jira/browse/HIVE-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099143#comment-16099143 ] Alan Gates commented on HIVE-17159: --- Thanks! As a first step I will put together a 1 pager with a proposed plan so people can give feedback. Hopefully I'll have that done today or tomorrow. Then I agree subtasks of this JIRA are the way to go. > Make metastore a separately releasable module > - > > Key: HIVE-17159 > URL: https://issues.apache.org/jira/browse/HIVE-17159 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > As proposed in this > [thread|https://lists.apache.org/thread.html/5e75f45d60f0b819510814a126cfd3809dd24b1c7035a1c8c41b0c5c@%3Cdev.hive.apache.org%3E] > on the dev list, we should move the metastore into a separately releasable > module. This is a POC of and potential first step towards separating out the > metastore as a separate Apache TLP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17159) Make metastore a separately releasable module
[ https://issues.apache.org/jira/browse/HIVE-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099138#comment-16099138 ] Vihang Karajgaonkar commented on HIVE-17159: Hi [~alangates] Thanks for creating this. I can help working on this as well. Are you planning to create sub-tasks to break up the work? > Make metastore a separately releasable module > - > > Key: HIVE-17159 > URL: https://issues.apache.org/jira/browse/HIVE-17159 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > As proposed in this > [thread|https://lists.apache.org/thread.html/5e75f45d60f0b819510814a126cfd3809dd24b1c7035a1c8c41b0c5c@%3Cdev.hive.apache.org%3E] > on the dev list, we should move the metastore into a separately releasable > module. This is a POC of and potential first step towards separating out the > metastore as a separate Apache TLP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099115#comment-16099115 ] Deepak Jaiswal commented on HIVE-16965: --- [~sershe] Thanks for the comments. Somehow lost the assert while making the code pretty. Applying all your comments in a patch coming in shortly. > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17160) Adding kerberos Authorization to the Druid hive integration
[ https://issues.apache.org/jira/browse/HIVE-17160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra reassigned HIVE-17160: - Assignee: slim bouguerra > Adding kerberos Authorization to the Druid hive integration > --- > > Key: HIVE-17160 > URL: https://issues.apache.org/jira/browse/HIVE-17160 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > > This goal of this feature is to allow hive querying a secured druid cluster > using kerberos credentials. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17132) Add InterfaceAudience and InterfaceStability annotations for UDF APIs
[ https://issues.apache.org/jira/browse/HIVE-17132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099099#comment-16099099 ] Zoltan Haindrich commented on HIVE-17132: - I always feeled that {{GenericUDF}} and {{UDF}} are somewhat the same...but they don't share a common parent... it seems to me that every {{UDF}} is repackaged into a Generic one via {{GenericUDFBridge}} - I assume this is because of evolutional purposes...I think it would be better to leave it behind...and try to remove the old {{UDF}} later. > Add InterfaceAudience and InterfaceStability annotations for UDF APIs > - > > Key: HIVE-17132 > URL: https://issues.apache.org/jira/browse/HIVE-17132 > Project: Hive > Issue Type: Sub-task > Components: UDF >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17132.1.patch > > > Add InterfaceAudience and InterfaceStability annotations for UDF APIs. UDFs > are a useful plugin point for Hive users, and there are a number of external > UDF libraries, such as hivemall. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099093#comment-16099093 ] Hive QA commented on HIVE-16965: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12878673/HIVE-16965.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11093 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_join1] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_scalar] (batchId=153) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning] (batchId=168) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=235) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=235) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=179) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=179) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6123/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6123/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6123/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12878673 - PreCommit-HIVE-Build > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17152) Improve security of random generator for HS2 cookies
[ https://issues.apache.org/jira/browse/HIVE-17152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099081#comment-16099081 ] Tao Li commented on HIVE-17152: --- [~susanths], [~vgumashta] Can you please take a look at this change? > Improve security of random generator for HS2 cookies > > > Key: HIVE-17152 > URL: https://issues.apache.org/jira/browse/HIVE-17152 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-17152.1.patch > > > The random number generated is used as a secret to append to a sequence and > SHA to implement a CookieSigner. If this is attackable, then it's possible > for an attacker to sign a cookie as if we had. We should fix this and use > SecureRandom as a stronger random function . > HTTPAuthUtils has a similar issue. If that is attackable, an attacker might > be able to create a similar cookie. Paired with the above issue with the > CookieSigner, it could reasonably spoof a HS2 cookie. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17039) Implement optimization rewritings that rely on database SQL constraints
[ https://issues.apache.org/jira/browse/HIVE-17039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099080#comment-16099080 ] Sergey Shelukhin commented on HIVE-17039: - Hmm... I thought the Hive SQL constraints are not enforced. > Implement optimization rewritings that rely on database SQL constraints > --- > > Key: HIVE-17039 > URL: https://issues.apache.org/jira/browse/HIVE-17039 > Project: Hive > Issue Type: New Feature > Components: Logical Optimizer >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez > > Hive already has support to declare multiple SQL constraints (PRIMARY KEY, > FOREIGN KEY, UNIQUE, and NOT NULL). Although these constraints cannot be > currently enforced on the data, they can be made available to the optimizer > by using the 'RELY' keyword. > This ticket is an umbrella for all the rewriting optimizations based on SQL > constraints that we will be including in Hive. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17159) Make metastore a separately releasable module
[ https://issues.apache.org/jira/browse/HIVE-17159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned HIVE-17159: - > Make metastore a separately releasable module > - > > Key: HIVE-17159 > URL: https://issues.apache.org/jira/browse/HIVE-17159 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > As proposed in this > [thread|https://lists.apache.org/thread.html/5e75f45d60f0b819510814a126cfd3809dd24b1c7035a1c8c41b0c5c@%3Cdev.hive.apache.org%3E] > on the dev list, we should move the metastore into a separately releasable > module. This is a POC of and potential first step towards separating out the > metastore as a separate Apache TLP. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17060) unix_timestamp(void) is deprecated message is printed twice
[ https://issues.apache.org/jira/browse/HIVE-17060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099073#comment-16099073 ] Sergey Shelukhin commented on HIVE-17060: - Well, we can remove the overload and the warning in 3.0; not sure about 2.4. There isn't really a good way to de-dup it if the UDF is being created twice. > unix_timestamp(void) is deprecated message is printed twice > --- > > Key: HIVE-17060 > URL: https://issues.apache.org/jira/browse/HIVE-17060 > Project: Hive > Issue Type: Bug > Components: CBO, UDF >Affects Versions: 1.3.0, 2.0.0 >Reporter: Peter Vary >Priority: Trivial > > HIVE-10728 added a warning message when the unix_timestamp used without > parameters. > When CBO is used, this message is printed twice. > Minimal steps to reproduce: > {code} > set hive.cbo.enable = true; > create table timestamp_test(s string); > select unix_timestamp() from timestamp_test; > {code} > This duplication is even enforced by the golden files in the commit :) > https://github.com/apache/hive/commit/24d3307be79d35d3a34c49014dfdd597112f9106#diff-bf6c9f3549aaeb2b40b8b1eab9254c4aR73 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099064#comment-16099064 ] Chao Sun commented on HIVE-17117: - Tests failures are not related. Committed this to master. Thanks [~pgolash] for the patch, and [~mohitsabharwal] and [~zshao] for the review. > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-17117: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16640) The ASF Headers have some errors in some class
[ https://issues.apache.org/jira/browse/HIVE-16640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099062#comment-16099062 ] Lefty Leverenz commented on HIVE-16640: --- Nudging [~pxiong] to add 2.3.0 to the fix versions. (See last comment.) > The ASF Headers have some errors in some class > -- > > Key: HIVE-16640 > URL: https://issues.apache.org/jira/browse/HIVE-16640 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-16640.1.patch > > > I found some class license hive placed in an incorrect location, some classes > missing license -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17117) Metalisteners are not notified when threadlocal metaconf is cleanup
[ https://issues.apache.org/jira/browse/HIVE-17117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099059#comment-16099059 ] Zheng Shao commented on HIVE-17117: --- +1 > Metalisteners are not notified when threadlocal metaconf is cleanup > > > Key: HIVE-17117 > URL: https://issues.apache.org/jira/browse/HIVE-17117 > Project: Hive > Issue Type: Bug > Components: Metastore > Environment: Tested on master branch (Applicable for downlevel > versions as well) >Reporter: PRASHANT GOLASH >Assignee: PRASHANT GOLASH >Priority: Minor > Attachments: HIVE-17117.1.patch, HIVE-17117.patch > > > Meta listeners are not notified of meta-conf cleanup. This could potentially > leave stale values on listeners objects. For e.g. > Request1 > a. HS2 -> HMS : HMSHandler#setMetaConf > MetaListeners are notified of the ConfigChangeEvent. > b. HS2 -> HMS : HMSHandler#shutdown / HiveMetaStore#deleteContext (if > shutdown is not invoked) > MetaConf is cleaned up in HiveMetaStore#cleanupRawStore, but meta > listeners are not notified > Request 2 > 3. HS2->HMS : AlterPartition > MetaListeners are notified of AlterPartitionEvent. If any listener has > taken dependency on the meta conf value, it will still be having stale value > from Request1 and would potentially be having issues. > The correct behavior should be to notify meta listeners on cleanup as well. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17126) Hive Metastore is incompatible with MariaDB 10.x
[ https://issues.apache.org/jira/browse/HIVE-17126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099060#comment-16099060 ] Sergey Shelukhin commented on HIVE-17126: - Sounds like a DataNucleus issue to me (assuming it supports Maria DB), or at least the lack of functionality (assuming it doesn't). You might want to find out which one :) Also, there's an ORM-free, SQL-based implementation of the entire metastore in the works in HIVE-14870. That is Oracle-specific, but some (or at least I ;) ) believe it should be database agnostic. It might also be of use, if there isn't an easy solution with DataNucleus. cc [~cdrome] > Hive Metastore is incompatible with MariaDB 10.x > > > Key: HIVE-17126 > URL: https://issues.apache.org/jira/browse/HIVE-17126 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.0, 1.1.0, 2.0.0 >Reporter: Eric Yang > > MariaDB 10.x is commonly used for cheap RDBMS high availability. Hive usage > of Datanucleus is currently preventing Hive Metastore to use MariaDB 10.x as > highly available metastore. Datanucleus generate SQL statements that are not > parsable by MariaDB 10.x when dropping Hive table or database schema. > Without MariaDB HA setup, the SQL statement problem also exists for metastore > interaction with MariaDB 10.x. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number
[ https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099054#comment-16099054 ] Ashutosh Chauhan commented on HIVE-17077: - General principle in such cases is to throw exception at query compile time if we can detect illegal argument in udf. But to return null if its runtime since there likely it depends on data and we don't fail query for malformed rows. > Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len > character's value is negative number > - > > Key: HIVE-17077 > URL: https://issues.apache.org/jira/browse/HIVE-17077 > Project: Hive > Issue Type: Bug >Reporter: Lingang Deng >Assignee: Lingang Deng >Priority: Minor > > lpad(rpad) throw a exception when the second argument a negative number, as > follows, > {code:java} > hive> select lpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > hive> select rpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > {code} > Maybe we should return friendly result such as mysql. > {code:java} > mysql> select lpad("hello", -1 ,"h"); > +--+ > | lpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > mysql> select rpad("hello", -1 ,"h"); > +--+ > | rpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16077) UPDATE/DELETE fails with numBuckets > numReducers
[ https://issues.apache.org/jira/browse/HIVE-16077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16077: -- Description: don't think we have such tests for Acid path check if they exist for non-acid path way to record expected files on disk in ptest/qfile https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; was: don't think we have such tests for Acid path check if they exist for non-acid path > UPDATE/DELETE fails with numBuckets > numReducers > - > > Key: HIVE-16077 > URL: https://issues.apache.org/jira/browse/HIVE-16077 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.1.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16077.01.patch, HIVE-16077.02.patch, > HIVE-16077.03.patch > > > don't think we have such tests for Acid path > check if they exist for non-acid path > way to record expected files on disk in ptest/qfile > https://github.com/apache/hive/blob/master/ql/src/test/queries/clientpositive/orc_merge3.q#L25 > dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/orcfile_merge3b/; -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17088: --- Attachment: HIVE-17088.addendum1.patch [~aihuaxu] Here's another patch to fix this issue. > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > Powered by Jetty:// 9.3.19.v20170502 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17088) HS2 WebUI throws a NullPointerException when opened
[ https://issues.apache.org/jira/browse/HIVE-17088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-17088: --- Status: Patch Available (was: Reopened) > HS2 WebUI throws a NullPointerException when opened > --- > > Key: HIVE-17088 > URL: https://issues.apache.org/jira/browse/HIVE-17088 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 3.0.0 >Reporter: Sergio Peña >Assignee: Sergio Peña > Fix For: 3.0.0 > > Attachments: HIVE-17088.1.patch, HIVE-17088.addendum1.patch > > > After bumping the Jetty version to 3.9 and excluding several other > dependencies on HIVE-16049, the HS2 webui stopped working and throwing a NPE > error. > {noformat} > HTTP ERROR 500 > Problem accessing /hiveserver2.jsp. Reason: > Server Error > Caused by: > java.lang.NullPointerException > at > org.apache.hive.generated.hiveserver2.hiveserver2_jsp._jspService(hiveserver2_jsp.java:181) > at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:790) > at > org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:840) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:584) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) > at > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) > at org.eclipse.jetty.server.Server.handle(Server.java:534) > at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) > at > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:240) > at > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283) > at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108) > at > org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) > at > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) > at > org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) > at > org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) > at java.lang.Thread.run(Thread.java:748) > Powered by Jetty:// 9.3.19.v20170502 > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number
[ https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16099006#comment-16099006 ] Sergey Shelukhin commented on HIVE-17077: - cc [~ashutoshc] what's the default behavior in Hive for these things? I think throwing a better exception (invalid argument) would be better. But we do return null for some invalid type conversions iirc > Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len > character's value is negative number > - > > Key: HIVE-17077 > URL: https://issues.apache.org/jira/browse/HIVE-17077 > Project: Hive > Issue Type: Bug >Reporter: Lingang Deng >Assignee: Lingang Deng >Priority: Minor > > lpad(rpad) throw a exception when the second argument a negative number, as > follows, > {code:java} > hive> select lpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > hive> select rpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > {code} > Maybe we should return friendly result such as mysql. > {code:java} > mysql> select lpad("hello", -1 ,"h"); > +--+ > | lpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > mysql> select rpad("hello", -1 ,"h"); > +--+ > | rpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17155) findConfFile() in HiveConf.java has some issues with the conf path
[ https://issues.apache.org/jira/browse/HIVE-17155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098988#comment-16098988 ] Yongzhi Chen commented on HIVE-17155: - The change looks good. +1 > findConfFile() in HiveConf.java has some issues with the conf path > -- > > Key: HIVE-17155 > URL: https://issues.apache.org/jira/browse/HIVE-17155 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 3.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu >Priority: Minor > Attachments: HIVE-17155.1.patch > > > In findConfFile() function of HiveConf.java, here are some issues. > File.pathSeparator which is ":" is used as the separator rather than "/". new > File(jarUri).getParentFile() will get the "$hive_home/lib" folder, but > actually we want "$hive_home". -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.2.patch > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: (was: HIVE-16965.2.patch) > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16614) Support "set local time zone" statement
[ https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16614: --- Status: Patch Available (was: Open) > Support "set local time zone" statement > --- > > Key: HIVE-16614 > URL: https://issues.apache.org/jira/browse/HIVE-16614 > Project: Hive > Issue Type: Improvement >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16614.patch > > > HIVE-14412 introduces a timezone-aware timestamp. > SQL has a concept of default time zone displacements, which are transparently > applied when converting between timezone-unaware types and timezone-aware > types and, in Hive's case, are also used to shift a timezone aware type to a > different time zone, depending on configuration. > SQL also provides that the default time zone displacement be settable at a > session level, so that clients can access a database simultaneously from > different time zones and see time values in their own time zone. > Currently the time zone displacement is fixed and is set based on the system > time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be > more convenient for users if they have the ability to set their time zone of > choice. > SQL defines "set time zone" with 2 ways of specifying the time zone, first > using an interval and second using the special keyword LOCAL. > Examples: > • set time zone '-8:00'; > • set time zone LOCAL; > LOCAL means to set the current default time zone displacement to the > session's original default time zone displacement. > Reference: SQL:2011 section 19.4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-16614) Support "set local time zone" statement
[ https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez reassigned HIVE-16614: -- Assignee: Jesus Camacho Rodriguez (was: Bing Li) > Support "set local time zone" statement > --- > > Key: HIVE-16614 > URL: https://issues.apache.org/jira/browse/HIVE-16614 > Project: Hive > Issue Type: Improvement >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16614.patch > > > HIVE-14412 introduces a timezone-aware timestamp. > SQL has a concept of default time zone displacements, which are transparently > applied when converting between timezone-unaware types and timezone-aware > types and, in Hive's case, are also used to shift a timezone aware type to a > different time zone, depending on configuration. > SQL also provides that the default time zone displacement be settable at a > session level, so that clients can access a database simultaneously from > different time zones and see time values in their own time zone. > Currently the time zone displacement is fixed and is set based on the system > time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be > more convenient for users if they have the ability to set their time zone of > choice. > SQL defines "set time zone" with 2 ways of specifying the time zone, first > using an interval and second using the special keyword LOCAL. > Examples: > • set time zone '-8:00'; > • set time zone LOCAL; > LOCAL means to set the current default time zone displacement to the > session's original default time zone displacement. > Reference: SQL:2011 section 19.4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16614) Support "set local time zone" statement
[ https://issues.apache.org/jira/browse/HIVE-16614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-16614: --- Attachment: HIVE-16614.patch > Support "set local time zone" statement > --- > > Key: HIVE-16614 > URL: https://issues.apache.org/jira/browse/HIVE-16614 > Project: Hive > Issue Type: Improvement >Reporter: Carter Shanklin >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-16614.patch > > > HIVE-14412 introduces a timezone-aware timestamp. > SQL has a concept of default time zone displacements, which are transparently > applied when converting between timezone-unaware types and timezone-aware > types and, in Hive's case, are also used to shift a timezone aware type to a > different time zone, depending on configuration. > SQL also provides that the default time zone displacement be settable at a > session level, so that clients can access a database simultaneously from > different time zones and see time values in their own time zone. > Currently the time zone displacement is fixed and is set based on the system > time zone where the Hive client runs (HiveServer2 or Hive CLI). It will be > more convenient for users if they have the ability to set their time zone of > choice. > SQL defines "set time zone" with 2 ways of specifying the time zone, first > using an interval and second using the special keyword LOCAL. > Examples: > • set time zone '-8:00'; > • set time zone LOCAL; > LOCAL means to set the current default time zone displacement to the > session's original default time zone displacement. > Reference: SQL:2011 section 19.4 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.2.patch Forgot to attach test. > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch, HIVE-16965.2.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17131) Add InterfaceAudience and InterfaceStability annotations for SerDe APIs
[ https://issues.apache.org/jira/browse/HIVE-17131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098960#comment-16098960 ] Sergey Shelukhin commented on HIVE-17131: - Is it stable? esp. the ObjectInspector. cc [~ashutoshc] > Add InterfaceAudience and InterfaceStability annotations for SerDe APIs > --- > > Key: HIVE-17131 > URL: https://issues.apache.org/jira/browse/HIVE-17131 > Project: Hive > Issue Type: Sub-task > Components: Serializers/Deserializers >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-17131.1.branch-2.patch, HIVE-17131.1.patch > > > Adding InterfaceAudience and InterfaceStability annotations for the core > SerDe APIs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17115) MetaStoreUtils.getDeserializer doesn't catch the java.lang.ClassNotFoundException
[ https://issues.apache.org/jira/browse/HIVE-17115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik.fang updated HIVE-17115: - Attachment: HIVE-17115.1.patch Sorry for the late reply [~daijy] I try to write a test case, but fail to get the NoClassDefFoundError. When code compiles successfully but jvm fails to find the class at runtime, the NoClassDefFoundError is threw I'm afraid it is hard to get NoClassDefFoundError in TestHiveMetaStore changed the patch to catch Throwable [~vihangk1] Yes, the error moved up the stack, and the thread died And the client waited for the response until timeout upload the patch, catches throwable and rebased against branch-1.2 > MetaStoreUtils.getDeserializer doesn't catch the > java.lang.ClassNotFoundException > - > > Key: HIVE-17115 > URL: https://issues.apache.org/jira/browse/HIVE-17115 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 1.2.1 >Reporter: Erik.fang >Assignee: Erik.fang > Attachments: HIVE-17115.1.patch, HIVE-17115.patch > > > Suppose we create a table with Custom SerDe, then call > HiveMetaStoreClient.getSchema(String db, String tableName) to extract the > metadata from HiveMetaStore Service > the thrift client hangs there with exception in HiveMetaStore Service's log, > such as > {code:java} > Exception in thread "pool-5-thread-129" java.lang.NoClassDefFoundError: > org/apache/hadoop/hbase/util/Bytes > at > org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184) > at > org.apache.hadoop.hive.hbase.HBaseSerDeParameters.(HBaseSerDeParameters.java:73) > at > org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117) > at > org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53) > at > org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:401) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_fields_with_environment_context(HiveMetaStore.java:3556) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_schema_with_environment_context(HiveMetaStore.java:3636) > at sun.reflect.GeneratedMethodAccessor104.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy4.get_schema_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9146) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_schema_with_environment_context.getResult(ThriftHiveMetastore.java:9130) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:551) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:546) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) > at > org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:546) > at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.hbase.util.Bytes > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098921#comment-16098921 ] Sergey Shelukhin commented on HIVE-16965: - Map by KV reader looks a little suspicious, what is the hashcode/equals of that? Is it valid and also acceptable in terms of perf? Should it be identity hash map? (HiveInputFormat.HiveInputSplit) splits.get(0) - assumes one element, add an assert? Why is path updated in IO context if we already set a specific one per input, perhaps a more detailed comment could be added > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Comment Edited] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098913#comment-16098913 ] Deepak Jaiswal edited comment on HIVE-16965 at 7/24/17 6:06 PM: Initial patch. Fixes the algorithm to provide correct IOContext for a given input. In SMB, the inputs keep switching compared to traditional joins where inputs are read sequentially. [~gopalv][~jdere][~hagleitn][~sershe] can you please review? was (Author: djaiswal): Initial patch. Fixes the algorithm to provide correct IOContext for a given input. In SMB, the inputs keep switching compared to traditional joins where inputs are read sequentially. > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Attachment: HIVE-16965.1.patch Initial patch. Fixes the algorithm to provide correct IOContext for a given input. In SMB, the inputs keep switching compared to traditional joins where inputs are read sequentially. > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > Attachments: HIVE-16965.1.patch > > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results
[ https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16791 started by Deepak Jaiswal. - > Tez engine giving inaccurate results on SMB Map joins while map-join and > shuffle join gets correct results > -- > > Key: HIVE-16791 > URL: https://issues.apache.org/jira/browse/HIVE-16791 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Reporter: Saumil Mayani >Assignee: Deepak Jaiswal > Attachments: sample-data-query.txt, sample-data.tar.gz-aa, > sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad > > > SMB Join gives incorrect results. > {code} > SMB-Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=50; > OK > 2016 1 11999639 > 2016 2 18955110 > 2017 2 22217437 > Time taken: 92.647 seconds, Fetched: 3 row(s) > {code} > {code} > MAP-JOIN > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 17.49 seconds, Fetched: 3 row(s) > {code} > {code} > Shuffle Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=false; > set hive.auto.convert.join=false; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 38.575 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16791) Tez engine giving inaccurate results on SMB Map joins while map-join and shuffle join gets correct results
[ https://issues.apache.org/jira/browse/HIVE-16791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16791: -- Status: Patch Available (was: In Progress) > Tez engine giving inaccurate results on SMB Map joins while map-join and > shuffle join gets correct results > -- > > Key: HIVE-16791 > URL: https://issues.apache.org/jira/browse/HIVE-16791 > Project: Hive > Issue Type: Bug > Components: Hive, HiveServer2 >Reporter: Saumil Mayani >Assignee: Deepak Jaiswal > Attachments: sample-data-query.txt, sample-data.tar.gz-aa, > sample-data.tar.gz-ab, sample-data.tar.gz-ac, sample-data.tar.gz-ad > > > SMB Join gives incorrect results. > {code} > SMB-Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=50; > OK > 2016 1 11999639 > 2016 2 18955110 > 2017 2 22217437 > Time taken: 92.647 seconds, Fetched: 3 row(s) > {code} > {code} > MAP-JOIN > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 17.49 seconds, Fetched: 3 row(s) > {code} > {code} > Shuffle Join > set hive.execution.engine=tez; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=false; > set hive.auto.convert.join=false; > set hive.auto.convert.join.noconditionaltask.size=5000; > OK > 2016 1 26586093 > 2016 2 17724062 > 2017 2 8862031 > Time taken: 38.575 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Deepak Jaiswal updated HIVE-16965: -- Status: Patch Available (was: In Progress) > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Work started] (HIVE-16965) SMB join may produce incorrect results
[ https://issues.apache.org/jira/browse/HIVE-16965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-16965 started by Deepak Jaiswal. - > SMB join may produce incorrect results > -- > > Key: HIVE-16965 > URL: https://issues.apache.org/jira/browse/HIVE-16965 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Deepak Jaiswal > > Running the following on MiniTez > {noformat} > set hive.mapred.mode=nonstrict; > SET hive.vectorized.execution.enabled=true; > SET hive.exec.orc.default.buffer.size=32768; > SET hive.exec.orc.default.row.index.stride=1000; > SET hive.optimize.index.filter=true; > set hive.fetch.task.conversion=none; > set hive.exec.dynamic.partition.mode=nonstrict; > DROP TABLE orc_a; > DROP TABLE orc_b; > CREATE TABLE orc_a (id bigint, cdouble double) partitioned by (y int, q > smallint) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > CREATE TABLE orc_b (id bigint, cfloat float) > CLUSTERED BY (id) SORTED BY (id) INTO 2 BUCKETS stored as orc; > insert into table orc_a partition (y=2000, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_a partition (y=2001, q) > select cbigint, cdouble, csmallint % 10 from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc; > insert into table orc_b > select cbigint, cfloat from alltypesorc > where cbigint is not null and csmallint > 0 order by cbigint asc limit 200; > set hive.cbo.enable=false; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > set hive.enforce.sortmergebucketmapjoin=false; > set hive.optimize.bucketmapjoin=true; > set hive.optimize.bucketmapjoin.sortedmerge=true; > set hive.auto.convert.sortmerge.join=true; > set hive.auto.convert.join=true; > set hive.auto.convert.join.noconditionaltask.size=10; > explain > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > select y,q,count(*) from orc_a a join orc_b b on a.id=b.id group by y,q; > DROP TABLE orc_a; > DROP TABLE orc_b; > {noformat} > Produces different results for the two selects. The SMB one looks incorrect. > cc [~djaiswal] [~hagleitn] -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16997) Extend object store to store bit vectors
[ https://issues.apache.org/jira/browse/HIVE-16997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098551#comment-16098551 ] Ashutosh Chauhan commented on HIVE-16997: - Latest patch looks good. I see you have used blob type for mysql. Any reason for still using varchar for others? Also, can you update RB with latest patch. > Extend object store to store bit vectors > > > Key: HIVE-16997 > URL: https://issues.apache.org/jira/browse/HIVE-16997 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-16997.01.patch, HIVE-16997.02.patch, > HIVE-16997.03.patch, HIVE-16997.04.patch > > > This patch includes: (1) a new serde for FMSketch (2) change of schema for > derby and mysql (3) support for date type (4) refactoring the extrapolation > and merge code -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-17158) BeeLine Query Log and Query Result print order is not defined
[ https://issues.apache.org/jira/browse/HIVE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary resolved HIVE-17158. --- Resolution: Cannot Reproduce > BeeLine Query Log and Query Result print order is not defined > - > > Key: HIVE-17158 > URL: https://issues.apache.org/jira/browse/HIVE-17158 > Project: Hive > Issue Type: Bug > Components: Beeline, Testing Infrastructure >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > > The output of the BeeLine tests is sometimes flaky, especially if the query > is a fast one > The output is sometime this: > {code} > PREHOOK: query: select explode(array('a', 'b')) > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > POSTHOOK: query: select explode(array('a', 'b')) > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > a > b > {code} > Sometime this: > {code} > a > b > PREHOOK: query: select explode(array('a', 'b')) > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > POSTHOOK: query: select explode(array('a', 'b')) > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > {code} > Notice, that the actual query result is either before, or after the stuff > printed by the hooks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17158) BeeLine Query Log and Query Result print order is not defined
[ https://issues.apache.org/jira/browse/HIVE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098408#comment-16098408 ] Peter Vary commented on HIVE-17158: --- Ok. I was looking at old code. This was solved by HIVE-15473 > BeeLine Query Log and Query Result print order is not defined > - > > Key: HIVE-17158 > URL: https://issues.apache.org/jira/browse/HIVE-17158 > Project: Hive > Issue Type: Bug > Components: Beeline, Testing Infrastructure >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > > The output of the BeeLine tests is sometimes flaky, especially if the query > is a fast one > The output is sometime this: > {code} > PREHOOK: query: select explode(array('a', 'b')) > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > POSTHOOK: query: select explode(array('a', 'b')) > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > a > b > {code} > Sometime this: > {code} > a > b > PREHOOK: query: select explode(array('a', 'b')) > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > POSTHOOK: query: select explode(array('a', 'b')) > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > {code} > Notice, that the actual query result is either before, or after the stuff > printed by the hooks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17158) BeeLine Query Log and Query Result print order is not defined
[ https://issues.apache.org/jira/browse/HIVE-17158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16098296#comment-16098296 ] Peter Vary commented on HIVE-17158: --- The root cause of the issue is that if the LogRunnable is interrupted before started then there is no InterruptedException thrown, and the {{showRemainingLogsIfAny}} is not called. {code} private Runnable createLogRunnable(final Statement statement) { if (statement instanceof HiveStatement) { final HiveStatement hiveStatement = (HiveStatement) statement; Runnable runnable = new Runnable() { @Override public void run() { while (hiveStatement.hasMoreLogs() && !Thread.currentThread().isInterrupted()) { try { // fetch the log periodically and output to beeline console for (String log : hiveStatement.getQueryLog()) { if (!beeLine.isTestMode()) { beeLine.info(log); } else { // In test mode print the logs to the output beeLine.output(log); } } Thread.sleep(DEFAULT_QUERY_PROGRESS_INTERVAL); } catch (SQLException e) { beeLine.error(new SQLWarning(e)); return; } catch (InterruptedException e) { beeLine.debug("Getting log thread is interrupted, since query is done!"); showRemainingLogsIfAny(statement); <-- We expect to print the logs here, but if no exception, no logs return; } } } }; return runnable; } else { [..] } } {code} The log printed when the ResultSet is queried, or in the finally stage. {code} do { ResultSet rs = stmnt.getResultSet(); try { int count = beeLine.print(rs); long end = System.currentTimeMillis(); beeLine.info( beeLine.loc("rows-selected", count) + " " + beeLine.locElapsedTime(end - start)); } finally { if (logThread != null) { logThread.join(DEFAULT_QUERY_PROGRESS_THREAD_TIMEOUT); showRemainingLogsIfAny(stmnt); logThread = null; } rs.close(); } } while (BeeLine.getMoreResults(stmnt)); {code} {code} } finally { if (logThread != null) { if (!logThread.isInterrupted()) { logThread.interrupt(); } logThread.join(DEFAULT_QUERY_PROGRESS_THREAD_TIMEOUT); showRemainingLogsIfAny(stmnt); } if (stmnt != null) { stmnt.close(); } } {code} > BeeLine Query Log and Query Result print order is not defined > - > > Key: HIVE-17158 > URL: https://issues.apache.org/jira/browse/HIVE-17158 > Project: Hive > Issue Type: Bug > Components: Beeline, Testing Infrastructure >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > > The output of the BeeLine tests is sometimes flaky, especially if the query > is a fast one > The output is sometime this: > {code} > PREHOOK: query: select explode(array('a', 'b')) > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > POSTHOOK: query: select explode(array('a', 'b')) > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > a > b > {code} > Sometime this: > {code} > a > b > PREHOOK: query: select explode(array('a', 'b')) > PREHOOK: type: QUERY > PREHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > POSTHOOK: query: select explode(array('a', 'b')) > POSTHOOK: type: QUERY > POSTHOOK: Input: _dummy_database@_dummy_table > A masked pattern was here > {code} > Notice, that the actual query result is either before, or after the stuff > printed by the hooks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)