[jira] [Commented] (HIVE-16730) Vectorization: Schema Evolution for Text Vectorization / Complex Types
[ https://issues.apache.org/jira/browse/HIVE-16730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083503#comment-16083503 ] Matt McCline commented on HIVE-16730: - Committed to master. > Vectorization: Schema Evolution for Text Vectorization / Complex Types > -- > > Key: HIVE-16730 > URL: https://issues.apache.org/jira/browse/HIVE-16730 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-16730.1.patch, HIVE-16730.2.patch, > HIVE-16730.3.patch > > > With HIVE-16589: "Vectorization: Support Complex Types and GroupBy modes > PARTIAL2, FINAL, and COMPLETE for AVG" change, the tests > schema_evol_text_vec_part_all_complex.q and > schema_evol_text_vecrow_part_all_complex.q fail. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-16977) Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN
[ https://issues.apache.org/jira/browse/HIVE-16977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline resolved HIVE-16977. - Resolution: Not A Problem > Vectorization: Vectorize expressions in THEN/ELSE branches of IF/CASE WHEN > -- > > Key: HIVE-16977 > URL: https://issues.apache.org/jira/browse/HIVE-16977 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > > VectorUDFAdaptor(CASE WHEN ((_col2 > 0)) THEN ((UDFToDouble(_col3) / > UDFToDouble(_col2)) BETWEEN 0. AND 1.5) ... > The expression in the THEN is not permitted. Only columns or constants are > vectorized. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17076) typo in itests/src/test/resources/testconfiguration.properties
[ https://issues.apache.org/jira/browse/HIVE-17076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083540#comment-16083540 ] Hive QA commented on HIVE-17076: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876715/HIVE-17076.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10719 tests executed *Failed tests:* {noformat} TestCleaner2 - did not produce a TEST-*.xml file (likely timed out) (batchId=257) TestConvertAstToSearchArg - did not produce a TEST-*.xml file (likely timed out) (batchId=257) TestIOContextMap - did not produce a TEST-*.xml file (likely timed out) (batchId=257) TestInitiator - did not produce a TEST-*.xml file (likely timed out) (batchId=257) TestRecordIdentifier - did not produce a TEST-*.xml file (likely timed out) (batchId=257) TestSearchArgumentImpl - did not produce a TEST-*.xml file (likely timed out) (batchId=257) TestWorker - did not produce a TEST-*.xml file (likely timed out) (batchId=257) TestWorker2 - did not produce a TEST-*.xml file (likely timed out) (batchId=257) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table] (batchId=54) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver (batchId=239) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5973/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5973/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5973/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876715 - PreCommit-HIVE-Build > typo in itests/src/test/resources/testconfiguration.properties > -- > > Key: HIVE-17076 > URL: https://issues.apache.org/jira/browse/HIVE-17076 > Project: Hive > Issue Type: Bug >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-17076.01.patch > > > it has > {noformat} > minillap.shared.query.files=insert_into1.q,\ > insert_into2.q,\ > insert_values_orig_table.,\ > llapdecider.q,\ > {noformat} > "insert_values_orig_table.,\" is a typo which causes these to be run with > TestCliDriver > Note that there are 2 .q files that start with insert_values_orig_table -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators
[ https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16100: --- Attachment: HIVE-16100.4.patch > Dynamic Sorted Partition optimizer loses sibling operators > -- > > Key: HIVE-16100 > URL: https://issues.apache.org/jira/browse/HIVE-16100 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.1, 2.1.1, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, > HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173 > {code} > // unlink connection between FS and its parent > fsParent = fsOp.getParentOperators().get(0); > fsParent.getChildOperators().clear(); > {code} > The optimizer discards any cases where the fsParent has another SEL child -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16989) Fix some issues identified by lgtm.com
[ https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Malcolm Taylor updated HIVE-16989: -- Status: In Progress (was: Patch Available) > Fix some issues identified by lgtm.com > -- > > Key: HIVE-16989 > URL: https://issues.apache.org/jira/browse/HIVE-16989 > Project: Hive > Issue Type: Improvement >Reporter: Malcolm Taylor >Assignee: Malcolm Taylor > Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, HIVE-16989.patch > > > [lgtm.com|https://lgtm.com] has identified a number of issues where there may > be scope for improvement. The plan is to address some of the alerts found at > [https://lgtm.com/projects/g/apache/hive/]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16989) Fix some issues identified by lgtm.com
[ https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Malcolm Taylor updated HIVE-16989: -- Attachment: HIVE-16989.4.patch Rebased and dropped changes to .reviewboardrc > Fix some issues identified by lgtm.com > -- > > Key: HIVE-16989 > URL: https://issues.apache.org/jira/browse/HIVE-16989 > Project: Hive > Issue Type: Improvement >Reporter: Malcolm Taylor >Assignee: Malcolm Taylor > Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, > HIVE-16989.4.patch, HIVE-16989.patch > > > [lgtm.com|https://lgtm.com] has identified a number of issues where there may > be scope for improvement. The plan is to address some of the alerts found at > [https://lgtm.com/projects/g/apache/hive/]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators
[ https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083560#comment-16083560 ] Gopal V commented on HIVE-16100: Need to revisit an assumption in the optimizer about parent being a SEL operator, expect another refresh. > Dynamic Sorted Partition optimizer loses sibling operators > -- > > Key: HIVE-16100 > URL: https://issues.apache.org/jira/browse/HIVE-16100 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.1, 2.1.1, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, > HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173 > {code} > // unlink connection between FS and its parent > fsParent = fsOp.getParentOperators().get(0); > fsParent.getChildOperators().clear(); > {code} > The optimizer discards any cases where the fsParent has another SEL child -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-15051) Test framework integration with findbugs, rat checks etc.
[ https://issues.apache.org/jira/browse/HIVE-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083573#comment-16083573 ] Peter Vary commented on HIVE-15051: --- I am happy with the current wording! Thanks [~leftylev]! > Test framework integration with findbugs, rat checks etc. > - > > Key: HIVE-15051 > URL: https://issues.apache.org/jira/browse/HIVE-15051 > Project: Hive > Issue Type: Sub-task > Components: Testing Infrastructure >Reporter: Peter Vary >Assignee: Peter Vary > Fix For: 3.0.0 > > Attachments: beeline.out, HIVE-15051.02.patch, HIVE-15051.patch, > Interim.patch, ql.out > > > Find a way to integrate code analysis tools like findbugs, rat checks to > PreCommit tests, thus removing the burden from reviewers to check the code > style and other checks which could be done by code. > Might worth to take a look on Yetus, but keep in mind the Hive has a specific > parallel test framework. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16989) Fix some issues identified by lgtm.com
[ https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Malcolm Taylor updated HIVE-16989: -- Status: Patch Available (was: In Progress) > Fix some issues identified by lgtm.com > -- > > Key: HIVE-16989 > URL: https://issues.apache.org/jira/browse/HIVE-16989 > Project: Hive > Issue Type: Improvement >Reporter: Malcolm Taylor >Assignee: Malcolm Taylor > Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, > HIVE-16989.4.patch, HIVE-16989.patch > > > [lgtm.com|https://lgtm.com] has identified a number of issues where there may > be scope for improvement. The plan is to address some of the alerts found at > [https://lgtm.com/projects/g/apache/hive/]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
[ https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083610#comment-16083610 ] Chao Sun commented on HIVE-17018: - [~kellyzly] Yes. I think we don't need to change the existing behavior. I'm just suggesting that we might need a HoS specific config to replace {{hive.auto.convert.join.nonconditionaltask.size}}, so that it is less confusing. > Small table is converted to map join even the total size of small tables > exceeds the threshold(hive.auto.convert.join.noconditionaltask.size) > - > > Key: HIVE-17018 > URL: https://issues.apache.org/jira/browse/HIVE-17018 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt > > > we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it > means the sum of size for n-1 of the tables/partitions for a n-way join is > smaller than it, it will be converted to a map join. for example, A join B > join C join D join E. Big table is A(100M), small tables are > B(10M),C(10M),D(10M),E(10M). If we set > hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B > will be converted to map join but C will not be converted to map join. In my > understanding, because hive.auto.convert.join.noconditionaltask.size can only > contain E and D, so C and B should not be converted to map join. > Let's explain more why E can be converted to map join. > in current code, > [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364] > calculates all the mapjoins in the parent path and child path. The search > stops when encountering [UnionOperator or > ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381]. > Because C is not converted to map join because {{connectedMapJoinSize + > totalSize) > maxSize}} [see > code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The > RS before the join of C remains. When calculating whether B will be > converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering > [RS > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409] > and causes {{connectedMapJoinSize + totalSize) < maxSize}} matches. > [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not as you > are more familiar with SparkJoinOptimizer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number
[ https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083611#comment-16083611 ] ASF GitHub Bot commented on HIVE-17077: --- GitHub user chitin opened a pull request: https://github.com/apache/hive/pull/203 HIVE-17077 Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number [HIVE-17077] Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number - return null when len character's value is negative number https://issues.apache.org/jira/browse/HIVE-17077 You can merge this pull request into a Git repository by running: $ git pull https://github.com/chitin/hive hive17077 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/203.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #203 commit 8715915839f76f6d60d6b40400d3bb826cbeb8b5 Author: chitin Date: 2017-07-12T07:56:47Z HIVE-17077 Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number - Add len judgment logic > Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len > character's value is negative number > - > > Key: HIVE-17077 > URL: https://issues.apache.org/jira/browse/HIVE-17077 > Project: Hive > Issue Type: Bug >Reporter: Lingang Deng >Assignee: Lingang Deng >Priority: Minor > > lpad(rpad) throw a exception when the second argument a negative number, as > follows, > {code:java} > hive> select lpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > hive> select rpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > {code} > Maybe we should return friendly result such as mysql. > {code:java} > mysql> select lpad("hello", -1 ,"h"); > +--+ > | lpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > mysql> select rpad("hello", -1 ,"h"); > +--+ > | rpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083605#comment-16083605 ] Hive QA commented on HIVE-17066: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876771/HIVE-17066.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10839 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5974/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5974/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5974/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876771 - PreCommit-HIVE-Build > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number
[ https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lingang Deng updated HIVE-17077: Summary: Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number (was: lpad(rpad) should return a value but not throw a exception) > Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len > character's value is negative number > - > > Key: HIVE-17077 > URL: https://issues.apache.org/jira/browse/HIVE-17077 > Project: Hive > Issue Type: Bug >Reporter: Lingang Deng >Assignee: Lingang Deng >Priority: Minor > > lpad(rpad) throw a exception when the second argument a negative number, as > follows, > {code:java} > hive> select lpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > hive> select rpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > {code} > Maybe we should return friendly result such as mysql. > {code:java} > mysql> select lpad("hello", -1 ,"h"); > +--+ > | lpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > mysql> select rpad("hello", -1 ,"h"); > +--+ > | rpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17077) lpad(rpad) should return a value but not throw a exception
[ https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lingang Deng reassigned HIVE-17077: --- > lpad(rpad) should return a value but not throw a exception > -- > > Key: HIVE-17077 > URL: https://issues.apache.org/jira/browse/HIVE-17077 > Project: Hive > Issue Type: Bug >Reporter: Lingang Deng >Assignee: Lingang Deng >Priority: Minor > > lpad(rpad) throw a exception when the second argument a negative number, as > follows, > {code:java} > hive> select lpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > hive> select rpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > {code} > Maybe we should return friendly result such as mysql. > {code:java} > mysql> select lpad("hello", -1 ,"h"); > +--+ > | lpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > mysql> select rpad("hello", -1 ,"h"); > +--+ > | rpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16975) Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is now used
[ https://issues.apache.org/jira/browse/HIVE-16975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083662#comment-16083662 ] Hive QA commented on HIVE-16975: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876748/HIVE-16975.1.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5975/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5975/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5975/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876748 - PreCommit-HIVE-Build > Vectorization: Fully vectorize CAST date as TIMESTAMP so VectorUDFAdaptor is > now used > - > > Key: HIVE-16975 > URL: https://issues.apache.org/jira/browse/HIVE-16975 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Teddy Choi >Priority: Critical > Attachments: HIVE-16975.1.patch > > > Fix VectorUDFAdaptor(CAST(d_date as TIMESTAMP)) to be native. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17077) Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len character's value is negative number
[ https://issues.apache.org/jira/browse/HIVE-17077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083673#comment-16083673 ] Lingang Deng commented on HIVE-17077: - CC [~sershe] > Hive should raise StringIndexOutOfBoundsException when LPAD/RPAD len > character's value is negative number > - > > Key: HIVE-17077 > URL: https://issues.apache.org/jira/browse/HIVE-17077 > Project: Hive > Issue Type: Bug >Reporter: Lingang Deng >Assignee: Lingang Deng >Priority: Minor > > lpad(rpad) throw a exception when the second argument a negative number, as > follows, > {code:java} > hive> select lpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > hive> select rpad("hello", -1 ,"h"); > FAILED: StringIndexOutOfBoundsException String index out of range: -1 > {code} > Maybe we should return friendly result such as mysql. > {code:java} > mysql> select lpad("hello", -1 ,"h"); > +--+ > | lpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > mysql> select rpad("hello", -1 ,"h"); > +--+ > | rpad("hello", -1 ,"h") | > +--+ > | NULL | > +--+ > 1 row in set (0.00 sec) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16917) HiveServer2 guard rails - Limit concurrent connections from user
[ https://issues.apache.org/jira/browse/HIVE-16917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083729#comment-16083729 ] Aengus Rooney commented on HIVE-16917: -- Thanks Thejas, discussed this with the team and it was agreed a User+IP combination would also be beneficial, as multiple apps use the same user for certain applications. The default thresholds are acceptable, so long as they are configurable. > HiveServer2 guard rails - Limit concurrent connections from user > > > Key: HIVE-16917 > URL: https://issues.apache.org/jira/browse/HIVE-16917 > Project: Hive > Issue Type: New Feature > Components: HiveServer2 >Reporter: Thejas M Nair > > Rogue applications can make HS2 unusable for others by making too many > connections at a time. > HS2 should start rejecting the number of connections from a user, after it > has reached a configurable threshold. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17019) Add support to download debugging information as an archive.
[ https://issues.apache.org/jira/browse/HIVE-17019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083774#comment-16083774 ] Harish Jaiprakash commented on HIVE-17019: -- Thanks [~sseth]. - Change the top level package from llap-debug to tez-debug? (Works with both I believe) [~ashutoshc], [~thejas] - any recommendations on whether the code gets a top level module, or goes under an existing module. This allows downloading of various debug artifacts for a tez job - logs, metrics for llap, hiveserver2 logs (soon), tez am logs, ATS data for the query (hive and tez). Will change the directory. - In the new pom.xml, dependency on hive-llap-server. 1) Is it required?, 2) Will need to exclude some dependent artifacts. See service/pom.xml llap-server dependency handling The llap status is fetched using LlapStatusServiceDriver which is part of hive-llap-server. - LogDownloadServlet - Should this throw an error as soon as the filename pattern validation fails? The filename check is to prevent any injection attack into the file name/http header, not to validate the id. - LogDownloadServlet - change to dagId/queryId validation instead Can do, but it will be sensitive to changes to the id format. Currently its passed down to ATS and nothing will be retrieved for it. - LogDownloadServlet - thread being created inside of the request handler? This should be limited outside of the request? so that only a controlled number of parallel artifact downloads can run. Creating a shared executor, does it make sense to use Guava's direct executor, which will schedule task in current thread. - LogDownloadServlet - what happens in case of aggregator failure? Exception back to the user? Jetty will handle the exception, returning 500 to the user. Not sure if exception trace is part of it. Will try and see. - LogDownloadServlet - seems to be generating the file to disk and then streaming it over. Can this be streamed over directly instead. Otherwise there's the possibility of leaking files. (Artifact.downloadIntoStream or some such?) Guessing this is complicated further by the multi-threaded artifact downloader. Alternately need to have a cleanup mechanism. For streaming directly, it would not be possible because of multithreading. If its single threaded then I can use a ZipOutputStream and add entry one at a time. Oops, sorry the finally got moved down since aggregator had to be closed before streaming the file. I'll handle it using a try finally to cleanup. - Timeout on the tests Setting timeouts on tests. - Apache header needs to be added to files where it is missing. Sorry, will add the licence header to all files. - Main - Please rename to something more indicative of what the tool does. I was planning to remove this and integrate with hive cli, --service . This does not work without lot of classpath fixes, or I'll have to create a script to add hive jars. - Main - Likely a follow up jira - parse using a standard library, instead of trying to parse the arguments to main directly. Will check a few libs, apache commons OptionBuilder uses a static instance in its builder. Should be ok, for a cli based invoke once app, but will look at something better on lines of python argparse. - Server - Enabling the artifact should be controlled via a config. Does not always need to be hosted in HS2 (Default disabled, at least till security can be sorted out) I'll add a config. - Is it possible to support a timeout on the downloads? (Can be a follow up jira) Sure, will do. Global or per download or both? - ArtifactAggregator - I believe this does 2 stages of dependent artifacts / downloads? Stage1 - download whatever it can. Information from this should should be adequate for stage2 downloads ? It could be more stages: Ex: given dag_id stage 1: will fetch tez ats info which is used to extract hive id, task container/node list. stage 2: will fetch hive ats info, tez container log list. stage 3: llap containers log list, tez task logs. stage 4: llap container logs. aggregator iterates through the list of sources and finds those which can download using info in the params. It schedules the sources and waits for them to complete everything and the repeats. Stop if no new sources could download or all sources are exhausted. - For the ones not implemented yet (DummyArtifact) - think it's better to just comment out the code, instead of invoking the DummyArtifacts downloader Sorry, will do. - Security - ACL enforcement required on secure clusters to make sure users can only download what they have access to. This is a must fix before this can be enabled by default. Working on this. - Security - this can work around yarn restrictions on log downloads, since the files are being accessed by the hive user. Yes this should work. Could you please add some details on cluster testing. I'll add ano
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083727#comment-16083727 ] Hive QA commented on HIVE-17066: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876771/HIVE-17066.5.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10839 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5976/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5976/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5976/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876771 - PreCommit-HIVE-Build > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17038) invalid result when CAST-ing to DATE
[ https://issues.apache.org/jira/browse/HIVE-17038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083775#comment-16083775 ] ASF GitHub Bot commented on HIVE-17038: --- GitHub user mlorek opened a pull request: https://github.com/apache/hive/pull/204 HIVE-17038 - DateParser fix You can merge this pull request into a Git repository by running: $ git pull https://github.com/mlorek/hive master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hive/pull/204.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #204 commit 1e4eb6a870af2ce3ec1e5116a81e6453f9fc990d Author: Michael Lorek Date: 2017-07-12T10:30:22Z HIVE-17038 - DateParser fix > invalid result when CAST-ing to DATE > > > Key: HIVE-17038 > URL: https://issues.apache.org/jira/browse/HIVE-17038 > Project: Hive > Issue Type: Bug > Components: CLI, Hive >Affects Versions: 1.2.1 >Reporter: Jim Hopper > > when casting incorrect date literals to DATE data type hive returns wrong > values instead of NULL. > {code} > SELECT CAST('2017-02-31' AS DATE); > SELECT CAST('2017-04-31' AS DATE); > {code} > Some examples below where it really can produce weird results: > {code} > select * > from ( > select cast('2017-07-01' as date) as dt > ) as t > where t.dt = '2017-06-31'; > select * > from ( > select cast('2017-07-01' as date) as dt > ) as t > where t.dt = cast('2017-06-31' as date); > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17073: --- Attachment: HIVE-17073.03.patch > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.03.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators
[ https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083799#comment-16083799 ] Hive QA commented on HIVE-16100: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876773/HIVE-16100.4.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 10839 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[multi_insert_move_tasks_share_dependencies] (batchId=52) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reducesink_dedup] (batchId=22) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype] (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5977/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5977/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5977/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876773 - PreCommit-HIVE-Build > Dynamic Sorted Partition optimizer loses sibling operators > -- > > Key: HIVE-16100 > URL: https://issues.apache.org/jira/browse/HIVE-16100 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.1, 2.1.1, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, > HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173 > {code} > // unlink connection between FS and its parent > fsParent = fsOp.getParentOperators().get(0); > fsParent.getChildOperators().clear(); > {code} > The optimizer discards any cases where the fsParent has another SEL child -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-12631: -- Attachment: HIVE-12631.16.patch > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.1.patch, HIVE-12631.2.patch, HIVE-12631.3.patch, > HIVE-12631.4.patch, HIVE-12631.5.patch, HIVE-12631.6.patch, > HIVE-12631.7.patch, HIVE-12631.8.patch, HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16989) Fix some issues identified by lgtm.com
[ https://issues.apache.org/jira/browse/HIVE-16989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083863#comment-16083863 ] Hive QA commented on HIVE-16989: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876774/HIVE-16989.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hadoop.hive.llap.security.TestLlapSignerImpl.testSigning (batchId=289) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5978/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5978/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5978/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876774 - PreCommit-HIVE-Build > Fix some issues identified by lgtm.com > -- > > Key: HIVE-16989 > URL: https://issues.apache.org/jira/browse/HIVE-16989 > Project: Hive > Issue Type: Improvement >Reporter: Malcolm Taylor >Assignee: Malcolm Taylor > Attachments: HIVE-16989.2.patch, HIVE-16989.3.patch, > HIVE-16989.4.patch, HIVE-16989.patch > > > [lgtm.com|https://lgtm.com] has identified a number of issues where there may > be scope for improvement. The plan is to address some of the alerts found at > [https://lgtm.com/projects/g/apache/hive/]. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16083925#comment-16083925 ] Hive QA commented on HIVE-17073: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876812/HIVE-17073.03.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.ql.exec.vector.TestVectorSelectOperator.testSelectOperator (batchId=272) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5979/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5979/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5979/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876812 - PreCommit-HIVE-Build > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.03.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi reassigned HIVE-17078: - Assignee: Yibing Shi > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Status: Patch Available (was: Open) > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.1.patch Attach a quick patch. No tests are added, because this feature seems not be able to be tested in mini cluster. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084016#comment-16084016 ] Hive QA commented on HIVE-17078: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876842/HIVE-17078.1.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5981/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5981/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5981/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ date '+%Y-%m-%d %T.%3N' 2017-07-12 13:54:51.943 + [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]] + export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64 + export PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'MAVEN_OPTS=-Xmx1g ' + MAVEN_OPTS='-Xmx1g ' + cd /data/hiveptest/working/ + tee /data/hiveptest/logs/PreCommit-HIVE-Build-5981/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ git = \s\v\n ]] + [[ git = \g\i\t ]] + [[ -z master ]] + [[ -d apache-github-source-source ]] + [[ ! -d apache-github-source-source/.git ]] + [[ ! -d apache-github-source-source ]] + date '+%Y-%m-%d %T.%3N' 2017-07-12 13:54:51.946 + cd apache-github-source-source + git fetch origin + git reset --hard HEAD HEAD is now at 26d6de7 HIVE-16730: Vectorization: Schema Evolution for Text Vectorization / Complex Types + git clean -f -d Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderAdaptor.java Removing ql/src/java/org/apache/hadoop/hive/ql/io/orc/VectorizedOrcAcidRowBatchReader.java.orig Removing ql/src/test/queries/clientpositive/llap_acid_fast.q Removing ql/src/test/results/clientpositive/llap/llap_acid.q.out Removing ql/src/test/results/clientpositive/llap/llap_acid_fast.q.out Removing ql/src/test/results/clientpositive/llap_acid_fast.q.out + git checkout master Already on 'master' Your branch is up-to-date with 'origin/master'. + git reset --hard origin/master HEAD is now at 26d6de7 HIVE-16730: Vectorization: Schema Evolution for Text Vectorization / Complex Types + git merge --ff-only origin/master Already up-to-date. + date '+%Y-%m-%d %T.%3N' 2017-07-12 13:54:57.818 + patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hiveptest/working/scratch/build.patch + [[ -f /data/hiveptest/working/scratch/build.patch ]] + chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh + /data/hiveptest/working/scratch/smart-apply-patch.sh /data/hiveptest/working/scratch/build.patch error: patch failed: ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java:36 error: ql/src/java/org/apache/hadoop/hive/ql/exec/mr/MapredLocalTask.java: patch does not apply The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12876842 - PreCommit-HIVE-Build > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084008#comment-16084008 ] Hive QA commented on HIVE-12631: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876820/HIVE-12631.16.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 35 failed/errored test(s), 10842 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_vectorization] (batchId=61) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_acid_fast] (batchId=38) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_reader] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[llap_uncompressed] (batchId=56) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters1] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_llap_counters] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_schema_evol_3a] (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast] (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testACIDwithSchemaEvolutionAndCompaction (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testAlterTable (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testBucketizedInputFormat (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testDeleteIn (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testETLSplitStrategyForACID (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge2 (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMerge3 (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testMultiInsertStatement (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNoHistory (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion1 (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion2 (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testNonAcidToAcidConversion3 (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testOrcNoPPD (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testOrcPPD (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.testUpdateMixedCase (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.updateDeletePartitioned (batchId=277) org.apache.hadoop.hive.ql.TestTxnCommands2WithSplitUpdateAndVectorization.writeBetweenWorkerAndCleaner (batchId=277) org.apache.hadoop.hive.ql.io.orc.TestVectorizedOrcAcidRowBatchReader.testVectorizedOrcAcidRowBatchReader (batchId=260) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5980/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5980/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5980/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 35 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876820 - PreCommit-HIVE-Build > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive >
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szita updated HIVE-8838: - Attachment: HIVE-8838.4.patch > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szita updated HIVE-8838: - Status: In Progress (was: Patch Available) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Szita updated HIVE-8838: - Status: Patch Available (was: In Progress) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084074#comment-16084074 ] Adam Szita commented on HIVE-8838: -- [~spena] I've addressed your comments in [^HIVE-8838.4.patch] > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16831) Add unit tests for NPE fixes in HIVE-12054
[ https://issues.apache.org/jira/browse/HIVE-16831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Voros updated HIVE-16831: Issue Type: Test (was: Bug) > Add unit tests for NPE fixes in HIVE-12054 > -- > > Key: HIVE-16831 > URL: https://issues.apache.org/jira/browse/HIVE-16831 > Project: Hive > Issue Type: Test > Components: Hive >Reporter: Sunitha Beeram >Assignee: Sunitha Beeram > Fix For: 3.0.0 > > Attachments: HIVE-16831.1.patch, HIVE-16831.2.patch > > > HIVE-12054 fixed NPE issues related to ObjectInspector which get triggered > when an empty ORC table/partition is read. > This work adds tests that trigger that path. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-17078: -- Attachment: HIVE-17078.2.patch Recreate the patch > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16880) Remove ArrayList Instantiation For Empty Arrays
[ https://issues.apache.org/jira/browse/HIVE-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084108#comment-16084108 ] Daniel Voros commented on HIVE-16880: - [~belugabehr] are you sure that using immutable lists won't affect the behavior in these cases? For example [here|https://github.com/apache/hive/commit/8f004997025f032242a5b2db4c6baf9256e0ecbd#diff-6b5f3b952d1387946d488fc2d4432ee1R1228] in the constructor of AggrStats the list will be stored as a field and we might try to add to it later. > Remove ArrayList Instantiation For Empty Arrays > --- > > Key: HIVE-16880 > URL: https://issues.apache.org/jira/browse/HIVE-16880 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.1, 3.0.0 >Reporter: BELUGA BEHR >Assignee: BELUGA BEHR >Priority: Trivial > Fix For: 3.0.0 > > Attachments: HIVE-16880.1.patch, HIVE-16880.2.patch > > > Class {{org.apache.hadoop.hive.metastore.MetaStoreDirectSql}} uses a lot of > empty arrays in the code. Please replace with a static empty array instead > of all the instantiation. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084144#comment-16084144 ] Sergio Peña commented on HIVE-8838: --- Thanks, LGTM +1 > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-4577: -- Attachment: HIVE-4577.6.patch > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16402) Upgrade to Hadoop 2.8.0
[ https://issues.apache.org/jira/browse/HIVE-16402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-16402: Fix Version/s: 2.2.0 > Upgrade to Hadoop 2.8.0 > --- > > Key: HIVE-16402 > URL: https://issues.apache.org/jira/browse/HIVE-16402 > Project: Hive > Issue Type: Bug >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.2.0, 3.0.0 > > Attachments: HIVE-16402.1.patch, HIVE-16402.2.patch, > HIVE-16402.3.patch, HIVE-16402.4.patch, HIVE-16402.5.patch, > HIVE-16402.6.patch, HIVE-16402.7.patch > > > Hadoop 2.8.0 has been out since March, we should upgrade to it. Release notes > for Hadoop 2.8.x are here: http://hadoop.apache.org/docs/r2.8.0/index.html > It has a number of useful features, improvements for S3 support, ADLS > support, etc. along with a bunch of other fixes. This should also help us on > our way to upgrading to Hadoop 3.x (HIVE-15016). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-17072: - Attachment: HIVE-17072.1.patch > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Attachments: HIVE-17072.1.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084174#comment-16084174 ] Bing Li commented on HIVE-4577: --- Thank you, [~vgumashta]. I could reproduce TestPerfCliDriver [query14] in my env, and update its golden file. The failure of TestMiniLlapLocalCliDriver[vector_if_expr] and TestBeeLineDriver[materialized_view_create_rewrite] should not caused by this patch. > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16922: --- Attachment: (was: HIVE-16922.2.patch) > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-16922: --- Attachment: HIVE-16922.2.patch > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084183#comment-16084183 ] Bing Li commented on HIVE-16922: Thank you, [~lirui]. Seems that the result page has been expired. Just re-submitted the patch to check. > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marta Kuczora updated HIVE-17072: - Status: Patch Available (was: Open) > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Attachments: HIVE-17072.1.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17073) Incorrect result with vectorization and SharedWorkOptimizer
[ https://issues.apache.org/jira/browse/HIVE-17073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-17073: --- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Fixed TestVectorSelectOperator and pushed to master, thanks for reviewing [~mmccline]! > Incorrect result with vectorization and SharedWorkOptimizer > --- > > Key: HIVE-17073 > URL: https://issues.apache.org/jira/browse/HIVE-17073 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 3.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 3.0.0 > > Attachments: HIVE-17073.01.patch, HIVE-17073.02.patch, > HIVE-17073.03.patch, HIVE-17073.patch > > > We get incorrect result with vectorization and multi-output Select operator > created by SharedWorkOptimizer. It can be reproduced in the following way. > {code:title=Correct} > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278"; > OK > 2 > {code} > {code:title=Correct} > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255"; > OK > 2 > {code} > {code:title=Incorrect} > select * from ( > select count(*) as h8_30_to_9 > from src > join src1 on src.key = src1.key > where src1.value = "val_278") s1 > join ( > select count(*) as h9_to_9_30 > from src > join src1 on src.key = src1.key > where src1.value = "val_255") s2; > OK > 2 0 > {code} > Problem seems to be that some ds in the batch row need to be re-initialized > after they have been forwarded to each output. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084197#comment-16084197 ] Hive QA commented on HIVE-8838: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876860/HIVE-8838.4.patch {color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10873 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5982/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5982/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876860 - PreCommit-HIVE-Build > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084226#comment-16084226 ] Adam Szita commented on HIVE-8838: -- Test results above are irrelevant again - I think this is ready for commit > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: Bug >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-13384) Failed to create HiveMetaStoreClient object with proxy user when Kerberos enabled
[ https://issues.apache.org/jira/browse/HIVE-13384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li updated HIVE-13384: --- Description: I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) But found that it can't new a HiveMetaStoreClient object successfully via a proxy user in Kerberos env. === 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) == When I debugging on Hive, I found that the error came from open() method in HiveMetaStoreClient class. Around line 406, transport = UserGroupInformation.getCurrentUser().doAs(new PrivilegedExceptionAction() { //FAILED, because the current user doesn't have the cridential But it will work if I change above line to transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new PrivilegedExceptionAction() { //PASS I found DRILL-3413 fixes this error in Drill side as a workaround. But if I submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when initialize the object via HCatalog. It would be better to fix this issue in Hive side. was: I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) But found that it can't new a HiveMetaStoreClient object successfully via a proxy using in Kerberos env. === 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) == When I debugging on Hive, I found that the error came from open() method in HiveMetaStoreClient class. Around line 406, transport = UserGroupInformation.getCurrentUser().doAs(new PrivilegedExceptionAction() { //FAILED, because the current user doesn't have the cridential But it will work if I change above line to transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new PrivilegedExceptionAction() { //PASS I found DRILL-3413 fixes this error in Drill side as a workaround. But if I submit a mapreduce job via Pig/HCatalog, it runs into the same issue again when initialize the object via HCatalog. It would be better to fix this issue in Hive side. > Failed to create HiveMetaStoreClient object with proxy user when Kerberos > enabled > - > > Key: HIVE-13384 > URL: https://issues.apache.org/jira/browse/HIVE-13384 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 1.2.0, 1.2.1 >Reporter: Bing Li >Assignee: Bing Li > > I wrote a Java client to talk with HiveMetaStore. (Hive 1.2.0) > But found that it can't new a HiveMetaStoreClient object successfully via a > proxy user in Kerberos env. > === > 15/10/13 00:14:38 ERROR transport.TSaslTransport: SASL negotiation failure > javax.security.sasl.SaslException: GSS initiate failed [Caused by > GSSException: No valid credentials provided (Mechanism level: Failed to find > any Kerberos tgt)] > at > com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) > at > org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94) > at > org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:271) > == > When I debugging on Hive, I found that the error came from open() method in > HiveMetaStoreClient class. > Around line 406, > transport = UserGroupInformation.getCurrentUser().doAs(new > PrivilegedExceptionAction() { //FAILED, because the current user > doesn't have the cridential > But it will work if I change above line to > transport = UserGroupInformation.getCurrentUser().getRealUser().doAs(new > PrivilegedExceptionAction() { //PASS > I found DRILL-3413 fixes this error in Drill side as a workaround. But if I > submit a mapreduce job via Pig/HCatalog, it runs into the same issue again > when initialize the object
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084245#comment-16084245 ] Bing Li commented on HIVE-16907: [~pxiong] and [~lirui], thank you for your comments. I tried CREATE TABLE statement in MySQL, and found that it treats the `db.tbl` as the table name. And "dot" is allowed in the table name. e.g. {code:java} mysql> create table xxx (col int); mysql> create table test.yyy (col int); mysql> create table `test.zzz` (col int); mysql> create table `test.test.tbl` (col int); mysql> show tables; ++ | Tables_in_test | ++ | test.test.tbl | | test.zzz | | xxx| | yyy| ++ {code} Back to Hive, if we would like to make it having the same behavior as MySQL, we should change the logic of processing it. My previous patch is NOT enough and can't handle `db.db.tbl` neither. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0
[jira] [Assigned] (HIVE-16999) Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource
[ https://issues.apache.org/jira/browse/HIVE-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Li reassigned HIVE-16999: -- Assignee: Bing Li > Performance bottleneck in the ADD FILE/ARCHIVE commands for an HDFS resource > > > Key: HIVE-16999 > URL: https://issues.apache.org/jira/browse/HIVE-16999 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Sailee Jain >Assignee: Bing Li >Priority: Critical > > Performance bottleneck is found in adding resource[which is lying on HDFS] to > the distributed cache. > Commands used are :- > {code:java} > 1. ADD ARCHIVE "hdfs://some_dir/archive.tar" > 2. ADD FILE "hdfs://some_dir/file.txt" > {code} > Here is the log corresponding to the archive adding operation:- > {noformat} > converting to local hdfs://some_dir/archive.tar > Added resources: [hdfs://some_dir/archive.tar > {noformat} > Hive is downloading the resource to the local filesystem [shown in log by > "converting to local"]. > {color:#d04437}Ideally there is no need to bring the file to the local > filesystem when this operation is all about copying the file from one > location on HDFS to other location on HDFS[distributed cache].{color} > This adds lot of performance bottleneck when the the resource is a big file > and all commands need the same resource. > After debugging around the impacted piece of code is found to be :- > {code:java} > public List add_resources(ResourceType t, Collection values, > boolean convertToUnix) > throws RuntimeException { > Set resourceSet = resourceMaps.getResourceSet(t); > Map> resourcePathMap = > resourceMaps.getResourcePathMap(t); > Map> reverseResourcePathMap = > resourceMaps.getReverseResourcePathMap(t); > List localized = new ArrayList(); > try { > for (String value : values) { > String key; > {color:#d04437}//get the local path of downloaded jars{color} > List downloadedURLs = resolveAndDownload(t, value, > convertToUnix); > ; > . > {code} > {code:java} > List resolveAndDownload(ResourceType t, String value, boolean > convertToUnix) throws URISyntaxException, > IOException { > URI uri = createURI(value); > if (getURLType(value).equals("file")) { > return Arrays.asList(uri); > } else if (getURLType(value).equals("ivy")) { > return dependencyResolver.downloadDependencies(uri); > } else { // goes here for HDFS > return Arrays.asList(createURI(downloadResource(value, > convertToUnix))); // Here when the resource is not local it will download it > to the local machine. > } > } > {code} > Here, the function resolveAndDownload() always calls the downloadResource() > api in case of external filesystem. It should take into consideration the > fact that - when the resource is on same HDFS then bringing it on local > machine is not a needed step and can be skipped for better performance. > Thanks, > Sailee -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084287#comment-16084287 ] Hive QA commented on HIVE-17078: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876861/HIVE-17078.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testConcurrentStatements (batchId=226) org.apache.hive.jdbc.TestSSL.testMetastoreWithSSL (batchId=223) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5983/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5983/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876861 - PreCommit-HIVE-Build > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16732: -- Attachment: HIVE-16732.03-branch-2.patch > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-4577) hive CLI can't handle hadoop dfs command with space and quotes.
[ https://issues.apache.org/jira/browse/HIVE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084376#comment-16084376 ] Hive QA commented on HIVE-4577: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876871/HIVE-4577.6.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10841 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_12] (batchId=237) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dfscmd] (batchId=33) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[mergejoin] (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5984/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5984/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876871 - PreCommit-HIVE-Build > hive CLI can't handle hadoop dfs command with space and quotes. > > > Key: HIVE-4577 > URL: https://issues.apache.org/jira/browse/HIVE-4577 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 0.9.0, 0.10.0, 0.14.0, 0.13.1, 1.2.0, 1.1.0 >Reporter: Bing Li >Assignee: Bing Li > Attachments: HIVE-4577.1.patch, HIVE-4577.2.patch, > HIVE-4577.3.patch.txt, HIVE-4577.4.patch, HIVE-4577.5.patch, HIVE-4577.6.patch > > > As design, hive could support hadoop dfs command in hive shell, like > hive> dfs -mkdir /user/biadmin/mydir; > but has different behavior with hadoop if the path contains space and quotes > hive> dfs -mkdir "hello"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:40 > /user/biadmin/"hello" > hive> dfs -mkdir 'world'; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:43 > /user/biadmin/'world' > hive> dfs -mkdir "bei jing"; > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/"bei > drwxr-xr-x - biadmin supergroup 0 2013-04-23 09:44 > /user/biadmin/jing" -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16812) VectorizedOrcAcidRowBatchReader doesn't filter delete events
[ https://issues.apache.org/jira/browse/HIVE-16812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16812: -- Priority: Critical (was: Major) > VectorizedOrcAcidRowBatchReader doesn't filter delete events > > > Key: HIVE-16812 > URL: https://issues.apache.org/jira/browse/HIVE-16812 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 2.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > > the c'tor of VectorizedOrcAcidRowBatchReader has > {noformat} > // Clone readerOptions for deleteEvents. > Reader.Options deleteEventReaderOptions = readerOptions.clone(); > // Set the range on the deleteEventReaderOptions to 0 to INTEGER_MAX > because > // we always want to read all the delete delta files. > deleteEventReaderOptions.range(0, Long.MAX_VALUE); > {noformat} > This is suboptimal since base and deltas are sorted by ROW__ID. So for each > split if base we can find min/max ROW_ID and only load events from delta that > are in [min,max] range. This will reduce the number of delete events we load > in memory (to no more than there in the split). > When we support sorting on PK, the same should apply but we'd need to make > sure to store PKs in ORC index > See OrcRawRecordMerger.discoverKeyBounds() -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084360#comment-16084360 ] Eugene Koifman commented on HIVE-16177: --- HIVE-16177.20-branch-2.patch committed to branch-2 (2.x) > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, > HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16177) non Acid to acid conversion doesn't handle _copy_N files
[ https://issues.apache.org/jira/browse/HIVE-16177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16177: -- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) > non Acid to acid conversion doesn't handle _copy_N files > > > Key: HIVE-16177 > URL: https://issues.apache.org/jira/browse/HIVE-16177 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 0.14.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16177.01.patch, HIVE-16177.02.patch, > HIVE-16177.04.patch, HIVE-16177.07.patch, HIVE-16177.08.patch, > HIVE-16177.09.patch, HIVE-16177.10.patch, HIVE-16177.11.patch, > HIVE-16177.14.patch, HIVE-16177.15.patch, HIVE-16177.16.patch, > HIVE-16177.17.patch, HIVE-16177.18-branch-2.patch, HIVE-16177.18.patch, > HIVE-16177.19-branch-2.patch, HIVE-16177.20-branch-2.patch > > > {noformat} > create table T(a int, b int) clustered by (a) into 2 buckets stored as orc > TBLPROPERTIES('transactional'='false') > insert into T(a,b) values(1,2) > insert into T(a,b) values(1,3) > alter table T SET TBLPROPERTIES ('transactional'='true') > {noformat} > //we should now have bucket files 01_0 and 01_0_copy_1 > but OrcRawRecordMerger.OriginalReaderPair.next() doesn't know that there can > be copy_N files and numbers rows in each bucket from 0 thus generating > duplicate IDs > {noformat} > select ROW__ID, INPUT__FILE__NAME, a, b from T > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0,1,2 > {"transactionid\":0,"bucketid":1,"rowid":0},file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommands.../warehouse/nonacidorctbl/01_0_copy_1,1,3 > {noformat} > [~owen.omalley], do you have any thoughts on a good way to handle this? > attached patch has a few changes to make Acid even recognize copy_N but this > is just a pre-requisite. The new UT demonstrates the issue. > Futhermore, > {noformat} > alter table T compact 'major' > select ROW__ID, INPUT__FILE__NAME, a, b from T order by b > {noformat} > produces > {noformat} > {"transactionid":0,"bucketid":1,"rowid":0} > file:/Users/ekoifman/dev/hiverwgit/ql/target/tmp/org.apache.hadoop.hive.ql.TestTxnCommandswarehouse/nonacidorctbl/base_-9223372036854775808/bucket_1 > 1 2 > {noformat} > HIVE-16177.04.patch has TestTxnCommands.testNonAcidToAcidConversion0() > demonstrating this > This is because compactor doesn't handle copy_N files either (skips them) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8838: -- Issue Type: New Feature (was: Bug) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran reassigned HIVE-17079: > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084432#comment-16084432 ] Vineet Garg commented on HIVE-17066: Pushed to master > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084434#comment-16084434 ] Sushanth Sowmyan commented on HIVE-8838: ([~spena], I just pushed to master too, hopefully our pushes don't conflict :D ) > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Resolution: Fixed Status: Resolved (was: Patch Available) > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Fix For: 3.0.0 > > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084425#comment-16084425 ] Vineet Garg commented on HIVE-17066: Failures are not reproducible/un-related > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-8838: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Thanks [~szita] for your contribution. I committed to master. > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17066) Query78 filter wrong estimatation is generating bad plan
[ https://issues.apache.org/jira/browse/HIVE-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-17066: --- Fix Version/s: 3.0.0 > Query78 filter wrong estimatation is generating bad plan > > > Key: HIVE-17066 > URL: https://issues.apache.org/jira/browse/HIVE-17066 > Project: Hive > Issue Type: Bug >Reporter: Vineet Garg >Assignee: Vineet Garg > Fix For: 3.0.0 > > Attachments: HIVE-17066.1.patch, HIVE-17066.2.patch, > HIVE-17066.3.patch, HIVE-17066.4.patch, HIVE-17066.5.patch > > > Filter operator is estimating 1 row following a left outer join causing bad > estimates > {noformat} > Reducer 12 > Execution mode: vectorized, llap > Reduce Operator Tree: > Map Join Operator > condition map: > Left Outer Join0 to 1 > keys: > 0 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > 1 KEY.reducesinkkey0 (type: bigint), KEY.reducesinkkey1 > (type: bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, _col6, > _col8 > input vertices: > 1 Map 14 > Statistics: Num rows: 71676270660 Data size: 3727166074320 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: _col8 is null (type: boolean) > Statistics: Num rows: 1 Data size: 52 Basic stats: COMPLETE > Column stats: COMPLETE > Select Operator > expressions: _col0 (type: bigint), _col1 (type: bigint), > _col3 (type: int), _col4 (type: double), _col5 (type: double), _col6 (type: > bigint) > outputColumnNames: _col0, _col1, _col3, _col4, _col5, > _col6 > Statistics: Num rows: 1 Data size: 52 Basic stats: > COMPLETE Column stats: COMPLETE > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16832: -- Attachment: HIVE-16832.22.patch > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17079: - Status: Patch Available (was: Open) > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17079: - Attachment: HIVE-17079.1.patch [~sseth] can you please take a look? small change > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084438#comment-16084438 ] Gopal V commented on HIVE-17079: LGTM - +1 > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16821) Vectorization: support Explain Analyze in vectorized mode
[ https://issues.apache.org/jira/browse/HIVE-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16821: --- Status: Open (was: Patch Available) Temporarily obsoleted by HIVE-17073 > Vectorization: support Explain Analyze in vectorized mode > - > > Key: HIVE-16821 > URL: https://issues.apache.org/jira/browse/HIVE-16821 > Project: Hive > Issue Type: Bug > Components: Diagnosability, Vectorization >Affects Versions: 2.1.1, 3.0.0 >Reporter: Gopal V >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16821.1.patch, HIVE-16821.2.patch, > HIVE-16821.2.patch, HIVE-16821.3.patch > > > Currently, to avoid a branch in the operator inner loop - the runtime stats > are only available in non-vector mode. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-12631) LLAP: support ORC ACID tables
[ https://issues.apache.org/jira/browse/HIVE-12631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Teddy Choi updated HIVE-12631: -- Attachment: HIVE-12631.17.patch > LLAP: support ORC ACID tables > - > > Key: HIVE-12631 > URL: https://issues.apache.org/jira/browse/HIVE-12631 > Project: Hive > Issue Type: Bug > Components: llap, Transactions >Reporter: Sergey Shelukhin >Assignee: Teddy Choi > Attachments: HIVE-12631.10.patch, HIVE-12631.10.patch, > HIVE-12631.11.patch, HIVE-12631.11.patch, HIVE-12631.12.patch, > HIVE-12631.13.patch, HIVE-12631.15.patch, HIVE-12631.16.patch, > HIVE-12631.17.patch, HIVE-12631.1.patch, HIVE-12631.2.patch, > HIVE-12631.3.patch, HIVE-12631.4.patch, HIVE-12631.5.patch, > HIVE-12631.6.patch, HIVE-12631.7.patch, HIVE-12631.8.patch, > HIVE-12631.8.patch, HIVE-12631.9.patch > > > LLAP uses a completely separate read path in ORC to allow for caching and > parallelization of reads and processing. This path does not support ACID. As > far as I remember ACID logic is embedded inside ORC format; we need to > refactor it to be on top of some interface, if practical; or just port it to > LLAP read path. > Another consideration is how the logic will work with cache. The cache is > currently low-level (CB-level in ORC), so we could just use it to read bases > and deltas (deltas should be cached with higher priority) and merge as usual. > We could also cache merged representation in future. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084462#comment-16084462 ] Sahil Takiar commented on HIVE-17078: - If we are printing the child stdout / stderr to the Hive logs then do we need to also print them to Hive stdout / stderr too? > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17072) Make the parallelized timeout configurable in BeeLine tests
[ https://issues.apache.org/jira/browse/HIVE-17072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084479#comment-16084479 ] Hive QA commented on HIVE-17072: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876874/HIVE-17072.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10840 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] (batchId=94) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5985/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5985/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876874 - PreCommit-HIVE-Build > Make the parallelized timeout configurable in BeeLine tests > --- > > Key: HIVE-17072 > URL: https://issues.apache.org/jira/browse/HIVE-17072 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Reporter: Marta Kuczora >Assignee: Marta Kuczora >Priority: Minor > Attachments: HIVE-17072.1.patch > > > When running the BeeLine tests parallel, the timeout is hardcoded in the > Parallelized.java: > {noformat} > @Override > public void finished() { > executor.shutdown(); > try { > executor.awaitTermination(10, TimeUnit.MINUTES); > } catch (InterruptedException exc) { > throw new RuntimeException(exc); > } > } > {noformat} > It would be better to make it configurable. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16922) Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim"
[ https://issues.apache.org/jira/browse/HIVE-16922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084566#comment-16084566 ] Hive QA commented on HIVE-16922: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876876/HIVE-16922.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10874 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5986/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5986/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876876 - PreCommit-HIVE-Build > Typo in serde.thrift: COLLECTION_DELIM = "colelction.delim" > --- > > Key: HIVE-16922 > URL: https://issues.apache.org/jira/browse/HIVE-16922 > Project: Hive > Issue Type: Bug > Components: Thrift API >Reporter: Dudu Markovitz >Assignee: Bing Li > Attachments: HIVE-16922.1.patch, HIVE-16922.2.patch > > > https://github.com/apache/hive/blob/master/serde/if/serde.thrift > Typo in serde.thrift: > COLLECTION_DELIM = "colelction.delim" > (*colelction* instead of *collection*) -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084585#comment-16084585 ] Sergio Peña commented on HIVE-8838: --- Aaa, that's what happened haha, I tried to push it when I got an error that I had to update my local repo, and when I updated it, I saw the patch was already there, then I got confused. Anyway, no worries, thanks for the heads up. > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614 ] Pengcheng Xiong commented on HIVE-16907: That is exactly what I am worrying about : Hive may not well support table name with ".". Could u evaluate the work that we need to do if we want to support this? Thanks. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator > | > | compressed: false >
[jira] [Comment Edited] (HIVE-16907) "INSERT INTO" overwrite old data when destination table encapsulated by backquote
[ https://issues.apache.org/jira/browse/HIVE-16907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084614#comment-16084614 ] Pengcheng Xiong edited comment on HIVE-16907 at 7/12/17 8:18 PM: - That is exactly what I am worrying about : Hive may not well support table name with ".". Could u estimate the work that we need to do if we want to support this? Thanks. was (Author: pxiong): That is exactly what I am worrying about : Hive may not well support table name with ".". Could u evaluate the work that we need to do if we want to support this? Thanks. > "INSERT INTO" overwrite old data when destination table encapsulated by > backquote > > > Key: HIVE-16907 > URL: https://issues.apache.org/jira/browse/HIVE-16907 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 1.1.0, 2.1.1 >Reporter: Nemon Lou >Assignee: Bing Li > Attachments: HIVE-16907.1.patch > > > A way to reproduce: > {noformat} > create database tdb; > use tdb; > create table t1(id int); > create table t2(id int); > explain insert into `tdb.t1` select * from t2; > {noformat} > {noformat} > +---+ > | > Explain | > +---+ > | STAGE DEPENDENCIES: > | > | Stage-1 is a root stage > | > | Stage-6 depends on stages: Stage-1 , consists of Stage-3, Stage-2, > Stage-4 | > | Stage-3 > | > | Stage-0 depends on stages: Stage-3, Stage-2, Stage-5 > | > | Stage-2 > | > | Stage-4 > | > | Stage-5 depends on stages: Stage-4 > | > | > | > | STAGE PLANS: > | > | Stage: Stage-1 > | > | Map Reduce > | > | Map Operator Tree: > | > | TableScan > | > | alias: t2 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | Select Operator > | > | expressions: id (type: int) > | > | outputColumnNames: _col0 > | > | Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column > stats: NONE | > | File Output Operator
[jira] [Commented] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084644#comment-16084644 ] Hive QA commented on HIVE-16732: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876901/HIVE-16732.03-branch-2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10585 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[comments] (batchId=35) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=38) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[orc_ppd_basic] (batchId=139) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[explaindenpendencydiffengs] (batchId=115) org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_ptf] (batchId=125) org.apache.hadoop.hive.ql.security.TestExtendedAcls.testPartition (batchId=228) org.apache.hadoop.hive.ql.security.TestFolderPermissions.testPartition (batchId=217) org.apache.hive.hcatalog.api.TestHCatClient.testTransportFailure (batchId=176) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5987/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5987/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876901 - PreCommit-HIVE-Build > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16732) Transactional tables should block LOAD DATA
[ https://issues.apache.org/jira/browse/HIVE-16732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16732: -- Resolution: Fixed Fix Version/s: 2.4.0 3.0.0 Status: Resolved (was: Patch Available) HIVE-16732.03-branch-2.patch committed to branch-2 (2.x) thanks Wei for the review > Transactional tables should block LOAD DATA > > > Key: HIVE-16732 > URL: https://issues.apache.org/jira/browse/HIVE-16732 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 3.0.0, 2.4.0 > > Attachments: HIVE-16732.01.patch, HIVE-16732.02.patch, > HIVE-16732.03-branch-2.patch, HIVE-16732.03.patch > > > This has always been the design. > see LoadSemanticAnalyzer.analyzeInternal() > StrictChecks.checkBucketing(conf); > Some examples (this is exposed by HIVE-16177) > insert_values_orig_table.q > insert_orig_table.q > insert_values_orig_table_use_metadata.q -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-8838) Support Parquet through HCatalog
[ https://issues.apache.org/jira/browse/HIVE-8838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084708#comment-16084708 ] Adam Szita commented on HIVE-8838: -- Thanks for reviewing [~spena], [~sushanth], [~aihuaxu] and committing! > Support Parquet through HCatalog > > > Key: HIVE-8838 > URL: https://issues.apache.org/jira/browse/HIVE-8838 > Project: Hive > Issue Type: New Feature >Reporter: Brock Noland >Assignee: Adam Szita > Fix For: 3.0.0 > > Attachments: HIVE-8838.0.patch, HIVE-8838.1.patch, HIVE-8838.2.patch, > HIVE-8838.3.patch, HIVE-8838.4.patch > > > Similar to HIVE-8687 for Avro we need to fix Parquet with HCatalog. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Status: Patch Available (was: Open) > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] >
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Attachment: HIVE-16793.5.patch Latest patch adds a config param {{hive.optimize.remove.sq_count_check}} to enable this optimization. Since this optimization caters to a very specific case but could have adverse effects (join reordering, joins not merging) we have decided to disable this optimization by default > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) >
[jira] [Updated] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vineet Garg updated HIVE-16793: --- Status: Open (was: Patch Available) > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"] > <-Select Operator
[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084760#comment-16084760 ] Gopal V commented on HIVE-16793: Does enabling this optimization remove the cross-products triggered by the scalar sub-query? > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) >
[jira] [Commented] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084767#comment-16084767 ] Hive QA commented on HIVE-17079: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876908/HIVE-17079.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10874 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=145) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout (batchId=226) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5988/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5988/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876908 - PreCommit-HIVE-Build > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17018) Small table is converted to map join even the total size of small tables exceeds the threshold(hive.auto.convert.join.noconditionaltask.size)
[ https://issues.apache.org/jira/browse/HIVE-17018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084774#comment-16084774 ] liyunzhang_intel commented on HIVE-17018: - [~csun]: {quote} Yes. I think we don't need to change the existing behavior. I'm just suggesting that we might need a HoS specific config to replace hive.auto.convert.join.nonconditionaltask.size {quote} rename {{hive.auto.convert.join.nonconditionaltask.size}} to {{hive.auto.convert.join.within.sparktask.size}}? and the description of the configuration {noformat} is changed from the sum of size for n-1 of the tables/partitions for a n-way join is smaller than it {noformat} to {noformat} the sum of size for n-1 of the tables/partitions for a n-way join is smaller than it in 1 MapTask or ReduceTask {noformat} Can you give some suggestion? > Small table is converted to map join even the total size of small tables > exceeds the threshold(hive.auto.convert.join.noconditionaltask.size) > - > > Key: HIVE-17018 > URL: https://issues.apache.org/jira/browse/HIVE-17018 > Project: Hive > Issue Type: Bug >Reporter: liyunzhang_intel >Assignee: liyunzhang_intel > Attachments: HIVE-17018_data_init.q, HIVE-17018.q, t3.txt > > > we use "hive.auto.convert.join.noconditionaltask.size" as the threshold. it > means the sum of size for n-1 of the tables/partitions for a n-way join is > smaller than it, it will be converted to a map join. for example, A join B > join C join D join E. Big table is A(100M), small tables are > B(10M),C(10M),D(10M),E(10M). If we set > hive.auto.convert.join.noconditionaltask.size=20M. In current code, E,D,B > will be converted to map join but C will not be converted to map join. In my > understanding, because hive.auto.convert.join.noconditionaltask.size can only > contain E and D, so C and B should not be converted to map join. > Let's explain more why E can be converted to map join. > in current code, > [SparkMapJoinOptimizer#getConnectedMapJoinSize|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L364] > calculates all the mapjoins in the parent path and child path. The search > stops when encountering [UnionOperator or > ReduceOperator|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L381]. > Because C is not converted to map join because {{connectedMapJoinSize + > totalSize) > maxSize}} [see > code|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#L330].The > RS before the join of C remains. When calculating whether B will be > converted to map join, {{getConnectedMapJoinSize}} returns 0 as encountering > [RS > |https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkMapJoinOptimizer.java#409] > and causes {{connectedMapJoinSize + totalSize) < maxSize}} matches. > [~xuefuz] or [~jxiang]: can you help see whether this is a bug or not as you > are more familiar with SparkJoinOptimizer. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16793) Scalar sub-query: sq_count_check not required if gby keys are constant
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084770#comment-16084770 ] Vineet Garg commented on HIVE-16793: It does if gby keys are constant > Scalar sub-query: sq_count_check not required if gby keys are constant > -- > > Key: HIVE-16793 > URL: https://issues.apache.org/jira/browse/HIVE-16793 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0 >Reporter: Gopal V >Assignee: Vineet Garg > Attachments: HIVE-16793.1.patch, HIVE-16793.2.patch, > HIVE-16793.3.patch, HIVE-16793.4.patch, HIVE-16793.5.patch > > > This query has an sq_count check, though is useless on a constant key. > {code} > hive> explain select * from part where p_size > (select max(p_size) from part > where p_type = '1' group by p_type); > Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product > Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product > OK > Plan optimized by CBO. > Vertex dependency in root stage > Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE) > Reducer 3 <- Map 2 (SIMPLE_EDGE) > Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE) > Reducer 6 <- Map 5 (SIMPLE_EDGE) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Map 1 vectorized, llap > File Output Operator [FS_64] > Select Operator [SEL_63] (rows= width=621) > > Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > Filter Operator [FIL_62] (rows= width=625) > predicate:(_col5 > _col10) > Map Join Operator [MAPJOIN_61] (rows=2 width=625) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"] > <-Reducer 6 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_58] > Select Operator [SEL_57] (rows=1 width=4) > Output:["_col0"] > Group By Operator [GBY_56] (rows=1 width=89) > > Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0 > <-Map 5 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_55] > PartitionCols:_col0 > Group By Operator [GBY_54] (rows=86 width=89) > > Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1' > Select Operator [SEL_53] (rows=1212121 width=109) > Output:["_col1"] > Filter Operator [FIL_52] (rows=1212121 width=109) > predicate:(p_type = '1') > TableScan [TS_17] (rows=2 width=109) > > tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"] > <-Map Join Operator [MAPJOIN_60] (rows=2 width=621) > > Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"] > <-Reducer 4 [BROADCAST_EDGE] vectorized, llap > BROADCAST [RS_51] > Select Operator [SEL_50] (rows=1 width=8) > Filter Operator [FIL_49] (rows=1 width=8) > predicate:(sq_count_check(_col0) <= 1) > Group By Operator [GBY_48] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap > PARTITION_ONLY_SHUFFLE [RS_47] > Group By Operator [GBY_46] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Select Operator [SEL_45] (rows=1 width=85) > Group By Operator [GBY_44] (rows=1 width=85) > Output:["_col0"],keys:KEY._col0 > <-Map 2 [SIMPLE_EDGE] vectorized, llap > SHUFFLE [RS_43] > PartitionCols:_col0 > Group By Operator [GBY_42] (rows=83 > width=85) > Output:["_col0"],keys:'1' > Select Operator [SEL_41] (rows=1212121 > width=105) > Filter Operator [FIL_40] (rows=1212121 > width=105) > predicate:(p_type = '1') > TableScan [TS_2] (rows=2 > width=105) > > tpch_flat_orc_1000@part,part,Tbl:COM
[jira] [Updated] (HIVE-16100) Dynamic Sorted Partition optimizer loses sibling operators
[ https://issues.apache.org/jira/browse/HIVE-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16100: --- Attachment: HIVE-16100.5.patch > Dynamic Sorted Partition optimizer loses sibling operators > -- > > Key: HIVE-16100 > URL: https://issues.apache.org/jira/browse/HIVE-16100 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 1.2.1, 2.1.1, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-16100.1.patch, HIVE-16100.2.patch, > HIVE-16100.2.patch, HIVE-16100.3.patch, HIVE-16100.4.patch, HIVE-16100.5.patch > > > https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SortedDynPartitionOptimizer.java#L173 > {code} > // unlink connection between FS and its parent > fsParent = fsOp.getParentOperators().get(0); > fsParent.getChildOperators().clear(); > {code} > The optimizer discards any cases where the fsParent has another SEL child -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16926) LlapTaskUmbilicalExternalClient should not start new umbilical server for every fragment request
[ https://issues.apache.org/jira/browse/HIVE-16926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084817#comment-16084817 ] Siddharth Seth commented on HIVE-16926: --- bq. Is there any action needed on this part? I don't thing there is, unless you see this as a problem for the running spark task. The number of threads created etc is quite small afaik. bq. Maybe I can just replace pendingClients/registeredClients with a single list and the RequestInfo can keep a state to show if the request is pending/running/etc. That'll work as well. Think there's still 2 places which have similar code related to heartbeats - heartbeat / nodePinged. > LlapTaskUmbilicalExternalClient should not start new umbilical server for > every fragment request > > > Key: HIVE-16926 > URL: https://issues.apache.org/jira/browse/HIVE-16926 > Project: Hive > Issue Type: Sub-task > Components: llap >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-16926.1.patch, HIVE-16926.2.patch, > HIVE-16926.3.patch, HIVE-16926.4.patch > > > Followup task from [~sseth] and [~sershe] after HIVE-16777. > LlapTaskUmbilicalExternalClient currently creates a new umbilical server for > every fragment request, but this is not necessary and the umbilical can be > shared. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17079) LLAP: Use FQDN by default for work submission
[ https://issues.apache.org/jira/browse/HIVE-17079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-17079: - Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master. Thanks for the review! > LLAP: Use FQDN by default for work submission > - > > Key: HIVE-17079 > URL: https://issues.apache.org/jira/browse/HIVE-17079 > Project: Hive > Issue Type: Bug > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 3.0.0 > > Attachments: HIVE-17079.1.patch > > > HIVE-14624 added FDQN for work submission. We should enable it by default to > avoid DNS issues. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084858#comment-16084858 ] Hive QA commented on HIVE-16832: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12876914/HIVE-16832.22.patch {color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10888 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] (batchId=237) org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite] (batchId=237) org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver[hbase_queries] (batchId=94) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] (batchId=143) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] (batchId=99) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] (batchId=232) org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] (batchId=232) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema (batchId=177) org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation (batchId=177) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/5989/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-5989/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12876914 - PreCommit-HIVE-Build > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16979) Cache UGI for metastore
[ https://issues.apache.org/jira/browse/HIVE-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084860#comment-16084860 ] Tao Li commented on HIVE-16979: --- [~gopalv] Can you please take a look at the patch? Thanks! > Cache UGI for metastore > --- > > Key: HIVE-16979 > URL: https://issues.apache.org/jira/browse/HIVE-16979 > Project: Hive > Issue Type: Improvement >Reporter: Tao Li >Assignee: Tao Li > Attachments: HIVE-16979.1.patch, HIVE-16979.2.patch, > HIVE-16979.3.patch > > > FileSystem.closeAllForUGI is called per request against metastore to dispose > UGI, which involves talking to HDFS name node and is time consuming. So the > perf improvement would be caching and reusing the UGI. > Per FileSystem.closeAllForUG call could take up to 20 ms as E2E latency > against HDFS. Usually a Hive query could result in several calls against > metastore, so we can save up to 50-100 ms per hive query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16832: -- Resolution: Fixed Fix Version/s: 3.0.0 Target Version/s: 3.0.0 (was: 3.0.0, 2.4.0) Status: Resolved (was: Patch Available) > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Fix For: 3.0.0 > > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-14947) Add support for Acid 2 in Merge
[ https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084884#comment-16084884 ] Eugene Koifman commented on HIVE-14947: --- fixed via HIVE-16832 > Add support for Acid 2 in Merge > --- > > Key: HIVE-14947 > URL: https://issues.apache.org/jira/browse/HIVE-14947 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 3.0.0 > > > HIVE-14035 etc introduced a more efficient data layout for acid tables > Additional work is needed to support Merge for these tables > Need to make sure we generate unique ROW__IDs in each branch of the > multi-insert statement. StatementId was introduced in HIVE-11030 but it's > not surfaced from storage layer. It needs to be made part of ROW__ID to > ensure unique ROW__ID from concurrent writes from the same query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (HIVE-14947) Add support for Acid 2 in Merge
[ https://issues.apache.org/jira/browse/HIVE-14947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman resolved HIVE-14947. --- Resolution: Fixed Fix Version/s: 3.0.0 > Add support for Acid 2 in Merge > --- > > Key: HIVE-14947 > URL: https://issues.apache.org/jira/browse/HIVE-14947 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Fix For: 3.0.0 > > > HIVE-14035 etc introduced a more efficient data layout for acid tables > Additional work is needed to support Merge for these tables > Need to make sure we generate unique ROW__IDs in each branch of the > multi-insert statement. StatementId was introduced in HIVE-11030 but it's > not surfaced from storage layer. It needs to be made part of ROW__ID to > ensure unique ROW__ID from concurrent writes from the same query. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-16953) OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and end are in the same stripe
[ https://issues.apache.org/jira/browse/HIVE-16953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16953: -- Summary: OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and end are in the same stripe (was: OrcRawRecordMerger.discoverOriginalKeyBounds) > OrcRawRecordMerger.discoverOriginalKeyBounds issue if both split start and > end are in the same stripe > - > > Key: HIVE-16953 > URL: https://issues.apache.org/jira/browse/HIVE-16953 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman > > if getOffset() and getMaxOffset() are inside > * the sames tripe - in this case we have minKey & isTail=false but > rowLength is never set. > don't know if we can ever have a split like that -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-17078) Add more logs to MapredLocalTask
[ https://issues.apache.org/jira/browse/HIVE-17078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084891#comment-16084891 ] Yibing Shi commented on HIVE-17078: --- I am trying to keep the current behaviour. With Hive CLI, by default Hive logs are not printed. Some users may rely on the stdout/stderr information. I don't want to surprise them. If you still think it is unnecessary to print child stdout/stderr to Hive stdout/stderr, I can remove the corresponding code. > Add more logs to MapredLocalTask > > > Key: HIVE-17078 > URL: https://issues.apache.org/jira/browse/HIVE-17078 > Project: Hive > Issue Type: Improvement >Reporter: Yibing Shi >Assignee: Yibing Shi >Priority: Minor > Attachments: HIVE-17078.1.patch, HIVE-17078.2.patch > > > By default, {{MapredLocalTask}} is executed in a child process of Hive, in > case the local task uses too much resources that may affect Hive. Currently, > the stdout and stderr information of the child process is printed in Hive's > stdout/stderr log, which doesn't have a timestamp information, and is > separated from Hive service logs. This makes it hard to troubleshoot problems > in MapredLocalTasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (HIVE-16832) duplicate ROW__ID possible in multi insert into transactional table
[ https://issues.apache.org/jira/browse/HIVE-16832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16084878#comment-16084878 ] Eugene Koifman commented on HIVE-16832: --- no related failures (see builds 5985,5984 for same failures) HIVE-16832.22.patch committed to master (3.0) thanks Gopal for the review > duplicate ROW__ID possible in multi insert into transactional table > --- > > Key: HIVE-16832 > URL: https://issues.apache.org/jira/browse/HIVE-16832 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.2.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-16832.01.patch, HIVE-16832.03.patch, > HIVE-16832.04.patch, HIVE-16832.05.patch, HIVE-16832.06.patch, > HIVE-16832.08.patch, HIVE-16832.09.patch, HIVE-16832.10.patch, > HIVE-16832.11.patch, HIVE-16832.14.patch, HIVE-16832.15.patch, > HIVE-16832.16.patch, HIVE-16832.17.patch, HIVE-16832.18.patch, > HIVE-16832.19.patch, HIVE-16832.20.patch, HIVE-16832.20.patch, > HIVE-16832.21.patch, HIVE-16832.22.patch > > > {noformat} > create table AcidTablePart(a int, b int) partitioned by (p string) clustered > by (a) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'); > create temporary table if not exists data1 (x int); > insert into data1 values (1); > from data1 >insert into AcidTablePart partition(p) select 0, 0, 'p' || x >insert into AcidTablePart partition(p='p1') select 0, 1 > {noformat} > Each branch of this multi-insert create a row in partition p1/bucket0 with > ROW__ID=(1,0,0). > The same can happen when running SQL Merge (HIVE-10924) statement that has > both Insert and Update clauses when target table has > _'transactional'='true','transactional_properties'='default'_ (see > HIVE-14035). This is so because Merge is internally run as a multi-insert > statement. > The solution relies on statement ID introduced in HIVE-11030. Each Insert > clause of a multi-insert is gets a unique ID. > The ROW__ID.bucketId now becomes a bit packed triplet (format version, > bucketId, statementId). > (Since ORC stores field names in the data file we can't rename > ROW__ID.bucketId). > This ensures that there are no collisions and retains desired sort properties > of ROW__ID. > In particular _SortedDynPartitionOptimizer_ works w/o any changes even in > cases where there fewer reducers than buckets. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (HIVE-17013) Delete request with a subquery based on select over a view
[ https://issues.apache.org/jira/browse/HIVE-17013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-17013: -- Component/s: Transactions > Delete request with a subquery based on select over a view > -- > > Key: HIVE-17013 > URL: https://issues.apache.org/jira/browse/HIVE-17013 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Frédéric ESCANDELL >Priority: Blocker > > Hi, > I based my DDL on this exemple > https://fr.hortonworks.com/tutorial/using-hive-acid-transactions-to-insert-update-and-delete-data/. > In a delete request, the use of a view in a subquery throw an exception : > FAILED: IllegalStateException Expected 'insert into table default.mydim > select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set > operation. > {code} > {code:sql} > drop table if exists mydim; > create table mydim (key int, name string, zip string, is_current boolean) > clustered by(key) into 3 buckets > stored as orc tblproperties ('transactional'='true'); > insert into mydim values > (1, 'bob', '95136', true), > (2, 'joe', '70068', true), > (3, 'steve', '22150', true); > drop table if exists updates_staging_table; > create table updates_staging_table (key int, newzip string); > insert into updates_staging_table values (1, 87102), (3, 45220); > drop view if exists updates_staging_view; > create view updates_staging_view (key, newzip) as select key, newzip from > updates_staging_table; > delete from mydim > where mydim.key in (select key from updates_staging_view); > FAILED: IllegalStateException Expected 'insert into table default.mydim > select ROW__ID from default.mydim sort by ROW__ID' to be in sub-query or set > operation. > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)