[jira] [Commented] (HIVE-12274) Increase width of columns used for general configuration in the metastore.
[ https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098471#comment-15098471 ] Carter Shanklin commented on HIVE-12274: [~teabot] can you post a sample DDL? What serde are you using to read? Is it Hive-JSON-Serde? > Increase width of columns used for general configuration in the metastore. > -- > > Key: HIVE-12274 > URL: https://issues.apache.org/jira/browse/HIVE-12274 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.0.0 >Reporter: Elliot West >Assignee: Sushanth Sowmyan > Labels: metastore > > This issue is very similar in principle to HIVE-1364. We are hitting a limit > when processing JSON data that has a large nested schema. The struct > definition is truncated when inserted into the metastore database column > {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length. > Given that the purpose of these columns is to hold very loosely defined > configuration values it seems rather limiting to impose such a relatively low > length bound. One can imagine that valid use cases will arise where > reasonable parameter/property values exceed the current limit. Can these > columns not use CLOB-like types as for example as used by > {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents > exist for all targeted database platforms: > * MySQL: {{mediumtext}} > * Postgres: {{text}} > * Oracle: {{CLOB}} > * Derby: {{LONG VARCHAR}} > I'd suggest that the candidates for type change are: > * {{COLUMNS_V2.TYPE_NAME}} > * {{TABLE_PARAMS.PARAM_VALUE}} > * {{SERDE_PARAMS.PARAM_VALUE}} > * {{SD_PARAMS.PARAM_VALUE}} > Finally, will this limitation persist in the work resulting from HIVE-9452? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12847) ORC file footer cache should be memory sensitive
[ https://issues.apache.org/jira/browse/HIVE-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097852#comment-15097852 ] Hive QA commented on HIVE-12847: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782173/HIVE-12847.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10016 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6620/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6620/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6620/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782173 - PreCommit-HIVE-TRUNK-Build > ORC file footer cache should be memory sensitive > > > Key: HIVE-12847 > URL: https://issues.apache.org/jira/browse/HIVE-12847 > Project: Hive > Issue Type: Improvement > Components: File Formats, ORC >Affects Versions: 1.2.1 >Reporter: Nemon Lou >Assignee: Nemon Lou > Attachments: HIVE-12847.patch > > > The size based footer cache can not control memory usage properly. > Having seen a HiveServer2 hang due to ORC file footer cache taking up too > much heap memory. > A simple query like "select * from orc_table limit 1" can make HiveServer2 > hang. > The input table has about 1000 ORC files and each ORC file owns about 2500 > stripes. > {noformat} > num #instances #bytes class name > -- >1: 21465360125758432120 > org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics >3: 122233301 8800797672 > org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics >5: 89439001 6439608072 > org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics >7: 2981300 262354400 > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation >9: 2981300 143102400 > org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics > 12: 2983691 71608584 > org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl > 15: 809297121752 > org.apache.hadoop.hive.ql.io.orc.OrcProto$Type > 17:1032825783792 > org.apache.hadoop.mapreduce.lib.input.FileSplit > 20: 516413305024 > org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit > 21: 516413305024 org.apache.hadoop.hive.ql.io.orc.OrcSplit > 31: 1 413152 > [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit; > 100: 1122 26928 org.apache.hadoop.hive.ql.io.orc.Metadata > > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12864) StackOverflowError parsing queries with very large predicates
[ https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12864: --- Attachment: HIVE-12864.01.patch > StackOverflowError parsing queries with very large predicates > - > > Key: HIVE-12864 > URL: https://issues.apache.org/jira/browse/HIVE-12864 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0, 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-12864.01.patch, HIVE-12864.patch > > > We have seen that queries with very large predicates might fail with the > following stacktrace: > {noformat} > 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in > thread "main" java.lang.StackOverflowError > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12
[jira] [Updated] (HIVE-12864) StackOverflowError parsing queries with very large predicates
[ https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-12864: --- Attachment: HIVE-12864.01.patch > StackOverflowError parsing queries with very large predicates > - > > Key: HIVE-12864 > URL: https://issues.apache.org/jira/browse/HIVE-12864 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0, 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-12864.01.patch, HIVE-12864.patch > > > We have seen that queries with very large predicates might fail with the > following stacktrace: > {noformat} > 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in > thread "main" java.lang.StackOverflowError > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12
[jira] [Updated] (HIVE-12874) dynamic partition insert project wrong column
[ https://issues.apache.org/jira/browse/HIVE-12874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] bin wang updated HIVE-12874: Affects Version/s: (was: 0.14.1) 1.1.0 > dynamic partition insert project wrong column > - > > Key: HIVE-12874 > URL: https://issues.apache.org/jira/browse/HIVE-12874 > Project: Hive > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 > Environment: hive 1.1.0-cdh5.4.8 >Reporter: bin wang >Assignee: Alan Gates > > We have two table as below: > create table test ( > id bigint comment ' id', > ) > PARTITIONED BY(etl_dt string) > STORED AS ORC; > create table test1 ( > id bigint > start_time int, > ) > PARTITIONED BY(etl_dt string) > STORED AS ORC; > we use sql like below to import rows from test1 to test: > insert overwrite table test PARTITION(etl_dt) > select id > ,from_unixtime(start_time,'-MM-dd') as etl_dt > > > from test1 > where test1.etl_dt='2016-01-12'; > but it behave wrong, it use test1.etl_dt as the test's partition value, not > the 'etl_dt' in select. > We think it's a bug, anyone to fix it? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12810) Hive select fails - java.lang.IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/HIVE-12810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097909#comment-15097909 ] Matjaz Skerjanec commented on HIVE-12810: - Hello, Thank you. yes I tried that and many other possible options yesteraday with no success. Since my hdfs is still empty I decided to reinstall everything - I have to get system up and running asap. Will go for last update now hdp 2.3.4.0 with a hope that problem with select will be solved. Will come back later with results... > Hive select fails - java.lang.IndexOutOfBoundsException > --- > > Key: HIVE-12810 > URL: https://issues.apache.org/jira/browse/HIVE-12810 > Project: Hive > Issue Type: Bug > Components: Beeline, CLI >Affects Versions: 1.2.1 > Environment: HDP 2.3.0 >Reporter: Matjaz Skerjanec > > Hadoop HDP 2.3 (Hadoop 2.7.1.2.3.0.0-2557) > Hive 1.2.1.2.3.0.0-2557 > We are loading orc tables in hive with sqoop from hana db. > Everything works fine, count and select with ie. 16.000.000 entries in the > table, but when we load 34.000.000 entries query select does not work anymore > and we get the followong error (select count(*) is working in both cases): > {code} > select count(*) from tablename; > INFO : Session is already open > INFO : > INFO : Status: Running (Executing on YARN cluster with App id > application_1452091205505_0032) > INFO : Map 1: -/- Reducer 2: 0/1 > INFO : Map 1: 0/96 Reducer 2: 0/1 > . > . > . > INFO : Map 1: 96/96Reducer 2: 0(+1)/1 > INFO : Map 1: 96/96Reducer 2: 1/1 > +---+--+ > |_c0| > +---+--+ > | 34146816 | > +---+--+ > 1 row selected (45.455 seconds) > {code} > {code} > "select originalxml from tablename where messageid = > 'd0b3c872-435d-499b-a65c-619d9e732bbb' > 0: jdbc:hive2://10.4.zz.xx:1/default> select originalxml from tablename > where messageid = 'd0b3c872-435d-499b-a65c-619d9e732bbb'; > INFO : Session is already open > INFO : Tez session was closed. Reopening... > INFO : Session re-established. > INFO : > INFO : Status: Running (Executing on YARN cluster with App id > application_1452091205505_0032) > INFO : Map 1: -/- > ERROR : Status: Failed > ERROR : Vertex failed, vertexName=Map 1, > vertexId=vertex_1452091205505_0032_1_00, diagnostics=[Vertex > vertex_1452091205505_0032_1_00 [Map 1] killed/failed due > to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tablename initializer failed, > vertex=vertex_1452091205505_0032_1_00 [Map 1], java.lang.RuntimeException: > serious problem > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408) > at > org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239) > at > org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.util.concurrent.ExecutionException: > java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.concurrent.FutureTask.report(FutureTask.java:122) > at java.util.concurrent.FutureTask.get(FutureTask.java:192) > at > org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1016) > ... 15 more > Caused by: java.lang.IndexOutOfBoundsException: Index: 0 > at java.util.Collections$EmptyList.get(Collections.java:4454) > at > org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240) > at >
[jira] [Commented] (HIVE-12808) Logical PPD: Push filter clauses through PTF(Windowing) into TS
[ https://issues.apache.org/jira/browse/HIVE-12808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098017#comment-15098017 ] Hive QA commented on HIVE-12808: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782179/HIVE-12808.01.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10005 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-tez_joins_explain.q-vector_decimal_aggregate.q-vector_groupby_mapjoin.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_windowing1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_windowing2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptfgroupbyjoin org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query70 org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6621/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6621/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6621/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 18 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782179 - PreCommit-HIVE-TRUNK-Build > Logical PPD: Push filter clauses through PTF(Windowing) into TS > --- > > Key: HIVE-12808 > URL: https://issues.apache.org/jira/browse/HIVE-12808 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.2.1, 2.0.0 >Reporter: Gopal V >Assignee: Laljo John Pullokkaran > Attachments: HIVE-12808.01.patch > > > Simplified repro case of [HCC > #8880|https://community.hortonworks.com/questions/8880/hive-on-tez-pushdown-predicate-doesnt-work-in-part.html], > with the slow query showing the push-down miss. > And the manually rewritten query to indicate the expected one. > Part of the problem could be the window range not being split apart for PPD, > but the FIL is not pushed down even if the rownum filter is removed. > {code} > create temporary table positions (regionid string, id bigint, deviceid > string, ts string); > insert into positions values('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', > 1422792010, '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-01'), > ('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', 1422792010, > '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-01'), > ('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', 1422792010, > '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-02'), > ('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', 1422792010, > '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-02'); > -- slow query > explain > WITH t1 AS > ( > SELECT *, > Row_number() over ( PARTITION BY regionid, id, deviceid > ORDER BY ts DESC) AS rownos > FROM positions ), > latestposition as ( >SELECT * >FROM t1 >WHERE rownos = 1) > SELECT * > FROM latestposition > WHERE regionid='1d6a0be1-6366-4692-9597-ebd5cd0f01d1' > ANDid=1422792010 > ANDdeviceid='6c5d1a30-2331-448b-a726-a380d6b3a432'; > -- fast query > explain > WITH t1 AS > ( > SELECT *, > Row_number() over ( PARTITION BY regionid, id, deviceid > ORDER BY ts DESC) AS rownos > FROM positions > WHERE regionid='1d6a0be1-6366-4692-9597-ebd5cd0f01d1' > ANDid=1422792010 > AND
[jira] [Commented] (HIVE-12853) LLAP: localize permanent UDF jars to daemon
[ https://issues.apache.org/jira/browse/HIVE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098345#comment-15098345 ] Hive QA commented on HIVE-12853: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782196/HIVE-12853.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10018 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_jar_pfile org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_remote_script org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_udf_using org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6623/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6623/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6623/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782196 - PreCommit-HIVE-TRUNK-Build > LLAP: localize permanent UDF jars to daemon > --- > > Key: HIVE-12853 > URL: https://issues.apache.org/jira/browse/HIVE-12853 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12853.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-12828: --- Attachment: HIVE-12828.2-spark.patch Hi [~lirui], the file is updated and I verified that the test passed. Reattached the patch #2 to give another riun. Thanks. > Update Spark version to 1.6 > --- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification
[ https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-12827: -- Assignee: Gopal V > Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign > needs explicit isNull[offset] modification > --- > > Key: HIVE-12827 > URL: https://issues.apache.org/jira/browse/HIVE-12827 > Project: Hive > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V > > Some scenarios do set Double.NaN instead of isNull=true, but all types aren't > consistent. > Examples of un-set isNull for the valid values are > {code} > private class FloatReader extends AbstractDoubleReader { > FloatReader(int columnIndex) { > super(columnIndex); > } > @Override > void apply(VectorizedRowBatch batch, int batchIndex) throws IOException { > DoubleColumnVector colVector = (DoubleColumnVector) > batch.cols[columnIndex]; > if (deserializeRead.readCheckNull()) { > VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex); > } else { > float value = deserializeRead.readFloat(); > colVector.vector[batchIndex] = (double) value; > } > } > } > {code} > {code} > private class DoubleCopyRow extends CopyRow { > DoubleCopyRow(int inColumnIndex, int outColumnIndex) { > super(inColumnIndex, outColumnIndex); > } > @Override > void copy(VectorizedRowBatch inBatch, int inBatchIndex, > VectorizedRowBatch outBatch, int outBatchIndex) { > DoubleColumnVector inColVector = (DoubleColumnVector) > inBatch.cols[inColumnIndex]; > DoubleColumnVector outColVector = (DoubleColumnVector) > outBatch.cols[outColumnIndex]; > if (inColVector.isRepeating) { > if (inColVector.noNulls || !inColVector.isNull[0]) { > outColVector.vector[outBatchIndex] = inColVector.vector[0]; > } else { > VectorizedBatchUtil.setNullColIsNullValue(outColVector, > outBatchIndex); > } > } else { > if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) { > outColVector.vector[outBatchIndex] = > inColVector.vector[inBatchIndex]; > } else { > VectorizedBatchUtil.setNullColIsNullValue(outColVector, > outBatchIndex); > } > } > } > } > {code} > {code} > private static abstract class VectorDoubleColumnAssign > extends VectorColumnAssignVectorBase { > protected void assignDouble(double value, int destIndex) { > outCol.vector[destIndex] = value; > } > } > {code} > The pattern to imitate would be the earlier code from VectorBatchUtil > {code} > case DOUBLE: { > DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + > colIndex]; > if (writableCol != null) { > dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get(); > dcv.isNull[rowIndex] = false; > } else { > dcv.vector[rowIndex] = Double.NaN; > setNullColIsNullValue(dcv, rowIndex); > } > } > break; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks
[ https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-12826: -- Assignee: Gopal V (was: Matt McCline) > Vectorization: VectorUDAF* suspect isNull checks > > > Key: HIVE-12826 > URL: https://issues.apache.org/jira/browse/HIVE-12826 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.3.0, 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > > for isRepeating=true, checking isNull[selected[i]] might return incorrect > results (without a heavy array fill of isNull). > VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this > pattern. > {code} > private void iterateHasNullsRepeatingSelectionWithAggregationSelection( > VectorAggregationBufferRow[] aggregationBufferSets, > int aggregateIndex, >value, > int batchSize, > int[] selection, > boolean[] isNull) { > > for (int i=0; i < batchSize; ++i) { > if (!isNull[selection[i]]) { > Aggregation myagg = getCurrentAggregationBuffer( > aggregationBufferSets, > aggregateIndex, > i); > myagg.sumValue(value); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks
[ https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101120#comment-15101120 ] Gopal V commented on HIVE-12826: [~mmccline]: this forces isRepeating=true to always check isNull[0] in all UDAFs in the patch. Please review. > Vectorization: VectorUDAF* suspect isNull checks > > > Key: HIVE-12826 > URL: https://issues.apache.org/jira/browse/HIVE-12826 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.3.0, 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-12826.1.patch > > > for isRepeating=true, checking isNull[selected[i]] might return incorrect > results (without a heavy array fill of isNull). > VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this > pattern. > {code} > private void iterateHasNullsRepeatingSelectionWithAggregationSelection( > VectorAggregationBufferRow[] aggregationBufferSets, > int aggregateIndex, >value, > int batchSize, > int[] selection, > boolean[] isNull) { > > for (int i=0; i < batchSize; ++i) { > if (!isNull[selection[i]]) { > Aggregation myagg = getCurrentAggregationBuffer( > aggregationBufferSets, > aggregateIndex, > i); > myagg.sumValue(value); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12758) Parallel compilation: Operator::resetId() is not thread-safe
[ https://issues.apache.org/jira/browse/HIVE-12758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12758: Attachment: HIVE-12758.03.patch Fixing a couple more spots where the context was null, updating q files > Parallel compilation: Operator::resetId() is not thread-safe > > > Key: HIVE-12758 > URL: https://issues.apache.org/jira/browse/HIVE-12758 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Sergey Shelukhin > Attachments: HIVE-12758.01.patch, HIVE-12758.02.patch, > HIVE-12758.03.patch, HIVE-12758.03.patch, HIVE-12758.patch > > > {code} > private static AtomicInteger seqId; > ... > public Operator() { > this(String.valueOf(seqId.getAndIncrement())); > } > public static void resetId() { > seqId.set(0); > } > {code} > Potential race-condition. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12220) LLAP: Usability issues with hive.llap.io.cache.orc.size
[ https://issues.apache.org/jira/browse/HIVE-12220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12220: Description: In the llap-daemon site you need to set, among other things, llap.daemon.memory.per.instance.mb and hive.llap.io.cache.orc.size The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, initially I entered the value in MB rather than in bytes. Operator error you could say but I look at this as a fraction of the other value which is in mb. Second, is this really tied to ORC? E.g. when we have the vectorized text reader will this data be cached as well? Or might it be in the future? I would like to propose instead using hive.llap.io.cache.size.mb for this setting. was: In the llap-daemon site you need to set, among other things, llap.daemon.memory.per.instance.mb and hive.llap.io.cache.orc.size The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, initially I entered the value in MB rather than in bytes. Operator error you could say but I look at this as a fraction of the other value which is in mb. Second, is this really tied to ORC? E.g. when we have the vectorized text reader will this data be cached as well? Or might it be in the future? I would like to propose instead using hive.llap.io.cache.size.mb for this setting. NO PRECOMMIT TESTS > LLAP: Usability issues with hive.llap.io.cache.orc.size > --- > > Key: HIVE-12220 > URL: https://issues.apache.org/jira/browse/HIVE-12220 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Carter Shanklin >Assignee: Sergey Shelukhin > Attachments: HIVE-12220.01.patch, HIVE-12220.02.patch, > HIVE-12220.patch, HIVE-12220.tmp.patch > > > In the llap-daemon site you need to set, among other things, > llap.daemon.memory.per.instance.mb > and > hive.llap.io.cache.orc.size > The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, > initially I entered the value in MB rather than in bytes. Operator error you > could say but I look at this as a fraction of the other value which is in mb. > Second, is this really tied to ORC? E.g. when we have the vectorized text > reader will this data be cached as well? Or might it be in the future? > I would like to propose instead using hive.llap.io.cache.size.mb for this > setting. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.
[ https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099215#comment-15099215 ] Alan Gates commented on HIVE-12353: --- With the commit of HIVE-12832 to master and branch-2.0 I think all of the schema changes needed for this are in. > When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it > should not. > --- > > Key: HIVE-12353 > URL: https://issues.apache.org/jira/browse/HIVE-12353 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12353.2.patch, HIVE-12353.3.patch, HIVE-12353.patch > > > One of the things that this method does is delete entries from TXN_COMPONENTS > for partition that it was trying to compact. > This causes Aborted transactions in TXNS to become empty according to > CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be > deleted. > Once they are deleted, data that belongs to these txns is deemed committed... > We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) > states. We should also not delete then entry from markedCleaned() > We'll have separate process that cleans 'f' and 's' records after X minutes > (or after > N records for a given partition exist). > This allows SHOW COMPACTIONS to show some history info and how many times > compaction failed on a given partition (subject to retention interval) so > that we don't have to call markCleaned() on Compactor failures at the same > time preventing Compactor to constantly getting stuck on the same bad > partition/table. > Ideally we'd want to include END_TIME field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-12672) Record last updated time for partition and table
[ https://issues.apache.org/jira/browse/HIVE-12672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-12672. --- Resolution: Won't Fix This was needed for HIVE-12669. As that JIRA is resolved won't fix, this won't won't be fixed either. > Record last updated time for partition and table > > > Key: HIVE-12672 > URL: https://issues.apache.org/jira/browse/HIVE-12672 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > Currently tables and partitions do not record when they were last updated. > This makes it hard for the system to know where to look for recently changed > tables and partitions that may need analyzed or other processing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-12669) Need a way to analyze tables in the background
[ https://issues.apache.org/jira/browse/HIVE-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates resolved HIVE-12669. --- Resolution: Won't Fix Given the work that's going on in HIVE-11160 and HIVE-12763 I don't think it makes sense to continue down this path. These JIRAs will lay the groundwork for auto-gathering stats on data as it is inserted rather than having a background process do the work. > Need a way to analyze tables in the background > -- > > Key: HIVE-12669 > URL: https://issues.apache.org/jira/browse/HIVE-12669 > Project: Hive > Issue Type: Improvement > Components: Metastore >Reporter: Alan Gates >Assignee: Alan Gates > > Currently analyze must be run by users manually. It would be useful to have > an option for certain or all tables to be automatically analyzed on a regular > basis. The system can do this in the background as a metastore thread > (similar to the compactor threads). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12429) Switch default Hive authorization to SQLStandardAuth in 2.0
[ https://issues.apache.org/jira/browse/HIVE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101046#comment-15101046 ] Hive QA commented on HIVE-12429: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782383/HIVE-12429.17.patch {color:green}SUCCESS:{color} +1 due to 54 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10016 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6629/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6629/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6629/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782383 - PreCommit-HIVE-TRUNK-Build > Switch default Hive authorization to SQLStandardAuth in 2.0 > --- > > Key: HIVE-12429 > URL: https://issues.apache.org/jira/browse/HIVE-12429 > Project: Hive > Issue Type: Task > Components: Authorization, Security >Affects Versions: 2.0.0 >Reporter: Alan Gates >Assignee: Daniel Dai > Attachments: HIVE-12429.1.patch, HIVE-12429.10.patch, > HIVE-12429.11.patch, HIVE-12429.12.patch, HIVE-12429.13.patch, > HIVE-12429.14.patch, HIVE-12429.15.patch, HIVE-12429.16.patch, > HIVE-12429.17.patch, HIVE-12429.2.patch, HIVE-12429.3.patch, > HIVE-12429.4.patch, HIVE-12429.5.patch, HIVE-12429.6.patch, > HIVE-12429.7.patch, HIVE-12429.8.patch, HIVE-12429.9.patch > > > Hive's default authorization is not real security, as it does not secure a > number of features and anyone can grant access to any object to any user. We > should switch the default to SQLStandardAuth, which provides real > authentication. > As this is a backwards incompatible change this was hard to do previously, > but 2.0 gives us a place to do this type of change. > By default authorization will still be off, as there are a few other things > to set when turning on authorization (such as the list of admin users). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12853) LLAP: localize permanent UDF jars to daemon
[ https://issues.apache.org/jira/browse/HIVE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101086#comment-15101086 ] Sergey Shelukhin commented on HIVE-12853: - I don't think get method is actually needed... looks like FunctionRegistry is only used at compile time, so on the daemon only adding and removing will need to be tracked. > LLAP: localize permanent UDF jars to daemon > --- > > Key: HIVE-12853 > URL: https://issues.apache.org/jira/browse/HIVE-12853 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12853.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-8065) Support HDFS encryption functionality on Hive
[ https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu resolved HIVE-8065. Resolution: Fixed Close this issue since it has already been merged into the upstream. > Support HDFS encryption functionality on Hive > - > > Key: HIVE-8065 > URL: https://issues.apache.org/jira/browse/HIVE-8065 > Project: Hive > Issue Type: Improvement >Affects Versions: 0.13.1 >Reporter: Sergio Peña >Assignee: Sergio Peña > > The new encryption support on HDFS makes Hive incompatible and unusable when > this feature is used. > HDFS encryption is designed so that an user can configure different > encryption zones (or directories) for multi-tenant environments. An > encryption zone has an exclusive encryption key, such as AES-128 or AES-256. > Because of security compliance, the HDFS does not allow to move/rename files > between encryption zones. Renames are allowed only inside the same encryption > zone. A copy is allowed between encryption zones. > See HDFS-6134 for more details about HDFS encryption design. > Hive currently uses a scratch directory (like /tmp/$user/$random). This > scratch directory is used for the output of intermediate data (between MR > jobs) and for the final output of the hive query which is later moved to the > table directory location. > If Hive tables are in different encryption zones than the scratch directory, > then Hive won't be able to renames those files/directories, and it will make > Hive unusable. > To handle this problem, we can change the scratch directory of the > query/statement to be inside the same encryption zone of the table directory > location. This way, the renaming process will be successful. > Also, for statements that move files between encryption zones (i.e. LOAD > DATA), a copy may be executed instead of a rename. This will cause an > overhead when copying large data files, but it won't break the encryption on > Hive. > Another security thing to consider is when using joins selects. If Hive joins > different tables with different encryption key strengths, then the results of > the select might break the security compliance of the tables. Let's say two > tables with 128 bits and 256 bits encryption are joined, then the temporary > results might be stored in the 128 bits encryption zone. This will conflict > with the table encrypted with 256 bits temporary. > To fix this, Hive should be able to select the scratch directory that is more > secured/encrypted in order to save the intermediate data temporary with no > compliance issues. > For instance: > {noformat} > SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id; > {noformat} > - This should use a scratch directory (or staging directory) inside the > table-aes256 table location. > {noformat} > INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1; > {noformat} > - This should use a scratch directory inside the table-aes1 location. > {noformat} > FROM table-unencrypted > INSERT OVERWRITE TABLE table-aes128 SELECT id, name > INSERT OVERWRITE TABLE table-aes256 SELECT id, name > {noformat} > - This should use a scratch directory on each of the tables locations. > - The first SELECT will have its scratch directory on table-aes128 directory. > - The second SELECT will have its scratch directory on table-aes256 directory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101125#comment-15101125 ] Rui Li commented on HIVE-12828: --- Looked at the log and error is {noformat} 2016-01-14T14:38:11,889 - 16/01/14 14:38:11 WARN TaskSetManager: Lost task 0.0 in stage 136.0 (TID 238, ip-10-233-128-9.us-west-1.compute.internal): java.io.IOException: java.lang.reflect.InvocationTargetException 2016-01-14T14:38:11,889 - at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) 2016-01-14T14:38:11,889 - at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:269) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:216) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:343) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:680) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:237) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 2016-01-14T14:38:11,890 - at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 2016-01-14T14:38:11,890 - at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 2016-01-14T14:38:11,890 - at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 2016-01-14T14:38:11,890 - at org.apache.spark.scheduler.Task.run(Task.scala:89) 2016-01-14T14:38:11,890 - at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213) 2016-01-14T14:38:11,890 - at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 2016-01-14T14:38:11,890 - at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 2016-01-14T14:38:11,890 - at java.lang.Thread.run(Thread.java:744) 2016-01-14T14:38:11,890 - Caused by: java.lang.reflect.InvocationTargetException 2016-01-14T14:38:11,890 - at sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source) 2016-01-14T14:38:11,890 - at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) 2016-01-14T14:38:11,890 - at java.lang.reflect.Constructor.newInstance(Constructor.java:526) 2016-01-14T14:38:11,890 - at org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:255) 2016-01-14T14:38:11,890 - ... 21 more 2016-01-14T14:38:11,891 - Caused by: java.lang.NoSuchMethodError: org.apache.parquet.schema.Types$MessageTypeBuilder.addFields([Lorg/apache/parquet/schema/Type;)Lorg/apache/parquet/schema/Types$BaseGroupBuilder; 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:160) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:223) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:248) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:94) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:80) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72) 2016-01-14T14:38:11,891 - at org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)
[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much
[ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101122#comment-15101122 ] Eugene Koifman commented on HIVE-12352: --- committed to 1.3 and 2.0 > CompactionTxnHandler.markCleaned() may delete too much > -- > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch > > >Worker will start with DB in state X (wrt this partition). >while it's working more txns will happen, against partition it's > compacting. >then this will delete state up to X and since then. There may be new > delta files created >between compaction starting and cleaning. These will not be compacted > until more >transactions happen. So this ideally should only delete >up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also > run >at READ_COMMITTED. So this means we'd want to store HWM in > COMPACTION_QUEUE when >Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted > since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to > these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and > txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + > info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one > of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). > At that point any metadata about an Aborted txn is gone and the system will > think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12802) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver.vector_join_filters.q failure
[ https://issues.apache.org/jira/browse/HIVE-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-12802: - Description: Discovered as part of running : mvn test -Dtest=TestMiniTezCliDriver -Dqfile_regex=vector.* -Dhive.cbo.returnpath.hiveop=true -Dtest.output.overwrite=true {code} SELECT sum(hash(a.key,a.value,b.key,b.value)) from myinput1 a LEFT OUTER JOIN myinput1 b ON (a.value=b.value AND a.key > 40 AND a.value > 50 AND a.key = a.value AND b.key > 40 AND b.value > 50 AND b.key = b.value) RIGHT OUTER JOIN myinput1 c ON (b.value=c.value AND c.key > 40 AND c.value > 50 AND c.key = c.value AND b.key > 40 AND b.value > 50 AND b.key = b.value) {code} {code} 2016-01-07T11:16:06,198 ERROR [657fd759-7643-467b-9bd0-17cb4958cb69 main[]]: parse.CalcitePlanner (CalcitePlanner.java:genOPTree(309)) - CBO failed, skipping CBO. java.lang.IndexOutOfBoundsException: index (10) must be less than size (6) at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) ~[guava-14.0.1.jar:?] at com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) ~[guava-14.0.1.jar:?] at com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:81) ~[guava-14.0.1.jar:?] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitInputRef(ExprNodeConverter.java:109) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitInputRef(ExprNodeConverter.java:79) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) ~[calcite-core-1.5.0.jar:1.5.0] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:128) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:79) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) ~[calcite-core-1.5.0.jar:1.5.0] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.convertToExprNode(HiveOpConverter.java:1153) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.translateJoin(HiveOpConverter.java:381) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:313) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:164) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:268) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:162) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:397) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:181) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.convert(HiveOpConverter.java:154) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedHiveOPDag(CalcitePlanner.java:688) ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:266) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10094) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:231) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237) [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT] at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471) [hive-exec-2.1.0-SNAPSHOT.jar:?] {code} was: Discovered as part of running : mvn test -Dtest=TestMiniTezCliDriver -Dqfile_regex=vector.* -Dhive.cbo.returnpath.hiveop=true -Dtest.output.overwrite=true {code} 2016-01-07T11:16:06,198 ERROR [657fd759-7643-467b-9bd0-17cb4958cb69 main[]]: parse.CalcitePlanner (CalcitePlanner.java:genOPTree(309)) - CBO failed, skipping CBO.
[jira] [Updated] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks
[ https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-12826: --- Attachment: HIVE-12826.1.patch > Vectorization: VectorUDAF* suspect isNull checks > > > Key: HIVE-12826 > URL: https://issues.apache.org/jira/browse/HIVE-12826 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.3.0, 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-12826.1.patch > > > for isRepeating=true, checking isNull[selected[i]] might return incorrect > results (without a heavy array fill of isNull). > VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this > pattern. > {code} > private void iterateHasNullsRepeatingSelectionWithAggregationSelection( > VectorAggregationBufferRow[] aggregationBufferSets, > int aggregateIndex, >value, > int batchSize, > int[] selection, > boolean[] isNull) { > > for (int i=0; i < batchSize; ++i) { > if (!isNull[selection[i]]) { > Aggregation myagg = getCurrentAggregationBuffer( > aggregationBufferSets, > aggregateIndex, > i); > myagg.sumValue(value); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101152#comment-15101152 ] Xuefu Zhang commented on HIVE-12828: I'm not sure if the test cleanup /thirdparty dir. I tested and worked for me. Could you give a try? > Update Spark version to 1.6 > --- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, > mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9862) Vectorized execution corrupts timestamp values
[ https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9862: --- Attachment: HIVE-9862.06.patch > Vectorized execution corrupts timestamp values > -- > > Key: HIVE-9862 > URL: https://issues.apache.org/jira/browse/HIVE-9862 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.0.0 >Reporter: Nathan Howell >Assignee: Matt McCline > Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, > HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch, HIVE-9862.06.patch > > > Timestamps in the future (year 2250?) and before ~1700 are silently corrupted > in vectorized execution mode. Simple repro: > {code} > hive> DROP TABLE IF EXISTS test; > hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC; > hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59'); > hive> SET hive.vectorized.execution.enabled = false; > hive> SELECT MAX(ts) FROM test; > -12-31 23:59:59 > hive> SET hive.vectorized.execution.enabled = true; > hive> SELECT MAX(ts) FROM test; > 1816-03-30 05:56:07.066277376 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099092#comment-15099092 ] Hive QA commented on HIVE-12828: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782359/HIVE-12828.2-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9866 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1030/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1030/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1030/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782359 - PreCommit-HIVE-SPARK-Build > Update Spark version to 1.6 > --- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, > mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
[ https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12657: Attachment: HIVE-12657.patch Simple patch. The relevant change is one-line, changing RR to LinkedHashMap for tables; the rest is comments, some cleanup, and out file updates. [~prasanth_j] [~pxiong] can you take a look? > selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8 > --- > > Key: HIVE-12657 > URL: https://issues.apache.org/jira/browse/HIVE-12657 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-12657.patch > > > Encountered this issue when analysing test failures of HIVE-12609. > selectDistinctStar.q produces the following diff when I ran with java version > "1.7.0_55" and java version "1.8.0_60" > {code} > < 128 val_128 128 > --- > > 128 128 val_128 > 1770c1770 > < 224 val_224 224 > --- > > 224 224 val_224 > 1776c1776 > < 369 val_369 369 > --- > > 369 369 val_369 > 1799,1810c1799,1810 > < 146 val_146 146 val_146 146 val_146 2008-04-08 11 > < 150 val_150 150 val_150 150 val_150 2008-04-08 11 > < 213 val_213 213 val_213 213 val_213 2008-04-08 11 > < 238 val_238 238 val_238 238 val_238 2008-04-08 11 > < 255 val_255 255 val_255 255 val_255 2008-04-08 11 > < 273 val_273 273 val_273 273 val_273 2008-04-08 11 > < 278 val_278 278 val_278 278 val_278 2008-04-08 11 > < 311 val_311 311 val_311 311 val_311 2008-04-08 11 > < 401 val_401 401 val_401 401 val_401 2008-04-08 11 > < 406 val_406 406 val_406 406 val_406 2008-04-08 11 > < 66val_66 66 val_66 66 val_66 2008-04-08 11 > < 98val_98 98 val_98 98 val_98 2008-04-08 11 > --- > > 146 val_146 2008-04-08 11 146 val_146 146 val_146 > > 150 val_150 2008-04-08 11 150 val_150 150 val_150 > > 213 val_213 2008-04-08 11 213 val_213 213 val_213 > > 238 val_238 2008-04-08 11 238 val_238 238 val_238 > > 255 val_255 2008-04-08 11 255 val_255 255 val_255 > > 273 val_273 2008-04-08 11 273 val_273 273 val_273 > > 278 val_278 2008-04-08 11 278 val_278 278 val_278 > > 311 val_311 2008-04-08 11 311 val_311 311 val_311 > > 401 val_401 2008-04-08 11 401 val_401 401 val_401 > > 406 val_406 2008-04-08 11 406 val_406 406 val_406 > > 66val_66 2008-04-08 11 66 val_66 66 val_66 > > 98val_98 2008-04-08 11 98 val_98 98 val_98 > 4212c4212 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12039) Fix TestSSL#testSSLVersion
[ https://issues.apache.org/jira/browse/HIVE-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-12039: Attachment: HIVE-12039.2.patch Another try. Test passes locally, let's see if it passes precommit. > Fix TestSSL#testSSLVersion > --- > > Key: HIVE-12039 > URL: https://issues.apache.org/jira/browse/HIVE-12039 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.3.0, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-12039.1.patch, HIVE-12039.2.patch > > > Looks like it's only run on Linux and failing after HIVE-11720. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12832) RDBMS schema changes for HIVE-11388
[ https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-12832: -- Attachment: HIVE-12832.uber.2.branch-1.patch Attaching patch used for branch-1. Note that this does update branch-1 to thrift 0.9.3. It had previously been 0.9.2. This brings the thrift version up to date with master. Per http://www.mail-archive.com/user%40thrift.apache.org/msg01282.html this should not cause backward incompatibility issues. > RDBMS schema changes for HIVE-11388 > --- > > Key: HIVE-12832 > URL: https://issues.apache.org/jira/browse/HIVE-12832 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Affects Versions: 1.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-12382.patch, HIVE-12832.3.patch, > HIVE-12832.uber.2.branch-1.patch, HIVE-12832.uber.2.branch-2.0.patch, > HIVE-12832.uber.2.patch, HIVE-12832.uber.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12854) LLAP: register permanent UDFs in the executors to make them usable, from localized jars
[ https://issues.apache.org/jira/browse/HIVE-12854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12854: --- Assignee: Sergey Shelukhin > LLAP: register permanent UDFs in the executors to make them usable, from > localized jars > --- > > Key: HIVE-12854 > URL: https://issues.apache.org/jira/browse/HIVE-12854 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()
[ https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-12875: Description: For every partition entity object present in sem.getInputs() and sem.getOutputs(), we must verify the appropriate Table in the list of Entities. (was: For every partition entity object present in sem.getInputs() and sem.getOutputs(), we must ensure that the appropriate Table is also added to the list of entities.) > Verify sem.getInputs() and sem.getOutputs() > --- > > Key: HIVE-12875 > URL: https://issues.apache.org/jira/browse/HIVE-12875 > Project: Hive > Issue Type: Bug >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > > For every partition entity object present in sem.getInputs() and > sem.getOutputs(), we must verify the appropriate Table in the list of > Entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12868) Fix empty operation-pool metrics
[ https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099102#comment-15099102 ] Jimmy Xiang commented on HIVE-12868: +1 > Fix empty operation-pool metrics > > > Key: HIVE-12868 > URL: https://issues.apache.org/jira/browse/HIVE-12868 > Project: Hive > Issue Type: Sub-task > Components: Diagnosability >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-12868.2.patch, HIVE-12868.patch > > > The newly-added operation pool metrics (thread-pool size, queue size) are > empty because metrics system is initialized too late. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12724: - Attachment: HIVE-12724.4.patch > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Blocker > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng > last_modified_time 1450137141
[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12724: - Attachment: (was: HIVE-12724.4.patch) > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Blocker > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng > last_modified_time
[jira] [Updated] (HIVE-12220) LLAP: Usability issues with hive.llap.io.cache.orc.size
[ https://issues.apache.org/jira/browse/HIVE-12220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12220: Attachment: HIVE-12220.02.patch This patch should allow specifying units for these settings (e.g. "4Mb"). This can later be expanded to other settings as needed. > LLAP: Usability issues with hive.llap.io.cache.orc.size > --- > > Key: HIVE-12220 > URL: https://issues.apache.org/jira/browse/HIVE-12220 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Carter Shanklin >Assignee: Sergey Shelukhin > Attachments: HIVE-12220.01.patch, HIVE-12220.02.patch, > HIVE-12220.patch, HIVE-12220.tmp.patch > > > In the llap-daemon site you need to set, among other things, > llap.daemon.memory.per.instance.mb > and > hive.llap.io.cache.orc.size > The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, > initially I entered the value in MB rather than in bytes. Operator error you > could say but I look at this as a fraction of the other value which is in mb. > Second, is this really tied to ORC? E.g. when we have the vectorized text > reader will this data be cached as well? Or might it be in the future? > I would like to propose instead using hive.llap.io.cache.size.mb for this > setting. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much
[ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099182#comment-15099182 ] Hive QA commented on HIVE-12352: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782361/HIVE-12352.3.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10019 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6627/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6627/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6627/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782361 - PreCommit-HIVE-TRUNK-Build > CompactionTxnHandler.markCleaned() may delete too much > -- > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch > > >Worker will start with DB in state X (wrt this partition). >while it's working more txns will happen, against partition it's > compacting. >then this will delete state up to X and since then. There may be new > delta files created >between compaction starting and cleaning. These will not be compacted > until more >transactions happen. So this ideally should only delete >up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also > run >at READ_COMMITTED. So this means we'd want to store HWM in > COMPACTION_QUEUE when >Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted > since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to > these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and > txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + > info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one > of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). > At that point any metadata about an Aborted txn is gone and the system will > think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.
[ https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099181#comment-15099181 ] Sergey Shelukhin commented on HIVE-12353: - Is it possible to include schema changes into a patch for HiveQA here? This separate thrift change development process makes no sense whatsoever and will delay commits because of HiveQA > When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it > should not. > --- > > Key: HIVE-12353 > URL: https://issues.apache.org/jira/browse/HIVE-12353 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12353.2.patch, HIVE-12353.3.patch, HIVE-12353.patch > > > One of the things that this method does is delete entries from TXN_COMPONENTS > for partition that it was trying to compact. > This causes Aborted transactions in TXNS to become empty according to > CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be > deleted. > Once they are deleted, data that belongs to these txns is deemed committed... > We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) > states. We should also not delete then entry from markedCleaned() > We'll have separate process that cleans 'f' and 's' records after X minutes > (or after > N records for a given partition exist). > This allows SHOW COMPACTIONS to show some history info and how many times > compaction failed on a given partition (subject to retention interval) so > that we don't have to call markCleaned() on Compactor failures at the same > time preventing Compactor to constantly getting stuck on the same bad > partition/table. > Ideally we'd want to include END_TIME field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much
[ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099205#comment-15099205 ] Eugene Koifman commented on HIVE-12352: --- committed to master https://github.com/apache/hive/commit/4935cfda78577bd63f1c4ae04a26dc307e640b6f > CompactionTxnHandler.markCleaned() may delete too much > -- > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch > > >Worker will start with DB in state X (wrt this partition). >while it's working more txns will happen, against partition it's > compacting. >then this will delete state up to X and since then. There may be new > delta files created >between compaction starting and cleaning. These will not be compacted > until more >transactions happen. So this ideally should only delete >up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also > run >at READ_COMMITTED. So this means we'd want to store HWM in > COMPACTION_QUEUE when >Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted > since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to > these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and > txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + > info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one > of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). > At that point any metadata about an Aborted txn is gone and the system will > think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction
[ https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12366: - Attachment: HIVE-12366.14.patch > Refactor Heartbeater logic for transaction > -- > > Key: HIVE-12366 > URL: https://issues.apache.org/jira/browse/HIVE-12366 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, > HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, > HIVE-12366.2.patch, HIVE-12366.3.patch, HIVE-12366.4.patch, > HIVE-12366.5.patch, HIVE-12366.6.patch, HIVE-12366.7.patch, > HIVE-12366.8.patch, HIVE-12366.9.patch > > > Currently there is a gap between the time locks acquisition and the first > heartbeat being sent out. Normally the gap is negligible, but when it's big > it will cause query fail since the locks are timed out by the time the > heartbeat is sent. > Need to remove this gap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction
[ https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12366: - Attachment: (was: HIVE-12366.14.patch) > Refactor Heartbeater logic for transaction > -- > > Key: HIVE-12366 > URL: https://issues.apache.org/jira/browse/HIVE-12366 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, > HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, > HIVE-12366.2.patch, HIVE-12366.3.patch, HIVE-12366.4.patch, > HIVE-12366.5.patch, HIVE-12366.6.patch, HIVE-12366.7.patch, > HIVE-12366.8.patch, HIVE-12366.9.patch > > > Currently there is a gap between the time locks acquisition and the first > heartbeat being sent out. Normally the gap is negligible, but when it's big > it will cause query fail since the locks are timed out by the time the > heartbeat is sent. > Need to remove this gap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12805) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver skewjoin.q failure
[ https://issues.apache.org/jira/browse/HIVE-12805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-12805: - Attachment: HIVE-12805.2.patch > CBO: Calcite Operator To Hive Operator (Calcite Return Path): > MiniTezCliDriver skewjoin.q failure > - > > Key: HIVE-12805 > URL: https://issues.apache.org/jira/browse/HIVE-12805 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-12805.1.patch, HIVE-12805.2.patch > > > Set hive.cbo.returnpath.hiveop=true > {code} > FROM T1 a FULL OUTER JOIN T2 c ON c.key+1=a.key SELECT /*+ STREAMTABLE(a) */ > sum(hash(a.key)), sum(hash(a.val)), sum(hash(c.key)) > {code} > The stack trace: > {code} > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > at java.util.ArrayList.rangeCheck(ArrayList.java:635) > at java.util.ArrayList.get(ArrayList.java:411) > at > org.apache.hadoop.hive.ql.ppd.SyntheticJoinPredicate$JoinSynthetic.process(SyntheticJoinPredicate.java:183) > at > org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89) > at > org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:43) > at > org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54) > at > org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54) > at > org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54) > at > org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120) > at > org.apache.hadoop.hive.ql.ppd.SyntheticJoinPredicate.transform(SyntheticJoinPredicate.java:100) > at > org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:236) > at > org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10170) > at > org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:231) > at > org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471) > {code} > Same error happens in auto_sortmerge_join_6.q.out for > {code} > select count(*) FROM tbl1 a JOIN tbl2 b ON a.key = b.key join src h on > h.value = a.value > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12868) Fix empty operation-pool metrics
[ https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-12868: - Attachment: HIVE-12868.2.patch Thanks for the suggestion, it is a better fix. I fixed the TestHs2, and also moved the code so it gets executed in same place by both HiveServer2 main method and MiniHs2.start() > Fix empty operation-pool metrics > > > Key: HIVE-12868 > URL: https://issues.apache.org/jira/browse/HIVE-12868 > Project: Hive > Issue Type: Sub-task > Components: Diagnosability >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-12868.2.patch, HIVE-12868.patch > > > The newly-added operation pool metrics (thread-pool size, queue size) are > empty because metrics system is initialized too late. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
[ https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099100#comment-15099100 ] Pengcheng Xiong commented on HIVE-12657: [~sershe], thanks for your prompt action. Patch looks good to me +1 pending QA run. Just one question. Shall we also make invRslvMap a LinkedHashMap too? > selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8 > --- > > Key: HIVE-12657 > URL: https://issues.apache.org/jira/browse/HIVE-12657 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > Attachments: HIVE-12657.patch > > > Encountered this issue when analysing test failures of HIVE-12609. > selectDistinctStar.q produces the following diff when I ran with java version > "1.7.0_55" and java version "1.8.0_60" > {code} > < 128 val_128 128 > --- > > 128 128 val_128 > 1770c1770 > < 224 val_224 224 > --- > > 224 224 val_224 > 1776c1776 > < 369 val_369 369 > --- > > 369 369 val_369 > 1799,1810c1799,1810 > < 146 val_146 146 val_146 146 val_146 2008-04-08 11 > < 150 val_150 150 val_150 150 val_150 2008-04-08 11 > < 213 val_213 213 val_213 213 val_213 2008-04-08 11 > < 238 val_238 238 val_238 238 val_238 2008-04-08 11 > < 255 val_255 255 val_255 255 val_255 2008-04-08 11 > < 273 val_273 273 val_273 273 val_273 2008-04-08 11 > < 278 val_278 278 val_278 278 val_278 2008-04-08 11 > < 311 val_311 311 val_311 311 val_311 2008-04-08 11 > < 401 val_401 401 val_401 401 val_401 2008-04-08 11 > < 406 val_406 406 val_406 406 val_406 2008-04-08 11 > < 66val_66 66 val_66 66 val_66 2008-04-08 11 > < 98val_98 98 val_98 98 val_98 2008-04-08 11 > --- > > 146 val_146 2008-04-08 11 146 val_146 146 val_146 > > 150 val_150 2008-04-08 11 150 val_150 150 val_150 > > 213 val_213 2008-04-08 11 213 val_213 213 val_213 > > 238 val_238 2008-04-08 11 238 val_238 238 val_238 > > 255 val_255 2008-04-08 11 255 val_255 255 val_255 > > 273 val_273 2008-04-08 11 273 val_273 273 val_273 > > 278 val_278 2008-04-08 11 278 val_278 278 val_278 > > 311 val_311 2008-04-08 11 311 val_311 311 val_311 > > 401 val_401 2008-04-08 11 401 val_401 401 val_401 > > 406 val_406 2008-04-08 11 406 val_406 406 val_406 > > 66val_66 2008-04-08 11 66 val_66 66 val_66 > > 98val_98 2008-04-08 11 98 val_98 98 val_98 > 4212c4212 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction
[ https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12366: - Attachment: HIVE-12366.14.patch patch 14 for test > Refactor Heartbeater logic for transaction > -- > > Key: HIVE-12366 > URL: https://issues.apache.org/jira/browse/HIVE-12366 > Project: Hive > Issue Type: Bug > Components: Transactions >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, > HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, > HIVE-12366.2.patch, HIVE-12366.3.patch, HIVE-12366.4.patch, > HIVE-12366.5.patch, HIVE-12366.6.patch, HIVE-12366.7.patch, > HIVE-12366.8.patch, HIVE-12366.9.patch > > > Currently there is a gap between the time locks acquisition and the first > heartbeat being sent out. Normally the gap is negligible, but when it's big > it will cause query fail since the locks are timed out by the time the > heartbeat is sent. > Need to remove this gap. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12724: - Attachment: HIVE-12724.ADDENDUM.1.patch > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.ADDENDUM.1.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng > last_modified_time 1450137141 > numFiles2 > numRows -1 >
[jira] [Commented] (HIVE-12832) RDBMS schema changes for HIVE-11388
[ https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098727#comment-15098727 ] Alan Gates commented on HIVE-12832: --- Committed uber.2 patch to master. Will also commit to branch-2.0 and branch-1 shortly. > RDBMS schema changes for HIVE-11388 > --- > > Key: HIVE-12832 > URL: https://issues.apache.org/jira/browse/HIVE-12832 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Affects Versions: 1.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-12382.patch, HIVE-12832.3.patch, > HIVE-12832.uber.2.patch, HIVE-12832.uber.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12783) fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl
[ https://issues.apache.org/jira/browse/HIVE-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098687#comment-15098687 ] Sergey Shelukhin commented on HIVE-12783: - [~owen.omalley] Should this be committed? As far as I see HiveQA has run on the latest patch and there's a +1 > fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl > - > > Key: HIVE-12783 > URL: https://issues.apache.org/jira/browse/HIVE-12783 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Owen O'Malley >Priority: Blocker > Attachments: HIVE-12783.patch, HIVE-12783.patch, HIVE-12783.patch > > > This includes > {code} > org.apache.hive.spark.client.TestSparkClient.testSyncRpc > org.apache.hive.spark.client.TestSparkClient.testJobSubmission > org.apache.hive.spark.client.TestSparkClient.testMetricsCollection > org.apache.hive.spark.client.TestSparkClient.testCounters > org.apache.hive.spark.client.TestSparkClient.testRemoteClient > org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles > org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob > org.apache.hive.spark.client.TestSparkClient.testErrorJob > org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse > org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse > {code} > all of them passed on my laptop. cc'ing [~szehon], [~xuefuz], could you > please take a look? Shall we ignore them? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098704#comment-15098704 ] Wei Zheng commented on HIVE-12724: -- [~ekoifman] Can you review the two patches below? Addendum of patch 3 for master: HIVE-12724.ADDENDUM.1.patch Complete patch for branch-1: HIVE-12724.branch-1.2.patch > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type:
[jira] [Commented] (HIVE-12695) LLAP: use somebody else's cluster
[ https://issues.apache.org/jira/browse/HIVE-12695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098731#comment-15098731 ] Sergey Shelukhin commented on HIVE-12695: - After some discussion we have decided against pursuing this model. It may interfere with future, improved security model and we don't want to have to support it in future. > LLAP: use somebody else's cluster > - > > Key: HIVE-12695 > URL: https://issues.apache.org/jira/browse/HIVE-12695 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-12695.patch > > > For non-HS2 case cluster sharing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much
[ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098733#comment-15098733 ] Eugene Koifman commented on HIVE-12352: --- [~sershe] Alan will commit HIVE-12832 later today and I can commit this right after. I would really like to get HIVE-12353 into 2.0 as well - I should have a patch tomorrow > CompactionTxnHandler.markCleaned() may delete too much > -- > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12352.2.patch, HIVE-12352.patch > > >Worker will start with DB in state X (wrt this partition). >while it's working more txns will happen, against partition it's > compacting. >then this will delete state up to X and since then. There may be new > delta files created >between compaction starting and cleaning. These will not be compacted > until more >transactions happen. So this ideally should only delete >up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also > run >at READ_COMMITTED. So this means we'd want to store HWM in > COMPACTION_QUEUE when >Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted > since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to > these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and > txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + > info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one > of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). > At that point any metadata about an Aborted txn is gone and the system will > think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12523) display Hive query name in explain plan
[ https://issues.apache.org/jira/browse/HIVE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12523: Description: Query name is being added by HIVE-12357 NO PRECOMMIT TESTS was: Query name is being added by HIVE-12357 > display Hive query name in explain plan > --- > > Key: HIVE-12523 > URL: https://issues.apache.org/jira/browse/HIVE-12523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12523.01.patch, HIVE-12523.patch > > > Query name is being added by HIVE-12357 > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12523) display Hive query name in explain plan
[ https://issues.apache.org/jira/browse/HIVE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12523: Description: Query name is being added by HIVE-12357 was: Query name is being added by HIVE-12357 NO PRECOMMIT TESTS > display Hive query name in explain plan > --- > > Key: HIVE-12523 > URL: https://issues.apache.org/jira/browse/HIVE-12523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12523.01.patch, HIVE-12523.patch > > > Query name is being added by HIVE-12357 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much
[ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098741#comment-15098741 ] Sergey Shelukhin commented on HIVE-12352: - How about a HiveQA run for this? > CompactionTxnHandler.markCleaned() may delete too much > -- > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12352.2.patch, HIVE-12352.patch > > >Worker will start with DB in state X (wrt this partition). >while it's working more txns will happen, against partition it's > compacting. >then this will delete state up to X and since then. There may be new > delta files created >between compaction starting and cleaning. These will not be compacted > until more >transactions happen. So this ideally should only delete >up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also > run >at READ_COMMITTED. So this means we'd want to store HWM in > COMPACTION_QUEUE when >Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted > since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to > these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and > txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + > info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one > of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). > At that point any metadata about an Aborted txn is gone and the system will > think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12794) LLAP cannot run queries against HBase due to missing HBase jars
[ https://issues.apache.org/jira/browse/HIVE-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098701#comment-15098701 ] Sergey Shelukhin commented on HIVE-12794: - Posted RB. https://reviews.apache.org/r/42318/diff/1-2/ are the changes that are not yet +1-d > LLAP cannot run queries against HBase due to missing HBase jars > --- > > Key: HIVE-12794 > URL: https://issues.apache.org/jira/browse/HIVE-12794 > Project: Hive > Issue Type: Bug >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-12794.01.patch, HIVE-12794.02.patch, > HIVE-12794.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12794) LLAP cannot run queries against HBase due to missing HBase jars
[ https://issues.apache.org/jira/browse/HIVE-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098715#comment-15098715 ] Gunther Hagleitner commented on HIVE-12794: --- +1 to .02 (latest) > LLAP cannot run queries against HBase due to missing HBase jars > --- > > Key: HIVE-12794 > URL: https://issues.apache.org/jira/browse/HIVE-12794 > Project: Hive > Issue Type: Bug >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-12794.01.patch, HIVE-12794.02.patch, > HIVE-12794.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12523) display Hive query name in explain plan
[ https://issues.apache.org/jira/browse/HIVE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12523: Target Version/s: 2.0.0 Description: Query name is being added by HIVE-12357 NO PRECOMMIT TESTS Let's see if we can get these in for 2.0 was: Query name is being added by HIVE-12357 NO PRECOMMIT TESTS > display Hive query name in explain plan > --- > > Key: HIVE-12523 > URL: https://issues.apache.org/jira/browse/HIVE-12523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12523.01.patch, HIVE-12523.patch > > > Query name is being added by HIVE-12357 > NO PRECOMMIT TESTS > Let's see if we can get these in for 2.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much
[ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098747#comment-15098747 ] Eugene Koifman commented on HIVE-12352: --- I'll rebase and and get it going > CompactionTxnHandler.markCleaned() may delete too much > -- > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12352.2.patch, HIVE-12352.patch > > >Worker will start with DB in state X (wrt this partition). >while it's working more txns will happen, against partition it's > compacting. >then this will delete state up to X and since then. There may be new > delta files created >between compaction starting and cleaning. These will not be compacted > until more >transactions happen. So this ideally should only delete >up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also > run >at READ_COMMITTED. So this means we'd want to store HWM in > COMPACTION_QUEUE when >Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted > since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to > these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and > txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + > info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one > of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). > At that point any metadata about an Aborted txn is gone and the system will > think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101220#comment-15101220 ] Xuefu Zhang commented on HIVE-12828: I think we are okay. Please feel free to commit the patch. We will address the env issue tomorrow. > Update Spark version to 1.6 > --- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, > mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101183#comment-15101183 ] Xuefu Zhang commented on HIVE-12828: +1 > Update Spark version to 1.6 > --- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, > mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12661) StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly
[ https://issues.apache.org/jira/browse/HIVE-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101228#comment-15101228 ] Ashutosh Chauhan commented on HIVE-12661: - +1 pending QA run > StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly > --- > > Key: HIVE-12661 > URL: https://issues.apache.org/jira/browse/HIVE-12661 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-12661.01.patch, HIVE-12661.02.patch, > HIVE-12661.03.patch, HIVE-12661.04.patch, HIVE-12661.05.patch, > HIVE-12661.06.patch, HIVE-12661.07.patch, HIVE-12661.08.patch, > HIVE-12661.09.patch, HIVE-12661.10.patch, HIVE-12661.11.patch > > > PROBLEM: > Hive stats are autogathered properly till an 'analyze table [tablename] > compute statistics for columns' is run. Then it does not auto-update the > stats till the command is run again. repo: > {code} > set hive.stats.autogather=true; > set hive.stats.atomic=false ; > set hive.stats.collect.rawdatasize=true ; > set hive.stats.collect.scancols=false ; > set hive.stats.collect.tablekeys=false ; > set hive.stats.fetch.column.stats=true; > set hive.stats.fetch.partition.stats=true ; > set hive.stats.reliable=false ; > set hive.compute.query.using.stats=true; > CREATE TABLE `default`.`calendar` (`year` int) ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( > 'orc.compress'='NONE') ; > insert into calendar values (2010), (2011), (2012); > select * from calendar; > ++--+ > | calendar.year | > ++--+ > | 2010 | > | 2011 | > | 2012 | > ++--+ > select max(year) from calendar; > | 2012 | > insert into calendar values (2013); > select * from calendar; > ++--+ > | calendar.year | > ++--+ > | 2010 | > | 2011 | > | 2012 | > | 2013 | > ++--+ > select max(year) from calendar; > | 2013 | > insert into calendar values (2014); > select max(year) from calendar; > | 2014 | > analyze table calendar compute statistics for columns; > insert into calendar values (2015); > select max(year) from calendar; > | 2014 | > insert into calendar values (2016), (2017), (2018); > select max(year) from calendar; > | 2014 | > analyze table calendar compute statistics for columns; > select max(year) from calendar; > | 2018 | > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9862) Vectorized execution corrupts timestamp values
[ https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101203#comment-15101203 ] Hive QA commented on HIVE-9862: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782388/HIVE-9862.06.patch {color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 9718 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTypeCasts.testCastTimestampToDouble org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testSparkQuery org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6630/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6630/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6630/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 11 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782388 - PreCommit-HIVE-TRUNK-Build > Vectorized execution corrupts timestamp values > -- > > Key: HIVE-9862 > URL: https://issues.apache.org/jira/browse/HIVE-9862 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.0.0 >Reporter: Nathan Howell >Assignee: Matt McCline > Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, > HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch, HIVE-9862.06.patch > > > Timestamps in the future (year 2250?) and before ~1700 are silently corrupted > in vectorized execution mode. Simple repro: > {code} > hive> DROP TABLE IF EXISTS test; > hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC; > hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59'); > hive> SET hive.vectorized.execution.enabled = false; > hive> SELECT MAX(ts) FROM test; > -12-31 23:59:59 > hive> SET hive.vectorized.execution.enabled = true; > hive> SELECT MAX(ts) FROM test; > 1816-03-30 05:56:07.066277376 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101204#comment-15101204 ] Rui Li commented on HIVE-12828: --- [~xuefuz], do we need to make parquet_join pass here? > Update Spark version to 1.6 > --- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, > mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks
[ https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101201#comment-15101201 ] Matt McCline commented on HIVE-12826: - +1 LGTM > Vectorization: VectorUDAF* suspect isNull checks > > > Key: HIVE-12826 > URL: https://issues.apache.org/jira/browse/HIVE-12826 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.3.0, 2.0.0, 2.1.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-12826.1.patch > > > for isRepeating=true, checking isNull[selected[i]] might return incorrect > results (without a heavy array fill of isNull). > VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this > pattern. > {code} > private void iterateHasNullsRepeatingSelectionWithAggregationSelection( > VectorAggregationBufferRow[] aggregationBufferSets, > int aggregateIndex, >value, > int batchSize, > int[] selection, > boolean[] isNull) { > > for (int i=0; i < batchSize; ++i) { > if (!isNull[selection[i]]) { > Aggregation myagg = getCurrentAggregationBuffer( > aggregationBufferSets, > aggregateIndex, > i); > myagg.sumValue(value); > } > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
[ https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-12863: --- Attachment: HIVE-12863.01.patch > fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union > - > > Key: HIVE-12863 > URL: https://issues.apache.org/jira/browse/HIVE-12863 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-12863.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
[ https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101243#comment-15101243 ] Pengcheng Xiong commented on HIVE-12863: [~alangates] and [~daijy], could you take a look at the patch? I think it is related to the HBaseStore. Thanks. > fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union > - > > Key: HIVE-12863 > URL: https://issues.apache.org/jira/browse/HIVE-12863 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-12863.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
[ https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-12863: --- Attachment: HIVE-12863.01.patch > fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union > - > > Key: HIVE-12863 > URL: https://issues.apache.org/jira/browse/HIVE-12863 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-12863.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
[ https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-12863: --- Attachment: (was: HIVE-12863.01.patch) > fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union > - > > Key: HIVE-12863 > URL: https://issues.apache.org/jira/browse/HIVE-12863 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-12863.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9862) Vectorized execution corrupts timestamp values
[ https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099038#comment-15099038 ] Hive QA commented on HIVE-9862: --- Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782279/HIVE-9862.05.patch {color:red}ERROR:{color} -1 due to build exiting with an error Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6626/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6626/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6626/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult [localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6626/succeeded/TestAcidUtils, remoteFile=/home/hiveptest/174.129.104.177-hiveptest-2/logs/, getExitCode()=12, getException()=null, getUser()=hiveptest, getHost()=174.129.104.177, getInstance()=2]: 'ssh: connect to host 174.129.104.177 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.6] ssh: connect to host 174.129.104.177 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh: connect to host 174.129.104.177 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh: connect to host 174.129.104.177 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ssh: connect to host 174.129.104.177 port 22: Connection timed out rsync: connection unexpectedly closed (0 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(600) [receiver=3.0.6] ' {noformat} This message is automatically generated. ATTACHMENT ID: 12782279 - PreCommit-HIVE-TRUNK-Build > Vectorized execution corrupts timestamp values > -- > > Key: HIVE-9862 > URL: https://issues.apache.org/jira/browse/HIVE-9862 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.0.0 >Reporter: Nathan Howell >Assignee: Matt McCline > Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, > HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch > > > Timestamps in the future (year 2250?) and before ~1700 are silently corrupted > in vectorized execution mode. Simple repro: > {code} > hive> DROP TABLE IF EXISTS test; > hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC; > hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59'); > hive> SET hive.vectorized.execution.enabled = false; > hive> SELECT MAX(ts) FROM test; > -12-31 23:59:59 > hive> SET hive.vectorized.execution.enabled = true; > hive> SELECT MAX(ts) FROM test; > 1816-03-30 05:56:07.066277376 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()
[ https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099144#comment-15099144 ] Sushanth Sowmyan commented on HIVE-12875: - (Will upload patch shortly) > Verify sem.getInputs() and sem.getOutputs() > --- > > Key: HIVE-12875 > URL: https://issues.apache.org/jira/browse/HIVE-12875 > Project: Hive > Issue Type: Bug >Reporter: Sushanth Sowmyan >Assignee: Sushanth Sowmyan > > For every partition entity object present in sem.getInputs() and > sem.getOutputs(), we must ensure that the appropriate Table is also added to > the list of entities. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12661) StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly
[ https://issues.apache.org/jira/browse/HIVE-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-12661: --- Attachment: HIVE-12661.11.patch address [~ashutoshc]'s comments. > StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly > --- > > Key: HIVE-12661 > URL: https://issues.apache.org/jira/browse/HIVE-12661 > Project: Hive > Issue Type: Bug >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-12661.01.patch, HIVE-12661.02.patch, > HIVE-12661.03.patch, HIVE-12661.04.patch, HIVE-12661.05.patch, > HIVE-12661.06.patch, HIVE-12661.07.patch, HIVE-12661.08.patch, > HIVE-12661.09.patch, HIVE-12661.10.patch, HIVE-12661.11.patch > > > PROBLEM: > Hive stats are autogathered properly till an 'analyze table [tablename] > compute statistics for columns' is run. Then it does not auto-update the > stats till the command is run again. repo: > {code} > set hive.stats.autogather=true; > set hive.stats.atomic=false ; > set hive.stats.collect.rawdatasize=true ; > set hive.stats.collect.scancols=false ; > set hive.stats.collect.tablekeys=false ; > set hive.stats.fetch.column.stats=true; > set hive.stats.fetch.partition.stats=true ; > set hive.stats.reliable=false ; > set hive.compute.query.using.stats=true; > CREATE TABLE `default`.`calendar` (`year` int) ROW FORMAT SERDE > 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( > 'orc.compress'='NONE') ; > insert into calendar values (2010), (2011), (2012); > select * from calendar; > ++--+ > | calendar.year | > ++--+ > | 2010 | > | 2011 | > | 2012 | > ++--+ > select max(year) from calendar; > | 2012 | > insert into calendar values (2013); > select * from calendar; > ++--+ > | calendar.year | > ++--+ > | 2010 | > | 2011 | > | 2012 | > | 2013 | > ++--+ > select max(year) from calendar; > | 2013 | > insert into calendar values (2014); > select max(year) from calendar; > | 2014 | > analyze table calendar compute statistics for columns; > insert into calendar values (2015); > select max(year) from calendar; > | 2014 | > insert into calendar values (2016), (2017), (2018); > select max(year) from calendar; > | 2014 | > analyze table calendar compute statistics for columns; > select max(year) from calendar; > | 2018 | > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12832) RDBMS schema changes for HIVE-11388
[ https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-12832: -- Attachment: HIVE-12832.uber.2.branch-2.0.patch Attaching the patch I used for branch-2.0 > RDBMS schema changes for HIVE-11388 > --- > > Key: HIVE-12832 > URL: https://issues.apache.org/jira/browse/HIVE-12832 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Affects Versions: 1.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-12382.patch, HIVE-12832.3.patch, > HIVE-12832.uber.2.branch-2.0.patch, HIVE-12832.uber.2.patch, > HIVE-12832.uber.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10632) Make sure TXN_COMPONENTS gets cleaned up if table is dropped before compaction.
[ https://issues.apache.org/jira/browse/HIVE-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099194#comment-15099194 ] Eugene Koifman commented on HIVE-10632: --- Also, note that insert into T values(...) generates an auxiliary values_tmp_table_N which ends up in ACID metastore tables. This is a temp table so it gets cleaned up but it still causes "noise" in ACID subsystem. > Make sure TXN_COMPONENTS gets cleaned up if table is dropped before > compaction. > --- > > Key: HIVE-10632 > URL: https://issues.apache.org/jira/browse/HIVE-10632 > Project: Hive > Issue Type: Bug > Components: Metastore, Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > > The compaction process will clean up entries in TXNS, > COMPLETED_TXN_COMPONENTS, TXN_COMPONENTS. If the table/partition is dropped > before compaction is complete there will be data left in these tables. Need > to investigate if there are other situations where this may happen and > address it. > see HIVE-10595 for additional info -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12876) OpenCSV Serde treats everything as strings.
[ https://issues.apache.org/jira/browse/HIVE-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-12876: --- Affects Version/s: 1.2.1 > OpenCSV Serde treats everything as strings. > --- > > Key: HIVE-12876 > URL: https://issues.apache.org/jira/browse/HIVE-12876 > Project: Hive > Issue Type: Bug > Components: Hive, Serializers/Deserializers >Affects Versions: 1.2.1 >Reporter: Carter Shanklin > > This one caught me by surprise after some wrong results. I'm filing this as > Brock suggested in HIVE-. Back to Ctrl-A for me it seems. > To repro: > {code} > drop table int_table; > create table int_table( > x int > ) > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'; > show create table int_table; > {code} > And note that x is a string. > Applicable to Hive 1.2.1. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12877) Hive use index for queries will lose some data if the Query file is compressed.
[ https://issues.apache.org/jira/browse/HIVE-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yangfang updated HIVE-12877: Description: Hive created the index using the extracted file length when the file is the compressed, but when to divide the data into pieces in MapReduce,Hive use the file length to compare with the extracted file length,if If it found that these two lengths are not matched, It filters out the file.So the query will lose some data. was: Hive created the index using the extracted file length when the file is the compressed, but when to divide the data into pieces in MapReduce,Hive use the file length to compare with the extracted file length,if If it found that these two lengths are not matched, It filters out the file.So the query will lose some data > Hive use index for queries will lose some data if the Query file is > compressed. > --- > > Key: HIVE-12877 > URL: https://issues.apache.org/jira/browse/HIVE-12877 > Project: Hive > Issue Type: Bug > Components: Indexing >Affects Versions: 1.2.1 > Environment: This problem exists in all Hive versions.no matter what > platform >Reporter: yangfang > Attachments: HIVE-12877.patch > > > Hive created the index using the extracted file length when the file is the > compressed, > but when to divide the data into pieces in MapReduce,Hive use the file length > to compare with the extracted file length,if > If it found that these two lengths are not matched, It filters out the > file.So the query will lose some data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12839) Upgrade Hive to Calcite 1.6
[ https://issues.apache.org/jira/browse/HIVE-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-12839: --- Attachment: HIVE-12839.02.patch > Upgrade Hive to Calcite 1.6 > --- > > Key: HIVE-12839 > URL: https://issues.apache.org/jira/browse/HIVE-12839 > Project: Hive > Issue Type: Improvement >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-12839.01.patch, HIVE-12839.02.patch > > > CLEAR LIBRARY CACHE > Upgrade Hive to Calcite 1.6.0-incubating. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12877) Hive use index for queries will lose some data if the Query file is compressed.
[ https://issues.apache.org/jira/browse/HIVE-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yangfang updated HIVE-12877: Description: Hive created the index using the extracted file length when the file is the compressed, but when to divide the data into pieces in MapReduce,Hive use the file length to compare with the extracted file length,if If it found that these two lengths are not matched, It filters out the file.So the query will lose some data. I modified the source code and make hive index can be used when the files is compressed,please test it. was: Hive created the index using the extracted file length when the file is the compressed, but when to divide the data into pieces in MapReduce,Hive use the file length to compare with the extracted file length,if If it found that these two lengths are not matched, It filters out the file.So the query will lose some data. > Hive use index for queries will lose some data if the Query file is > compressed. > --- > > Key: HIVE-12877 > URL: https://issues.apache.org/jira/browse/HIVE-12877 > Project: Hive > Issue Type: Bug > Components: Indexing >Affects Versions: 1.2.1 > Environment: This problem exists in all Hive versions.no matter what > platform >Reporter: yangfang > Attachments: HIVE-12877.patch > > > Hive created the index using the extracted file length when the file is the > compressed, > but when to divide the data into pieces in MapReduce,Hive use the file length > to compare with the extracted file length,if > If it found that these two lengths are not matched, It filters out the > file.So the query will lose some data. > I modified the source code and make hive index can be used when the files is > compressed,please test it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098777#comment-15098777 ] Hive QA commented on HIVE-12828: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782315/HIVE-12828.2-spark.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 9312 tests executed *Failed tests:* {noformat} TestCliDriver-auto_sortmerge_join_7.q-exim_04_evolved_parts.q-query_with_semi.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-bool_literal.q-authorization_cli_createtab.q-explain_ddl.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-bucketsortoptimize_insert_7.q-list_bucket_query_multiskew_1.q-skewjoin_noskew.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-cp_mj_rc.q-decimal_2.q-union32.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-groupby_map_ppr_multi_distinct.q-vectorization_16.q-union_remove_15.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-join39.q-exim_07_all_part_over_nonoverlap.q-cbo_windowing.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-join9.q-insert_values_partitioned.q-progress_1.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-join_cond_pushdown_unqual4.q-udf_var_samp.q-load_dyn_part2.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-metadata_export_drop.q-udf_sin.q-udf_reverse.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-orc_split_elimination.q-udf_xpath_string.q-partition_wise_fileformat.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-ptf_general_queries.q-unionDistinct_1.q-groupby1_noskew.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-push_or.q-infer_bucket_sort_list_bucket.q-vector_interval_2.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-script_pipe.q-auto_join24.q-cast1.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-show_conf.q-udaf_covar_samp.q-udf_md5.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-skewjoin_mapjoin4.q-groupby6_map.q-cbo_rp_union.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-smb_mapjoin_4.q-udf_asin.q-udf_to_unix_timestamp.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-stats13.q-join_parse.q-sort_merge_join_desc_2.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-timestamp_lazy.q-union29.q-groupby_ppd.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-timestamp_literal.q-inputddl8.q-runtime_skewjoin_mapjoin_spark.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-udf_current_user.q-join44.q-union2.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-udf_nvl.q-alter_char1.q-serde_reported_schema.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-union36.q-acid_join.q-part_inherit_tbl_props_empty.q-and-6-more - did not produce a TEST-*.xml file TestCliDriver-varchar_nested_types.q-leadlag.q-semicolon.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-vector_distinct_2.q-nullscript.q-vector_char_mapjoin1.q-and-12-more - did not produce a TEST-*.xml file TestCliDriver-vectorization_limit.q-union19.q-groupby_grouping_sets6.q-and-12-more - did not produce a TEST-*.xml file TestHWISessionManager - did not produce a TEST-*.xml file TestMiniTezCliDriver-auto_join30.q-vector_decimal_10_0.q-schema_evol_orc_acidvec_mapwork_part.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_grouping_sets.q-mapjoin_mapjoin.q-update_all_partitioned.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_interval_2.q-constprog_dpp.q-dynamic_partition_pruning.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vector_non_string_partition.q-delete_where_non_partitioned.q-auto_sortmerge_join_16.q-and-12-more - did not produce a TEST-*.xml file TestMiniTezCliDriver-vectorization_13.q-tez_bmj_schema_evolution.q-bucket3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-auto_join30.q-join9.q-input17.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-avro_joins.q-join36.q-join4.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-bucketmapjoin3.q-enforce_order.q-union11.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby6_map.q-join13.q-join_reorder3.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-groupby_grouping_id2.q-vectorization_13.q-auto_sortmerge_join_13.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-input1_limit.q-groupby8_map.q-varchar_join1.q-and-12-more - did not produce a TEST-*.xml file TestSparkCliDriver-join_cond_pushdown_3.q-groupby7.q-auto_join17.q-and-12-more - did not produce a TEST-*.xml
[jira] [Updated] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much
[ https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12352: -- Attachment: HIVE-12352.3.patch > CompactionTxnHandler.markCleaned() may delete too much > -- > > Key: HIVE-12352 > URL: https://issues.apache.org/jira/browse/HIVE-12352 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch > > >Worker will start with DB in state X (wrt this partition). >while it's working more txns will happen, against partition it's > compacting. >then this will delete state up to X and since then. There may be new > delta files created >between compaction starting and cleaning. These will not be compacted > until more >transactions happen. So this ideally should only delete >up to TXN_ID that was compacted (i.e. HWM in Worker?) Then this can also > run >at READ_COMMITTED. So this means we'd want to store HWM in > COMPACTION_QUEUE when >Worker picks up the job. > Actually the problem is even worse (but also solved using HWM as above): > Suppose some transactions (against same partition) have started and aborted > since the time Worker ran compaction job. > That means there are never-compacted delta files with data that belongs to > these aborted txns. > Following will pick up these aborted txns. > s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and > txn_state = '" + > TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and > tc_table = '" + > info.tableName + "'"; > if (info.partName != null) s += " and tc_partition = '" + > info.partName + "'"; > The logic after that will delete relevant data from TXN_COMPONENTS and if one > of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). > At that point any metadata about an Aborted txn is gone and the system will > think it's committed. > HWM in this case would be (in ValidCompactorTxnList) > if(minOpenTxn > 0) > min(highWaterMark, minOpenTxn) > else > highWaterMark -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12868) Fix empty operation-pool metrics
[ https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098866#comment-15098866 ] Szehon Ho commented on HIVE-12868: -- [~jxiang] could you help me take a quick look at this patch? Thanks > Fix empty operation-pool metrics > > > Key: HIVE-12868 > URL: https://issues.apache.org/jira/browse/HIVE-12868 > Project: Hive > Issue Type: Sub-task > Components: Diagnosability >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-12868.patch > > > The newly-added operation pool metrics (thread-pool size, queue size) are > empty because metrics system is initialized too late. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
[ https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12657: --- Assignee: Sergey Shelukhin (was: Pengcheng Xiong) > selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8 > --- > > Key: HIVE-12657 > URL: https://issues.apache.org/jira/browse/HIVE-12657 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Prasanth Jayachandran >Assignee: Sergey Shelukhin > > Encountered this issue when analysing test failures of HIVE-12609. > selectDistinctStar.q produces the following diff when I ran with java version > "1.7.0_55" and java version "1.8.0_60" > {code} > < 128 val_128 128 > --- > > 128 128 val_128 > 1770c1770 > < 224 val_224 224 > --- > > 224 224 val_224 > 1776c1776 > < 369 val_369 369 > --- > > 369 369 val_369 > 1799,1810c1799,1810 > < 146 val_146 146 val_146 146 val_146 2008-04-08 11 > < 150 val_150 150 val_150 150 val_150 2008-04-08 11 > < 213 val_213 213 val_213 213 val_213 2008-04-08 11 > < 238 val_238 238 val_238 238 val_238 2008-04-08 11 > < 255 val_255 255 val_255 255 val_255 2008-04-08 11 > < 273 val_273 273 val_273 273 val_273 2008-04-08 11 > < 278 val_278 278 val_278 278 val_278 2008-04-08 11 > < 311 val_311 311 val_311 311 val_311 2008-04-08 11 > < 401 val_401 401 val_401 401 val_401 2008-04-08 11 > < 406 val_406 406 val_406 406 val_406 2008-04-08 11 > < 66val_66 66 val_66 66 val_66 2008-04-08 11 > < 98val_98 98 val_98 98 val_98 2008-04-08 11 > --- > > 146 val_146 2008-04-08 11 146 val_146 146 val_146 > > 150 val_150 2008-04-08 11 150 val_150 150 val_150 > > 213 val_213 2008-04-08 11 213 val_213 213 val_213 > > 238 val_238 2008-04-08 11 238 val_238 238 val_238 > > 255 val_255 2008-04-08 11 255 val_255 255 val_255 > > 273 val_273 2008-04-08 11 273 val_273 273 val_273 > > 278 val_278 2008-04-08 11 278 val_278 278 val_278 > > 311 val_311 2008-04-08 11 311 val_311 311 val_311 > > 401 val_401 2008-04-08 11 401 val_401 401 val_401 > > 406 val_406 2008-04-08 11 406 val_406 406 val_406 > > 66val_66 2008-04-08 11 66 val_66 66 val_66 > > 98val_98 2008-04-08 11 98 val_98 98 val_98 > 4212c4212 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12864) StackOverflowError parsing queries with very large predicates
[ https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098885#comment-15098885 ] Hive QA commented on HIVE-12864: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782246/HIVE-12864.01.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10018 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6625/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6625/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6625/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782246 - PreCommit-HIVE-TRUNK-Build > StackOverflowError parsing queries with very large predicates > - > > Key: HIVE-12864 > URL: https://issues.apache.org/jira/browse/HIVE-12864 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0, 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-12864.01.patch, HIVE-12864.patch > > > We have seen that queries with very large predicates might fail with the > following stacktrace: > {noformat} > 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in > thread "main" java.lang.StackOverflowError > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12
[jira] [Commented] (HIVE-12864) StackOverflowError parsing queries with very large predicates
[ https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098893#comment-15098893 ] Jesus Camacho Rodriguez commented on HIVE-12864: Clean QA. [~ashutoshc]/[~jpullokkaran], could you take a look? Thanks > StackOverflowError parsing queries with very large predicates > - > > Key: HIVE-12864 > URL: https://issues.apache.org/jira/browse/HIVE-12864 > Project: Hive > Issue Type: Bug > Components: Parser >Affects Versions: 2.0.0, 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-12864.01.patch, HIVE-12864.patch > > > We have seen that queries with very large predicates might fail with the > following stacktrace: > {noformat} > 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in > thread "main" java.lang.StackOverflowError > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at > org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146) > 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at >
[jira] [Commented] (HIVE-12868) Fix empty operation-pool metrics
[ https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098928#comment-15098928 ] Jimmy Xiang commented on HIVE-12868: This patch is ok. +1 Are you going to have a follow up patch to fix miniHs2/test? > Fix empty operation-pool metrics > > > Key: HIVE-12868 > URL: https://issues.apache.org/jira/browse/HIVE-12868 > Project: Hive > Issue Type: Sub-task > Components: Diagnosability >Reporter: Szehon Ho >Assignee: Szehon Ho > Attachments: HIVE-12868.patch > > > The newly-added operation pool metrics (thread-pool size, queue size) are > empty because metrics system is initialized too late. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12724: - Attachment: HIVE-12724.branch-1.2.patch > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng > last_modified_time 1450137141 > numFiles
[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-12724: - Attachment: HIVE-12724.4.patch patch 4 is exactly the same as ADDENDUM 1, attached just to trigger the QA run. > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng >
[jira] [Commented] (HIVE-12777) Add capability to restore session
[ https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098714#comment-15098714 ] Hive QA commented on HIVE-12777: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782213/HIVE-12777.13.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10009 tests executed *Failed tests:* {noformat} TestEmbeddedThriftBinaryCLIService - did not produce a TEST-*.xml file TestHWISessionManager - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAllowedCommands org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAuthorization1 org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testBlackListedUdfUsage org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testConfigWhiteList org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthBinary.testAuthorization1 org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthHttp.testAuthorization1 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6624/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6624/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6624/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 15 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782213 - PreCommit-HIVE-TRUNK-Build > Add capability to restore session > - > > Key: HIVE-12777 > URL: https://issues.apache.org/jira/browse/HIVE-12777 > Project: Hive > Issue Type: Improvement >Reporter: Rajat Khandelwal >Assignee: Rajat Khandelwal > Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, > HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, > HIVE-12777.13.patch > > > Extensions using Hive session handles should be able to restore the hive > session from the handle. > Apache Lens depends on a fork of hive and that fork has such a capability. > Relevant commit: > https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.
[ https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12353: -- Target Version/s: 1.3.0, 2.0.0 (was: 1.3.0) > When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it > should not. > --- > > Key: HIVE-12353 > URL: https://issues.apache.org/jira/browse/HIVE-12353 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Blocker > Attachments: HIVE-12353.2.patch, HIVE-12353.3.patch, HIVE-12353.patch > > > One of the things that this method does is delete entries from TXN_COMPONENTS > for partition that it was trying to compact. > This causes Aborted transactions in TXNS to become empty according to > CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be > deleted. > Once they are deleted, data that belongs to these txns is deemed committed... > We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) > states. We should also not delete then entry from markedCleaned() > We'll have separate process that cleans 'f' and 's' records after X minutes > (or after > N records for a given partition exist). > This allows SHOW COMPACTIONS to show some history info and how many times > compaction failed on a given partition (subject to retention interval) so > that we don't have to call markCleaned() on Compactor failures at the same > time preventing Compactor to constantly getting stuck on the same bad > partition/table. > Ideally we'd want to include END_TIME field. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12828) Update Spark version to 1.6
[ https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-12828: --- Attachment: HIVE-12828.2-spark.patch > Update Spark version to 1.6 > --- > > Key: HIVE-12828 > URL: https://issues.apache.org/jira/browse/HIVE-12828 > Project: Hive > Issue Type: Task > Components: Spark >Reporter: Xuefu Zhang >Assignee: Rui Li > Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, > HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, > mem.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12783) fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl
[ https://issues.apache.org/jira/browse/HIVE-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098690#comment-15098690 ] Sergey Shelukhin commented on HIVE-12783: - {quote} The real fix is for spark to either remove their dependence on the eclipse version or to shroud it so that it doesn't leak through to all of their users. {quote} Should there be a Spark JIRA for this? > fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl > - > > Key: HIVE-12783 > URL: https://issues.apache.org/jira/browse/HIVE-12783 > Project: Hive > Issue Type: Sub-task > Components: Test >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Owen O'Malley >Priority: Blocker > Attachments: HIVE-12783.patch, HIVE-12783.patch, HIVE-12783.patch > > > This includes > {code} > org.apache.hive.spark.client.TestSparkClient.testSyncRpc > org.apache.hive.spark.client.TestSparkClient.testJobSubmission > org.apache.hive.spark.client.TestSparkClient.testMetricsCollection > org.apache.hive.spark.client.TestSparkClient.testCounters > org.apache.hive.spark.client.TestSparkClient.testRemoteClient > org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles > org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob > org.apache.hive.spark.client.TestSparkClient.testErrorJob > org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse > org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse > {code} > all of them passed on my laptop. cc'ing [~szehon], [~xuefuz], could you > please take a look? Shall we ignore them? Thanks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9862) Vectorized execution corrupts timestamp values
[ https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-9862: --- Attachment: HIVE-9862.05.patch > Vectorized execution corrupts timestamp values > -- > > Key: HIVE-9862 > URL: https://issues.apache.org/jira/browse/HIVE-9862 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 1.0.0 >Reporter: Nathan Howell >Assignee: Matt McCline > Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, > HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch > > > Timestamps in the future (year 2250?) and before ~1700 are silently corrupted > in vectorized execution mode. Simple repro: > {code} > hive> DROP TABLE IF EXISTS test; > hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC; > hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59'); > hive> SET hive.vectorized.execution.enabled = false; > hive> SELECT MAX(ts) FROM test; > -12-31 23:59:59 > hive> SET hive.vectorized.execution.enabled = true; > hive> SELECT MAX(ts) FROM test; > 1816-03-30 05:56:07.066277376 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12832) RDBMS schema changes for HIVE-11388
[ https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098197#comment-15098197 ] Hive QA commented on HIVE-12832: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12782181/HIVE-12832.3.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10003 tests executed *Failed tests:* {noformat} TestHWISessionManager - did not produce a TEST-*.xml file TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6622/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6622/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6622/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 8 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12782181 - PreCommit-HIVE-TRUNK-Build > RDBMS schema changes for HIVE-11388 > --- > > Key: HIVE-12832 > URL: https://issues.apache.org/jira/browse/HIVE-12832 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Affects Versions: 1.0.0 >Reporter: Alan Gates >Assignee: Alan Gates > Attachments: HIVE-12382.patch, HIVE-12832.3.patch, > HIVE-12832.uber.2.patch, HIVE-12832.uber.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12270) Add DBTokenStore support to HS2 delegation token
[ https://issues.apache.org/jira/browse/HIVE-12270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098946#comment-15098946 ] Robert Kanter commented on HIVE-12270: -- Oozie needs this to work 100% of the time with secure HS2 HA. Otherwise, the Oozie server can get a delegation token from one HS2 server, but the actual query might run against another HS2 server, which won't recognize the HS2 delegation token. > Add DBTokenStore support to HS2 delegation token > > > Key: HIVE-12270 > URL: https://issues.apache.org/jira/browse/HIVE-12270 > Project: Hive > Issue Type: New Feature >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > DBTokenStore was initially introduced by HIVE-3255 in Hive-0.12 and it is > mainly for HMS delegation token. Later in Hive-0.13, the HS2 delegation token > support was introduced by HIVE-5155 but it used MemoryTokenStore as token > store. That the HIVE-9622 uses the shared RawStore (or HMSHandler) to access > the token/keys information in HMS DB directly from HS2 seems not the right > approach to support DBTokenStore in HS2. I think we should use > HiveMetaStoreClient in HS2 instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098967#comment-15098967 ] Eugene Koifman commented on HIVE-12724: --- +1 pending tests > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng > last_modified_time
[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12724: -- Priority: Blocker (was: Major) > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Blocker > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng > last_modified_time
[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12724: -- Affects Version/s: (was: 2.1.0) 1.3.0 > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Blocker > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_bywzheng
[jira] [Commented] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job
[ https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098971#comment-15098971 ] Eugene Koifman commented on HIVE-12724: --- made a Blocker since this is data loss > ACID: Major compaction fails to include the original bucket files into MR job > - > > Key: HIVE-12724 > URL: https://issues.apache.org/jira/browse/HIVE-12724 > Project: Hive > Issue Type: Bug > Components: Hive, Transactions >Affects Versions: 1.3.0, 2.0.0 >Reporter: Wei Zheng >Assignee: Wei Zheng >Priority: Blocker > Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, > HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, > HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch > > > How the problem happens: > * Create a non-ACID table > * Before non-ACID to ACID table conversion, we inserted row one > * After non-ACID to ACID table conversion, we inserted row two > * Both rows can be retrieved before MAJOR compaction > * After MAJOR compaction, row one is lost > {code} > hive> USE acidtest; > OK > Time taken: 0.77 seconds > hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment > STRING) > > CLUSTERED BY (regionkey) INTO 2 BUCKETS > > STORED AS ORC; > OK > Time taken: 0.179 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > transient_lastDdlTime 1450137040 > # Storage Information > SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde > InputFormat: org.apache.hadoop.hive.ql.io.orc.OrcInputFormat > OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat > Compressed: No > Num Buckets: 2 > Bucket Columns: [regionkey] > Sort Columns: [] > Storage Desc Params: > serialization.format1 > Time taken: 0.198 seconds, Fetched: 28 row(s) > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db; > Found 1 items > drwxr-xr-x - wzheng staff 68 2015-12-14 15:50 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states'); > WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the > future versions. Consider using a different execution engine (i.e. tez, > spark) or using Hive 1.X releases. > Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks determined at compile time: 2 > In order to change the average load for a reducer (in bytes): > set hive.exec.reducers.bytes.per.reducer= > In order to limit the maximum number of reducers: > set hive.exec.reducers.max= > In order to set a constant number of reducers: > set mapreduce.job.reduces= > Job running in-process (local Hadoop) > 2015-12-14 15:51:58,070 Stage-1 map = 100%, reduce = 100% > Ended Job = job_local73977356_0001 > Loading data to table acidtest.t1 > MapReduce Jobs Launched: > Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 SUCCESS > Total MapReduce CPU Time Spent: 0 msec > OK > Time taken: 2.825 seconds > hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1; > Found 2 items > -rwxr-xr-x 1 wzheng staff112 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0 > -rwxr-xr-x 1 wzheng staff472 2015-12-14 15:51 > /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0 > hive> SELECT * FROM t1; > OK > 1 USA 1 united states > Time taken: 0.434 seconds, Fetched: 1 row(s) > hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true'); > OK > Time taken: 0.071 seconds > hive> DESC FORMATTED t1; > OK > # col_namedata_type comment > nationkey int > name string > regionkey int > comment string > # Detailed Table Information > Database: acidtest > Owner:wzheng > CreateTime: Mon Dec 14 15:50:40 PST 2015 > LastAccessTime: UNKNOWN > Retention:0 > Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1 > Table Type: MANAGED_TABLE > Table Parameters: > COLUMN_STATS_ACCURATE false > last_modified_by