[jira] [Commented] (HIVE-12274) Increase width of columns used for general configuration in the metastore.

2016-01-14 Thread Carter Shanklin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098471#comment-15098471
 ] 

Carter Shanklin commented on HIVE-12274:


[~teabot] can you post a sample DDL? What serde are you using to read? Is it 
Hive-JSON-Serde?

> Increase width of columns used for general configuration in the metastore.
> --
>
> Key: HIVE-12274
> URL: https://issues.apache.org/jira/browse/HIVE-12274
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.0.0
>Reporter: Elliot West
>Assignee: Sushanth Sowmyan
>  Labels: metastore
>
> This issue is very similar in principle to HIVE-1364. We are hitting a limit 
> when processing JSON data that has a large nested schema. The struct 
> definition is truncated when inserted into the metastore database column 
> {{COLUMNS_V2.YPE_NAME}} as it is greater than 4000 characters in length.
> Given that the purpose of these columns is to hold very loosely defined 
> configuration values it seems rather limiting to impose such a relatively low 
> length bound. One can imagine that valid use cases will arise where 
> reasonable parameter/property values exceed the current limit. Can these 
> columns not use CLOB-like types as for example as used by 
> {{TBLS.VIEW_EXPANDED_TEXT}}? It would seem that suitable type equivalents 
> exist for all targeted database platforms:
> * MySQL: {{mediumtext}}
> * Postgres: {{text}}
> * Oracle: {{CLOB}}
> * Derby: {{LONG VARCHAR}}
> I'd suggest that the candidates for type change are:
> * {{COLUMNS_V2.TYPE_NAME}}
> * {{TABLE_PARAMS.PARAM_VALUE}}
> * {{SERDE_PARAMS.PARAM_VALUE}}
> * {{SD_PARAMS.PARAM_VALUE}}
> Finally, will this limitation persist in the work resulting from HIVE-9452?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12847) ORC file footer cache should be memory sensitive

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097852#comment-15097852
 ] 

Hive QA commented on HIVE-12847:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782173/HIVE-12847.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10016 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler.org.apache.hive.hcatalog.hbase.TestPigHBaseStorageHandler
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6620/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6620/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6620/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782173 - PreCommit-HIVE-TRUNK-Build

> ORC file footer cache should be memory sensitive
> 
>
> Key: HIVE-12847
> URL: https://issues.apache.org/jira/browse/HIVE-12847
> Project: Hive
>  Issue Type: Improvement
>  Components: File Formats, ORC
>Affects Versions: 1.2.1
>Reporter: Nemon Lou
>Assignee: Nemon Lou
> Attachments: HIVE-12847.patch
>
>
> The size based footer cache can not control memory usage properly.
> Having seen a HiveServer2 hang due to ORC file footer cache taking up too 
> much heap memory.
> A simple query like "select * from orc_table limit 1" can make HiveServer2 
> hang.
> The input table has about 1000 ORC files and each ORC file owns about 2500 
> stripes.
> {noformat}
>  num #instances #bytes  class name
> --
>1: 21465360125758432120  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$ColumnStatistics
>3: 122233301 8800797672  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StringStatistics
>5:  89439001 6439608072  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$IntegerStatistics
>7:   2981300  262354400  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeInformation
>9:   2981300  143102400  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$StripeStatistics
>   12:   2983691   71608584  
> org.apache.hadoop.hive.ql.io.orc.ReaderImpl$StripeInformationImpl
>   15: 809297121752  
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type
>   17:1032825783792  
> org.apache.hadoop.mapreduce.lib.input.FileSplit
>   20: 516413305024  
> org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit
>   21: 516413305024  org.apache.hadoop.hive.ql.io.orc.OrcSplit
>   31: 1 413152  
> [Lorg.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit;  
>  100:  1122  26928  org.apache.hadoop.hive.ql.io.orc.Metadata 
>  
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12864) StackOverflowError parsing queries with very large predicates

2016-01-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12864:
---
Attachment: HIVE-12864.01.patch

> StackOverflowError parsing queries with very large predicates
> -
>
> Key: HIVE-12864
> URL: https://issues.apache.org/jira/browse/HIVE-12864
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12864.01.patch, HIVE-12864.patch
>
>
> We have seen that queries with very large predicates might fail with the 
> following stacktrace:
> {noformat}
> 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in 
> thread "main" java.lang.StackOverflowError
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 

[jira] [Updated] (HIVE-12864) StackOverflowError parsing queries with very large predicates

2016-01-14 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-12864:
---
Attachment: HIVE-12864.01.patch

> StackOverflowError parsing queries with very large predicates
> -
>
> Key: HIVE-12864
> URL: https://issues.apache.org/jira/browse/HIVE-12864
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12864.01.patch, HIVE-12864.patch
>
>
> We have seen that queries with very large predicates might fail with the 
> following stacktrace:
> {noformat}
> 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in 
> thread "main" java.lang.StackOverflowError
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 

[jira] [Updated] (HIVE-12874) dynamic partition insert project wrong column

2016-01-14 Thread bin wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

bin wang updated HIVE-12874:

Affects Version/s: (was: 0.14.1)
   1.1.0

> dynamic partition insert project wrong column
> -
>
> Key: HIVE-12874
> URL: https://issues.apache.org/jira/browse/HIVE-12874
> Project: Hive
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
> Environment: hive 1.1.0-cdh5.4.8
>Reporter: bin wang
>Assignee: Alan Gates
>
> We have two table as below:
> create table  test (
> id bigint comment ' id',
> )
> PARTITIONED BY(etl_dt string)
> STORED AS ORC;
> create table  test1 (
> id bigint
> start_time int,
> )
> PARTITIONED BY(etl_dt string)
> STORED AS ORC;
> we use sql like below to import rows from test1 to test:
> insert overwrite table test PARTITION(etl_dt)
> select id
> ,from_unixtime(start_time,'-MM-dd') as  etl_dt
>   
>  
> from test1
> where test1.etl_dt='2016-01-12';
> but it behave wrong, it use test1.etl_dt as the test's partition value, not 
> the 'etl_dt' in select.
> We think it's a bug, anyone to fix it? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12810) Hive select fails - java.lang.IndexOutOfBoundsException

2016-01-14 Thread Matjaz Skerjanec (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15097909#comment-15097909
 ] 

Matjaz Skerjanec commented on HIVE-12810:
-

Hello,
Thank you.
yes I tried that and many other possible options yesteraday with no success. 
Since my hdfs is still empty I decided to reinstall everything - I have to get 
system up and running asap.

Will go for last update now hdp 2.3.4.0 with a hope that problem with select 
will be solved.

Will come back later with results...



> Hive select fails - java.lang.IndexOutOfBoundsException
> ---
>
> Key: HIVE-12810
> URL: https://issues.apache.org/jira/browse/HIVE-12810
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline, CLI
>Affects Versions: 1.2.1
> Environment: HDP 2.3.0
>Reporter: Matjaz Skerjanec
>
> Hadoop HDP 2.3 (Hadoop 2.7.1.2.3.0.0-2557)
> Hive 1.2.1.2.3.0.0-2557
> We are loading orc tables in hive with sqoop from hana db.
> Everything works fine, count and select with ie. 16.000.000 entries in the 
> table, but when we load 34.000.000 entries query select does not work anymore 
> and we get the followong error (select count(*) is working in both cases):
> {code}
> select count(*) from tablename;
> INFO  : Session is already open
> INFO  :
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1452091205505_0032)
> INFO  : Map 1: -/-  Reducer 2: 0/1
> INFO  : Map 1: 0/96 Reducer 2: 0/1
> .
> .
> .
> INFO  : Map 1: 96/96Reducer 2: 0(+1)/1
> INFO  : Map 1: 96/96Reducer 2: 1/1
> +---+--+
> |_c0|
> +---+--+
> | 34146816  |
> +---+--+
> 1 row selected (45.455 seconds)
> {code}
> {code}
> "select originalxml from tablename where messageid = 
> 'd0b3c872-435d-499b-a65c-619d9e732bbb'
> 0: jdbc:hive2://10.4.zz.xx:1/default> select originalxml from tablename 
> where messageid = 'd0b3c872-435d-499b-a65c-619d9e732bbb';
> INFO  : Session is already open
> INFO  : Tez session was closed. Reopening...
> INFO  : Session re-established.
> INFO  :
> INFO  : Status: Running (Executing on YARN cluster with App id 
> application_1452091205505_0032)
> INFO  : Map 1: -/-
> ERROR : Status: Failed
> ERROR : Vertex failed, vertexName=Map 1, 
> vertexId=vertex_1452091205505_0032_1_00, diagnostics=[Vertex 
> vertex_1452091205505_0032_1_00 [Map 1] killed/failed due 
> to:ROOT_INPUT_INIT_FAILURE, Vertex Input: tablename initializer failed, 
> vertex=vertex_1452091205505_0032_1_00 [Map 1], java.lang.RuntimeException: 
> serious problem
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:306)
> at 
> org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:408)
> at 
> org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:245)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:239)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:239)
> at 
> org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:226)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.IndexOutOfBoundsException: Index: 0
> at java.util.concurrent.FutureTask.report(FutureTask.java:122)
> at java.util.concurrent.FutureTask.get(FutureTask.java:192)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1016)
> ... 15 more
> Caused by: java.lang.IndexOutOfBoundsException: Index: 0
> at java.util.Collections$EmptyList.get(Collections.java:4454)
> at 
> org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
> at 
> 

[jira] [Commented] (HIVE-12808) Logical PPD: Push filter clauses through PTF(Windowing) into TS

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098017#comment-15098017
 ] 

Hive QA commented on HIVE-12808:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782179/HIVE-12808.01.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 10005 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-tez_joins_explain.q-vector_decimal_aggregate.q-vector_groupby_mapjoin.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lineage3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_windowing1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ppd_windowing2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ptfgroupbyjoin
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_unqualcolumnrefs
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestPerfCliDriver.testPerfCliDriver_query70
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_subquery_in
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.spark.client.rpc.TestRpc.testClientTimeout
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6621/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6621/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6621/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782179 - PreCommit-HIVE-TRUNK-Build

> Logical PPD: Push filter clauses through PTF(Windowing) into TS
> ---
>
> Key: HIVE-12808
> URL: https://issues.apache.org/jira/browse/HIVE-12808
> Project: Hive
>  Issue Type: Bug
>  Components: Logical Optimizer
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Gopal V
>Assignee: Laljo John Pullokkaran
> Attachments: HIVE-12808.01.patch
>
>
> Simplified repro case of [HCC 
> #8880|https://community.hortonworks.com/questions/8880/hive-on-tez-pushdown-predicate-doesnt-work-in-part.html],
>  with the slow query showing the push-down miss. 
> And the manually rewritten query to indicate the expected one.
> Part of the problem could be the window range not being split apart for PPD, 
> but the FIL is not pushed down even if the rownum filter is removed.
> {code}
> create temporary table positions (regionid string, id bigint, deviceid 
> string, ts string);
> insert into positions values('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', 
> 1422792010, '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-01'),
> ('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', 1422792010, 
> '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-01'),
> ('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', 1422792010, 
> '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-02'),
> ('1d6a0be1-6366-4692-9597-ebd5cd0f01d1', 1422792010, 
> '6c5d1a30-2331-448b-a726-a380d6b3a432', '2016-01-02');
> -- slow query
> explain
> WITH t1 AS 
> ( 
>  SELECT   *, 
>   Row_number() over ( PARTITION BY regionid, id, deviceid 
> ORDER BY ts DESC) AS rownos
>  FROM positions ), 
> latestposition as ( 
>SELECT * 
>FROM   t1 
>WHERE  rownos = 1) 
> SELECT * 
> FROM   latestposition 
> WHERE  regionid='1d6a0be1-6366-4692-9597-ebd5cd0f01d1' 
> ANDid=1422792010 
> ANDdeviceid='6c5d1a30-2331-448b-a726-a380d6b3a432';
> -- fast query
> explain
> WITH t1 AS 
> ( 
>  SELECT   *, 
>   Row_number() over ( PARTITION BY regionid, id, deviceid 
> ORDER BY ts DESC) AS rownos
>  FROM positions 
>  WHERE  regionid='1d6a0be1-6366-4692-9597-ebd5cd0f01d1' 
>  ANDid=1422792010 
>  AND

[jira] [Commented] (HIVE-12853) LLAP: localize permanent UDF jars to daemon

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098345#comment-15098345
 ] 

Hive QA commented on HIVE-12853:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782196/HIVE-12853.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 10018 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_add_jar_pfile
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_root_dir_external_table
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_remote_script
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_udf_using
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestMultiSessionsHS2WithLocalClusterSpark.testSparkQuery
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6623/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6623/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6623/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782196 - PreCommit-HIVE-TRUNK-Build

> LLAP: localize permanent UDF jars to daemon
> ---
>
> Key: HIVE-12853
> URL: https://issues.apache.org/jira/browse/HIVE-12853
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12853.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12828:
---
Attachment: HIVE-12828.2-spark.patch

Hi [~lirui], the file is updated and I verified that the test passed. 
Reattached the patch #2 to give another riun. Thanks.

> Update Spark version to 1.6
> ---
>
> Key: HIVE-12828
> URL: https://issues.apache.org/jira/browse/HIVE-12828
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, 
> HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, mem.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12827) Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign needs explicit isNull[offset] modification

2016-01-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-12827:
--

Assignee: Gopal V

> Vectorization: VectorCopyRow/VectorAssignRow/VectorDeserializeRow assign 
> needs explicit isNull[offset] modification
> ---
>
> Key: HIVE-12827
> URL: https://issues.apache.org/jira/browse/HIVE-12827
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Gopal V
>
> Some scenarios do set Double.NaN instead of isNull=true, but all types aren't 
> consistent.
> Examples of un-set isNull for the valid values are 
> {code}
>   private class FloatReader extends AbstractDoubleReader {
> FloatReader(int columnIndex) {
>   super(columnIndex);
> }
> @Override
> void apply(VectorizedRowBatch batch, int batchIndex) throws IOException {
>   DoubleColumnVector colVector = (DoubleColumnVector) 
> batch.cols[columnIndex];
>   if (deserializeRead.readCheckNull()) {
> VectorizedBatchUtil.setNullColIsNullValue(colVector, batchIndex);
>   } else {
> float value = deserializeRead.readFloat();
> colVector.vector[batchIndex] = (double) value;
>   }
> }
>   }
> {code}
> {code}
>   private class DoubleCopyRow extends CopyRow {
> DoubleCopyRow(int inColumnIndex, int outColumnIndex) {
>   super(inColumnIndex, outColumnIndex);
> }
> @Override
> void copy(VectorizedRowBatch inBatch, int inBatchIndex, 
> VectorizedRowBatch outBatch, int outBatchIndex) {
>   DoubleColumnVector inColVector = (DoubleColumnVector) 
> inBatch.cols[inColumnIndex];
>   DoubleColumnVector outColVector = (DoubleColumnVector) 
> outBatch.cols[outColumnIndex];
>   if (inColVector.isRepeating) {
> if (inColVector.noNulls || !inColVector.isNull[0]) {
>   outColVector.vector[outBatchIndex] = inColVector.vector[0];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   } else {
> if (inColVector.noNulls || !inColVector.isNull[inBatchIndex]) {
>   outColVector.vector[outBatchIndex] = 
> inColVector.vector[inBatchIndex];
> } else {
>   VectorizedBatchUtil.setNullColIsNullValue(outColVector, 
> outBatchIndex);
> }
>   }
> }
>   }
> {code}
> {code}
>  private static abstract class VectorDoubleColumnAssign
> extends VectorColumnAssignVectorBase {
> protected void assignDouble(double value, int destIndex) {
>   outCol.vector[destIndex] = value;
> }
>   }
> {code}
> The pattern to imitate would be the earlier code from VectorBatchUtil
> {code}
> case DOUBLE: {
>   DoubleColumnVector dcv = (DoubleColumnVector) batch.cols[offset + 
> colIndex];
>   if (writableCol != null) {
> dcv.vector[rowIndex] = ((DoubleWritable) writableCol).get();
> dcv.isNull[rowIndex] = false;
>   } else {
> dcv.vector[rowIndex] = Double.NaN;
> setNullColIsNullValue(dcv, rowIndex);
>   }
> }
>   break;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks

2016-01-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V reassigned HIVE-12826:
--

Assignee: Gopal V  (was: Matt McCline)

> Vectorization: VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks

2016-01-14 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101120#comment-15101120
 ] 

Gopal V commented on HIVE-12826:


[~mmccline]: this forces isRepeating=true to always check isNull[0] in all 
UDAFs in the patch. Please review.

> Vectorization: VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12826.1.patch
>
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12758) Parallel compilation: Operator::resetId() is not thread-safe

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12758:

Attachment: HIVE-12758.03.patch

Fixing a couple more spots where the context was null, updating q files

> Parallel compilation: Operator::resetId() is not thread-safe
> 
>
> Key: HIVE-12758
> URL: https://issues.apache.org/jira/browse/HIVE-12758
> Project: Hive
>  Issue Type: Bug
>  Components: Physical Optimizer
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12758.01.patch, HIVE-12758.02.patch, 
> HIVE-12758.03.patch, HIVE-12758.03.patch, HIVE-12758.patch
>
>
> {code}
>   private static AtomicInteger seqId;
> ...
>   public Operator() {
> this(String.valueOf(seqId.getAndIncrement()));
>   }
>   public static void resetId() {
> seqId.set(0);
>   }
> {code}
> Potential race-condition.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12220) LLAP: Usability issues with hive.llap.io.cache.orc.size

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12220:

Description: 
In the llap-daemon site you need to set, among other things,

llap.daemon.memory.per.instance.mb
and
hive.llap.io.cache.orc.size

The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, 
initially I entered the value in MB rather than in bytes. Operator error you 
could say but I look at this as a fraction of the other value which is in mb.

Second, is this really tied to ORC? E.g. when we have the vectorized text 
reader will this data be cached as well? Or might it be in the future?

I would like to propose instead using hive.llap.io.cache.size.mb for this 
setting.


  was:
In the llap-daemon site you need to set, among other things,

llap.daemon.memory.per.instance.mb
and
hive.llap.io.cache.orc.size

The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, 
initially I entered the value in MB rather than in bytes. Operator error you 
could say but I look at this as a fraction of the other value which is in mb.

Second, is this really tied to ORC? E.g. when we have the vectorized text 
reader will this data be cached as well? Or might it be in the future?

I would like to propose instead using hive.llap.io.cache.size.mb for this 
setting.

NO PRECOMMIT TESTS


> LLAP: Usability issues with hive.llap.io.cache.orc.size
> ---
>
> Key: HIVE-12220
> URL: https://issues.apache.org/jira/browse/HIVE-12220
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Carter Shanklin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12220.01.patch, HIVE-12220.02.patch, 
> HIVE-12220.patch, HIVE-12220.tmp.patch
>
>
> In the llap-daemon site you need to set, among other things,
> llap.daemon.memory.per.instance.mb
> and
> hive.llap.io.cache.orc.size
> The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, 
> initially I entered the value in MB rather than in bytes. Operator error you 
> could say but I look at this as a fraction of the other value which is in mb.
> Second, is this really tied to ORC? E.g. when we have the vectorized text 
> reader will this data be cached as well? Or might it be in the future?
> I would like to propose instead using hive.llap.io.cache.size.mb for this 
> setting.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.

2016-01-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099215#comment-15099215
 ] 

Alan Gates commented on HIVE-12353:
---

With the commit of HIVE-12832 to master and branch-2.0 I think all of the 
schema changes needed for this are in.

> When Compactor fails it calls CompactionTxnHandler.markedCleaned().  it 
> should not.
> ---
>
> Key: HIVE-12353
> URL: https://issues.apache.org/jira/browse/HIVE-12353
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12353.2.patch, HIVE-12353.3.patch, HIVE-12353.patch
>
>
> One of the things that this method does is delete entries from TXN_COMPONENTS 
> for partition that it was trying to compact.
> This causes Aborted transactions in TXNS to become empty according to
> CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be 
> deleted.  
> Once they are deleted, data that belongs to these txns is deemed committed...
> We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) 
> states.  We should also not delete then entry from markedCleaned()
> We'll have separate process that cleans 'f' and 's' records after X minutes 
> (or after > N records for a given partition exist).
> This allows SHOW COMPACTIONS to show some history info and how many times 
> compaction failed on a given partition (subject to retention interval) so 
> that we don't have to call markCleaned() on Compactor failures at the same 
> time preventing Compactor to constantly getting stuck on the same bad 
> partition/table.
> Ideally we'd want to include END_TIME field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12672) Record last updated time for partition and table

2016-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-12672.
---
Resolution: Won't Fix

This was needed for HIVE-12669.  As that JIRA is resolved won't fix, this won't 
won't be fixed either.

> Record last updated time for partition and table
> 
>
> Key: HIVE-12672
> URL: https://issues.apache.org/jira/browse/HIVE-12672
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> Currently tables and partitions do not record when they were last updated.  
> This makes it hard for the system to know where to look for recently changed 
> tables and partitions that may need analyzed or other processing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-12669) Need a way to analyze tables in the background

2016-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates resolved HIVE-12669.
---
Resolution: Won't Fix

Given the work that's going on in HIVE-11160 and HIVE-12763 I don't think it 
makes sense to continue down this path.  These JIRAs will lay the groundwork 
for auto-gathering stats on data as it is inserted rather than having a 
background process do the work.

> Need a way to analyze tables in the background
> --
>
> Key: HIVE-12669
> URL: https://issues.apache.org/jira/browse/HIVE-12669
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
>
> Currently analyze must be run by users manually.  It would be useful to have 
> an option for certain or all tables to be automatically analyzed on a regular 
> basis.  The system can do this in the background as a metastore thread 
> (similar to the compactor threads).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12429) Switch default Hive authorization to SQLStandardAuth in 2.0

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101046#comment-15101046
 ] 

Hive QA commented on HIVE-12429:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782383/HIVE-12429.17.patch

{color:green}SUCCESS:{color} +1 due to 54 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10016 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6629/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6629/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6629/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782383 - PreCommit-HIVE-TRUNK-Build

> Switch default Hive authorization to SQLStandardAuth in 2.0
> ---
>
> Key: HIVE-12429
> URL: https://issues.apache.org/jira/browse/HIVE-12429
> Project: Hive
>  Issue Type: Task
>  Components: Authorization, Security
>Affects Versions: 2.0.0
>Reporter: Alan Gates
>Assignee: Daniel Dai
> Attachments: HIVE-12429.1.patch, HIVE-12429.10.patch, 
> HIVE-12429.11.patch, HIVE-12429.12.patch, HIVE-12429.13.patch, 
> HIVE-12429.14.patch, HIVE-12429.15.patch, HIVE-12429.16.patch, 
> HIVE-12429.17.patch, HIVE-12429.2.patch, HIVE-12429.3.patch, 
> HIVE-12429.4.patch, HIVE-12429.5.patch, HIVE-12429.6.patch, 
> HIVE-12429.7.patch, HIVE-12429.8.patch, HIVE-12429.9.patch
>
>
> Hive's default authorization is not real security, as it does not secure a 
> number of features and anyone can grant access to any object to any user.  We 
> should switch the default to SQLStandardAuth, which provides real 
> authentication.
> As this is a backwards incompatible change this was hard to do previously, 
> but 2.0 gives us a place to do this type of change.
> By default authorization will still be off, as there are a few other things 
> to set when turning on authorization (such as the list of admin users).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12853) LLAP: localize permanent UDF jars to daemon

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101086#comment-15101086
 ] 

Sergey Shelukhin commented on HIVE-12853:
-

I don't think get method is actually needed... looks like FunctionRegistry is 
only used at compile time, so on the daemon only adding and removing will need 
to be tracked.

> LLAP: localize permanent UDF jars to daemon
> ---
>
> Key: HIVE-12853
> URL: https://issues.apache.org/jira/browse/HIVE-12853
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12853.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-8065) Support HDFS encryption functionality on Hive

2016-01-14 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-8065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu resolved HIVE-8065.

Resolution: Fixed

Close this issue since it has already been merged into the upstream.

> Support HDFS encryption functionality on Hive
> -
>
> Key: HIVE-8065
> URL: https://issues.apache.org/jira/browse/HIVE-8065
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 0.13.1
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>
> The new encryption support on HDFS makes Hive incompatible and unusable when 
> this feature is used.
> HDFS encryption is designed so that an user can configure different 
> encryption zones (or directories) for multi-tenant environments. An 
> encryption zone has an exclusive encryption key, such as AES-128 or AES-256. 
> Because of security compliance, the HDFS does not allow to move/rename files 
> between encryption zones. Renames are allowed only inside the same encryption 
> zone. A copy is allowed between encryption zones.
> See HDFS-6134 for more details about HDFS encryption design.
> Hive currently uses a scratch directory (like /tmp/$user/$random). This 
> scratch directory is used for the output of intermediate data (between MR 
> jobs) and for the final output of the hive query which is later moved to the 
> table directory location.
> If Hive tables are in different encryption zones than the scratch directory, 
> then Hive won't be able to renames those files/directories, and it will make 
> Hive unusable.
> To handle this problem, we can change the scratch directory of the 
> query/statement to be inside the same encryption zone of the table directory 
> location. This way, the renaming process will be successful. 
> Also, for statements that move files between encryption zones (i.e. LOAD 
> DATA), a copy may be executed instead of a rename. This will cause an 
> overhead when copying large data files, but it won't break the encryption on 
> Hive.
> Another security thing to consider is when using joins selects. If Hive joins 
> different tables with different encryption key strengths, then the results of 
> the select might break the security compliance of the tables. Let's say two 
> tables with 128 bits and 256 bits encryption are joined, then the temporary 
> results might be stored in the 128 bits encryption zone. This will conflict 
> with the table encrypted with 256 bits temporary.
> To fix this, Hive should be able to select the scratch directory that is more 
> secured/encrypted in order to save the intermediate data temporary with no 
> compliance issues.
> For instance:
> {noformat}
> SELECT * FROM table-aes128 t1 JOIN table-aes256 t2 WHERE t1.id == t2.id;
> {noformat}
> - This should use a scratch directory (or staging directory) inside the 
> table-aes256 table location.
> {noformat}
> INSERT OVERWRITE TABLE table-unencrypted SELECT * FROM table-aes1;
> {noformat}
> - This should use a scratch directory inside the table-aes1 location.
> {noformat}
> FROM table-unencrypted
> INSERT OVERWRITE TABLE table-aes128 SELECT id, name
> INSERT OVERWRITE TABLE table-aes256 SELECT id, name
> {noformat}
> - This should use a scratch directory on each of the tables locations.
> - The first SELECT will have its scratch directory on table-aes128 directory.
> - The second SELECT will have its scratch directory on table-aes256 directory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101125#comment-15101125
 ] 

Rui Li commented on HIVE-12828:
---

Looked at the log and error is
{noformat}
2016-01-14T14:38:11,889   - 16/01/14 14:38:11 WARN TaskSetManager: Lost task 
0.0 in stage 136.0 (TID 238, ip-10-233-128-9.us-west-1.compute.internal): 
java.io.IOException: java.lang.reflect.InvocationTargetException
2016-01-14T14:38:11,889   - at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
2016-01-14T14:38:11,889   - at 
org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
2016-01-14T14:38:11,890   - at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:269)
2016-01-14T14:38:11,890   - at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.(HadoopShimsSecure.java:216)
2016-01-14T14:38:11,890   - at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileInputFormatShim.getRecordReader(HadoopShimsSecure.java:343)
2016-01-14T14:38:11,890   - at 
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getRecordReader(CombineHiveInputFormat.java:680)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:237)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:208)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:101)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:87)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
2016-01-14T14:38:11,890   - at 
org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
2016-01-14T14:38:11,890   - at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73)
2016-01-14T14:38:11,890   - at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
2016-01-14T14:38:11,890   - at 
org.apache.spark.scheduler.Task.run(Task.scala:89)
2016-01-14T14:38:11,890   - at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
2016-01-14T14:38:11,890   - at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2016-01-14T14:38:11,890   - at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2016-01-14T14:38:11,890   - at java.lang.Thread.run(Thread.java:744)
2016-01-14T14:38:11,890   - Caused by: 
java.lang.reflect.InvocationTargetException
2016-01-14T14:38:11,890   - at 
sun.reflect.GeneratedConstructorAccessor29.newInstance(Unknown Source)
2016-01-14T14:38:11,890   - at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
2016-01-14T14:38:11,890   - at 
java.lang.reflect.Constructor.newInstance(Constructor.java:526)
2016-01-14T14:38:11,890   - at 
org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.initNextRecordReader(HadoopShimsSecure.java:255)
2016-01-14T14:38:11,890   - ... 21 more
2016-01-14T14:38:11,891   - Caused by: java.lang.NoSuchMethodError: 
org.apache.parquet.schema.Types$MessageTypeBuilder.addFields([Lorg/apache/parquet/schema/Type;)Lorg/apache/parquet/schema/Types$BaseGroupBuilder;
2016-01-14T14:38:11,891   - at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.getSchemaByName(DataWritableReadSupport.java:160)
2016-01-14T14:38:11,891   - at 
org.apache.hadoop.hive.ql.io.parquet.read.DataWritableReadSupport.init(DataWritableReadSupport.java:223)
2016-01-14T14:38:11,891   - at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:248)
2016-01-14T14:38:11,891   - at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:94)
2016-01-14T14:38:11,891   - at 
org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.(ParquetRecordReaderWrapper.java:80)
2016-01-14T14:38:11,891   - at 
org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:72)
2016-01-14T14:38:11,891   - at 
org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.(CombineHiveRecordReader.java:67)

[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2016-01-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101122#comment-15101122
 ] 

Eugene Koifman commented on HIVE-12352:
---

committed to 1.3 and 2.0

> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Fix For: 1.3.0, 2.0.0
>
> Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch
>
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'";
> The logic after that will delete relevant data from TXN_COMPONENTS and if one 
> of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). 
>  At that point any metadata about an Aborted txn is gone and the system will 
> think it's committed.
> HWM in this case would be (in ValidCompactorTxnList)
> if(minOpenTxn > 0)
> min(highWaterMark, minOpenTxn) 
> else 
> highWaterMark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12802) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver.vector_join_filters.q failure

2016-01-14 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12802:
-
Description: 
Discovered as part of running :
mvn test -Dtest=TestMiniTezCliDriver -Dqfile_regex=vector.* 
-Dhive.cbo.returnpath.hiveop=true -Dtest.output.overwrite=true

{code}
SELECT sum(hash(a.key,a.value,b.key,b.value)) from myinput1 a LEFT OUTER JOIN 
myinput1 b ON (a.value=b.value AND a.key > 40 AND a.value > 50 AND a.key = 
a.value AND b.key > 40 AND b.value > 50 AND b.key = b.value) RIGHT OUTER JOIN 
myinput1 c ON (b.value=c.value AND c.key > 40 AND c.value > 50 AND c.key = 
c.value AND b.key > 40 AND b.value > 50 AND b.key = b.value)
{code}

{code}
2016-01-07T11:16:06,198 ERROR [657fd759-7643-467b-9bd0-17cb4958cb69 main[]]: 
parse.CalcitePlanner (CalcitePlanner.java:genOPTree(309)) - CBO failed, 
skipping CBO.
java.lang.IndexOutOfBoundsException: index (10) must be less than size (6)
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:305) 
~[guava-14.0.1.jar:?]
at 
com.google.common.base.Preconditions.checkElementIndex(Preconditions.java:284) 
~[guava-14.0.1.jar:?]
at 
com.google.common.collect.RegularImmutableList.get(RegularImmutableList.java:81)
 ~[guava-14.0.1.jar:?]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitInputRef(ExprNodeConverter.java:109)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitInputRef(ExprNodeConverter.java:79)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.calcite.rex.RexInputRef.accept(RexInputRef.java:112) 
~[calcite-core-1.5.0.jar:1.5.0]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:128)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.ExprNodeConverter.visitCall(ExprNodeConverter.java:79)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.calcite.rex.RexCall.accept(RexCall.java:107) 
~[calcite-core-1.5.0.jar:1.5.0]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.convertToExprNode(HiveOpConverter.java:1153)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.translateJoin(HiveOpConverter.java:381)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:313)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:164)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:268)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:162)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.visit(HiveOpConverter.java:397)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.dispatch(HiveOpConverter.java:181)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.optimizer.calcite.translator.HiveOpConverter.convert(HiveOpConverter.java:154)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.getOptimizedHiveOPDag(CalcitePlanner.java:688)
 ~[hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.genOPTree(CalcitePlanner.java:266)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10094)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:231)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at 
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
 [hive-exec-2.1.0-SNAPSHOT.jar:2.1.0-SNAPSHOT]
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471) 
[hive-exec-2.1.0-SNAPSHOT.jar:?]
{code}

  was:
Discovered as part of running :
mvn test -Dtest=TestMiniTezCliDriver -Dqfile_regex=vector.* 
-Dhive.cbo.returnpath.hiveop=true -Dtest.output.overwrite=true

{code}
2016-01-07T11:16:06,198 ERROR [657fd759-7643-467b-9bd0-17cb4958cb69 main[]]: 
parse.CalcitePlanner (CalcitePlanner.java:genOPTree(309)) - CBO failed, 
skipping CBO.

[jira] [Updated] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks

2016-01-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12826:
---
Attachment: HIVE-12826.1.patch

> Vectorization: VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12826.1.patch
>
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101152#comment-15101152
 ] 

Xuefu Zhang commented on HIVE-12828:


I'm not sure if the test cleanup /thirdparty dir. I tested and worked for me. 
Could you give a try?

> Update Spark version to 1.6
> ---
>
> Key: HIVE-12828
> URL: https://issues.apache.org/jira/browse/HIVE-12828
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, 
> HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, 
> mem.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9862) Vectorized execution corrupts timestamp values

2016-01-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9862:
---
Attachment: HIVE-9862.06.patch

> Vectorized execution corrupts timestamp values
> --
>
> Key: HIVE-9862
> URL: https://issues.apache.org/jira/browse/HIVE-9862
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.0.0
>Reporter: Nathan Howell
>Assignee: Matt McCline
> Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, 
> HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch, HIVE-9862.06.patch
>
>
> Timestamps in the future (year 2250?) and before ~1700 are silently corrupted 
> in vectorized execution mode. Simple repro:
> {code}
> hive> DROP TABLE IF EXISTS test;
> hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC;
> hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59');
> hive> SET hive.vectorized.execution.enabled = false;
> hive> SELECT MAX(ts) FROM test;
> -12-31 23:59:59
> hive> SET hive.vectorized.execution.enabled = true;
> hive> SELECT MAX(ts) FROM test;
> 1816-03-30 05:56:07.066277376
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099092#comment-15099092
 ] 

Hive QA commented on HIVE-12828:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782359/HIVE-12828.2-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9866 tests executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver_parquet_join
org.apache.hadoop.hive.metastore.TestHiveMetaStorePartitionSpecs.testGetPartitionSpecs_WithAndWithoutPartitionGrouping
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1030/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/1030/console
Test logs: 
http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-1030/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782359 - PreCommit-HIVE-SPARK-Build

> Update Spark version to 1.6
> ---
>
> Key: HIVE-12828
> URL: https://issues.apache.org/jira/browse/HIVE-12828
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, 
> HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, 
> mem.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12657:

Attachment: HIVE-12657.patch

Simple patch. The relevant change is one-line, changing RR to LinkedHashMap for 
tables; the rest is comments, some cleanup, and out file updates.

[~prasanth_j] [~pxiong] can you take a look?

> selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
> ---
>
> Key: HIVE-12657
> URL: https://issues.apache.org/jira/browse/HIVE-12657
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12657.patch
>
>
> Encountered this issue when analysing test failures of HIVE-12609. 
> selectDistinctStar.q produces the following diff when I ran with java version 
> "1.7.0_55" and java version "1.8.0_60"
> {code}
> < 128   val_128 128 
> ---
> > 128   128 val_128
> 1770c1770
> < 224   val_224 224 
> ---
> > 224   224 val_224
> 1776c1776
> < 369   val_369 369 
> ---
> > 369   369 val_369
> 1799,1810c1799,1810
> < 146   val_146 146 val_146 146 val_146 2008-04-08  11
> < 150   val_150 150 val_150 150 val_150 2008-04-08  11
> < 213   val_213 213 val_213 213 val_213 2008-04-08  11
> < 238   val_238 238 val_238 238 val_238 2008-04-08  11
> < 255   val_255 255 val_255 255 val_255 2008-04-08  11
> < 273   val_273 273 val_273 273 val_273 2008-04-08  11
> < 278   val_278 278 val_278 278 val_278 2008-04-08  11
> < 311   val_311 311 val_311 311 val_311 2008-04-08  11
> < 401   val_401 401 val_401 401 val_401 2008-04-08  11
> < 406   val_406 406 val_406 406 val_406 2008-04-08  11
> < 66val_66  66  val_66  66  val_66  2008-04-08  11
> < 98val_98  98  val_98  98  val_98  2008-04-08  11
> ---
> > 146   val_146 2008-04-08  11  146 val_146 146 val_146
> > 150   val_150 2008-04-08  11  150 val_150 150 val_150
> > 213   val_213 2008-04-08  11  213 val_213 213 val_213
> > 238   val_238 2008-04-08  11  238 val_238 238 val_238
> > 255   val_255 2008-04-08  11  255 val_255 255 val_255
> > 273   val_273 2008-04-08  11  273 val_273 273 val_273
> > 278   val_278 2008-04-08  11  278 val_278 278 val_278
> > 311   val_311 2008-04-08  11  311 val_311 311 val_311
> > 401   val_401 2008-04-08  11  401 val_401 401 val_401
> > 406   val_406 2008-04-08  11  406 val_406 406 val_406
> > 66val_66  2008-04-08  11  66  val_66  66  val_66
> > 98val_98  2008-04-08  11  98  val_98  98  val_98
> 4212c4212
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12039) Fix TestSSL#testSSLVersion

2016-01-14 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-12039:

Attachment: HIVE-12039.2.patch

Another try. Test passes locally, let's see if it passes precommit.

> Fix TestSSL#testSSLVersion 
> ---
>
> Key: HIVE-12039
> URL: https://issues.apache.org/jira/browse/HIVE-12039
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
> Attachments: HIVE-12039.1.patch, HIVE-12039.2.patch
>
>
> Looks like it's only run on Linux and failing after HIVE-11720.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12832) RDBMS schema changes for HIVE-11388

2016-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-12832:
--
Attachment: HIVE-12832.uber.2.branch-1.patch

Attaching patch used for branch-1.  Note that this does update branch-1 to 
thrift 0.9.3.  It had previously been 0.9.2.  This brings the thrift version up 
to date with master.  Per 
http://www.mail-archive.com/user%40thrift.apache.org/msg01282.html this should 
not cause backward incompatibility issues.

> RDBMS schema changes for HIVE-11388
> ---
>
> Key: HIVE-12832
> URL: https://issues.apache.org/jira/browse/HIVE-12832
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12382.patch, HIVE-12832.3.patch, 
> HIVE-12832.uber.2.branch-1.patch, HIVE-12832.uber.2.branch-2.0.patch, 
> HIVE-12832.uber.2.patch, HIVE-12832.uber.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12854) LLAP: register permanent UDFs in the executors to make them usable, from localized jars

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12854:
---

Assignee: Sergey Shelukhin

> LLAP: register permanent UDFs in the executors to make them usable, from 
> localized jars
> ---
>
> Key: HIVE-12854
> URL: https://issues.apache.org/jira/browse/HIVE-12854
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-14 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-12875:

Description: For every partition entity object present in sem.getInputs() 
and sem.getOutputs(), we must verify the appropriate Table in the list of 
Entities.  (was: For every partition entity object present in sem.getInputs() 
and sem.getOutputs(), we must ensure that the appropriate Table is also added 
to the list of entities.)

> Verify sem.getInputs() and sem.getOutputs()
> ---
>
> Key: HIVE-12875
> URL: https://issues.apache.org/jira/browse/HIVE-12875
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>
> For every partition entity object present in sem.getInputs() and 
> sem.getOutputs(), we must verify the appropriate Table in the list of 
> Entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12868) Fix empty operation-pool metrics

2016-01-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099102#comment-15099102
 ] 

Jimmy Xiang commented on HIVE-12868:


+1

> Fix empty operation-pool metrics
> 
>
> Key: HIVE-12868
> URL: https://issues.apache.org/jira/browse/HIVE-12868
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12868.2.patch, HIVE-12868.patch
>
>
> The newly-added operation pool metrics (thread-pool size, queue size) are 
> empty because metrics system is initialized too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12724:
-
Attachment: HIVE-12724.4.patch

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng
>   last_modified_time  1450137141

[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12724:
-
Attachment: (was: HIVE-12724.4.patch)

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng
>   last_modified_time  

[jira] [Updated] (HIVE-12220) LLAP: Usability issues with hive.llap.io.cache.orc.size

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12220:

Attachment: HIVE-12220.02.patch

This patch should allow specifying units for these settings (e.g. "4Mb"). This 
can later be expanded to other settings as needed.

> LLAP: Usability issues with hive.llap.io.cache.orc.size
> ---
>
> Key: HIVE-12220
> URL: https://issues.apache.org/jira/browse/HIVE-12220
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Carter Shanklin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12220.01.patch, HIVE-12220.02.patch, 
> HIVE-12220.patch, HIVE-12220.tmp.patch
>
>
> In the llap-daemon site you need to set, among other things,
> llap.daemon.memory.per.instance.mb
> and
> hive.llap.io.cache.orc.size
> The use of hive.llap.io.cache.orc.size caused me some unnecessary problems, 
> initially I entered the value in MB rather than in bytes. Operator error you 
> could say but I look at this as a fraction of the other value which is in mb.
> Second, is this really tied to ORC? E.g. when we have the vectorized text 
> reader will this data be cached as well? Or might it be in the future?
> I would like to propose instead using hive.llap.io.cache.size.mb for this 
> setting.
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099182#comment-15099182
 ] 

Hive QA commented on HIVE-12352:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782361/HIVE-12352.3.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10019 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6627/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6627/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6627/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782361 - PreCommit-HIVE-TRUNK-Build

> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch
>
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'";
> The logic after that will delete relevant data from TXN_COMPONENTS and if one 
> of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). 
>  At that point any metadata about an Aborted txn is gone and the system will 
> think it's committed.
> HWM in this case would be (in ValidCompactorTxnList)
> if(minOpenTxn > 0)
> min(highWaterMark, minOpenTxn) 
> else 
> highWaterMark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099181#comment-15099181
 ] 

Sergey Shelukhin commented on HIVE-12353:
-

Is it possible to include schema changes into a patch for HiveQA here? This 
separate thrift change development process makes no sense whatsoever and will 
delay commits because of HiveQA

> When Compactor fails it calls CompactionTxnHandler.markedCleaned().  it 
> should not.
> ---
>
> Key: HIVE-12353
> URL: https://issues.apache.org/jira/browse/HIVE-12353
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12353.2.patch, HIVE-12353.3.patch, HIVE-12353.patch
>
>
> One of the things that this method does is delete entries from TXN_COMPONENTS 
> for partition that it was trying to compact.
> This causes Aborted transactions in TXNS to become empty according to
> CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be 
> deleted.  
> Once they are deleted, data that belongs to these txns is deemed committed...
> We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) 
> states.  We should also not delete then entry from markedCleaned()
> We'll have separate process that cleans 'f' and 's' records after X minutes 
> (or after > N records for a given partition exist).
> This allows SHOW COMPACTIONS to show some history info and how many times 
> compaction failed on a given partition (subject to retention interval) so 
> that we don't have to call markCleaned() on Compactor failures at the same 
> time preventing Compactor to constantly getting stuck on the same bad 
> partition/table.
> Ideally we'd want to include END_TIME field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2016-01-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099205#comment-15099205
 ] 

Eugene Koifman commented on HIVE-12352:
---

committed to master 
https://github.com/apache/hive/commit/4935cfda78577bd63f1c4ae04a26dc307e640b6f

> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch
>
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'";
> The logic after that will delete relevant data from TXN_COMPONENTS and if one 
> of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). 
>  At that point any metadata about an Aborted txn is gone and the system will 
> think it's committed.
> HWM in this case would be (in ValidCompactorTxnList)
> if(minOpenTxn > 0)
> min(highWaterMark, minOpenTxn) 
> else 
> highWaterMark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12366:
-
Attachment: HIVE-12366.14.patch

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, 
> HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, 
> HIVE-12366.2.patch, HIVE-12366.3.patch, HIVE-12366.4.patch, 
> HIVE-12366.5.patch, HIVE-12366.6.patch, HIVE-12366.7.patch, 
> HIVE-12366.8.patch, HIVE-12366.9.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12366:
-
Attachment: (was: HIVE-12366.14.patch)

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, 
> HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, 
> HIVE-12366.2.patch, HIVE-12366.3.patch, HIVE-12366.4.patch, 
> HIVE-12366.5.patch, HIVE-12366.6.patch, HIVE-12366.7.patch, 
> HIVE-12366.8.patch, HIVE-12366.9.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12805) CBO: Calcite Operator To Hive Operator (Calcite Return Path): MiniTezCliDriver skewjoin.q failure

2016-01-14 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-12805:
-
Attachment: HIVE-12805.2.patch

> CBO: Calcite Operator To Hive Operator (Calcite Return Path): 
> MiniTezCliDriver skewjoin.q failure
> -
>
> Key: HIVE-12805
> URL: https://issues.apache.org/jira/browse/HIVE-12805
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Reporter: Hari Sankar Sivarama Subramaniyan
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-12805.1.patch, HIVE-12805.2.patch
>
>
> Set hive.cbo.returnpath.hiveop=true
> {code}
> FROM T1 a FULL OUTER JOIN T2 c ON c.key+1=a.key SELECT /*+ STREAMTABLE(a) */ 
> sum(hash(a.key)), sum(hash(a.val)), sum(hash(c.key))
> {code}
> The stack trace:
> {code}
> java.lang.IndexOutOfBoundsException: Index: 1, Size: 1
> at java.util.ArrayList.rangeCheck(ArrayList.java:635)
> at java.util.ArrayList.get(ArrayList.java:411)
> at 
> org.apache.hadoop.hive.ql.ppd.SyntheticJoinPredicate$JoinSynthetic.process(SyntheticJoinPredicate.java:183)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:105)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:89)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:43)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.PreOrderOnceWalker.walk(PreOrderOnceWalker.java:54)
> at 
> org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:120)
> at 
> org.apache.hadoop.hive.ql.ppd.SyntheticJoinPredicate.transform(SyntheticJoinPredicate.java:100)
> at 
> org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:236)
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10170)
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:231)
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:237)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:471)
> {code}
> Same error happens in auto_sortmerge_join_6.q.out for 
> {code}
> select count(*) FROM tbl1 a JOIN tbl2 b ON a.key = b.key join src h on 
> h.value = a.value
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12868) Fix empty operation-pool metrics

2016-01-14 Thread Szehon Ho (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Szehon Ho updated HIVE-12868:
-
Attachment: HIVE-12868.2.patch

Thanks for the suggestion, it is a better fix.

I fixed the TestHs2, and also moved the code so it gets executed in same place 
by both HiveServer2 main method and MiniHs2.start()

> Fix empty operation-pool metrics
> 
>
> Key: HIVE-12868
> URL: https://issues.apache.org/jira/browse/HIVE-12868
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12868.2.patch, HIVE-12868.patch
>
>
> The newly-added operation pool metrics (thread-pool size, queue size) are 
> empty because metrics system is initialized too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8

2016-01-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099100#comment-15099100
 ] 

Pengcheng Xiong commented on HIVE-12657:


[~sershe], thanks for your prompt action. Patch looks good to me +1 pending QA 
run. Just one question. Shall we also make invRslvMap a LinkedHashMap too?

> selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
> ---
>
> Key: HIVE-12657
> URL: https://issues.apache.org/jira/browse/HIVE-12657
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12657.patch
>
>
> Encountered this issue when analysing test failures of HIVE-12609. 
> selectDistinctStar.q produces the following diff when I ran with java version 
> "1.7.0_55" and java version "1.8.0_60"
> {code}
> < 128   val_128 128 
> ---
> > 128   128 val_128
> 1770c1770
> < 224   val_224 224 
> ---
> > 224   224 val_224
> 1776c1776
> < 369   val_369 369 
> ---
> > 369   369 val_369
> 1799,1810c1799,1810
> < 146   val_146 146 val_146 146 val_146 2008-04-08  11
> < 150   val_150 150 val_150 150 val_150 2008-04-08  11
> < 213   val_213 213 val_213 213 val_213 2008-04-08  11
> < 238   val_238 238 val_238 238 val_238 2008-04-08  11
> < 255   val_255 255 val_255 255 val_255 2008-04-08  11
> < 273   val_273 273 val_273 273 val_273 2008-04-08  11
> < 278   val_278 278 val_278 278 val_278 2008-04-08  11
> < 311   val_311 311 val_311 311 val_311 2008-04-08  11
> < 401   val_401 401 val_401 401 val_401 2008-04-08  11
> < 406   val_406 406 val_406 406 val_406 2008-04-08  11
> < 66val_66  66  val_66  66  val_66  2008-04-08  11
> < 98val_98  98  val_98  98  val_98  2008-04-08  11
> ---
> > 146   val_146 2008-04-08  11  146 val_146 146 val_146
> > 150   val_150 2008-04-08  11  150 val_150 150 val_150
> > 213   val_213 2008-04-08  11  213 val_213 213 val_213
> > 238   val_238 2008-04-08  11  238 val_238 238 val_238
> > 255   val_255 2008-04-08  11  255 val_255 255 val_255
> > 273   val_273 2008-04-08  11  273 val_273 273 val_273
> > 278   val_278 2008-04-08  11  278 val_278 278 val_278
> > 311   val_311 2008-04-08  11  311 val_311 311 val_311
> > 401   val_401 2008-04-08  11  401 val_401 401 val_401
> > 406   val_406 2008-04-08  11  406 val_406 406 val_406
> > 66val_66  2008-04-08  11  66  val_66  66  val_66
> > 98val_98  2008-04-08  11  98  val_98  98  val_98
> 4212c4212
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12366) Refactor Heartbeater logic for transaction

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12366:
-
Attachment: HIVE-12366.14.patch

patch 14 for test

> Refactor Heartbeater logic for transaction
> --
>
> Key: HIVE-12366
> URL: https://issues.apache.org/jira/browse/HIVE-12366
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12366.1.patch, HIVE-12366.11.patch, 
> HIVE-12366.12.patch, HIVE-12366.13.patch, HIVE-12366.14.patch, 
> HIVE-12366.2.patch, HIVE-12366.3.patch, HIVE-12366.4.patch, 
> HIVE-12366.5.patch, HIVE-12366.6.patch, HIVE-12366.7.patch, 
> HIVE-12366.8.patch, HIVE-12366.9.patch
>
>
> Currently there is a gap between the time locks acquisition and the first 
> heartbeat being sent out. Normally the gap is negligible, but when it's big 
> it will cause query fail since the locks are timed out by the time the 
> heartbeat is sent.
> Need to remove this gap.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12724:
-
Attachment: HIVE-12724.ADDENDUM.1.patch

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.ADDENDUM.1.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng
>   last_modified_time  1450137141
>   numFiles2
>   numRows -1
> 

[jira] [Commented] (HIVE-12832) RDBMS schema changes for HIVE-11388

2016-01-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098727#comment-15098727
 ] 

Alan Gates commented on HIVE-12832:
---

Committed uber.2 patch to master.  Will also commit to branch-2.0 and branch-1 
shortly.

> RDBMS schema changes for HIVE-11388
> ---
>
> Key: HIVE-12832
> URL: https://issues.apache.org/jira/browse/HIVE-12832
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12382.patch, HIVE-12832.3.patch, 
> HIVE-12832.uber.2.patch, HIVE-12832.uber.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12783) fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098687#comment-15098687
 ] 

Sergey Shelukhin commented on HIVE-12783:
-

[~owen.omalley] Should this be committed? As far as I see HiveQA has run on the 
latest patch and there's a +1

> fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl
> -
>
> Key: HIVE-12783
> URL: https://issues.apache.org/jira/browse/HIVE-12783
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Owen O'Malley
>Priority: Blocker
> Attachments: HIVE-12783.patch, HIVE-12783.patch, HIVE-12783.patch
>
>
> This includes
> {code}
> org.apache.hive.spark.client.TestSparkClient.testSyncRpc
> org.apache.hive.spark.client.TestSparkClient.testJobSubmission
> org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
> org.apache.hive.spark.client.TestSparkClient.testCounters
> org.apache.hive.spark.client.TestSparkClient.testRemoteClient
> org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
> org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
> org.apache.hive.spark.client.TestSparkClient.testErrorJob
> org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
> org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
> {code}
> all of them passed on my laptop. cc'ing [~szehon], [~xuefuz], could you 
> please take a look? Shall we ignore them? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098704#comment-15098704
 ] 

Wei Zheng commented on HIVE-12724:
--

[~ekoifman] Can you review the two patches below?
Addendum of patch 3 for master: HIVE-12724.ADDENDUM.1.patch
Complete patch for branch-1: HIVE-12724.branch-1.2.patch

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   

[jira] [Commented] (HIVE-12695) LLAP: use somebody else's cluster

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098731#comment-15098731
 ] 

Sergey Shelukhin commented on HIVE-12695:
-

After some discussion we have decided against pursuing this model. It may 
interfere with future, improved security model and we don't want to have to 
support it in future.


> LLAP: use somebody else's cluster
> -
>
> Key: HIVE-12695
> URL: https://issues.apache.org/jira/browse/HIVE-12695
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12695.patch
>
>
> For non-HS2 case cluster sharing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2016-01-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098733#comment-15098733
 ] 

Eugene Koifman commented on HIVE-12352:
---

[~sershe] Alan will commit HIVE-12832 later today and I can commit this right 
after.
I would really like to get HIVE-12353 into 2.0 as well - I should have a patch 
tomorrow

> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12352.2.patch, HIVE-12352.patch
>
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'";
> The logic after that will delete relevant data from TXN_COMPONENTS and if one 
> of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). 
>  At that point any metadata about an Aborted txn is gone and the system will 
> think it's committed.
> HWM in this case would be (in ValidCompactorTxnList)
> if(minOpenTxn > 0)
> min(highWaterMark, minOpenTxn) 
> else 
> highWaterMark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12523) display Hive query name in explain plan

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12523:

Description: 
Query name is being added by HIVE-12357

NO PRECOMMIT TESTS

  was:
Query name is being added by HIVE-12357



> display Hive query name in explain plan
> ---
>
> Key: HIVE-12523
> URL: https://issues.apache.org/jira/browse/HIVE-12523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12523.01.patch, HIVE-12523.patch
>
>
> Query name is being added by HIVE-12357
> NO PRECOMMIT TESTS



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12523) display Hive query name in explain plan

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12523:

Description: 
Query name is being added by HIVE-12357


  was:
Query name is being added by HIVE-12357

NO PRECOMMIT TESTS


> display Hive query name in explain plan
> ---
>
> Key: HIVE-12523
> URL: https://issues.apache.org/jira/browse/HIVE-12523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12523.01.patch, HIVE-12523.patch
>
>
> Query name is being added by HIVE-12357



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098741#comment-15098741
 ] 

Sergey Shelukhin commented on HIVE-12352:
-

How about a HiveQA run for this?

> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12352.2.patch, HIVE-12352.patch
>
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'";
> The logic after that will delete relevant data from TXN_COMPONENTS and if one 
> of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). 
>  At that point any metadata about an Aborted txn is gone and the system will 
> think it's committed.
> HWM in this case would be (in ValidCompactorTxnList)
> if(minOpenTxn > 0)
> min(highWaterMark, minOpenTxn) 
> else 
> highWaterMark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12794) LLAP cannot run queries against HBase due to missing HBase jars

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098701#comment-15098701
 ] 

Sergey Shelukhin commented on HIVE-12794:
-

Posted RB.
https://reviews.apache.org/r/42318/diff/1-2/ are the changes that are not yet 
+1-d

> LLAP cannot run queries against HBase due to missing HBase jars
> ---
>
> Key: HIVE-12794
> URL: https://issues.apache.org/jira/browse/HIVE-12794
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12794.01.patch, HIVE-12794.02.patch, 
> HIVE-12794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12794) LLAP cannot run queries against HBase due to missing HBase jars

2016-01-14 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098715#comment-15098715
 ] 

Gunther Hagleitner commented on HIVE-12794:
---

+1 to .02 (latest)

> LLAP cannot run queries against HBase due to missing HBase jars
> ---
>
> Key: HIVE-12794
> URL: https://issues.apache.org/jira/browse/HIVE-12794
> Project: Hive
>  Issue Type: Bug
>Reporter: Takahiko Saito
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12794.01.patch, HIVE-12794.02.patch, 
> HIVE-12794.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12523) display Hive query name in explain plan

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-12523:

Target Version/s: 2.0.0
 Description: 
Query name is being added by HIVE-12357

NO PRECOMMIT TESTS

Let's see if we can get these in for 2.0

  was:
Query name is being added by HIVE-12357

NO PRECOMMIT TESTS


> display Hive query name in explain plan
> ---
>
> Key: HIVE-12523
> URL: https://issues.apache.org/jira/browse/HIVE-12523
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-12523.01.patch, HIVE-12523.patch
>
>
> Query name is being added by HIVE-12357
> NO PRECOMMIT TESTS
> Let's see if we can get these in for 2.0



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2016-01-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098747#comment-15098747
 ] 

Eugene Koifman commented on HIVE-12352:
---

I'll rebase and and get it going


> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12352.2.patch, HIVE-12352.patch
>
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'";
> The logic after that will delete relevant data from TXN_COMPONENTS and if one 
> of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). 
>  At that point any metadata about an Aborted txn is gone and the system will 
> think it's committed.
> HWM in this case would be (in ValidCompactorTxnList)
> if(minOpenTxn > 0)
> min(highWaterMark, minOpenTxn) 
> else 
> highWaterMark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101220#comment-15101220
 ] 

Xuefu Zhang commented on HIVE-12828:


I think we are okay. Please feel free to commit the patch. We will address the 
env issue tomorrow.

> Update Spark version to 1.6
> ---
>
> Key: HIVE-12828
> URL: https://issues.apache.org/jira/browse/HIVE-12828
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, 
> HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, 
> mem.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101183#comment-15101183
 ] 

Xuefu Zhang commented on HIVE-12828:


+1

> Update Spark version to 1.6
> ---
>
> Key: HIVE-12828
> URL: https://issues.apache.org/jira/browse/HIVE-12828
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, 
> HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, 
> mem.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12661) StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly

2016-01-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101228#comment-15101228
 ] 

Ashutosh Chauhan commented on HIVE-12661:
-

+1 pending QA run

> StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly
> ---
>
> Key: HIVE-12661
> URL: https://issues.apache.org/jira/browse/HIVE-12661
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12661.01.patch, HIVE-12661.02.patch, 
> HIVE-12661.03.patch, HIVE-12661.04.patch, HIVE-12661.05.patch, 
> HIVE-12661.06.patch, HIVE-12661.07.patch, HIVE-12661.08.patch, 
> HIVE-12661.09.patch, HIVE-12661.10.patch, HIVE-12661.11.patch
>
>
> PROBLEM:
> Hive stats are autogathered properly till an 'analyze table [tablename] 
> compute statistics for columns' is run. Then it does not auto-update the 
> stats till the command is run again. repo:
> {code}
> set hive.stats.autogather=true; 
> set hive.stats.atomic=false ; 
> set hive.stats.collect.rawdatasize=true ; 
> set hive.stats.collect.scancols=false ; 
> set hive.stats.collect.tablekeys=false ; 
> set hive.stats.fetch.column.stats=true; 
> set hive.stats.fetch.partition.stats=true ; 
> set hive.stats.reliable=false ; 
> set hive.compute.query.using.stats=true; 
> CREATE TABLE `default`.`calendar` (`year` int) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 
> 'orc.compress'='NONE') ; 
> insert into calendar values (2010), (2011), (2012); 
> select * from calendar; 
> ++--+ 
> | calendar.year | 
> ++--+ 
> | 2010 | 
> | 2011 | 
> | 2012 | 
> ++--+ 
> select max(year) from calendar; 
> | 2012 | 
> insert into calendar values (2013); 
> select * from calendar; 
> ++--+ 
> | calendar.year | 
> ++--+ 
> | 2010 | 
> | 2011 | 
> | 2012 | 
> | 2013 | 
> ++--+ 
> select max(year) from calendar; 
> | 2013 | 
> insert into calendar values (2014); 
> select max(year) from calendar; 
> | 2014 |
> analyze table calendar compute statistics for columns;
> insert into calendar values (2015);
> select max(year) from calendar;
> | 2014 |
> insert into calendar values (2016), (2017), (2018);
> select max(year) from calendar;
> | 2014  |
> analyze table calendar compute statistics for columns;
> select max(year) from calendar;
> | 2018  |
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9862) Vectorized execution corrupts timestamp values

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101203#comment-15101203
 ] 

Hive QA commented on HIVE-9862:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782388/HIVE-9862.06.patch

{color:green}SUCCESS:{color} +1 due to 14 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 9718 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union9
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hadoop.hive.ql.exec.vector.expressions.TestVectorTypeCasts.testCastTimestampToDouble
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testSparkQuery
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6630/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6630/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6630/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782388 - PreCommit-HIVE-TRUNK-Build

> Vectorized execution corrupts timestamp values
> --
>
> Key: HIVE-9862
> URL: https://issues.apache.org/jira/browse/HIVE-9862
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.0.0
>Reporter: Nathan Howell
>Assignee: Matt McCline
> Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, 
> HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch, HIVE-9862.06.patch
>
>
> Timestamps in the future (year 2250?) and before ~1700 are silently corrupted 
> in vectorized execution mode. Simple repro:
> {code}
> hive> DROP TABLE IF EXISTS test;
> hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC;
> hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59');
> hive> SET hive.vectorized.execution.enabled = false;
> hive> SELECT MAX(ts) FROM test;
> -12-31 23:59:59
> hive> SET hive.vectorized.execution.enabled = true;
> hive> SELECT MAX(ts) FROM test;
> 1816-03-30 05:56:07.066277376
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101204#comment-15101204
 ] 

Rui Li commented on HIVE-12828:
---

[~xuefuz], do we need to make parquet_join pass here?

> Update Spark version to 1.6
> ---
>
> Key: HIVE-12828
> URL: https://issues.apache.org/jira/browse/HIVE-12828
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, 
> HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, 
> mem.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12826) Vectorization: VectorUDAF* suspect isNull checks

2016-01-14 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101201#comment-15101201
 ] 

Matt McCline commented on HIVE-12826:
-

+1 LGTM

> Vectorization: VectorUDAF* suspect isNull checks
> 
>
> Key: HIVE-12826
> URL: https://issues.apache.org/jira/browse/HIVE-12826
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.3.0, 2.0.0, 2.1.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-12826.1.patch
>
>
> for isRepeating=true, checking isNull[selected[i]] might return incorrect 
> results (without a heavy array fill of isNull).
> VectorUDAFSum/Min/Max/Avg and SumDecimal impls need to be reviewed for this 
> pattern.
> {code}
> private void iterateHasNullsRepeatingSelectionWithAggregationSelection(
>   VectorAggregationBufferRow[] aggregationBufferSets,
>   int aggregateIndex,
>value,
>   int batchSize,
>   int[] selection,
>   boolean[] isNull) {
>   
>   for (int i=0; i < batchSize; ++i) {
> if (!isNull[selection[i]]) {
>   Aggregation myagg = getCurrentAggregationBuffer(
> aggregationBufferSets, 
> aggregateIndex,
> i);
>   myagg.sumValue(value);
> }
>   }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12863:
---
Attachment: HIVE-12863.01.patch

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12863.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-14 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101243#comment-15101243
 ] 

Pengcheng Xiong commented on HIVE-12863:


[~alangates] and [~daijy], could you take a look at the patch? I think it is 
related to the HBaseStore. Thanks.

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12863.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12863:
---
Attachment: HIVE-12863.01.patch

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12863.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12863) fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union

2016-01-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12863:
---
Attachment: (was: HIVE-12863.01.patch)

> fix test failure for TestMiniTezCliDriver.testCliDriver_tez_union
> -
>
> Key: HIVE-12863
> URL: https://issues.apache.org/jira/browse/HIVE-12863
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12863.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-9862) Vectorized execution corrupts timestamp values

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099038#comment-15099038
 ] 

Hive QA commented on HIVE-9862:
---



Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782279/HIVE-9862.05.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6626/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6626/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6626/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests exited with: ExecutionException: java.util.concurrent.ExecutionException: 
org.apache.hive.ptest.execution.ssh.SSHExecutionException: RSyncResult 
[localFile=/data/hive-ptest/logs/PreCommit-HIVE-TRUNK-Build-6626/succeeded/TestAcidUtils,
 remoteFile=/home/hiveptest/174.129.104.177-hiveptest-2/logs/, 
getExitCode()=12, getException()=null, getUser()=hiveptest, 
getHost()=174.129.104.177, getInstance()=2]: 'ssh: connect to host 
174.129.104.177 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: unexplained error (code 255) at io.c(600) [receiver=3.0.6]
ssh: connect to host 174.129.104.177 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
ssh: connect to host 174.129.104.177 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
ssh: connect to host 174.129.104.177 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
ssh: connect to host 174.129.104.177 port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(600) 
[receiver=3.0.6]
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782279 - PreCommit-HIVE-TRUNK-Build

> Vectorized execution corrupts timestamp values
> --
>
> Key: HIVE-9862
> URL: https://issues.apache.org/jira/browse/HIVE-9862
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.0.0
>Reporter: Nathan Howell
>Assignee: Matt McCline
> Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, 
> HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch
>
>
> Timestamps in the future (year 2250?) and before ~1700 are silently corrupted 
> in vectorized execution mode. Simple repro:
> {code}
> hive> DROP TABLE IF EXISTS test;
> hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC;
> hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59');
> hive> SET hive.vectorized.execution.enabled = false;
> hive> SELECT MAX(ts) FROM test;
> -12-31 23:59:59
> hive> SET hive.vectorized.execution.enabled = true;
> hive> SELECT MAX(ts) FROM test;
> 1816-03-30 05:56:07.066277376
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12875) Verify sem.getInputs() and sem.getOutputs()

2016-01-14 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099144#comment-15099144
 ] 

Sushanth Sowmyan commented on HIVE-12875:
-

(Will upload patch shortly)

> Verify sem.getInputs() and sem.getOutputs()
> ---
>
> Key: HIVE-12875
> URL: https://issues.apache.org/jira/browse/HIVE-12875
> Project: Hive
>  Issue Type: Bug
>Reporter: Sushanth Sowmyan
>Assignee: Sushanth Sowmyan
>
> For every partition entity object present in sem.getInputs() and 
> sem.getOutputs(), we must ensure that the appropriate Table is also added to 
> the list of entities.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12661) StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly

2016-01-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12661:
---
Attachment: HIVE-12661.11.patch

address [~ashutoshc]'s comments.

> StatsSetupConst.COLUMN_STATS_ACCURATE is not used correctly
> ---
>
> Key: HIVE-12661
> URL: https://issues.apache.org/jira/browse/HIVE-12661
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12661.01.patch, HIVE-12661.02.patch, 
> HIVE-12661.03.patch, HIVE-12661.04.patch, HIVE-12661.05.patch, 
> HIVE-12661.06.patch, HIVE-12661.07.patch, HIVE-12661.08.patch, 
> HIVE-12661.09.patch, HIVE-12661.10.patch, HIVE-12661.11.patch
>
>
> PROBLEM:
> Hive stats are autogathered properly till an 'analyze table [tablename] 
> compute statistics for columns' is run. Then it does not auto-update the 
> stats till the command is run again. repo:
> {code}
> set hive.stats.autogather=true; 
> set hive.stats.atomic=false ; 
> set hive.stats.collect.rawdatasize=true ; 
> set hive.stats.collect.scancols=false ; 
> set hive.stats.collect.tablekeys=false ; 
> set hive.stats.fetch.column.stats=true; 
> set hive.stats.fetch.partition.stats=true ; 
> set hive.stats.reliable=false ; 
> set hive.compute.query.using.stats=true; 
> CREATE TABLE `default`.`calendar` (`year` int) ROW FORMAT SERDE 
> 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 
> 'orc.compress'='NONE') ; 
> insert into calendar values (2010), (2011), (2012); 
> select * from calendar; 
> ++--+ 
> | calendar.year | 
> ++--+ 
> | 2010 | 
> | 2011 | 
> | 2012 | 
> ++--+ 
> select max(year) from calendar; 
> | 2012 | 
> insert into calendar values (2013); 
> select * from calendar; 
> ++--+ 
> | calendar.year | 
> ++--+ 
> | 2010 | 
> | 2011 | 
> | 2012 | 
> | 2013 | 
> ++--+ 
> select max(year) from calendar; 
> | 2013 | 
> insert into calendar values (2014); 
> select max(year) from calendar; 
> | 2014 |
> analyze table calendar compute statistics for columns;
> insert into calendar values (2015);
> select max(year) from calendar;
> | 2014 |
> insert into calendar values (2016), (2017), (2018);
> select max(year) from calendar;
> | 2014  |
> analyze table calendar compute statistics for columns;
> select max(year) from calendar;
> | 2018  |
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12832) RDBMS schema changes for HIVE-11388

2016-01-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated HIVE-12832:
--
Attachment: HIVE-12832.uber.2.branch-2.0.patch

Attaching the patch I used for branch-2.0

> RDBMS schema changes for HIVE-11388
> ---
>
> Key: HIVE-12832
> URL: https://issues.apache.org/jira/browse/HIVE-12832
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12382.patch, HIVE-12832.3.patch, 
> HIVE-12832.uber.2.branch-2.0.patch, HIVE-12832.uber.2.patch, 
> HIVE-12832.uber.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10632) Make sure TXN_COMPONENTS gets cleaned up if table is dropped before compaction.

2016-01-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15099194#comment-15099194
 ] 

Eugene Koifman commented on HIVE-10632:
---

Also, note that insert into T values(...) generates an auxiliary 
values_tmp_table_N which ends up in ACID metastore tables.  This is a temp 
table so it gets cleaned up but it still causes "noise" in ACID subsystem.

> Make sure TXN_COMPONENTS gets cleaned up if table is dropped before 
> compaction.
> ---
>
> Key: HIVE-10632
> URL: https://issues.apache.org/jira/browse/HIVE-10632
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore, Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Critical
>
> The compaction process will clean up entries in  TXNS, 
> COMPLETED_TXN_COMPONENTS, TXN_COMPONENTS.  If the table/partition is dropped 
> before compaction is complete there will be data left in these tables.  Need 
> to investigate if there are other situations where this may happen and 
> address it.
> see HIVE-10595 for additional info



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12876) OpenCSV Serde treats everything as strings.

2016-01-14 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-12876:
---
Affects Version/s: 1.2.1

> OpenCSV Serde treats everything as strings.
> ---
>
> Key: HIVE-12876
> URL: https://issues.apache.org/jira/browse/HIVE-12876
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Serializers/Deserializers
>Affects Versions: 1.2.1
>Reporter: Carter Shanklin
>
> This one caught me by surprise after some wrong results. I'm filing this as 
> Brock suggested in HIVE-. Back to Ctrl-A for me it seems.
> To repro:
> {code}
> drop table int_table;
> create table int_table(
> x int
> )
> ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde';
> show create table int_table;
> {code}
> And note that x is a string.
> Applicable to Hive 1.2.1.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12877) Hive use index for queries will lose some data if the Query file is compressed.

2016-01-14 Thread yangfang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangfang updated HIVE-12877:

Description: 
Hive created the index using the extracted file length when the file is  the 
compressed,
but when to divide the data into pieces in MapReduce,Hive use the file length 
to compare with the extracted file length,if
If it found that these two lengths are not matched, It filters out the file.So 
the query will lose some data.

  was:
Hive created the index using the extracted file length when the file is  the 
compressed,
but when to divide the data into pieces in MapReduce,Hive use the file length 
to compare with the extracted file length,if
If it found that these two lengths are not matched, It filters out the file.So 
the query will lose some data


> Hive use index for queries will lose some data if the Query file is 
> compressed.
> ---
>
> Key: HIVE-12877
> URL: https://issues.apache.org/jira/browse/HIVE-12877
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 1.2.1
> Environment: This problem exists in all Hive versions.no matter what 
> platform
>Reporter: yangfang
> Attachments: HIVE-12877.patch
>
>
> Hive created the index using the extracted file length when the file is  the 
> compressed,
> but when to divide the data into pieces in MapReduce,Hive use the file length 
> to compare with the extracted file length,if
> If it found that these two lengths are not matched, It filters out the 
> file.So the query will lose some data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12839) Upgrade Hive to Calcite 1.6

2016-01-14 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-12839:
---
Attachment: HIVE-12839.02.patch

> Upgrade Hive to Calcite 1.6
> ---
>
> Key: HIVE-12839
> URL: https://issues.apache.org/jira/browse/HIVE-12839
> Project: Hive
>  Issue Type: Improvement
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-12839.01.patch, HIVE-12839.02.patch
>
>
> CLEAR LIBRARY CACHE
> Upgrade Hive to Calcite 1.6.0-incubating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12877) Hive use index for queries will lose some data if the Query file is compressed.

2016-01-14 Thread yangfang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yangfang updated HIVE-12877:

Description: 
Hive created the index using the extracted file length when the file is  the 
compressed,
but when to divide the data into pieces in MapReduce,Hive use the file length 
to compare with the extracted file length,if
If it found that these two lengths are not matched, It filters out the file.So 
the query will lose some data.
I modified the source code and make hive index can be used when the files is 
compressed,please test it.

  was:
Hive created the index using the extracted file length when the file is  the 
compressed,
but when to divide the data into pieces in MapReduce,Hive use the file length 
to compare with the extracted file length,if
If it found that these two lengths are not matched, It filters out the file.So 
the query will lose some data.


> Hive use index for queries will lose some data if the Query file is 
> compressed.
> ---
>
> Key: HIVE-12877
> URL: https://issues.apache.org/jira/browse/HIVE-12877
> Project: Hive
>  Issue Type: Bug
>  Components: Indexing
>Affects Versions: 1.2.1
> Environment: This problem exists in all Hive versions.no matter what 
> platform
>Reporter: yangfang
> Attachments: HIVE-12877.patch
>
>
> Hive created the index using the extracted file length when the file is  the 
> compressed,
> but when to divide the data into pieces in MapReduce,Hive use the file length 
> to compare with the extracted file length,if
> If it found that these two lengths are not matched, It filters out the 
> file.So the query will lose some data.
> I modified the source code and make hive index can be used when the files is 
> compressed,please test it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098777#comment-15098777
 ] 

Hive QA commented on HIVE-12828:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782315/HIVE-12828.2-spark.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 45 failed/errored test(s), 9312 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver-auto_sortmerge_join_7.q-exim_04_evolved_parts.q-query_with_semi.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-bool_literal.q-authorization_cli_createtab.q-explain_ddl.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-bucketsortoptimize_insert_7.q-list_bucket_query_multiskew_1.q-skewjoin_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-cp_mj_rc.q-decimal_2.q-union32.q-and-12-more - did not produce a 
TEST-*.xml file
TestCliDriver-groupby_map_ppr_multi_distinct.q-vectorization_16.q-union_remove_15.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-join39.q-exim_07_all_part_over_nonoverlap.q-cbo_windowing.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-join9.q-insert_values_partitioned.q-progress_1.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-join_cond_pushdown_unqual4.q-udf_var_samp.q-load_dyn_part2.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-metadata_export_drop.q-udf_sin.q-udf_reverse.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-orc_split_elimination.q-udf_xpath_string.q-partition_wise_fileformat.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-ptf_general_queries.q-unionDistinct_1.q-groupby1_noskew.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-push_or.q-infer_bucket_sort_list_bucket.q-vector_interval_2.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-script_pipe.q-auto_join24.q-cast1.q-and-12-more - did not produce 
a TEST-*.xml file
TestCliDriver-show_conf.q-udaf_covar_samp.q-udf_md5.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-skewjoin_mapjoin4.q-groupby6_map.q-cbo_rp_union.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-smb_mapjoin_4.q-udf_asin.q-udf_to_unix_timestamp.q-and-12-more - 
did not produce a TEST-*.xml file
TestCliDriver-stats13.q-join_parse.q-sort_merge_join_desc_2.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-timestamp_lazy.q-union29.q-groupby_ppd.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-timestamp_literal.q-inputddl8.q-runtime_skewjoin_mapjoin_spark.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-udf_current_user.q-join44.q-union2.q-and-12-more - did not 
produce a TEST-*.xml file
TestCliDriver-udf_nvl.q-alter_char1.q-serde_reported_schema.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-union36.q-acid_join.q-part_inherit_tbl_props_empty.q-and-6-more - 
did not produce a TEST-*.xml file
TestCliDriver-varchar_nested_types.q-leadlag.q-semicolon.q-and-12-more - did 
not produce a TEST-*.xml file
TestCliDriver-vector_distinct_2.q-nullscript.q-vector_char_mapjoin1.q-and-12-more
 - did not produce a TEST-*.xml file
TestCliDriver-vectorization_limit.q-union19.q-groupby_grouping_sets6.q-and-12-more
 - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
TestMiniTezCliDriver-auto_join30.q-vector_decimal_10_0.q-schema_evol_orc_acidvec_mapwork_part.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_grouping_sets.q-mapjoin_mapjoin.q-update_all_partitioned.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_interval_2.q-constprog_dpp.q-dynamic_partition_pruning.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vector_non_string_partition.q-delete_where_non_partitioned.q-auto_sortmerge_join_16.q-and-12-more
 - did not produce a TEST-*.xml file
TestMiniTezCliDriver-vectorization_13.q-tez_bmj_schema_evolution.q-bucket3.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-auto_join30.q-join9.q-input17.q-and-12-more - did not 
produce a TEST-*.xml file
TestSparkCliDriver-avro_joins.q-join36.q-join4.q-and-12-more - did not produce 
a TEST-*.xml file
TestSparkCliDriver-bucketmapjoin3.q-enforce_order.q-union11.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-groupby6_map.q-join13.q-join_reorder3.q-and-12-more - did 
not produce a TEST-*.xml file
TestSparkCliDriver-groupby_grouping_id2.q-vectorization_13.q-auto_sortmerge_join_13.q-and-12-more
 - did not produce a TEST-*.xml file
TestSparkCliDriver-input1_limit.q-groupby8_map.q-varchar_join1.q-and-12-more - 
did not produce a TEST-*.xml file
TestSparkCliDriver-join_cond_pushdown_3.q-groupby7.q-auto_join17.q-and-12-more 
- did not produce a TEST-*.xml 

[jira] [Updated] (HIVE-12352) CompactionTxnHandler.markCleaned() may delete too much

2016-01-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12352:
--
Attachment: HIVE-12352.3.patch

> CompactionTxnHandler.markCleaned() may delete too much
> --
>
> Key: HIVE-12352
> URL: https://issues.apache.org/jira/browse/HIVE-12352
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12352.2.patch, HIVE-12352.3.patch, HIVE-12352.patch
>
>
>Worker will start with DB in state X (wrt this partition).
>while it's working more txns will happen, against partition it's 
> compacting.
>then this will delete state up to X and since then.  There may be new 
> delta files created
>between compaction starting and cleaning.  These will not be compacted 
> until more
>transactions happen.  So this ideally should only delete
>up to TXN_ID that was compacted (i.e. HWM in Worker?)  Then this can also 
> run
>at READ_COMMITTED.  So this means we'd want to store HWM in 
> COMPACTION_QUEUE when
>Worker picks up the job.
> Actually the problem is even worse (but also solved using HWM as above):
> Suppose some transactions (against same partition) have started and aborted 
> since the time Worker ran compaction job.
> That means there are never-compacted delta files with data that belongs to 
> these aborted txns.
> Following will pick up these aborted txns.
> s = "select txn_id from TXNS, TXN_COMPONENTS where txn_id = tc_txnid and 
> txn_state = '" +
>   TXN_ABORTED + "' and tc_database = '" + info.dbname + "' and 
> tc_table = '" +
>   info.tableName + "'";
> if (info.partName != null) s += " and tc_partition = '" + 
> info.partName + "'";
> The logic after that will delete relevant data from TXN_COMPONENTS and if one 
> of these txns becomes empty, it will be picked up by cleanEmptyAbortedTxns(). 
>  At that point any metadata about an Aborted txn is gone and the system will 
> think it's committed.
> HWM in this case would be (in ValidCompactorTxnList)
> if(minOpenTxn > 0)
> min(highWaterMark, minOpenTxn) 
> else 
> highWaterMark



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12868) Fix empty operation-pool metrics

2016-01-14 Thread Szehon Ho (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098866#comment-15098866
 ] 

Szehon Ho commented on HIVE-12868:
--

[~jxiang] could you help me take a quick look at this patch?  Thanks

> Fix empty operation-pool metrics
> 
>
> Key: HIVE-12868
> URL: https://issues.apache.org/jira/browse/HIVE-12868
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12868.patch
>
>
> The newly-added operation pool metrics (thread-pool size, queue size) are 
> empty because metrics system is initialized too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-12657) selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8

2016-01-14 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-12657:
---

Assignee: Sergey Shelukhin  (was: Pengcheng Xiong)

> selectDistinctStar.q results differ with jdk 1.7 vs jdk 1.8
> ---
>
> Key: HIVE-12657
> URL: https://issues.apache.org/jira/browse/HIVE-12657
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
>
> Encountered this issue when analysing test failures of HIVE-12609. 
> selectDistinctStar.q produces the following diff when I ran with java version 
> "1.7.0_55" and java version "1.8.0_60"
> {code}
> < 128   val_128 128 
> ---
> > 128   128 val_128
> 1770c1770
> < 224   val_224 224 
> ---
> > 224   224 val_224
> 1776c1776
> < 369   val_369 369 
> ---
> > 369   369 val_369
> 1799,1810c1799,1810
> < 146   val_146 146 val_146 146 val_146 2008-04-08  11
> < 150   val_150 150 val_150 150 val_150 2008-04-08  11
> < 213   val_213 213 val_213 213 val_213 2008-04-08  11
> < 238   val_238 238 val_238 238 val_238 2008-04-08  11
> < 255   val_255 255 val_255 255 val_255 2008-04-08  11
> < 273   val_273 273 val_273 273 val_273 2008-04-08  11
> < 278   val_278 278 val_278 278 val_278 2008-04-08  11
> < 311   val_311 311 val_311 311 val_311 2008-04-08  11
> < 401   val_401 401 val_401 401 val_401 2008-04-08  11
> < 406   val_406 406 val_406 406 val_406 2008-04-08  11
> < 66val_66  66  val_66  66  val_66  2008-04-08  11
> < 98val_98  98  val_98  98  val_98  2008-04-08  11
> ---
> > 146   val_146 2008-04-08  11  146 val_146 146 val_146
> > 150   val_150 2008-04-08  11  150 val_150 150 val_150
> > 213   val_213 2008-04-08  11  213 val_213 213 val_213
> > 238   val_238 2008-04-08  11  238 val_238 238 val_238
> > 255   val_255 2008-04-08  11  255 val_255 255 val_255
> > 273   val_273 2008-04-08  11  273 val_273 273 val_273
> > 278   val_278 2008-04-08  11  278 val_278 278 val_278
> > 311   val_311 2008-04-08  11  311 val_311 311 val_311
> > 401   val_401 2008-04-08  11  401 val_401 401 val_401
> > 406   val_406 2008-04-08  11  406 val_406 406 val_406
> > 66val_66  2008-04-08  11  66  val_66  66  val_66
> > 98val_98  2008-04-08  11  98  val_98  98  val_98
> 4212c4212
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12864) StackOverflowError parsing queries with very large predicates

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098885#comment-15098885
 ] 

Hive QA commented on HIVE-12864:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782246/HIVE-12864.01.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10018 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6625/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6625/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6625/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 6 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782246 - PreCommit-HIVE-TRUNK-Build

> StackOverflowError parsing queries with very large predicates
> -
>
> Key: HIVE-12864
> URL: https://issues.apache.org/jira/browse/HIVE-12864
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12864.01.patch, HIVE-12864.patch
>
>
> We have seen that queries with very large predicates might fail with the 
> following stacktrace:
> {noformat}
> 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in 
> thread "main" java.lang.StackOverflowError
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 

[jira] [Commented] (HIVE-12864) StackOverflowError parsing queries with very large predicates

2016-01-14 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098893#comment-15098893
 ] 

Jesus Camacho Rodriguez commented on HIVE-12864:


Clean QA. [~ashutoshc]/[~jpullokkaran], could you take a look? Thanks

> StackOverflowError parsing queries with very large predicates
> -
>
> Key: HIVE-12864
> URL: https://issues.apache.org/jira/browse/HIVE-12864
> Project: Hive
>  Issue Type: Bug
>  Components: Parser
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-12864.01.patch, HIVE-12864.patch
>
>
> We have seen that queries with very large predicates might fail with the 
> following stacktrace:
> {noformat}
> 016-01-12 05:47:36,516|beaver.machine|INFO|552|5072|Thread-22|Exception in 
> thread "main" java.lang.StackOverflowError
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:145)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,517|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,519|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> org.antlr.runtime.tree.CommonTree.setUnknownTokenBoundaries(CommonTree.java:146)
> 2016-01-12 05:47:36,520|beaver.machine|INFO|552|5072|Thread-22|at 
> 

[jira] [Commented] (HIVE-12868) Fix empty operation-pool metrics

2016-01-14 Thread Jimmy Xiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098928#comment-15098928
 ] 

Jimmy Xiang commented on HIVE-12868:


This patch is ok. +1

Are you going  to have a follow up patch to fix miniHs2/test?

> Fix empty operation-pool metrics
> 
>
> Key: HIVE-12868
> URL: https://issues.apache.org/jira/browse/HIVE-12868
> Project: Hive
>  Issue Type: Sub-task
>  Components: Diagnosability
>Reporter: Szehon Ho
>Assignee: Szehon Ho
> Attachments: HIVE-12868.patch
>
>
> The newly-added operation pool metrics (thread-pool size, queue size) are 
> empty because metrics system is initialized too late.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12724:
-
Attachment: HIVE-12724.branch-1.2.patch

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng
>   last_modified_time  1450137141
>   numFiles   

[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-12724:
-
Attachment: HIVE-12724.4.patch

patch 4 is exactly the same as ADDENDUM 1, attached just to trigger the QA run.

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng
>   

[jira] [Commented] (HIVE-12777) Add capability to restore session

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098714#comment-15098714
 ] 

Hive QA commented on HIVE-12777:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782213/HIVE-12777.13.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10009 tests 
executed
*Failed tests:*
{noformat}
TestEmbeddedThriftBinaryCLIService - did not produce a TEST-*.xml file
TestHWISessionManager - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth.org.apache.hive.jdbc.authorization.TestJdbcMetadataApiAuth
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthUDFBlacklist.testBlackListedUdfUsage
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAllowedCommands
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testAuthorization1
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testBlackListedUdfUsage
org.apache.hive.jdbc.authorization.TestJdbcWithSQLAuthorization.testConfigWhiteList
org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthBinary.testAuthorization1
org.apache.hive.minikdc.TestJdbcWithMiniKdcSQLAuthHttp.testAuthorization1
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6624/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6624/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6624/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782213 - PreCommit-HIVE-TRUNK-Build

> Add capability to restore session
> -
>
> Key: HIVE-12777
> URL: https://issues.apache.org/jira/browse/HIVE-12777
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajat Khandelwal
>Assignee: Rajat Khandelwal
> Attachments: HIVE-12777.04.patch, HIVE-12777.08.patch, 
> HIVE-12777.09.patch, HIVE-12777.11.patch, HIVE-12777.12.patch, 
> HIVE-12777.13.patch
>
>
> Extensions using Hive session handles should be able to restore the hive 
> session from the handle. 
> Apache Lens depends on a fork of hive and that fork has such a capability. 
> Relevant commit: 
> https://github.com/InMobi/hive/commit/931fe9116161a18952c082c14223ad6745fefe00#diff-0acb35f7cab7492f522b0c40ce3ce1be



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12353) When Compactor fails it calls CompactionTxnHandler.markedCleaned(). it should not.

2016-01-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12353:
--
Target Version/s: 1.3.0, 2.0.0  (was: 1.3.0)

> When Compactor fails it calls CompactionTxnHandler.markedCleaned().  it 
> should not.
> ---
>
> Key: HIVE-12353
> URL: https://issues.apache.org/jira/browse/HIVE-12353
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Blocker
> Attachments: HIVE-12353.2.patch, HIVE-12353.3.patch, HIVE-12353.patch
>
>
> One of the things that this method does is delete entries from TXN_COMPONENTS 
> for partition that it was trying to compact.
> This causes Aborted transactions in TXNS to become empty according to
> CompactionTxnHandler.cleanEmptyAbortedTxns() which means they can now be 
> deleted.  
> Once they are deleted, data that belongs to these txns is deemed committed...
> We should extend COMPACTION_QUEUE state with 'f' and 's' (failed, success) 
> states.  We should also not delete then entry from markedCleaned()
> We'll have separate process that cleans 'f' and 's' records after X minutes 
> (or after > N records for a given partition exist).
> This allows SHOW COMPACTIONS to show some history info and how many times 
> compaction failed on a given partition (subject to retention interval) so 
> that we don't have to call markCleaned() on Compactor failures at the same 
> time preventing Compactor to constantly getting stuck on the same bad 
> partition/table.
> Ideally we'd want to include END_TIME field.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12828) Update Spark version to 1.6

2016-01-14 Thread Xuefu Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-12828:
---
Attachment: HIVE-12828.2-spark.patch

> Update Spark version to 1.6
> ---
>
> Key: HIVE-12828
> URL: https://issues.apache.org/jira/browse/HIVE-12828
> Project: Hive
>  Issue Type: Task
>  Components: Spark
>Reporter: Xuefu Zhang
>Assignee: Rui Li
> Attachments: HIVE-12828.1-spark.patch, HIVE-12828.2-spark.patch, 
> HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, HIVE-12828.2-spark.patch, 
> mem.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12783) fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl

2016-01-14 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098690#comment-15098690
 ] 

Sergey Shelukhin commented on HIVE-12783:
-

{quote}
The real fix is for spark to either remove their dependence on the eclipse 
version or to shroud it so that it doesn't leak through to all of their users.
{quote}
Should there be a Spark JIRA for this?

> fix the unit test failures in TestSparkClient and TestSparkSessionManagerImpl
> -
>
> Key: HIVE-12783
> URL: https://issues.apache.org/jira/browse/HIVE-12783
> Project: Hive
>  Issue Type: Sub-task
>  Components: Test
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Owen O'Malley
>Priority: Blocker
> Attachments: HIVE-12783.patch, HIVE-12783.patch, HIVE-12783.patch
>
>
> This includes
> {code}
> org.apache.hive.spark.client.TestSparkClient.testSyncRpc
> org.apache.hive.spark.client.TestSparkClient.testJobSubmission
> org.apache.hive.spark.client.TestSparkClient.testMetricsCollection
> org.apache.hive.spark.client.TestSparkClient.testCounters
> org.apache.hive.spark.client.TestSparkClient.testRemoteClient
> org.apache.hive.spark.client.TestSparkClient.testAddJarsAndFiles
> org.apache.hive.spark.client.TestSparkClient.testSimpleSparkJob
> org.apache.hive.spark.client.TestSparkClient.testErrorJob
> org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
> org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
> {code}
> all of them passed on my laptop. cc'ing [~szehon], [~xuefuz], could you 
> please take a look? Shall we ignore them? Thanks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-9862) Vectorized execution corrupts timestamp values

2016-01-14 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-9862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-9862:
---
Attachment: HIVE-9862.05.patch

> Vectorized execution corrupts timestamp values
> --
>
> Key: HIVE-9862
> URL: https://issues.apache.org/jira/browse/HIVE-9862
> Project: Hive
>  Issue Type: Bug
>  Components: Vectorization
>Affects Versions: 1.0.0
>Reporter: Nathan Howell
>Assignee: Matt McCline
> Attachments: HIVE-9862.01.patch, HIVE-9862.02.patch, 
> HIVE-9862.03.patch, HIVE-9862.04.patch, HIVE-9862.05.patch
>
>
> Timestamps in the future (year 2250?) and before ~1700 are silently corrupted 
> in vectorized execution mode. Simple repro:
> {code}
> hive> DROP TABLE IF EXISTS test;
> hive> CREATE TABLE test(ts TIMESTAMP) STORED AS ORC;
> hive> INSERT INTO TABLE test VALUES ('-12-31 23:59:59');
> hive> SET hive.vectorized.execution.enabled = false;
> hive> SELECT MAX(ts) FROM test;
> -12-31 23:59:59
> hive> SET hive.vectorized.execution.enabled = true;
> hive> SELECT MAX(ts) FROM test;
> 1816-03-30 05:56:07.066277376
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12832) RDBMS schema changes for HIVE-11388

2016-01-14 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098197#comment-15098197
 ] 

Hive QA commented on HIVE-12832:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12782181/HIVE-12832.3.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10003 tests 
executed
*Failed tests:*
{noformat}
TestHWISessionManager - did not produce a TEST-*.xml file
TestSparkCliDriver-timestamp_lazy.q-bucketsortoptimize_insert_4.q-date_udf.q-and-12-more
 - did not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_tez_union
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_char_simple
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_authorization_uri_import
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testMultiSessionMultipleUse
org.apache.hadoop.hive.ql.exec.spark.session.TestSparkSessionManagerImpl.testSingleSessionMultipleUse
org.apache.hive.jdbc.TestSSL.testSSLVersion
{noformat}

Test results: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6622/testReport
Console output: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/6622/console
Test logs: 
http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-6622/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12782181 - PreCommit-HIVE-TRUNK-Build

> RDBMS schema changes for HIVE-11388
> ---
>
> Key: HIVE-12832
> URL: https://issues.apache.org/jira/browse/HIVE-12832
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Affects Versions: 1.0.0
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-12382.patch, HIVE-12832.3.patch, 
> HIVE-12832.uber.2.patch, HIVE-12832.uber.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12270) Add DBTokenStore support to HS2 delegation token

2016-01-14 Thread Robert Kanter (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098946#comment-15098946
 ] 

Robert Kanter commented on HIVE-12270:
--

Oozie needs this to work 100% of the time with secure HS2 HA.  Otherwise, the 
Oozie server can get a delegation token from one HS2 server, but the actual 
query might run against another HS2 server, which won't recognize the HS2 
delegation token.

> Add DBTokenStore support to HS2 delegation token
> 
>
> Key: HIVE-12270
> URL: https://issues.apache.org/jira/browse/HIVE-12270
> Project: Hive
>  Issue Type: New Feature
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> DBTokenStore was initially introduced by HIVE-3255 in Hive-0.12 and it is 
> mainly for HMS delegation token. Later in Hive-0.13, the HS2 delegation token 
> support was introduced by HIVE-5155 but it used MemoryTokenStore as token 
> store. That the HIVE-9622 uses the shared RawStore (or HMSHandler) to access 
> the token/keys information in HMS DB directly from HS2 seems not the right 
> approach to support DBTokenStore in HS2. I think we should use 
> HiveMetaStoreClient in HS2 instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098967#comment-15098967
 ] 

Eugene Koifman commented on HIVE-12724:
---

+1 pending tests

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng
>   last_modified_time  

[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12724:
--
Priority: Blocker  (was: Major)

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng
>   last_modified_time  

[jira] [Updated] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-12724:
--
Affects Version/s: (was: 2.1.0)
   1.3.0

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_bywzheng

[jira] [Commented] (HIVE-12724) ACID: Major compaction fails to include the original bucket files into MR job

2016-01-14 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15098971#comment-15098971
 ] 

Eugene Koifman commented on HIVE-12724:
---

made a Blocker since this is data loss

> ACID: Major compaction fails to include the original bucket files into MR job
> -
>
> Key: HIVE-12724
> URL: https://issues.apache.org/jira/browse/HIVE-12724
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Transactions
>Affects Versions: 1.3.0, 2.0.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
>Priority: Blocker
> Attachments: HIVE-12724.1.patch, HIVE-12724.2.patch, 
> HIVE-12724.3.patch, HIVE-12724.4.patch, HIVE-12724.ADDENDUM.1.patch, 
> HIVE-12724.branch-1.2.patch, HIVE-12724.branch-1.patch
>
>
> How the problem happens:
> * Create a non-ACID table
> * Before non-ACID to ACID table conversion, we inserted row one
> * After non-ACID to ACID table conversion, we inserted row two
> * Both rows can be retrieved before MAJOR compaction
> * After MAJOR compaction, row one is lost
> {code}
> hive> USE acidtest;
> OK
> Time taken: 0.77 seconds
> hive> CREATE TABLE t1 (nationkey INT, name STRING, regionkey INT, comment 
> STRING)
> > CLUSTERED BY (regionkey) INTO 2 BUCKETS
> > STORED AS ORC;
> OK
> Time taken: 0.179 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   transient_lastDdlTime   1450137040
> # Storage Information
> SerDe Library:org.apache.hadoop.hive.ql.io.orc.OrcSerde
> InputFormat:  org.apache.hadoop.hive.ql.io.orc.OrcInputFormat
> OutputFormat: org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
> Compressed:   No
> Num Buckets:  2
> Bucket Columns:   [regionkey]
> Sort Columns: []
> Storage Desc Params:
>   serialization.format1
> Time taken: 0.198 seconds, Fetched: 28 row(s)
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db;
> Found 1 items
> drwxr-xr-x   - wzheng staff 68 2015-12-14 15:50 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> hive> INSERT INTO TABLE t1 VALUES (1, 'USA', 1, 'united states');
> WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the 
> future versions. Consider using a different execution engine (i.e. tez, 
> spark) or using Hive 1.X releases.
> Query ID = wzheng_20151214155028_630098c6-605f-4e7e-a797-6b49fb48360d
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 2
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=
> Job running in-process (local Hadoop)
> 2015-12-14 15:51:58,070 Stage-1 map = 100%,  reduce = 100%
> Ended Job = job_local73977356_0001
> Loading data to table acidtest.t1
> MapReduce Jobs Launched:
> Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
> Total MapReduce CPU Time Spent: 0 msec
> OK
> Time taken: 2.825 seconds
> hive> dfs -ls /Users/wzheng/hivetmp/warehouse/acidtest.db/t1;
> Found 2 items
> -rwxr-xr-x   1 wzheng staff112 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/00_0
> -rwxr-xr-x   1 wzheng staff472 2015-12-14 15:51 
> /Users/wzheng/hivetmp/warehouse/acidtest.db/t1/01_0
> hive> SELECT * FROM t1;
> OK
> 1 USA 1   united states
> Time taken: 0.434 seconds, Fetched: 1 row(s)
> hive> ALTER TABLE t1 SET TBLPROPERTIES ('transactional' = 'true');
> OK
> Time taken: 0.071 seconds
> hive> DESC FORMATTED t1;
> OK
> # col_namedata_type   comment
> nationkey int
> name  string
> regionkey int
> comment   string
> # Detailed Table Information
> Database: acidtest
> Owner:wzheng
> CreateTime:   Mon Dec 14 15:50:40 PST 2015
> LastAccessTime:   UNKNOWN
> Retention:0
> Location: file:/Users/wzheng/hivetmp/warehouse/acidtest.db/t1
> Table Type:   MANAGED_TABLE
> Table Parameters:
>   COLUMN_STATS_ACCURATE   false
>   last_modified_by