[jira] [Commented] (HIVE-17275) Auto-merge fails on writes of UNION ALL output to ORC file with dynamic partitioning

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119514#comment-16119514
 ] 

Hive QA commented on HIVE-17275:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880946/HIVE-17275.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6316/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6316/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6316/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880946 - PreCommit-HIVE-Build

> Auto-merge fails on writes of UNION ALL output to ORC file with dynamic 
> partitioning
> 
>
> Key: HIVE-17275
> URL: https://issues.apache.org/jira/browse/HIVE-17275
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-17275-branch-2.2.patch, HIVE-17275-branch-2.patch, 
> HIVE-17275.patch
>
>
> If dynamic partitioning is used to write the output of UNION or UNION ALL 
> queries into ORC files with hive.merge.tezfiles=true, the merge step fails as 
> follows:
> {noformat}
> 2017-08-08T11:27:19,958 ERROR [e7b1f06d-d632-408a-9dff-f7ae042cd25a main] 
> SessionState: Vertex failed, vertexName=File Merge, 
> vertexId=vertex_1502216690354_0001_33_00, diagnostics=[Task failed, 
> taskId=task_1502216690354_0001_33_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502216690354_0001_33_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> 

[jira] [Updated] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-09 Thread Vlad Gudikov (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vlad Gudikov updated HIVE-17148:

Attachment: HIVE-17148.2.patch

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-09 Thread Vlad Gudikov (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119531#comment-16119531
 ] 

Vlad Gudikov commented on HIVE-17148:
-

Uploaded new patch with fixed tests

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create table ct1 (a1 string,b1 string);
> Step 2: Create a table ct2
> create table ct2 (a2 string);
> Step 3 : Insert following data into table ct1
> insert into table ct1 (a1) values ('1');
> Step 4 : Insert following data into table ct2
> insert into table ct2 (a2) values ('1');
> Step 5 : Execute the following query 
> select * from ct1 c1, ct2 c2 where COALESCE(a1,b1)=a2;
> {code}
> ACTUAL RESULT:
> {code}
> The query returns nothing;
> {code}
> EXPECTED RESULT:
> {code}
> 1   NULL1
> {code}
> The issue seems to be because of the incorrect query plan. In the plan we can 
> see:
> predicate:(a1 is not null and b1 is not null)
> which does not look correct. As a result, it is filtering out all the rows is 
> any column mentioned in the COALESCE has null value.
> Please find the query plan below:
> {code}
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Stage-0
>   Fetch Operator
> limit:-1
> Stage-1
>   Map 1
>   File Output Operator [FS_10]
> Map Join Operator [MAPJOIN_15] (rows=1 width=4)
>   
> Conds:SEL_2.COALESCE(_col0,_col1)=RS_7._col0(Inner),HybridGraceHashJoin:true,Output:["_col0","_col1","_col2"]
> <-Map 2 [BROADCAST_EDGE]
>   BROADCAST [RS_7]
> PartitionCols:_col0
> Select Operator [SEL_5] (rows=1 width=1)
>   Output:["_col0"]
>   Filter Operator [FIL_14] (rows=1 width=1)
> predicate:a2 is not null
> TableScan [TS_3] (rows=1 width=1)
>   default@ct2,c2,Tbl:COMPLETE,Col:NONE,Output:["a2"]
> <-Select Operator [SEL_2] (rows=1 width=4)
> Output:["_col0","_col1"]
> Filter Operator [FIL_13] (rows=1 width=4)
>   predicate:(a1 is not null and b1 is not null)
>   TableScan [TS_0] (rows=1 width=4)
> default@ct1,c1,Tbl:COMPLETE,Col:NONE,Output:["a1","b1"]
> {code}
> This happens only if join is inner type, otherwise HiveJoinAddNotRule which 
> creates this problem is skipped.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17279) spark/bin/beeline throws Unknown column 'A0.IS_REWRITE_ENABLED' in 'field list'

2017-08-09 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V resolved HIVE-17279.

Resolution: Invalid

spark/bin/beeline is not part of this code base - this is a BUG in Spark 
project.

https://github.com/apache/spark/blob/master/bin/beeline

>From a quick look, the error is related to Materialized Views support in Hive 
>2.0.

> spark/bin/beeline throws Unknown column 'A0.IS_REWRITE_ENABLED' in 'field 
> list'
> ---
>
> Key: HIVE-17279
> URL: https://issues.apache.org/jira/browse/HIVE-17279
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.1.0
> Environment: spark 2.10. , hive 2.1.0
>Reporter: saurab
>
> up vote
> 0
> down vote
> favorite
> If I run this code on /spark/bin/beeline
> CREATE EXTERNAL TABLE IF NOT EXISTS foods_txt (
>   name string, 
>   type string
> ) ROW FORMAT delimited fields terminated by ','
> STORED AS TEXTFILE
> LOCATION 'hdfs://:9000/hello'
> I get
> Error: javax.jdo.JDOFatalInternalException: The initCause method cannot be 
> used. To set the cause of this exception, use a constructor with a 
> Throwable[] argument. (state=08S01,code=0)
> and /tmp//hive.log shows
> 2017-08-09T02:21:14,427  WARN [HiveServer2-Handler-Pool: Thread-39] 
> thrift.ThriftCLIService: Error executing statement:
> org.apache.hive.service.cli.HiveSQLException: Error while compiling 
> statement: FAILED: IllegalStateException Unexpected Exception thrown: Unab$
> at 
> org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380)
>  ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:206)
>  ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:290)
>  ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.cli.operation.Operation.run(Operation.java:320) 
> ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:530)
>  ~[hive-service-2.3.0.jar:2.3$
> at 
> org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:517)
>  ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:310)
>  ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:530)
>  ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1437)
>  ~[hive-exec-2.3.0.jar:2.$
> at 
> org.apache.hive.service.rpc.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1422)
>  ~[hive-exec-2.3.0.jar:2.$
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
>  ~[hive-service-2.3.0.jar:2.3.0]
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  [?:1.8.0_131]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  [?:1.8.0_131]
> at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> Caused by: java.lang.IllegalStateException: Unexpected Exception thrown: 
> Unable to fetch table foods_txt. Exception thrown when executing quer$
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeCreateTable(SemanticAnalyzer.java:12000)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genResolvedParseTree(SemanticAnalyzer.java:11001)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:4)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:286)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:258)
>  ~[hive-exec-2.3.0.jar:2.3.0]
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:511) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1316) 
> ~[hive-exec-2.3.0.jar:2.3.0]
> at 
> org.apache.hadoop.hi

[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-08-09 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17164:

Status: In Progress  (was: Patch Available)

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-08-09 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17164:

Attachment: HIVE-17164.03.patch

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch, 
> HIVE-17164.03.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-08-09 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-17164:

Status: Patch Available  (was: In Progress)

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch, 
> HIVE-17164.03.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-08-09 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16990:

Attachment: HIVE-16990.03.patch

Added 03.patch for rebasing against master.

> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, 
> HIVE-16990.03.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-08-09 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16990:

Status: Open  (was: Patch Available)

> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, 
> HIVE-16990.03.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-08-09 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-16990:

Status: Patch Available  (was: Open)

> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, 
> HIVE-16990.03.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-08-09 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-14747:
---
Attachment: HIVE-14747.05.patch

Renamed new constant string to conform to existing naming standards.

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch, 
> HIVE-14747.03.patch, HIVE-14747.04.patch, HIVE-14747.05.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work stopped] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-09 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17183 stopped by Sankar Hariappan.
---
> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-09 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Attachment: HIVE-17183.01.patch

Added 01.patch with below changes.
- Added new parameter (bootstrap.dump.state) to database object.
- Set the value to "active" when bootstrap dump in progress and "idle" if not.
- Added a test case to verify concurrent rename partition/table when bootstrap 
dump in progress.

Request [~anishek]/[~daijy]/[~thejas] to please review the patch!

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-09 Thread Sankar Hariappan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sankar Hariappan updated HIVE-17183:

Status: Patch Available  (was: Open)

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119779#comment-16119779
 ] 

ASF GitHub Bot commented on HIVE-17183:
---

GitHub user sankarh opened a pull request:

https://github.com/apache/hive/pull/225

HIVE-17183: Disable rename operations during bootstrap dump



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sankarh/hive HIVE-17183

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/hive/pull/225.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #225


commit 7ce9f8f3f5e63d25027b8a51a92d2b30caea8f17
Author: Sankar Hariappan 
Date:   2017-08-09T05:02:25Z

HIVE-17183: Disable rename operations during bootstrap dump




> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17164) Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119886#comment-16119886
 ] 

Hive QA commented on HIVE-17164:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880975/HIVE-17164.03.patch

{color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 13 failed/errored test(s), 10999 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_windowing_expressions]
 (batchId=75)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_ptf_part_simple]
 (batchId=152)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6317/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6317/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6317/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 13 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880975 - PreCommit-HIVE-Build

> Vectorization: Support PTF (Part 2: Unbounded Support-- Turn ON by default)
> ---
>
> Key: HIVE-17164
> URL: https://issues.apache.org/jira/browse/HIVE-17164
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-17164.01.patch, HIVE-17164.02.patch, 
> HIVE-17164.03.patch
>
>
> Add disk storage backing.  Turn hive.vectorized.execution.ptf.enabled on by 
> default.
> Add hive.vectorized.ptf.max.memory.buffering.batch.count to specify the 
> maximum number of vectorized row batch to buffer in memory before spilling to 
> disk.
> Add hive.vectorized.testing.reducer.batch.size parameter to have the Tez 
> Reducer make small batches for making a lot of key group batches that cause 
> memory buffering and disk storage backing.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17280) Data loss in CONCATENATE ORC created by Spark

2017-08-09 Thread Marco Gaido (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Gaido updated HIVE-17280:
---
Description: 
Hive concatenation causes data loss if the ORC files in the table were written 
by Spark.

Here are the steps to reproduce the problem:
 - create a table;
{code:java}
hive
hive> create table aa (a string, b int) stored as orc;
{code}
 - insert 2 rows using Spark;
{code:java}
spark-shell
scala> case class AA(a:String, b:Int)
scala> val df = sc.parallelize(Array(AA("b",2),AA("c",3) )).toDF
scala> df.write.insertInto("aa")
{code}
 - change table schema;
{code:java}
hive
hive> alter table aa add columns(aa string, bb int);
{code}
 - insert other 2 rows with Spark
{code:java}
spark-shell
scala> case class BB(a:String, b:Int, aa:String, bb:Int)
scala> val df = sc.parallelize(Array(BB("b",2,"b",2),BB("c",3,"c",3) )).toDF
scala> df.write.insertInto("aa")
{code}
 - at this point, running a select statement with Hive returns correctly *4 
rows* in the table; then run the concatenation
{code:java}
hive
hive> alter table aa concatenate;
{code}


At this point, a select returns only *3 rows, ie. a row is missing*.


  was:
Hive concatenation causes data loss if the ORC files in the table were written 
by Spark.

Here are the steps to reproduce the problem:
 - create a table;
{code:java}
hive
hive> create table aa (a string, b int) stored as orc;
{code}
 - insert 2 rows using Spark;
{code:java}
spark-shell
scala> case class AA(a:String, b:Int)
scala> val df = sc.parallelize(Array(AA("b",2),AA("c",3) )).toDF
scala> df.write.insertInto("aa")
{code}
 - change table schema;
{code:java}
hive
hive> alter table aa add columns(aa string, bb int);
{code}
 - insert other 2 rows with Spark
{code:java}
spark-shell
scala> case class BB(a:String, b:Int, aa:String, bb:Int)
scala> val df = sc.parallelize(Array(BB("b",2,"b",2),BB("c",3,"c",3) )).toDF
scala> df.write.insertInto("aa")
{code}
 - at this point, running a select statement with Hive returns correctly 4 rows 
in the table; then run the concatenation
{code:java}
hive
hive> alter table aa concatenate;
{code}
At this point, a select returns only* 3 rows, ie. a row is missing*.



> Data loss in CONCATENATE ORC created by Spark
> -
>
> Key: HIVE-17280
> URL: https://issues.apache.org/jira/browse/HIVE-17280
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.2.1
> Environment: Tested in HDP-2.6
>Reporter: Marco Gaido
>
> Hive concatenation causes data loss if the ORC files in the table were 
> written by Spark.
> Here are the steps to reproduce the problem:
>  - create a table;
> {code:java}
> hive
> hive> create table aa (a string, b int) stored as orc;
> {code}
>  - insert 2 rows using Spark;
> {code:java}
> spark-shell
> scala> case class AA(a:String, b:Int)
> scala> val df = sc.parallelize(Array(AA("b",2),AA("c",3) )).toDF
> scala> df.write.insertInto("aa")
> {code}
>  - change table schema;
> {code:java}
> hive
> hive> alter table aa add columns(aa string, bb int);
> {code}
>  - insert other 2 rows with Spark
> {code:java}
> spark-shell
> scala> case class BB(a:String, b:Int, aa:String, bb:Int)
> scala> val df = sc.parallelize(Array(BB("b",2,"b",2),BB("c",3,"c",3) )).toDF
> scala> df.write.insertInto("aa")
> {code}
>  - at this point, running a select statement with Hive returns correctly *4 
> rows* in the table; then run the concatenation
> {code:java}
> hive
> hive> alter table aa concatenate;
> {code}
> At this point, a select returns only *3 rows, ie. a row is missing*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17280) Data loss in CONCATENATE ORC created by Spark

2017-08-09 Thread Marco Gaido (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Gaido updated HIVE-17280:
---
Priority: Critical  (was: Major)

> Data loss in CONCATENATE ORC created by Spark
> -
>
> Key: HIVE-17280
> URL: https://issues.apache.org/jira/browse/HIVE-17280
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.2.1
> Environment: Tested in HDP-2.6
>Reporter: Marco Gaido
>Priority: Critical
>
> Hive concatenation causes data loss if the ORC files in the table were 
> written by Spark.
> Here are the steps to reproduce the problem:
>  - create a table;
> {code:java}
> hive
> hive> create table aa (a string, b int) stored as orc;
> {code}
>  - insert 2 rows using Spark;
> {code:java}
> spark-shell
> scala> case class AA(a:String, b:Int)
> scala> val df = sc.parallelize(Array(AA("b",2),AA("c",3) )).toDF
> scala> df.write.insertInto("aa")
> {code}
>  - change table schema;
> {code:java}
> hive
> hive> alter table aa add columns(aa string, bb int);
> {code}
>  - insert other 2 rows with Spark
> {code:java}
> spark-shell
> scala> case class BB(a:String, b:Int, aa:String, bb:Int)
> scala> val df = sc.parallelize(Array(BB("b",2,"b",2),BB("c",3,"c",3) )).toDF
> scala> df.write.insertInto("aa")
> {code}
>  - at this point, running a select statement with Hive returns correctly *4 
> rows* in the table; then run the concatenation
> {code:java}
> hive
> hive> alter table aa concatenate;
> {code}
> At this point, a select returns only *3 rows, ie. a row is missing*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16853) Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement

2017-08-09 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119918#comment-16119918
 ] 

BELUGA BEHR edited comment on HIVE-16853 at 8/9/17 1:48 PM:


[~csun] Also related to 
{{org.apache.hadoop.hive.ql.exec.HashTableSinkOperator}}. Mind taking a peek? 
:)  Thanks!


was (Author: belugabehr):
[~csun] Also related to 
{{org.apache.hadoop.hive.ql.exec.HashTableSinkOperator}}. Mind taking a peek? 
:)  Thanks@

> Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement
> --
>
> Key: HIVE-16853
> URL: https://issues.apache.org/jira/browse/HIVE-16853
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16853.1.patch
>
>
> # Remove custom buffer size for {{BufferedOutputStream}} and rely on JVM 
> default (usually, a larger 8192 these days)
> # Remove needless logger checks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16853) Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement

2017-08-09 Thread BELUGA BEHR (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119918#comment-16119918
 ] 

BELUGA BEHR commented on HIVE-16853:


[~csun] Also related to 
{{org.apache.hadoop.hive.ql.exec.HashTableSinkOperator}}. Mind taking a peek? 
:)  Thanks@

> Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement
> --
>
> Key: HIVE-16853
> URL: https://issues.apache.org/jira/browse/HIVE-16853
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16853.1.patch
>
>
> # Remove custom buffer size for {{BufferedOutputStream}} and rely on JVM 
> default (usually, a larger 8192 these days)
> # Remove needless logger checks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16990) REPL LOAD should update last repl ID only after successful copy of data files.

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16119978#comment-16119978
 ] 

Hive QA commented on HIVE-16990:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880981/HIVE-16990.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 11 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6318/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6318/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6318/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 11 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880981 - PreCommit-HIVE-Build

> REPL LOAD should update last repl ID only after successful copy of data files.
> --
>
> Key: HIVE-16990
> URL: https://issues.apache.org/jira/browse/HIVE-16990
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-16990.01.patch, HIVE-16990.02.patch, 
> HIVE-16990.03.patch
>
>
> For REPL LOAD operations that includes both metadata and data changes should 
> follow the below rule.
> 1. Copy the metadata excluding the last repl ID.
> 2. Copy the data files
> 3. If Step 1 and 2 are successful, then update the last repl ID of the object.
> This rule will allow the the failed events to be re-applied by REPL LOAD and 
> ensures no data loss due to failures.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17280) Data loss in CONCATENATE ORC created by Spark

2017-08-09 Thread Marco Gaido (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marco Gaido updated HIVE-17280:
---
Environment: Spark 1.6.3  (was: Tested in HDP-2.6)

> Data loss in CONCATENATE ORC created by Spark
> -
>
> Key: HIVE-17280
> URL: https://issues.apache.org/jira/browse/HIVE-17280
> Project: Hive
>  Issue Type: Bug
>  Components: Hive, Spark
>Affects Versions: 1.2.1
> Environment: Spark 1.6.3
>Reporter: Marco Gaido
>Priority: Critical
>
> Hive concatenation causes data loss if the ORC files in the table were 
> written by Spark.
> Here are the steps to reproduce the problem:
>  - create a table;
> {code:java}
> hive
> hive> create table aa (a string, b int) stored as orc;
> {code}
>  - insert 2 rows using Spark;
> {code:java}
> spark-shell
> scala> case class AA(a:String, b:Int)
> scala> val df = sc.parallelize(Array(AA("b",2),AA("c",3) )).toDF
> scala> df.write.insertInto("aa")
> {code}
>  - change table schema;
> {code:java}
> hive
> hive> alter table aa add columns(aa string, bb int);
> {code}
>  - insert other 2 rows with Spark
> {code:java}
> spark-shell
> scala> case class BB(a:String, b:Int, aa:String, bb:Int)
> scala> val df = sc.parallelize(Array(BB("b",2,"b",2),BB("c",3,"c",3) )).toDF
> scala> df.write.insertInto("aa")
> {code}
>  - at this point, running a select statement with Hive returns correctly *4 
> rows* in the table; then run the concatenation
> {code:java}
> hive
> hive> alter table aa concatenate;
> {code}
> At this point, a select returns only *3 rows, ie. a row is missing*.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120025#comment-16120025
 ] 

Hive QA commented on HIVE-14747:





{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/Hive-Build/123/testReport
Console output: https://builds.apache.org/job/Hive-Build/123/console
Test logs: http://172.31.115.19/logs/Hive-Build-123/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-08-09 07:48:00.620
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ MAVEN_OPTS='-Xmx1g -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128'
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/Hive-Build-123/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-08-09 07:48:00.623
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 16bfb9c HIVE-17191: Add InterfaceAudience and InterfaceStability 
annotations for StorageHandler APIs (Sahil Takiar, reviewd by Aihua Xu)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 16bfb9c HIVE-17191: Add InterfaceAudience and InterfaceStability 
annotations for StorageHandler APIs (Sahil Takiar, reviewd by Aihua Xu)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-08-09 07:48:01.453
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
[ERROR] Failed to execute goal on project spark-client: Could not resolve 
dependencies for project org.apache.hive:spark-client:jar:3.0.0-SNAPSHOT: 
Failed to collect dependencies for [com.esotericsoftware:kryo-shaded:jar:3.0.3 
(compile), com.google.guava:guava:jar:14.0.1 (compile), 
io.netty:netty-all:jar:4.0.29.Final (compile), 
org.apache.hive:hive-common:jar:3.0.0-SNAPSHOT (compile), 
org.apache.spark:spark-core_2.11:jar:2.0.0 (compile), junit:junit:jar:4.11 
(test), org.mockito:mockito-all:jar:1.10.19 (test), 
com.sun.jersey:jersey-servlet:jar:1.14 (test), 
org.glassfish.jersey.containers:jersey-container-servlet:jar:2.22.2 (test), 
org.apache.hive:hive-service-rpc:jar:3.0.0-SNAPSHOT (compile), 
org.slf4j:slf4j-api:jar:1.7.10 (compile)]: Failed to read artifact descriptor 
for javax.ws.rs:javax.ws.rs-api:jar:2.0.1: Could not transfer artifact 
javax.ws.rs:javax.ws.rs-api:pom:2.0.1 from/to datanucleus 
(http://www.datanucleus.org/downloads/maven2): Connect to localhost:3128 
[localhost/127.0.0.1, localhost/0:0:0:0:0:0:0:1] failed: Connection refused 
(Connection refused) -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/DependencyResolutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :spark-client
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID:  - Hive-Build

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch, 
> HIVE-1

[jira] [Commented] (HIVE-17270) Qtest results show wrong number of executors

2017-08-09 Thread Peter Vary (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120039#comment-16120039
 ] 

Peter Vary commented on HIVE-17270:
---

Ok, I think I understand now what is happening.

When running {{TestMiniSparkOnYarnCliDriver}} tests, we set the number of 
execution instances to 2 in the {{data/conf/spark/yarn-client/hive-site.xml}}:
{code}

  spark.executor.instances
  2

{code}

When running this code against a {{Hadoop23Shims.MiniSparkShim}} we create the 
cluster with this {{new MiniSparkOnYARNCluster("sparkOnYarn")}}. This means 
that the cluster is created with 1 nodeManager. So no matter what we set in the 
configuration we will have only 1 executor (SparkClientImpl.GetExecutorCountJob 
will return 1). Thus the resulting explain output will show 2*1 reducers:
{code}
   Stage: Stage-1
 Spark
   Edges:
 Reducer 2 <- Map 1 (GROUP, 2)  <-- number of reducers are 
instances * cores
 Reducer 3 <- Reducer 2 (SORT, 1)
{code}

The good news is, that the resulting output is consistent - always will show 
{{2}}.
No change is absolutely needed, but I would prefer to use consistent 
configuration - so I propose one of the followings:
# Change the {{spark.executor.instances}} to 1 in the hive-site.xml, or
# Change the cluster creation to have more nodes {{new 
MiniSparkOnYARNCluster("sparkOnYarn", 1, 2)}}

I expect these changes will not affect the test outputs at all. I personally 
prefer the 1st, since it will require less resource to run the tests, and I do 
not see the point of having more cores for every test.


Another change I propose is that when we are not able to get the executors we 
should fall back to the one provided by the configuration, so we can provide 
better experience in the first run too. So instead of this:
{code:title=SparkClientImpl}
  public ObjectPair getMemoryAndCores() throws Exception {
SparkConf sparkConf = hiveSparkClient.getSparkConf();
int numExecutors = hiveSparkClient.getExecutorCount();
// at start-up, we may be unable to get number of executors
if (numExecutors <= 0) {
  return new ObjectPair(-1L, -1);
}
[..]
  }
{code}

We should read the configuration:
{code}
  @Override
  public ObjectPair getMemoryAndCores() throws Exception {
SparkConf sparkConf = hiveSparkClient.getSparkConf();
int numExecutors = hiveSparkClient.getExecutorCount();
// at start-up, we may be unable to get number of executors, use the 
configuration values in this case
if (numExecutors <= 0) {
  numExecutors = sparkConf.getInt("spark.executor.instances", 1);
}
[..]
  }
{code}

What do you think about this [~stakiar], [~xuefuz], [~lirui]? If we agree, I 
can create the required patches / jiras etc.
And many thanks for your pointers! Those are helped me tremendously!

Peter

> Qtest results show wrong number of executors
> 
>
> Key: HIVE-17270
> URL: https://issues.apache.org/jira/browse/HIVE-17270
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> The hive-site.xml shows, that the TestMiniSparkOnYarnCliDriver uses 2 cores, 
> and 2 executor instances to run the queries. See: 
> https://github.com/apache/hive/blob/master/data/conf/spark/yarn-client/hive-site.xml#L233
> When reading the log files for the query tests, I see the following:
> {code}
> 2017-08-08T07:41:03,315  INFO [0381325d-2c8c-46fb-ab51-423defaddd84 main] 
> session.SparkSession: Spark cluster current has executors: 1, total cores: 2, 
> memory per executor: 512M, memoryFraction: 0.4
> {code}
> See: 
> http://104.198.109.242/logs/PreCommit-HIVE-Build-6299/succeeded/171-TestMiniSparkOnYarnCliDriver-insert_overwrite_directory2.q-scriptfile1.q-vector_outer_join0.q-and-17-more/logs/hive.log
> When running the tests against a real cluster, I found that running an 
> explain query for the first time I see 1 executor, but running it for the 
> second time I see 2 executors.
> Also setting some spark configuration on the cluster resets this behavior. 
> For the first time I will see 1 executor, and for the second time I will see 
> 2 executors again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17148) Incorrect result for Hive join query with COALESCE in WHERE condition

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120077#comment-16120077
 ] 

Hive QA commented on HIVE-17148:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880974/HIVE-17148.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 32 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[annotate_stats_join] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[correlated_join_keys] 
(batchId=26)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[vector_interval_mapjoin] 
(batchId=37)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[smb_mapjoin_15]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[subquery_select]
 (batchId=154)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_interval_mapjoin]
 (batchId=153)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query17] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query25] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query29] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query30] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query49] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query50] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query5] (batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query64] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query81] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query85] 
(batchId=235)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[annotate_stats_join]
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_15] 
(batchId=133)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6319/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6319/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6319/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 32 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880974 - PreCommit-HIVE-Build

> Incorrect result for Hive join query with COALESCE in WHERE condition
> -
>
> Key: HIVE-17148
> URL: https://issues.apache.org/jira/browse/HIVE-17148
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.1.1
>Reporter: Vlad Gudikov
>Assignee: Vlad Gudikov
> Attachments: HIVE-17148.1.patch, HIVE-17148.2.patch, HIVE-17148.patch
>
>
> The issue exists in Hive-2.1. In Hive-1.2 the query works fine with cbo 
> enabled:
> STEPS TO REPRODUCE:
> {code}
> Step 1: Create a table ct1
> create

[jira] [Commented] (HIVE-15728) Empty table returns result when querying partitioned field with a function

2017-08-09 Thread Jared Leable (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120106#comment-16120106
 ] 

Jared Leable commented on HIVE-15728:
-

What version did you test? It is still an issue in Hive version 1.1.0

> Empty table returns result when querying partitioned field with a function
> --
>
> Key: HIVE-15728
> URL: https://issues.apache.org/jira/browse/HIVE-15728
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Jared Leable
>Assignee: Fei Hui
>
> If a partitioned table contained data and is then truncated a query will 
> still return a result when using a function on a partitioned field.
> create table test1 (
>   field string
> )
> partitioned by(dt string);
> set hive.exec.dynamic.partition.mode=nonstrict;
> insert into test1 
> partition(dt)
> select 'a','2017-01-01';
> -- to view inserted records
> select * from test1;
> -- to delete all records from the table
> truncate table test1;
> -- to view 0 records in the table
> select * from test1;
> -- still returns a result of '2017-01-01'
> select max(dt) from test1;



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15705) Event replication for constraints

2017-08-09 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120163#comment-16120163
 ] 

Thejas M Nair commented on HIVE-15705:
--

+1

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch, HIVE-15705.5.patch, 
> HIVE-15705.6.patch, HIVE-15705.7.patch, HIVE-15705.8.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-14747) Remove JAVA paths from profiles by sending them from ptest-client

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120204#comment-16120204
 ] 

Hive QA commented on HIVE-14747:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880982/HIVE-14747.05.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 15 failed/errored test(s), 10999 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimal 
(batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimalXY 
(batchId=183)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6320/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6320/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6320/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 15 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880982 - PreCommit-HIVE-Build

> Remove JAVA paths from profiles by sending them from ptest-client
> -
>
> Key: HIVE-14747
> URL: https://issues.apache.org/jira/browse/HIVE-14747
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive, Testing Infrastructure
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-14747.01.patch, HIVE-14747.02.patch, 
> HIVE-14747.03.patch, HIVE-14747.04.patch, HIVE-14747.05.patch
>
>
> Hive ptest uses some properties files per branch that contain information 
> about how to execute the tests.
> This profile includes JAVA paths to build and execute the tests. We should 
> get rid of these by passing such information from Jenkins to the 
> ptest-server. In case a profile needs a different java version, then we can 
> create a specific Jenkins job for that.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17275) Auto-merge fails on writes of UNION ALL output to ORC file with dynamic partitioning

2017-08-09 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120270#comment-16120270
 ] 

Mithun Radhakrishnan commented on HIVE-17275:
-

Thanks for the patch, [~cdrome]. +1.

> Auto-merge fails on writes of UNION ALL output to ORC file with dynamic 
> partitioning
> 
>
> Key: HIVE-17275
> URL: https://issues.apache.org/jira/browse/HIVE-17275
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-17275-branch-2.2.patch, HIVE-17275-branch-2.patch, 
> HIVE-17275.patch
>
>
> If dynamic partitioning is used to write the output of UNION or UNION ALL 
> queries into ORC files with hive.merge.tezfiles=true, the merge step fails as 
> follows:
> {noformat}
> 2017-08-08T11:27:19,958 ERROR [e7b1f06d-d632-408a-9dff-f7ae042cd25a main] 
> SessionState: Vertex failed, vertexName=File Merge, 
> vertexId=vertex_1502216690354_0001_33_00, diagnostics=[Task failed, 
> taskId=task_1502216690354_0001_33_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502216690354_0001_33_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:225)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:216)
>   ... 16 more
> Caused by: java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse

[jira] [Commented] (HIVE-16853) Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement

2017-08-09 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120282#comment-16120282
 ] 

Chao Sun commented on HIVE-16853:
-

+1

> Minor org.apache.hadoop.hive.ql.exec.HashTableSinkOperator Improvement
> --
>
> Key: HIVE-16853
> URL: https://issues.apache.org/jira/browse/HIVE-16853
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.1, 3.0.0
>Reporter: BELUGA BEHR
>Assignee: BELUGA BEHR
>Priority: Trivial
> Attachments: HIVE-16853.1.patch
>
>
> # Remove custom buffer size for {{BufferedOutputStream}} and rely on JVM 
> default (usually, a larger 8192 these days)
> # Remove needless logger checks



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-09 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120303#comment-16120303
 ] 

Mithun Radhakrishnan commented on HIVE-8472:


Thank you, [~leftylev].

bq. If you post your username here, I'll see it and set you up for editing.
I'd be most grateful if you granted permissions for {{mit...@apache.org}}.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-8472) Add ALTER DATABASE SET LOCATION

2017-08-09 Thread Mithun Radhakrishnan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-8472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120303#comment-16120303
 ] 

Mithun Radhakrishnan edited comment on HIVE-8472 at 8/9/17 5:27 PM:


Thank you, [~leftylev].

bq. If you post your username here, I'll see it and set you up for editing.
I'd be most grateful if you'd grant permissions for {{mit...@apache.org}}.


was (Author: mithun):
Thank you, [~leftylev].

bq. If you post your username here, I'll see it and set you up for editing.
I'd be most grateful if you granted permissions for {{mit...@apache.org}}.

> Add ALTER DATABASE SET LOCATION
> ---
>
> Key: HIVE-8472
> URL: https://issues.apache.org/jira/browse/HIVE-8472
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0, 3.0.0
>Reporter: Jeremy Beard
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-8472.1.patch, HIVE-8472.3.patch
>
>
> Similarly to ALTER TABLE tablename SET LOCATION, it would be helpful if there 
> was an equivalent for databases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17270) Qtest results show wrong number of executors

2017-08-09 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120310#comment-16120310
 ] 

Xuefu Zhang commented on HIVE-17270:


Thanks for the investigation, [~pvary].

Auto determining the number of reducers seems rather complicated. Currently, we 
don't try to get memory/core when dynamic allocation is enabled because of the 
dynamic nature of the executors. While it seems reasonable to use 
{{spark.executor.instances}} as an input, the help is limited in my opinion as 
static allocation is expected to be rare in real production.

I also had some doubts on the original idea which was to automatically and 
dynamically deciding the parallelism based on available memory/cores. Maybe we 
should back to the basis, where the number of reducers is solely determined 
(statically) by the total shuffled data size (stats) divided by the 
configuration "bytes per reducer". I'm open to all proposals, including doing 
this for dynamic allocation and using {{spark.executor.instances}} for static 
allocation.


> Qtest results show wrong number of executors
> 
>
> Key: HIVE-17270
> URL: https://issues.apache.org/jira/browse/HIVE-17270
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> The hive-site.xml shows, that the TestMiniSparkOnYarnCliDriver uses 2 cores, 
> and 2 executor instances to run the queries. See: 
> https://github.com/apache/hive/blob/master/data/conf/spark/yarn-client/hive-site.xml#L233
> When reading the log files for the query tests, I see the following:
> {code}
> 2017-08-08T07:41:03,315  INFO [0381325d-2c8c-46fb-ab51-423defaddd84 main] 
> session.SparkSession: Spark cluster current has executors: 1, total cores: 2, 
> memory per executor: 512M, memoryFraction: 0.4
> {code}
> See: 
> http://104.198.109.242/logs/PreCommit-HIVE-Build-6299/succeeded/171-TestMiniSparkOnYarnCliDriver-insert_overwrite_directory2.q-scriptfile1.q-vector_outer_join0.q-and-17-more/logs/hive.log
> When running the tests against a real cluster, I found that running an 
> explain query for the first time I see 1 executor, but running it for the 
> second time I see 2 executors.
> Also setting some spark configuration on the cluster resets this behavior. 
> For the first time I will see 1 executor, and for the second time I will see 
> 2 executors again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17183) Disable rename operations during bootstrap dump

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120326#comment-16120326
 ] 

Hive QA commented on HIVE-17183:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881003/HIVE-17183.01.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6321/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6321/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6321/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881003 - PreCommit-HIVE-Build

> Disable rename operations during bootstrap dump
> ---
>
> Key: HIVE-17183
> URL: https://issues.apache.org/jira/browse/HIVE-17183
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Affects Versions: 2.1.0
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
>  Labels: DR, replication
> Fix For: 3.0.0
>
> Attachments: HIVE-17183.01.patch
>
>
> Currently, bootstrap dump shall lead to data loss when any rename happens 
> while dump in progress. This feature can be supported in next phase 
> development as it need proper design to keep track of renamed 
> tables/partitions. 
> So, for time being, we shall disable rename operations when bootstrap dump in 
> progress to avoid any inconsistent state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere reassigned HIVE-17281:
-


> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17222) Llap: Iotrace throws java.lang.UnsupportedOperationException with IncompleteCb

2017-08-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-17222:

Fix Version/s: 2.4.0

> Llap: Iotrace throws  java.lang.UnsupportedOperationException with 
> IncompleteCb
> ---
>
> Key: HIVE-17222
> URL: https://issues.apache.org/jira/browse/HIVE-17222
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17222.1.patch
>
>
> branch: hive master 
> Running Q76 at 1 TB generates the following exception.
> {noformat}
> Caused by: java.io.IOException: java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:349)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:304)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:244)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:67)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
> ... 23 more
> Caused by: java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.hive.common.io.DiskRange.getData(DiskRange.java:86)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRange(IoTrace.java:304)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRanges(IoTrace.java:291)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:328)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:426)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:250)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:247)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:247)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:96)
> ... 6 more
> {noformat}
> When {{IncompleteCb}} is encountered, it ends up throwing this error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17222) Llap: Iotrace throws java.lang.UnsupportedOperationException with IncompleteCb

2017-08-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120392#comment-16120392
 ] 

Sergey Shelukhin commented on HIVE-17222:
-

Backported to branch-2

> Llap: Iotrace throws  java.lang.UnsupportedOperationException with 
> IncompleteCb
> ---
>
> Key: HIVE-17222
> URL: https://issues.apache.org/jira/browse/HIVE-17222
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Fix For: 3.0.0, 2.4.0
>
> Attachments: HIVE-17222.1.patch
>
>
> branch: hive master 
> Running Q76 at 1 TB generates the following exception.
> {noformat}
> Caused by: java.io.IOException: java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.rethrowErrorIfAny(LlapRecordReader.java:349)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.nextCvb(LlapRecordReader.java:304)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:244)
> at 
> org.apache.hadoop.hive.llap.io.api.impl.LlapRecordReader.next(LlapRecordReader.java:67)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:360)
> ... 23 more
> Caused by: java.lang.UnsupportedOperationException
> at 
> org.apache.hadoop.hive.common.io.DiskRange.getData(DiskRange.java:86)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRange(IoTrace.java:304)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.IoTrace.logRanges(IoTrace.java:291)
> at 
> org.apache.hadoop.hive.ql.io.orc.encoded.EncodedReaderImpl.readEncodedColumns(EncodedReaderImpl.java:328)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.performDataRead(OrcEncodedDataReader.java:426)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:250)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader$4.run(OrcEncodedDataReader.java:247)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:247)
> at 
> org.apache.hadoop.hive.llap.io.encoded.OrcEncodedDataReader.callInternal(OrcEncodedDataReader.java:96)
> ... 6 more
> {noformat}
> When {{IncompleteCb}} is encountered, it ends up throwing this error.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17197) Flaky test: TestMiniSparkOnYarnCliDriver

2017-08-09 Thread PRASHANT GOLASH (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PRASHANT GOLASH reassigned HIVE-17197:
--

Assignee: PRASHANT GOLASH

> Flaky test: TestMiniSparkOnYarnCliDriver
> 
>
> Key: HIVE-17197
> URL: https://issues.apache.org/jira/browse/HIVE-17197
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Janaki Lahorani
>Assignee: PRASHANT GOLASH
>
> Error:
> Failed during createSources processLine with code=3



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-08-09 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Attachment: HIVE-11548.4.patch

Yet another patch for master.

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-08-09 Thread Mithun Radhakrishnan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mithun Radhakrishnan updated HIVE-11548:

Affects Version/s: 3.0.0
   Status: Patch Available  (was: Open)

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-15705) Event replication for constraints

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120451#comment-16120451
 ] 

Hive QA commented on HIVE-15705:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12880970/HIVE-15705.8.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6322/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6322/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6322/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12880970 - PreCommit-HIVE-Build

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch, HIVE-15705.5.patch, 
> HIVE-15705.6.patch, HIVE-15705.7.patch, HIVE-15705.8.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-15705) Event replication for constraints

2017-08-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-15705:
--
   Resolution: Fixed
 Hadoop Flags: Reviewed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Test failures are not related. Patch pushed to master. Thanks Sankar, Thejas 
for review!

> Event replication for constraints
> -
>
> Key: HIVE-15705
> URL: https://issues.apache.org/jira/browse/HIVE-15705
> Project: Hive
>  Issue Type: Sub-task
>  Components: repl
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 3.0.0
>
> Attachments: HIVE-15705.1.patch, HIVE-15705.2.patch, 
> HIVE-15705.3.patch, HIVE-15705.4.patch, HIVE-15705.5.patch, 
> HIVE-15705.6.patch, HIVE-15705.7.patch, HIVE-15705.8.patch
>
>
> Make event replication for primary key and foreign key work.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-11548) HCatLoader should support predicate pushdown.

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-11548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120555#comment-16120555
 ] 

Hive QA commented on HIVE-11548:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881059/HIVE-11548.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 11003 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=241)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=244)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=244)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=244)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=236)
org.apache.hadoop.hive.ql.io.orc.TestNewInputOutputFormat.testNewInputFormat 
(batchId=264)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.hcatalog.pig.TestAvroHCatLoader.testColumnarStorePushdown 
(batchId=183)
org.apache.hive.hcatalog.pig.TestAvroHCatLoader.testGetInputBytes (batchId=183)
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testColumnarStorePushdown 
(batchId=183)
org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testGetInputBytes (batchId=183)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testColumnarStorePushdown 
(batchId=183)
org.apache.hive.hcatalog.pig.TestParquetHCatLoader.testGetInputBytes 
(batchId=183)
org.apache.hive.hcatalog.pig.TestRCFileHCatLoader.testColumnarStorePushdown 
(batchId=183)
org.apache.hive.hcatalog.pig.TestRCFileHCatLoader.testGetInputBytes 
(batchId=183)
org.apache.hive.hcatalog.pig.TestSequenceFileHCatLoader.testColumnarStorePushdown
 (batchId=183)
org.apache.hive.hcatalog.pig.TestSequenceFileHCatLoader.testGetInputBytes 
(batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatLoader.testColumnarStorePushdown 
(batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatLoader.testGetInputBytes 
(batchId=183)
org.apache.hive.jdbc.TestJdbcWithMiniHS2.testHttpRetryOnServerIdleTimeout 
(batchId=229)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6323/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6323/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6323/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 25 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881059 - PreCommit-HIVE-Build

> HCatLoader should support predicate pushdown.
> -
>
> Key: HIVE-11548
> URL: https://issues.apache.org/jira/browse/HIVE-11548
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog
>Affects Versions: 3.0.0
>Reporter: Mithun Radhakrishnan
>Assignee: Mithun Radhakrishnan
> Attachments: HIVE-11548.1.patch, HIVE-11548.2.patch, 
> HIVE-11548.3.patch, HIVE-11548.4.patch
>
>
> When one uses {{HCatInputFormat}}/{{HCatLoader}} to read from file-formats 
> that support predicate pushdown (such as ORC, with 
> {{hive.optimize.index.filter=true}}), one sees that the predicates aren't 
> actually pushed down into the storage layer.
> The forthcoming patch should allow for filter-pushdown, if any of the 
> partitions being scanned with {{HCatLoader}} support the functionality. The 
> patch should technically allow the same for users of {{HCatInputFormat}}, but 
> I don't currently have a neat interface to build a compound 
> predicate-expression. Will add this separately, if required.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-6431) Hive table name start with underscore (_)

2017-08-09 Thread Christopher (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-6431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120628#comment-16120628
 ] 

Christopher commented on HIVE-6431:
---

Why on earth are _filenames prohibited??

> Hive table name start with underscore (_)
> -
>
> Key: HIVE-6431
> URL: https://issues.apache.org/jira/browse/HIVE-6431
> Project: Hive
>  Issue Type: Bug
>  Components: Database/Schema
>Affects Versions: 0.12.0
>Reporter: Kuek Chiew Yea
>
> When I create a hive table with table name start with underscore, I am able 
> to create successfully. The command that I am using is as the following:
> *CREATE TABLE `_testtable` AS SELECT * FROM `dimdate`;*
> Once created, I issue a command (as following) to query all records from that 
> table.
> *SELECT * FROM `_testtable`;*
> However, I got the error as the following:
> *Failed with exception 
> java.io.IOException:org.apache.hadoop.mapred.InvalidInputException: Input 
> path does not exist: hdfs://sandbox:8020/apps/hive/warehouse/_testtable*
> When I run the hdfs command to list the file in directory 
> */apps/hive/warehouse/_testtable*, I am able to get list of files in that 
> directory.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17126) Hive Metastore is incompatible with MariaDB 10.x

2017-08-09 Thread Eric Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eric Yang resolved HIVE-17126.
--
Resolution: Not A Problem

I am incline to close this issue as user error for mismatching 
mysql-connector-java driver with mariadb versions.

> Hive Metastore is incompatible with MariaDB 10.x
> 
>
> Key: HIVE-17126
> URL: https://issues.apache.org/jira/browse/HIVE-17126
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.0, 1.1.0, 2.0.0
>Reporter: Eric Yang
>
> MariaDB 10.x is commonly used for cheap RDBMS high availability.  Hive usage 
> of Datanucleus is currently preventing Hive Metastore to use MariaDB 10.x as 
> highly available metastore. Datanucleus generate SQL statements that are not 
> parsable by MariaDB 10.x when dropping Hive table or database schema.  
> Without MariaDB HA setup, the SQL statement problem also exists for metastore 
> interaction with MariaDB 10.x.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17138) FileSinkOperator doesn't create empty files for acid path

2017-08-09 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120678#comment-16120678
 ] 

Eugene Koifman commented on HIVE-17138:
---

OrcRecordUpdater also has some inconsistent logic as to when it creates an 
empty file.  For "legacy" - always - for "default" - never.

Should add a switch just like FileSinkOperator that checks engine type (and 
some other prop)

> FileSinkOperator doesn't create empty files for acid path
> -
>
> Key: HIVE-17138
> URL: https://issues.apache.org/jira/browse/HIVE-17138
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>
> For bucketed tables, FileSinkOperator is expected (in some cases)  to produce 
> a specific number of files even if they are empty.
> FileSinkOperator.closeOp(boolean abort) has logic to create files even if 
> empty.
> This doesn't property work for Acid path.  For Insert, the 
> OrcRecordUpdater(s) is set up in createBucketForFileIdx() which creates the 
> actual bucketN file (as of HIVE-14007, it does it regardless of whether 
> RecordUpdater sees any rows).  This causes empty (i.e.ORC metadata only) 
> bucket files to be created for multiFileSpray=true if a particular 
> FileSinkOperator.process() sees at least 1 row.  For example,
> {noformat}
> create table fourbuckets (a int, b int) clustered by (a) into 4 buckets 
> stored as orc TBLPROPERTIES ('transactional'='true');
> insert into fourbuckets values(0,1),(1,1);
> with mapreduce.job.reduces = 1 or 2 
> {noformat}
> For Update/Delete path, OrcRecordWriter is created lazily when the 1st row 
> that needs to land there is seen.  Thus it never creates empty buckets no 
> mater what the value of _skipFiles_ in closeOp(boolean).
> Once Split Update does the split early (in operator pipeline) only the Insert 
> path will matter since base and delta are the only files split computation, 
> etc looks at.  delete_delta is only for Acid internals so there is never any 
> reason for create empty files there.
> Also make sure to close RecordUpdaters in FileSinkOperator.abortWriters()



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16811) Estimate statistics in absence of stats

2017-08-09 Thread Vineet Garg (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120723#comment-16120723
 ] 

Vineet Garg commented on HIVE-16811:


Failures are unrelated

> Estimate statistics in absence of stats
> ---
>
> Key: HIVE-16811
> URL: https://issues.apache.org/jira/browse/HIVE-16811
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vineet Garg
>Assignee: Vineet Garg
> Attachments: HIVE-16811.1.patch, HIVE-16811.2.patch, 
> HIVE-16811.3.patch, HIVE-16811.4.patch, HIVE-16811.5.patch, 
> HIVE-16811.6.patch, HIVE-16811.7.patch
>
>
> Currently Join ordering completely bails out in absence of statistics and 
> this could lead to bad joins such as cross joins.
> e.g. following select query will produce cross join.
> {code:sql}
> create table supplier (S_SUPPKEY INT, S_NAME STRING, S_ADDRESS STRING, 
> S_NATIONKEY INT, 
> S_PHONE STRING, S_ACCTBAL DOUBLE, S_COMMENT STRING)
> CREATE TABLE lineitem (L_ORDERKEY  INT,
> L_PARTKEY   INT,
> L_SUPPKEY   INT,
> L_LINENUMBERINT,
> L_QUANTITY  DOUBLE,
> L_EXTENDEDPRICE DOUBLE,
> L_DISCOUNT  DOUBLE,
> L_TAX   DOUBLE,
> L_RETURNFLAGSTRING,
> L_LINESTATUSSTRING,
> l_shipdate  STRING,
> L_COMMITDATESTRING,
> L_RECEIPTDATE   STRING,
> L_SHIPINSTRUCT  STRING,
> L_SHIPMODE  STRING,
> L_COMMENT   STRING) partitioned by (dl 
> int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '|';
> CREATE TABLE part(
> p_partkey INT,
> p_name STRING,
> p_mfgr STRING,
> p_brand STRING,
> p_type STRING,
> p_size INT,
> p_container STRING,
> p_retailprice DOUBLE,
> p_comment STRING
> );
> explain select count(1) from part,supplier,lineitem where p_partkey = 
> l_partkey and s_suppkey = l_suppkey;
> {code}
> Estimating stats will prevent join ordering algorithm to bail out and come up 
> with join at least better than cross join 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17281:
--
Attachment: HIVE-17281.1.patch

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Jason Dere (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-17281:
--
Status: Patch Available  (was: Open)

[~rajesh.balamohan] [~sershe] Can you review?

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120769#comment-16120769
 ] 

Sergey Shelukhin commented on HIVE-17281:
-

+1 on the logic. I'm assuming no extra synchronization is necessary w.r.t. 
state checks and changes.

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-09 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-17283:
-


> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17224) Move JDO classes to standalone metastore

2017-08-09 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120787#comment-16120787
 ] 

Vihang Karajgaonkar commented on HIVE-17224:


Failures are unrelated. LGTM +1

> Move JDO classes to standalone metastore
> 
>
> Key: HIVE-17224
> URL: https://issues.apache.org/jira/browse/HIVE-17224
> Project: Hive
>  Issue Type: Sub-task
>  Components: Metastore
>Reporter: Alan Gates
>Assignee: Alan Gates
> Attachments: HIVE-17224.patch
>
>
> The JDO model classes (MDatabase, MTable, etc.) and the package.jdo file that 
> defines the DB mapping need to be moved to the standalone metastore.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-09 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17283 started by Deepak Jaiswal.
-
> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17089) make acid 2.0 the default

2017-08-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17089:
--
Attachment: HIVE-17089.07.patch

> make acid 2.0 the default
> -
>
> Key: HIVE-17089
> URL: https://issues.apache.org/jira/browse/HIVE-17089
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, 
> HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch
>
>
> acid 2.0 is introduced in HIVE-14035.  It replaces Update events with a 
> combination of Delete + Insert events.  This now makes U=D+I the default (and 
> only) supported acid table type in Hive 3.0.  
> The expectation for upgrade is that Major compaction has to be run on all 
> acid tables in the existing Hive cluster and that no new writes to these 
> table take place since the start of compaction (Need to add a mechanism to 
> put a table in read-only mode - this way it can still be read while it's 
> being compacted).  Then upgrade to Hive 3.0 can take place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-09 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17283:
--
Status: Patch Available  (was: In Progress)

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-09 Thread Deepak Jaiswal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120811#comment-16120811
 ] 

Deepak Jaiswal edited comment on HIVE-17283 at 8/9/17 11:13 PM:


Patch with config to enable parallel semijoin edges and a test.

[~gopalv] [~jdere] Can you please review?


was (Author: djaiswal):
Patch with config to enable parallel semijoin edges and a test.

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17283.1.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-09 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-17283:
--
Attachment: HIVE-17283.1.patch

Patch with config to enable parallel semijoin edges and a test.

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17283.1.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17275) Auto-merge fails on writes of UNION ALL output to ORC file with dynamic partitioning

2017-08-09 Thread Chris Drome (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Drome updated HIVE-17275:
---
Attachment: HIVE-17275.2.patch
HIVE-17275.2-branch-2.patch
HIVE-17275.2-branch-2.2.patch

Moved qfile test to minillap.

> Auto-merge fails on writes of UNION ALL output to ORC file with dynamic 
> partitioning
> 
>
> Key: HIVE-17275
> URL: https://issues.apache.org/jira/browse/HIVE-17275
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-17275.2-branch-2.2.patch, 
> HIVE-17275.2-branch-2.patch, HIVE-17275.2.patch, HIVE-17275-branch-2.2.patch, 
> HIVE-17275-branch-2.patch, HIVE-17275.patch
>
>
> If dynamic partitioning is used to write the output of UNION or UNION ALL 
> queries into ORC files with hive.merge.tezfiles=true, the merge step fails as 
> follows:
> {noformat}
> 2017-08-08T11:27:19,958 ERROR [e7b1f06d-d632-408a-9dff-f7ae042cd25a main] 
> SessionState: Vertex failed, vertexName=File Merge, 
> vertexId=vertex_1502216690354_0001_33_00, diagnostics=[Task failed, 
> taskId=task_1502216690354_0001_33_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502216690354_0001_33_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
>   at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: java.io.IOException: 
> Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:225)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.run(MergeFileRecordProcessor.java:154)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:185)
>   ... 14 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.processKeyValuePairs(OrcFileMergeOperator.java:169)
>   at 
> org.apache.hadoop.hive.ql.exec.OrcFileMergeOperator.process(OrcFileMergeOperator.java:72)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileRecordProcessor.processRow(MergeFileRecordProcessor.java:216)
>   ... 16 more
> Ca

[jira] [Commented] (HIVE-17281) LLAP external client not properly handling KILLED notification that occurs when a fragment is rejected

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120852#comment-16120852
 ] 

Hive QA commented on HIVE-17281:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881098/HIVE-17281.1.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 10973 tests 
executed
*Failed tests:*
{noformat}
TestCliDriver - did not produce a TEST-*.xml file (likely timed out) (batchId=6)

[ptf_general_queries.q,correlationoptimizer9.q,cross_join_merge.q,sample2.q,parquet_decimal.q,join1.q,decimal_join.q,bucket_if_with_path_filter.q,join32_lessSize.q,combine2.q,escape3.q,windowing_range_multiorder.q,authorization_admin_almighty1.q,cte_mat_4.q,udf_weekofyear.q,masking_disablecbo_4.q,char_pad_convert.q,groupby9.q,udaf_covar_samp.q,column_table_stats_orc.q,parquet_columnar.q,skewjoinopt18.q,colstats_all_nulls.q,groupby_duplicate_key.q,pointlookup3.q,orc_remove_cols.q,udf_classloader.q,subq2.q,ctas.q,setop_subq.q]
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6324/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6324/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6324/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881098 - PreCommit-HIVE-Build

> LLAP external client not properly handling KILLED notification that occurs 
> when a fragment is rejected
> --
>
> Key: HIVE-17281
> URL: https://issues.apache.org/jira/browse/HIVE-17281
> Project: Hive
>  Issue Type: Bug
>  Components: llap
>Reporter: Jason Dere
>Assignee: Jason Dere
> Attachments: HIVE-17281.1.patch
>
>
> When LLAP fragment submission is rejected, the external client receives both 
> REJECTED and KILLED notifications for the fragment. The KILLED notification 
> is being treated as an error, which prevents the retry logic from 
> resubmitting the fragment. This needs to be fixed in the client logic.
> {noformat}
> 17/08/02 04:36:16 INFO LlapBaseInputFormat: Registered id: 
> attempt_2519876382789748565_0005_0_00_21_0
> 17/08/02 04:36:16 INFO LlapTaskUmbilicalExternalClient: Fragment: 
> attempt_2519876382789748565_0005_0_00_21_0 rejected. Server Busy.
> 17/08/02 04:36:16 ERROR LlapTaskUmbilicalExternalClient: Task killed - 
> attempt_2519876382789748565_0005_0_00_21_0
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17089) make acid 2.0 the default

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120900#comment-16120900
 ] 

Hive QA commented on HIVE-17089:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881100/HIVE-17089.07.patch

{color:green}SUCCESS:{color} +1 due to 12 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 10963 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=74)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[dynamic_semijoin_reduction_3]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6325/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6325/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6325/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881100 - PreCommit-HIVE-Build

> make acid 2.0 the default
> -
>
> Key: HIVE-17089
> URL: https://issues.apache.org/jira/browse/HIVE-17089
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, 
> HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch
>
>
> acid 2.0 is introduced in HIVE-14035.  It replaces Update events with a 
> combination of Delete + Insert events.  This now makes U=D+I the default (and 
> only) supported acid table type in Hive 3.0.  
> The expectation for upgrade is that Major compaction has to be run on all 
> acid tables in the existing Hive cluster and that no new writes to these 
> table take place since the start of compaction (Need to add a mechanism to 
> put a table in read-only mode - this way it can still be read while it's 
> being compacted).  Then upgrade to Hive 3.0 can take place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17089) make acid 2.0 the default

2017-08-09 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120909#comment-16120909
 ] 

Eugene Koifman commented on HIVE-17089:
---

TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1] and 
TestPerfCliDriver.testCliDriver[query14] failures appear intermittently in 
other runs.  check others

> make acid 2.0 the default
> -
>
> Key: HIVE-17089
> URL: https://issues.apache.org/jira/browse/HIVE-17089
> Project: Hive
>  Issue Type: New Feature
>  Components: Transactions
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-17089.01.patch, HIVE-17089.03.patch, 
> HIVE-17089.05.patch, HIVE-17089.06.patch, HIVE-17089.07.patch
>
>
> acid 2.0 is introduced in HIVE-14035.  It replaces Update events with a 
> combination of Delete + Insert events.  This now makes U=D+I the default (and 
> only) supported acid table type in Hive 3.0.  
> The expectation for upgrade is that Major compaction has to be run on all 
> acid tables in the existing Hive cluster and that no new writes to these 
> table take place since the start of compaction (Need to add a mechanism to 
> put a table in read-only mode - this way it can still be read while it's 
> being compacted).  Then upgrade to Hive 3.0 can take place.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120910#comment-16120910
 ] 

Gopal V commented on HIVE-17283:


Left 2 doc comments on the RB - LGTM - +1 tests pending.

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17283.1.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17284) remove OrcRecordUpdater.deleteEventIndexBuilder

2017-08-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-17284:
-


> remove OrcRecordUpdater.deleteEventIndexBuilder
> ---
>
> Key: HIVE-17284
> URL: https://issues.apache.org/jira/browse/HIVE-17284
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> There is no point in it.  We know how many rows a delete_delta file has from 
> ORC and they are all the same type - so no need for AcidStats.
> hive.acid.key.index has no value since delete_delta files are never split and 
> are not likely to have more than 1 stripe since they are very small.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17284) remove OrcRecordUpdater.deleteEventIndexBuilder

2017-08-09 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-17284:
--
Description: 
There is no point in it.  We know how many rows a delete_delta file has from 
ORC and they are all the same type - so no need for AcidStats.
hive.acid.key.index has no value since delete_delta files are never split and 
are not likely to have more than 1 stripe since they are very small.

Also can remove KeyIndexBuilder.acidStats - we only have 1 type of event per 
file

  was:
There is no point in it.  We know how many rows a delete_delta file has from 
ORC and they are all the same type - so no need for AcidStats.
hive.acid.key.index has no value since delete_delta files are never split and 
are not likely to have more than 1 stripe since they are very small.


> remove OrcRecordUpdater.deleteEventIndexBuilder
> ---
>
> Key: HIVE-17284
> URL: https://issues.apache.org/jira/browse/HIVE-17284
> Project: Hive
>  Issue Type: Improvement
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
>
> There is no point in it.  We know how many rows a delete_delta file has from 
> ORC and they are all the same type - so no need for AcidStats.
> hive.acid.key.index has no value since delete_delta files are never split and 
> are not likely to have more than 1 stripe since they are very small.
> Also can remove KeyIndexBuilder.acidStats - we only have 1 type of event per 
> file



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17285) Fixes for bit vector retrievals and merging

2017-08-09 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17285:

Attachment: HIVE-17285.patch

> Fixes for bit vector retrievals and merging
> ---
>
> Key: HIVE-17285
> URL: https://issues.apache.org/jira/browse/HIVE-17285
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-17285.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17285) Fixes for bit vector retrievals and merging

2017-08-09 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-17285:

Status: Patch Available  (was: Open)

> Fixes for bit vector retrievals and merging
> ---
>
> Key: HIVE-17285
> URL: https://issues.apache.org/jira/browse/HIVE-17285
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-17285.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-09 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez reassigned HIVE-17286:
--


> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-09 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-17286 started by Jesus Camacho Rodriguez.
--
> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17286.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-09 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17286:
---
Attachment: HIVE-17286.patch

> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17286.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-09 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-17286:
---
Status: Patch Available  (was: In Progress)

> Avoid expensive String serialization/deserialization for bitvectors
> ---
>
> Key: HIVE-17286
> URL: https://issues.apache.org/jira/browse/HIVE-17286
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
> Attachments: HIVE-17286.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17283) Enable parallel edges of semijoin along with mapjoins

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120939#comment-16120939
 ] 

Hive QA commented on HIVE-17283:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881101/HIVE-17283.1.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[create_merge_compressed]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[insert_overwrite_local_directory_1]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[unionDistinct_1] 
(batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[columnstats_part_coltype]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6326/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6326/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6326/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881101 - PreCommit-HIVE-Build

> Enable parallel edges of semijoin along with mapjoins
> -
>
> Key: HIVE-17283
> URL: https://issues.apache.org/jira/browse/HIVE-17283
> Project: Hive
>  Issue Type: Bug
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-17283.1.patch
>
>
> https://issues.apache.org/jira/browse/HIVE-16260 removes parallel edges of 
> semijoin with mapjoin. However, in some cases it maybe beneficial to have it.
> We need a config which can enable it.
> The default should be false which maintains the existing behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-16945) Add method to compare Operators

2017-08-09 Thread Rui Li (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rui Li updated HIVE-16945:
--
Attachment: HIVE-16945.3.patch

[~jcamachorodriguez], nice catch. Update to address the comment.

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Rui Li
> Attachments: HIVE-16945.1.patch, HIVE-16945.2.patch, 
> HIVE-16945.3.patch
>
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16945) Add method to compare Operators

2017-08-09 Thread Jesus Camacho Rodriguez (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120956#comment-16120956
 ] 

Jesus Camacho Rodriguez commented on HIVE-16945:


+1

> Add method to compare Operators 
> 
>
> Key: HIVE-16945
> URL: https://issues.apache.org/jira/browse/HIVE-16945
> Project: Hive
>  Issue Type: Improvement
>  Components: Operators
>Affects Versions: 3.0.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Rui Li
> Attachments: HIVE-16945.1.patch, HIVE-16945.2.patch, 
> HIVE-16945.3.patch
>
>
> HIVE-10844 introduced a comparator factory class for operators that 
> encapsulates all the logic to assess whether two operators are equal:
> https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/OperatorComparatorFactory.java
> The current design might create problems as any change in fields of operators 
> will break the comparators. It would be better to do this via inheritance 
> from Operator base class, by adding a {{logicalEquals(Operator other)}} 
> method.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17240) Function ACOS(2) and ASIN(2) should be null

2017-08-09 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-17240:
---
Summary: Function ACOS(2) and ASIN(2) should be null  (was: function 
ACOS(2) and ASIN(2) should be null)

> Function ACOS(2) and ASIN(2) should be null
> ---
>
> Key: HIVE-17240
> URL: https://issues.apache.org/jira/browse/HIVE-17240
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 1.2.2, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-17240.1.patch, HIVE-17240.2.patch, 
> HIVE-17240.3.patch, HIVE-17240.4.patch, HIVE-17240.5.patch
>
>
> {{acos(2)}} should be null, same as MySQL:
> {code:sql}
> hive> desc function extended acos;
> OK
> acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise
> Example:
>   > SELECT acos(1) FROM src LIMIT 1;
>   0
>   > SELECT acos(2) FROM src LIMIT 1;
>   NULL
> Time taken: 0.009 seconds, Fetched: 6 row(s)
> hive> select acos(2);
> OK
> NaN
> Time taken: 0.437 seconds, Fetched: 1 row(s)
> {code}
> {code:sql}
> mysql>  select acos(2);
> +-+
> | acos(2) |
> +-+
> |NULL |
> +-+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17240) Function ACOS(2) and ASIN(2) should be null

2017-08-09 Thread Yuming Wang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated HIVE-17240:
---
Attachment: HIVE-17240.6.patch

> Function ACOS(2) and ASIN(2) should be null
> ---
>
> Key: HIVE-17240
> URL: https://issues.apache.org/jira/browse/HIVE-17240
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Affects Versions: 1.1.1, 1.2.2, 2.2.0
>Reporter: Yuming Wang
>Assignee: Yuming Wang
> Attachments: HIVE-17240.1.patch, HIVE-17240.2.patch, 
> HIVE-17240.3.patch, HIVE-17240.4.patch, HIVE-17240.5.patch, HIVE-17240.6.patch
>
>
> {{acos(2)}} should be null, same as MySQL:
> {code:sql}
> hive> desc function extended acos;
> OK
> acos(x) - returns the arc cosine of x if -1<=x<=1 or NULL otherwise
> Example:
>   > SELECT acos(1) FROM src LIMIT 1;
>   0
>   > SELECT acos(2) FROM src LIMIT 1;
>   NULL
> Time taken: 0.009 seconds, Fetched: 6 row(s)
> hive> select acos(2);
> OK
> NaN
> Time taken: 0.437 seconds, Fetched: 1 row(s)
> {code}
> {code:sql}
> mysql>  select acos(2);
> +-+
> | acos(2) |
> +-+
> |NULL |
> +-+
> 1 row in set (0.00 sec)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17275) Auto-merge fails on writes of UNION ALL output to ORC file with dynamic partitioning

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120978#comment-16120978
 ] 

Hive QA commented on HIVE-17275:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881108/HIVE-17275.2.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 11001 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_queries]
 (batchId=231)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[union_fast_stats]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_2] 
(batchId=100)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=235)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6327/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6327/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6327/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 12 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881108 - PreCommit-HIVE-Build

> Auto-merge fails on writes of UNION ALL output to ORC file with dynamic 
> partitioning
> 
>
> Key: HIVE-17275
> URL: https://issues.apache.org/jira/browse/HIVE-17275
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 2.2.0
>Reporter: Chris Drome
>Assignee: Chris Drome
> Attachments: HIVE-17275.2-branch-2.2.patch, 
> HIVE-17275.2-branch-2.patch, HIVE-17275.2.patch, HIVE-17275-branch-2.2.patch, 
> HIVE-17275-branch-2.patch, HIVE-17275.patch
>
>
> If dynamic partitioning is used to write the output of UNION or UNION ALL 
> queries into ORC files with hive.merge.tezfiles=true, the merge step fails as 
> follows:
> {noformat}
> 2017-08-08T11:27:19,958 ERROR [e7b1f06d-d632-408a-9dff-f7ae042cd25a main] 
> SessionState: Vertex failed, vertexName=File Merge, 
> vertexId=vertex_1502216690354_0001_33_00, diagnostics=[Task failed, 
> taskId=task_1502216690354_0001_33_00_00, diagnostics=[TaskAttempt 0 
> failed, info=[Error: Error while running task ( failure ) : 
> attempt_1502216690354_0001_33_00_00_0:java.lang.RuntimeException: 
> java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.io.IOException: Multiple partitions for one merge mapper: 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/1
>  NOT EQUAL TO 
> hdfs://localhost:39943/build/ql/test/data/warehouse/partunion1/.hive-staging_hive_2017-08-08_11-27-09_105_286405133968521828-1/-ext-10002/part1=2014/2
>   at 
> org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:211)
>   at 
> org.apache.hadoop.hive.ql.exec.tez.MergeFileTezProcessor.run(MergeFileTezProcessor.java:42)
>   at 
> org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
>   at 
> org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:422)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>   at 
> org.apache.tez.runtime.ta

[jira] [Commented] (HIVE-17285) Fixes for bit vector retrievals and merging

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120982#comment-16120982
 ] 

Gopal V commented on HIVE-17285:


[~ashutoshc]: the new String(new byte[]) needs Base64 to work.

To fix the metastore properly, we need to go fix 

https://github.com/apache/hive/blob/master/standalone-metastore/src/main/thrift/hive_metastore.thrift#L387

and make it {{optional bytes bitVectors}} instead of {{optional String}}.

> Fixes for bit vector retrievals and merging
> ---
>
> Key: HIVE-17285
> URL: https://issues.apache.org/jira/browse/HIVE-17285
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-17285.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17285) Fixes for bit vector retrievals and merging

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120982#comment-16120982
 ] 

Gopal V edited comment on HIVE-17285 at 8/10/17 3:11 AM:
-

[~ashutoshc]: the new String(new byte[]) needs Base64 to work, which takes up 
extra memory for no reason.

To fix the metastore properly, we need to go fix 

https://github.com/apache/hive/blob/master/standalone-metastore/src/main/thrift/hive_metastore.thrift#L387

and make it {{optional bytes bitVectors}} instead of {{optional String}}.


was (Author: gopalv):
[~ashutoshc]: the new String(new byte[]) needs Base64 to work.

To fix the metastore properly, we need to go fix 

https://github.com/apache/hive/blob/master/standalone-metastore/src/main/thrift/hive_metastore.thrift#L387

and make it {{optional bytes bitVectors}} instead of {{optional String}}.

> Fixes for bit vector retrievals and merging
> ---
>
> Key: HIVE-17285
> URL: https://issues.apache.org/jira/browse/HIVE-17285
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-17285.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17285) Fixes for bit vector retrievals and merging

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120988#comment-16120988
 ] 

Gopal V commented on HIVE-17285:


Heh, [~jcamachorodriguez] beat me to the punch - HIVE-17286 already changed it.

{code}
-5: optional string bitVectors
+5: optional binary bitVectors
{code}

> Fixes for bit vector retrievals and merging
> ---
>
> Key: HIVE-17285
> URL: https://issues.apache.org/jira/browse/HIVE-17285
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-17285.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Junjie Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen reassigned HIVE-17261:
--

Assignee: Junjie Chen

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Junjie Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen updated HIVE-17261:
---
Target Version/s: 2.3.0
  Status: Patch Available  (was: Open)

--- 
a/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
@@ -131,15 +131,14 @@ protected ParquetInputSplit getSplit(
 filtedBlocks = splitGroup;
   }

+
   split = new ParquetInputSplit(finalPath,
-splitStart,
-splitLength,
-oldSplit.getLocations(),
-filtedBlocks,
-readContext.getRequestedSchema().toString(),
-fileMetaData.getSchema().toString(),
-fileMetaData.getKeyValueMetaData(),
-readContext.getReadSupportMetadata());
+  splitStart,
+  splitStart + splitLength,
+  splitLength,
+  oldSplit.getLocations(),
+  null);
+
   return split;
 } else {
   throw new IllegalArgumentException("Unknown split type: " + oldSplit);


> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Junjie Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen updated HIVE-17261:
---
Attachment: HIVE-17261.diff

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Attachments: HIVE-17261.diff
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121004#comment-16121004
 ] 

Gopal V commented on HIVE-17287:


[~ liyunzhang_intel]: I suspect HoS is loading each partition as an independent 
RDD, which removes the effect of 
SemanticAnalyzer::genNotNullFilterForJoinSourcePlan()?

As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Junjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120996#comment-16120996
 ] 

Junjie Chen edited comment on HIVE-17261 at 8/10/17 3:42 AM:
-

Just update one function for parquet, so no unit test.


was (Author: junjie):
--- 
a/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
+++ 
b/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java
@@ -131,15 +131,14 @@ protected ParquetInputSplit getSplit(
 filtedBlocks = splitGroup;
   }

+
   split = new ParquetInputSplit(finalPath,
-splitStart,
-splitLength,
-oldSplit.getLocations(),
-filtedBlocks,
-readContext.getRequestedSchema().toString(),
-fileMetaData.getSchema().toString(),
-fileMetaData.getKeyValueMetaData(),
-readContext.getReadSupportMetadata());
+  splitStart,
+  splitStart + splitLength,
+  splitLength,
+  oldSplit.getLocations(),
+  null);
+
   return split;
 } else {
   throw new IllegalArgumentException("Unknown split type: " + oldSplit);


> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Attachments: HIVE-17261.diff
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121004#comment-16121004
 ] 

Gopal V edited comment on HIVE-17287 at 8/10/17 3:42 AM:
-

[~liyunzhang_intel]: I suspect HoS is loading each partition as an independent 
RDD, which removes the effect of 
SemanticAnalyzer::genNotNullFilterForJoinSourcePlan()?

As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.


was (Author: gopalv):
[~ liyunzhang_intel]: I suspect HoS is loading each partition as an independent 
RDD, which removes the effect of 
SemanticAnalyzer::genNotNullFilterForJoinSourcePlan()?

As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Junjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121006#comment-16121006
 ] 

Junjie Chen commented on HIVE-17261:


Hi [~kellyzly]
Could you please have a look?

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Attachments: HIVE-17261.diff
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread hsj (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121018#comment-16121018
 ] 

hsj commented on HIVE-17287:


[~gopalv]:  thanks for your comments
{quote}
As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.

{quote}
the default_partition stores the data(25.7G) which ss_sold_date_sk is null. In 
MR/Tez, these part of data will not be load because there is no match data in 
join?


> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121021#comment-16121021
 ] 

Gopal V commented on HIVE-17287:


bq. these part of data will not be load because there is no match data in join?

Yes.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17270) Qtest results show wrong number of executors

2017-08-09 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121020#comment-16121020
 ] 

Rui Li commented on HIVE-17270:
---

Hi [~pvary], I don't understand why 1 NM means we can only have 1 executor. 
IIUC, NM can allocate executor/container if there's enough memory. Each 
executor requires 512MB mem, and the AM take another 512MB. So we use 1536MB in 
total. We start the NM with [2048MB total 
mem|https://github.com/apache/hive/blob/master/shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L493],
 which should be able to satisfy all the requests.

> Qtest results show wrong number of executors
> 
>
> Key: HIVE-17270
> URL: https://issues.apache.org/jira/browse/HIVE-17270
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>
> The hive-site.xml shows, that the TestMiniSparkOnYarnCliDriver uses 2 cores, 
> and 2 executor instances to run the queries. See: 
> https://github.com/apache/hive/blob/master/data/conf/spark/yarn-client/hive-site.xml#L233
> When reading the log files for the query tests, I see the following:
> {code}
> 2017-08-08T07:41:03,315  INFO [0381325d-2c8c-46fb-ab51-423defaddd84 main] 
> session.SparkSession: Spark cluster current has executors: 1, total cores: 2, 
> memory per executor: 512M, memoryFraction: 0.4
> {code}
> See: 
> http://104.198.109.242/logs/PreCommit-HIVE-Build-6299/succeeded/171-TestMiniSparkOnYarnCliDriver-insert_overwrite_directory2.q-scriptfile1.q-vector_outer_join0.q-and-17-more/logs/hive.log
> When running the tests against a real cluster, I found that running an 
> explain query for the first time I see 1 executor, but running it for the 
> second time I see 2 executors.
> Also setting some spark configuration on the cluster resets this behavior. 
> For the first time I will see 1 executor, and for the second time I will see 
> 2 executors again.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread hsj (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

hsj updated HIVE-17287:
---
Comment: was deleted

(was: [~gopalv]:  thanks for your comments
{quote}
As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.

{quote}
the default_partition stores the data(25.7G) which ss_sold_date_sk is null. In 
MR/Tez, these part of data will not be load because there is no match data in 
join?
)

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121023#comment-16121023
 ] 

liyunzhang_intel commented on HIVE-17287:
-

[~gopalv]:  thanks for your comments
{quote}
As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.

{quote}
the default_partition stores the data(25.7G) which ss_sold_date_sk is null. In 
MR/Tez, these part of data will not be load because there is no match data in 
join? 

{quote}
I suspect HoS is loading each partition as an independent RDD, which removes 
the effect of SemanticAnalyzer::genNotNullFilterForJoinSourcePlan()?
{quote}
I also see the filter to filter null data in the explain, although hos load 
partition as an independent RDD, i think the filter should work in theory. 
part of explain of query67
{code}
 Map 1 
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: (ss_store_sk is not null and ss_item_sk is not 
null) (type: boolean)
  Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
  Filter Operator
predicate: (ss_store_sk is not null and ss_item_sk is not 
null) (type: boolean)
Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
Select Operator
  expressions: ss_item_sk (type: bigint), ss_store_sk 
(type: bigint), ss_quantity (type: int), ss_sales_price (type: double), 
ss_sold_date_sk (type: bigint) 
  outputColumnNames: _col0, _col1, _col2, _col3, _col4
  Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
  Map Join Operator
{code}

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121023#comment-16121023
 ] 

liyunzhang_intel edited comment on HIVE-17287 at 8/10/17 4:02 AM:
--

[~gopalv]:  thanks for your comments
{quote}
As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.

{quote}
the default_partition stores the data(25.7G) which ss_sold_date_sk is null. In 
MR/Tez, these part of data will not be load because there is no match data in 
join? 

{quote}
I suspect HoS is loading each partition as an independent RDD, which removes 
the effect of SemanticAnalyzer::genNotNullFilterForJoinSourcePlan()?
{quote}
I also see the filter to filter null data in the explain, although HOS load 
partition as an independent RDD, i think the filter should work after loading 
data in theory.
part of explain of query67
{code}
 Map 1 
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: (ss_store_sk is not null and ss_item_sk is not 
null) (type: boolean)
  Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
  Filter Operator
predicate: (ss_store_sk is not null and ss_item_sk is not 
null) (type: boolean)
Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
Select Operator
  expressions: ss_item_sk (type: bigint), ss_store_sk 
(type: bigint), ss_quantity (type: int), ss_sales_price (type: double), 
ss_sold_date_sk (type: bigint) 
  outputColumnNames: _col0, _col1, _col2, _col3, _col4
  Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
  Map Join Operator
{code}


was (Author: kellyzly):
[~gopalv]:  thanks for your comments
{quote}
As little as I know about Hive-on-Spark, Query67 does not read any row from the 
default partition in MR or Tez.

{quote}
the default_partition stores the data(25.7G) which ss_sold_date_sk is null. In 
MR/Tez, these part of data will not be load because there is no match data in 
join? 

{quote}
I suspect HoS is loading each partition as an independent RDD, which removes 
the effect of SemanticAnalyzer::genNotNullFilterForJoinSourcePlan()?
{quote}
I also see the filter to filter null data in the explain, although hos load 
partition as an independent RDD, i think the filter should work in theory. 
part of explain of query67
{code}
 Map 1 
Map Operator Tree:
TableScan
  alias: store_sales
  filterExpr: (ss_store_sk is not null and ss_item_sk is not 
null) (type: boolean)
  Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
  Filter Operator
predicate: (ss_store_sk is not null and ss_item_sk is not 
null) (type: boolean)
Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
Select Operator
  expressions: ss_item_sk (type: bigint), ss_store_sk 
(type: bigint), ss_quantity (type: int), ss_sales_price (type: double), 
ss_sold_date_sk (type: bigint) 
  outputColumnNames: _col0, _col1, _col2, _col3, _col4
  Statistics: Num rows: 8251124389 Data size: 181524736558 
Basic stats: COMPLETE Column stats: NONE
  Map Join Operator
{code}

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_

[jira] [Commented] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121026#comment-16121026
 ] 

Gopal V commented on HIVE-17287:


{code}
store_sales.ss_sold_date_sk=date_dim.d_date_sk
{code}

Implies ss_sold_date_sk is not null.

There's another optimizer in Hive called PCR, which removes any Filter which is 
no-op.

So there's some possibility that "explain extended " won't list the 
default partition at all.

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17285) Fixes for bit vector retrievals and merging

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121034#comment-16121034
 ] 

Hive QA commented on HIVE-17285:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881117/HIVE-17285.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainuser_3] 
(batchId=99)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=235)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testStatsFastTrivial 
(batchId=206)
org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore.testStatsFastTrivial 
(batchId=208)
org.apache.hadoop.hive.metastore.TestSetUGIOnBothClientServer.testStatsFastTrivial
 (batchId=205)
org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyClient.testStatsFastTrivial 
(batchId=203)
org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer.testStatsFastTrivial 
(batchId=213)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testPartitionSpecRegistrationWithCustomSchema
 (batchId=180)
org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation 
(batchId=180)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteDecimal 
(batchId=183)
org.apache.hive.hcatalog.pig.TestTextFileHCatStorer.testWriteTinyint 
(batchId=183)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/6328/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/6328/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-6328/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12881117 - PreCommit-HIVE-Build

> Fixes for bit vector retrievals and merging
> ---
>
> Key: HIVE-17285
> URL: https://issues.apache.org/jira/browse/HIVE-17285
> Project: Hive
>  Issue Type: Bug
>Reporter: Ashutosh Chauhan
> Attachments: HIVE-17285.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-17287) HoS can not deal with skewed data group by

2017-08-09 Thread liyunzhang_intel (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liyunzhang_intel reassigned HIVE-17287:
---

Assignee: liyunzhang_intel

> HoS can not deal with skewed data group by
> --
>
> Key: HIVE-17287
> URL: https://issues.apache.org/jira/browse/HIVE-17287
> Project: Hive
>  Issue Type: Bug
>Reporter: liyunzhang_intel
>Assignee: liyunzhang_intel
>
> In 
> [tpcds/query67.sql|https://github.com/kellyzly/hive-testbench/blob/hive14/sample-queries-tpcds/query67.sql],
>  fact table {{store_sales}} joins with small tables {{date_dim}}, 
> {{item}},{{store}}. After join, groupby the intermediate data.
> Here the data of {{store_sales}} on 3TB tpcds is skewed:  there are 1824 
> partitions. The biggest partition is 25.7G and others are 715M.
> {code}
> hadoop fs -du -h 
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales
> 
> 715.0 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452639
> 713.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452640
> 714.1 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452641
> 712.9 M  
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=2452642
> 25.7 G   
> /user/hive/warehouse/tpcds_bin_partitioned_parquet_3000.db/store_sales/ss_sold_date_sk=__HIVE_DEFAULT_PARTITION__
> {code}
> The skewed table {{store_sales}} caused the failed job. Is there any way to 
> solve the groupby problem of skewed table?  I tried to enable 
> {{hive.groupby.skewindata}} to first divide the data more evenly then start 
> do group by. But the job still hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread liyunzhang_intel (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121062#comment-16121062
 ] 

liyunzhang_intel commented on HIVE-17261:
-

[~junjie]:  GTM  from my side. 
[~ferd] and [~csun]: can you help review as you have more knowledge on it.

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Attachments: HIVE-17261.diff
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17286) Avoid expensive String serialization/deserialization for bitvectors

2017-08-09 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121063#comment-16121063
 ] 

Hive QA commented on HIVE-17286:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12881120/HIVE-17286.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 54 failed/errored test(s), 11000 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[materialized_view_create_rewrite]
 (batchId=240)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_move]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_merge_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestBlobstoreCliDriver.testCliDriver[insert_overwrite_dynamic_partitions_move_only]
 (batchId=243)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_5] 
(batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_6] 
(batchId=63)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_7] 
(batchId=61)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_8] 
(batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_9] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_udf1] (batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[column_pruner_multiple_children]
 (batchId=21)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl] 
(batchId=34)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_partlvl_dp] 
(batchId=49)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_quoting] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[columnstats_tbllvl] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_date] 
(batchId=43)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_decimal] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_double] 
(batchId=14)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_empty_table]
 (batchId=50)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_long] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[compute_stats_string] 
(batchId=4)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constant_prop_2] 
(batchId=27)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[display_colstats_tbllvl] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[exec_parallel_column_stats]
 (batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[parallel_colstats] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[partial_column_stats] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[reduceSinkDeDuplication_pRS_key_empty]
 (batchId=44)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[temp_table_display_colstats_tbllvl]
 (batchId=74)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_stats] 
(batchId=142)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[parallel_colstats]
 (batchId=143)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[column_table_stats]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[column_table_stats_orc]
 (batchId=146)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[varchar_udf1]
 (batchId=149)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_udf1]
 (batchId=157)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge3]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[orc_merge_diff_fs]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[quotedid_smb]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_combine_equivalent_work]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_dynamic_partition_pruning_mapjoin_only]
 (batchId=170)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_use_op_stats]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[spark_vectorized_dynamic_partition_pruning]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[truncate_c

[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121073#comment-16121073
 ] 

Ferdinand Xu commented on HIVE-17261:
-

Can you rename the patch to HIVE-17261.patch? I see the new APIs doesn't 
require filtedBlocks as its parameter. So Parquet can handle the search 
argument in its side?

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Attachments: HIVE-17261.diff
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Junjie Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Junjie Chen updated HIVE-17261:
---
Attachment: HIVE-17261.patch

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Attachments: HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17261) Hive use deprecated ParquetInputSplit constructor which blocked parquet dictionary filter

2017-08-09 Thread Junjie Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16121082#comment-16121082
 ] 

Junjie Chen commented on HIVE-17261:


Hive convert search argument to FilterPredicate and push down to Parquet. 
Please see here: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L151

> Hive use deprecated ParquetInputSplit constructor which blocked parquet 
> dictionary filter
> -
>
> Key: HIVE-17261
> URL: https://issues.apache.org/jira/browse/HIVE-17261
> Project: Hive
>  Issue Type: Improvement
>  Components: Database/Schema
>Affects Versions: 2.2.0
>Reporter: Junjie Chen
>Assignee: Junjie Chen
>Priority: Minor
> Attachments: HIVE-17261.diff, HIVE-17261.patch
>
>
> Hive use deprecated ParquetInputSplit in 
> [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/ParquetRecordReaderBase.java#L128]
> Please see interface definition in 
> [https://github.com/apache/parquet-mr/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ParquetInputSplit.java#L80]
> Old interface set rowgroupoffset values which will lead to skip dictionary 
> filter in parquet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


  1   2   >