[jira] [Commented] (HIVE-15883) HBase mapped table in Hive insert fail for decimal

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862274#comment-15862274
 ] 

Hive QA commented on HIVE-15883:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852182/HIVE-15883.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10246 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3501/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3501/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3501/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852182 - PreCommit-HIVE-Build

> HBase mapped table in Hive insert fail for decimal
> --
>
> Key: HIVE-15883
> URL: https://issues.apache.org/jira/browse/HIVE-15883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-15883.patch
>
>
> CREATE TABLE hbase_table (
> id int,
> balance decimal(15,2))
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '~'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping"=":key,cf:balance#b");
> insert into hbase_table values (1,1);
> 
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: java.lang.RuntimeException: 
> Hive internal error.
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:733)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
> ... 9 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: 
> java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:668)
> ... 15 more
> Caused by: java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:328)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
> at 
> 

[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Status: Patch Available  (was: Open)

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.1.patch, HIVE-1555.2.patch, JDBCStorageHandler 
> Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Attachment: HIVE-1555.2.patch

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.1.patch, HIVE-1555.2.patch, JDBCStorageHandler 
> Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Status: Open  (was: Patch Available)

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.1.patch, HIVE-1555.2.patch, JDBCStorageHandler 
> Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15872) The PERCENTILE UDAF does not work with empty set

2017-02-10 Thread Chaozhong Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaozhong Yang updated HIVE-15872:
--
Description: 
1. Original SQL:
{code}
select
percentile_approx(
column0,
array(0.50, 0.70, 0.90, 0.95, 0.99)
)
from
my_table
where
date = '20170207'
and column1 = 'value1'
and column2 = 'value2'
and column3 = 'value3'
and column4 = 'value4'
and column5 = 'value5'
{code}

2. Exception StackTrace:
{code}
Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) ... 
7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
 at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) 
... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at 
java.util.ArrayList.rangeCheck(ArrayList.java:653) at 
java.util.ArrayList.get(ArrayList.java:429) at 
org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134)
 at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318)
 at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
 ... 8 more
{code}

3. review data:

{code}
select
column0
from
my_table
where
date = '20170207'
and column1 = 'value1'
and column2 = 'value2'
and column3 = 'value3'
and column4 = 'value4'
and column5 = 'value5'
{code}

After run this sql, we found the result is NULL.

4. what's the meaning of [0.0, 1.0] in stacktrace?

In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an 
ArrayList which name is partialHistogram. Normally, the basic structure of 
partialHistogram is [npercentiles, percentile0, percentile1..., nbins, bin0.x, 
bin0.y, bin1.x, bin1.y,...]. However, if we process NULL(empty set) column 
values, the partialHistoram will only contains [npercentiles(0), nbins(1)]. 
That's the reason why the stacktrace shows a strange row data: 
{"key":{},"value":{"_col0":[0.0,1.0]}}

Before we call histogram#merge (on-line hisgoram algorithm from paper: 
http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ), the 
partialHistogram should remove elements which store percentiles like 
`partialHistogram.subList(0, nquantiles+1).clear();`. In the case of empty set, 
GenericUDAFPercentileApproxEvaluator will not remove percentiles. Consequently, 
NumericHistogram will merge a list which contains only 2 elements([0, 1.0]) 
and throws IndexOutOfBoundsException. 

  was:
1. Original SQL:

select
percentile_approx(
column0,
array(0.50, 0.70, 0.90, 0.95, 0.99)
)
from
my_table
where
date = '20170207'
and column1 = 'value1'
and column2 = 'value2'
and column3 = 'value3'
and column4 = 'value4'
and column5 = 'value5'

2. Exception StackTrace:

Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 

[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Attachment: HIVE-1555.1.patch

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.1.patch, JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-1555) JDBC Storage Handler

2017-02-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-1555:
-
Status: Patch Available  (was: In Progress)

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: HIVE-1555.1.patch, JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-1555) JDBC Storage Handler

2017-02-10 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862266#comment-15862266
 ] 

Gunther Hagleitner commented on HIVE-1555:
--

Can I take this over? It's been a few months w/o update again. I've brought 
some code. Well I took [~charithe]'s recommendation and took 
https://github.com/QubitProducts/hive-jdbc-storage-handler

I haven't tested w/ mysql or any other db yet. But you can write queries 
against derby in q files now. 

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-1555) JDBC Storage Handler

2017-02-10 Thread Gunther Hagleitner (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner reassigned HIVE-1555:


Assignee: Gunther Hagleitner  (was: Dmitry Zagorulkin)

> JDBC Storage Handler
> 
>
> Key: HIVE-1555
> URL: https://issues.apache.org/jira/browse/HIVE-1555
> Project: Hive
>  Issue Type: New Feature
>  Components: JDBC
>Reporter: Bob Robertson
>Assignee: Gunther Hagleitner
> Attachments: JDBCStorageHandler Design Doc.pdf
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> With the Cassandra and HBase Storage Handlers I thought it would make sense 
> to include a generic JDBC RDBMS Storage Handler so that you could import a 
> standard DB table into Hive. Many people must want to perform HiveQL joins, 
> etc against tables in other systems etc.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15883) HBase mapped table in Hive insert fail for decimal

2017-02-10 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15883:
-
Status: Patch Available  (was: Open)

> HBase mapped table in Hive insert fail for decimal
> --
>
> Key: HIVE-15883
> URL: https://issues.apache.org/jira/browse/HIVE-15883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-15883.patch
>
>
> CREATE TABLE hbase_table (
> id int,
> balance decimal(15,2))
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '~'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping"=":key,cf:balance#b");
> insert into hbase_table values (1,1);
> 
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: java.lang.RuntimeException: 
> Hive internal error.
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:733)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
> ... 9 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: 
> java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:668)
> ... 15 more
> Caused by: java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:328)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
> ... 16 more 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15883) HBase mapped table in Hive insert fail for decimal

2017-02-10 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15883:
-
Attachment: HIVE-15883.patch

{code}
hive> insert into hbase_table values (2,2);
Query ID = hive_20170210142121_a966b2a7-e371-4cba-a9e9-dae2554cec42
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1486734052415_0010, Tracking URL = 
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2017-02-10 14:21:54,959 Stage-0 map = 0%,  reduce = 0%
2017-02-10 14:22:05,579 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 3.87 
sec
MapReduce Total cumulative CPU time: 3 seconds 870 msec
Ended Job = job_1486734052415_0010
MapReduce Jobs Launched: 
Stage-Stage-0: Map: 1   Cumulative CPU: 3.87 sec   HDFS Read: 4065 HDFS Write: 
0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 870 msec
OK
Time taken: 24.433 seconds
select * from hbase_table;
OK
1   1
2   2
Time taken: 2.426 seconds, Fetched: 2 row(s)
hive> insert into hbase_decimalbinary values (11, 11.11);
Query ID = hive_20170210200404_6e0a62ed-a587-4cd5-a105-524f3838f4a9
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1486734052415_0032, Tracking URL = 
Hadoop job information for Stage-0: number of mappers: 1; number of reducers: 0
2017-02-10 20:04:29,894 Stage-0 map = 0%,  reduce = 0%
2017-02-10 20:04:42,333 Stage-0 map = 100%,  reduce = 0%, Cumulative CPU 3.68 
sec
MapReduce Total cumulative CPU time: 3 seconds 680 msec
Ended Job = job_1486734052415_0032
MapReduce Jobs Launched: 
Stage-Stage-0: Map: 1   Cumulative CPU: 3.68 sec   HDFS Read: 4070 HDFS Write: 
0 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 680 msec
OK
Time taken: 25.385 seconds
hive> select * from hbase_table;
OK
1   1
11  11.11
2   2
Time taken: 0.394 seconds, Fetched: 3 row(s)
{code}



> HBase mapped table in Hive insert fail for decimal
> --
>
> Key: HIVE-15883
> URL: https://issues.apache.org/jira/browse/HIVE-15883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
> Attachments: HIVE-15883.patch
>
>
> CREATE TABLE hbase_table (
> id int,
> balance decimal(15,2))
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '~'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping"=":key,cf:balance#b");
> insert into hbase_table values (1,1);
> 
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: java.lang.RuntimeException: 
> Hive internal error.
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:733)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
> ... 9 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: 
> java.lang.RuntimeException: Hive internal error.
> at 
> 

[jira] [Assigned] (HIVE-15883) HBase mapped table in Hive insert fail for decimal

2017-02-10 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam reassigned HIVE-15883:



> HBase mapped table in Hive insert fail for decimal
> --
>
> Key: HIVE-15883
> URL: https://issues.apache.org/jira/browse/HIVE-15883
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 1.1.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>
> CREATE TABLE hbase_table (
> id int,
> balance decimal(15,2))
> ROW FORMAT DELIMITED
> COLLECTION ITEMS TERMINATED BY '~'
> STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
> WITH SERDEPROPERTIES (
> "hbase.columns.mapping"=":key,cf:balance#b");
> insert into hbase_table values (1,1);
> 
> Diagnostic Messages for this Task:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:179)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row {"tmp_values_col1":"1","tmp_values_col2":"1"}
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:507)
> at 
> org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:170)
> ... 8 more
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> org.apache.hadoop.hive.serde2.SerDeException: java.lang.RuntimeException: 
> Hive internal error.
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:733)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
> at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:97)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:157)
> at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:497)
> ... 9 more
> Caused by: org.apache.hadoop.hive.serde2.SerDeException: 
> java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:286)
> at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:668)
> ... 15 more
> Caused by: java.lang.RuntimeException: Hive internal error.
> at 
> org.apache.hadoop.hive.serde2.lazy.LazyUtils.writePrimitive(LazyUtils.java:328)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:220)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serializeField(HBaseRowSerializer.java:194)
> at 
> org.apache.hadoop.hive.hbase.HBaseRowSerializer.serialize(HBaseRowSerializer.java:118)
> at 
> org.apache.hadoop.hive.hbase.HBaseSerDe.serialize(HBaseSerDe.java:282)
> ... 16 more 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15882) HS2 generating high memory pressure with many partitions and concurrent queries

2017-02-10 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev updated HIVE-15882:
--
Attachment: hs2-crash-2000p-500m-50q.txt

> HS2 generating high memory pressure with many partitions and concurrent 
> queries
> ---
>
> Key: HIVE-15882
> URL: https://issues.apache.org/jira/browse/HIVE-15882
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
> Attachments: hs2-crash-2000p-500m-50q.txt
>
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code:
> 1. 24.5% of memory is wasted by duplicate strings (see section 6). With 
> String.intern() calls added in the ~10 relevant places in the code, this 
> overhead can be highly reduced.
> 2. Almost 20% of memory is wasted due to various suboptimally used 
> collections (see section 8). There are many maps and lists that are either 
> empty or have just 1 element. By modifying the code that creates and 
> populates these collections, we may likely save 5-10% of memory.
> 3. Almost 20% of memory is used by instances of java.util.Properties. It 
> looks like these objects are highly duplicate, since for each Partition each 
> concurrently running query creates its own copy of Partion, PartitionDesc and 
> Properties. Thus we have nearly 100,000 (50 queries * 2,000 partitions) 
> Properties in memory. By interning/deduplicating these objects we may be able 
> to save perhaps 15% of memory.
> So overall, I think there is a good chance to reduce HS2 memory consumption 
> in this scenario by ~40%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15882) HS2 generating high memory pressure with many partitions and concurrent queries

2017-02-10 Thread Misha Dmitriev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Misha Dmitriev reassigned HIVE-15882:
-


> HS2 generating high memory pressure with many partitions and concurrent 
> queries
> ---
>
> Key: HIVE-15882
> URL: https://issues.apache.org/jira/browse/HIVE-15882
> Project: Hive
>  Issue Type: Improvement
>  Components: HiveServer2
>Reporter: Misha Dmitriev
>Assignee: Misha Dmitriev
>
> I've created a Hive table with 2000 partitions, each backed by two files, 
> with one row in each file. When I execute some number of concurrent queries 
> against this table, e.g. as follows
> {code}
> for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p 
> admin -e "select count(i_f_1) from misha_table;" & done
> {code}
> it results in a big memory spike. With 20 queries I caused an OOM in a HS2 
> server with -Xmx200m and with 50 queries - in the one with -Xmx500m.
> I am attaching the results of jxray (www.jxray.com) analysis of a heap dump 
> that was generated in the 50queries/500m heap scenario. It suggests that 
> there are several opportunities to reduce memory pressure with not very 
> invasive changes to the code:
> 1. 24.5% of memory is wasted by duplicate strings (see section 6). With 
> String.intern() calls added in the ~10 relevant places in the code, this 
> overhead can be highly reduced.
> 2. Almost 20% of memory is wasted due to various suboptimally used 
> collections (see section 8). There are many maps and lists that are either 
> empty or have just 1 element. By modifying the code that creates and 
> populates these collections, we may likely save 5-10% of memory.
> 3. Almost 20% of memory is used by instances of java.util.Properties. It 
> looks like these objects are highly duplicate, since for each Partition each 
> concurrently running query creates its own copy of Partion, PartitionDesc and 
> Properties. Thus we have nearly 100,000 (50 queries * 2,000 partitions) 
> Properties in memory. By interning/deduplicating these objects we may be able 
> to save perhaps 15% of memory.
> So overall, I think there is a good chance to reduce HS2 memory consumption 
> in this scenario by ~40%.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15665) LLAP: OrcFileMetadata objects in cache can impact heap usage

2017-02-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15665:

Attachment: HIVE-15665.WIP.patch

WIP patch that refactors buffers for multiple type support, and changes file 
metadata to a buffer. Need to work on stripe metadata, which doesn't have a 
convenient method to get a buffer, and will also probably need separate caching 
for row indexes.

> LLAP: OrcFileMetadata objects in cache can impact heap usage
> 
>
> Key: HIVE-15665
> URL: https://issues.apache.org/jira/browse/HIVE-15665
> Project: Hive
>  Issue Type: Improvement
>  Components: llap
>Reporter: Rajesh Balamohan
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15665.WIP.patch
>
>
> OrcFileMetadata internally has filestats, stripestats etc which are allocated 
> in heap. On large data sets, this could have an impact on the heap usage and 
> the memory usage by different executors in LLAP.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15878) LLAP text cache: bug in last merge

2017-02-10 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862154#comment-15862154
 ] 

Gopal V commented on HIVE-15878:


LGTM - +1.

Seeing hit-rates again now.

> LLAP text cache: bug in last merge
> --
>
> Key: HIVE-15878
> URL: https://issues.apache.org/jira/browse/HIVE-15878
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15878.patch
>
>
> While rebasing the last patch, a bug was introduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862128#comment-15862128
 ] 

Hive QA commented on HIVE-15489:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852162/HIVE-15489.5.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 38 failed/errored test(s), 10246 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[index_auto_mult_tables] 
(batchId=78)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join1]
 (batchId=160)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[vector_outer_join2]
 (batchId=160)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join0] 
(batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join13] 
(batchId=129)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join22] 
(batchId=118)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join2] 
(batchId=122)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join30] 
(batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join31] 
(batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_stats2] 
(batchId=132)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_join_stats] 
(batchId=115)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_12]
 (batchId=109)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[auto_sortmerge_join_9]
 (batchId=123)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_spark4]
 (batchId=95)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[bucket_map_join_tez2]
 (batchId=100)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[cross_product_check_2]
 (batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[identity_project_remove_skip]
 (batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join28] 
(batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join29] 
(batchId=114)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join31] 
(batchId=133)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32] 
(batchId=103)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join32_lessSize] 
(batchId=98)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join33] 
(batchId=102)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[join_star] 
(batchId=108)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_mapjoin] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_subquery2] 
(batchId=98)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[mapjoin_subquery] 
(batchId=116)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[reduce_deduplicate_exclude_join]
 (batchId=128)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_17] 
(batchId=97)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[smb_mapjoin_25] 
(batchId=99)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[subquery_multiinsert]
 (batchId=131)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_left_outer_join]
 (batchId=105)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vector_mapjoin_reduce]
 (batchId=129)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[vectorized_nested_mapjoin]
 (batchId=102)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3500/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3500/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3500/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 38 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852162 - PreCommit-HIVE-Build

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>   

[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-10 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862099#comment-15862099
 ] 

Sahil Takiar commented on HIVE-15881:
-

As far as I can tell, this config *is only used by Hive* across the entire 
Hadoop ecosystem.

The only reference I have found to it is here: 
https://github.com/facebookarchive/hadoop-20/blob/master/src/mapred/org/apache/hadoop/mapred/FileInputFormat.java
 - but this seems to be an archived version of a Facebook specific version of 
Hadoop. I've checked Hadoop trunk a few times and have found no reference to 
this config. Even in old commits to trunk, I see no reference to this config.

It was originally added in HIVE-2051 by [~sdong].

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15880) Allow insert overwrite query to use auto.purge table property

2017-02-10 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862096#comment-15862096
 ] 

Vihang Karajgaonkar commented on HIVE-15880:


Hi [~ashutoshc] [~spena] Just wanted to get your opinion on this. Is this 
something that you think will be useful/consistent or if you see any issues if 
we decide to implement this?

> Allow insert overwrite query to use auto.purge table property
> -
>
> Key: HIVE-15880
> URL: https://issues.apache.org/jira/browse/HIVE-15880
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> It seems inconsistent that auto.purge property is not considered when we do a 
> INSERT OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:03 
> /user/hive/warehouse/temp/00_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is 
> set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:07 
> /user/hive/warehouse/temp/00_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 26 2017-02-09 13:08 
> /user/hive/warehouse/temp/00_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx--   - hive hive  0 2017-02-09 13:08 
> /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant 
> overhead on slow FileSystems like S3. This could improve the performance of 
> {{INSERT OVERWRITE TABLE}} queries especially when there are large number of 
> partitions on tables located on S3 should the user wish to set auto.purge 
> property to true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15873) Remove Windows-specific code

2017-02-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862060#comment-15862060
 ] 

Ashutosh Chauhan commented on HIVE-15873:
-

+1

> Remove Windows-specific code
> 
>
> Key: HIVE-15873
> URL: https://issues.apache.org/jira/browse/HIVE-15873
> Project: Hive
>  Issue Type: Bug
>Reporter: Gunther Hagleitner
>Assignee: Gunther Hagleitner
> Attachments: HIVE-15873.1.patch, HIVE-15873.2.patch
>
>
> I know a lot of work has gone initially into supporting UT, runtime, etc on 
> Windows, but this code seems to have been rotting. 
> There have been no updates to the windows specific test files, or any new 
> code to keep the new features compatible.
> We're also not running the tests or builds on windows. That is really an 
> impediment to keepting that code healthy.
> The code is sprinkled all over the codebase. Makes it hard to maintain. I 
> think we're better off removing it.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15769) Support view creation in CBO

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15862027#comment-15862027
 ] 

Hive QA commented on HIVE-15769:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852131/HIVE-15769.06.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10247 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_or_replace_view4]
 (batchId=86)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_view_failure3]
 (batchId=86)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_view_failure6]
 (batchId=86)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_view_failure7]
 (batchId=85)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_view_failure8]
 (batchId=85)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[create_view_failure9]
 (batchId=86)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3499/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3499/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3499/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852131 - PreCommit-HIVE-Build

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch, 
> HIVE-15769.03.patch, HIVE-15769.04.patch, HIVE-15769.05.patch, 
> HIVE-15769.06.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-10 Thread Chao Sun (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated HIVE-15489:

Attachment: HIVE-15489.5.patch

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.5.patch, 
> HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861998#comment-15861998
 ] 

Ashutosh Chauhan commented on HIVE-15388:
-

+1 on latest patch LGTM [~pxiong] please create a follow-up jira for 
vectorization.

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>   

[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-10 Thread Thomas Poepping (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861995#comment-15861995
 ] 

Thomas Poepping commented on HIVE-15881:


>From what I can tell, saying
bq. These methods are Hive related
isn't exactly true. 

I agree that the configuration value name is confusing, mainly because of the 
inclusion of {{mapred}} (we support Tez too, right? ;) ). But it seems like the 
dfsclient (HDFS, S3a, etc) should actually have some level of control over the 
number of threads being used to access it. What if we have some {{FileSystem}} 
implementation that, for some reason, is non-threadsafe? It should be able to 
limit threads. Renaming the config to something like 
{{hive.get.input.listing.num.threads}} can hide what's _really_ happening here, 
which would be the dfsclient controlling its maximum allowable number of 
parallel accessors on its own.

That being said, I can't find a link in open source Hadoop to this 
configuration value. Can you link it here? If it is actually only used by Hive 
across the entire Hadoop ecosystem, then I have no problem doing what you 
suggest. I just have a hard time believing that's the case.

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861980#comment-15861980
 ] 

Hive QA commented on HIVE-15388:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852128/HIVE-15388.08.patch

{color:green}SUCCESS:{color} +1 due to 7 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10243 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[select_charliteral]
 (batchId=86)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=230)
org.apache.hadoop.hive.ql.security.TestAuthorizationPreEventListener.testListener
 (batchId=209)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3498/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3498/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3498/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852128 - PreCommit-HIVE-Build

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 

[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861978#comment-15861978
 ] 

Xuefu Zhang commented on HIVE-15489:


{quote}
I've thought about this. The downside is many good cases will be turned to 
reduce join as well. But I think this config is mainly for stability, so it 
should be fine, as long as we document this well. Will add to next patch.
{quote}
My concern is that the map joins down below may also suffer the consequence of 
inaccurate stats.

{quote}
Do you think we should combine these two? since they are similar.
{quote}
It's probably better to have two as they control behaviors on different 
functionality.

> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15769) Support view creation in CBO

2017-02-10 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861977#comment-15861977
 ] 

Ashutosh Chauhan commented on HIVE-15769:
-

+1 pending tests

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch, 
> HIVE-15769.03.patch, HIVE-15769.04.patch, HIVE-15769.05.patch, 
> HIVE-15769.06.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15769) Support view creation in CBO

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15769:
---
Attachment: HIVE-15769.06.patch

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch, 
> HIVE-15769.03.patch, HIVE-15769.04.patch, HIVE-15769.05.patch, 
> HIVE-15769.06.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15769) Support view creation in CBO

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15769:
---
Status: Patch Available  (was: Open)

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch, 
> HIVE-15769.03.patch, HIVE-15769.04.patch, HIVE-15769.05.patch, 
> HIVE-15769.06.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña reassigned HIVE-15881:
--


> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15769) Support view creation in CBO

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15769:
---
Status: Open  (was: Patch Available)

> Support view creation in CBO
> 
>
> Key: HIVE-15769
> URL: https://issues.apache.org/jira/browse/HIVE-15769
> Project: Hive
>  Issue Type: Bug
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15769.01.patch, HIVE-15769.02.patch, 
> HIVE-15769.03.patch, HIVE-15769.04.patch, HIVE-15769.05.patch, 
> HIVE-15769.06.patch
>
>
> Right now, set operator needs to run in CBO. If a view contains a set op, it 
> will throw exception. We need to support view creation in CBO.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15881) Use new thread count variable name instead of mapred.dfsclient.parallelism.max

2017-02-10 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HIVE-15881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861943#comment-15861943
 ] 

Sergio Peña commented on HIVE-15881:


[~ashutoshc] [~poeppt] [~stakiar] What do you think about this change? We have 
found this variable name a little confusing because it is a Hadoop-specific 
variable, and the Utilities is used just for Hive. The new and old variable 
will do the same thing during Hive 2.x.

> Use new thread count variable name instead of mapred.dfsclient.parallelism.max
> --
>
> Key: HIVE-15881
> URL: https://issues.apache.org/jira/browse/HIVE-15881
> Project: Hive
>  Issue Type: Task
>  Components: Query Planning
>Reporter: Sergio Peña
>Assignee: Sergio Peña
>Priority: Minor
>
> The Utilities class has two methods, {{getInputSummary}} and 
> {{getInputPaths}}, that use the variable {{mapred.dfsclient.parallelism.max}} 
> to get the summary of a list of input locations in parallel. These methods 
> are Hive related, but the variable name does not look it is specific for Hive.
> Also, the above variable is not on HiveConf nor used anywhere else. I just 
> found a reference on the Hadoop MR1 code.
> I'd like to propose the deprecation of {{mapred.dfsclient.parallelism.max}}, 
> and use a different variable name, such as 
> {{hive.get.input.listing.num.threads}}, that reflects the intention of the 
> variable. The removal of the old variable might happen on Hive 3.x



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15878) LLAP text cache: bug in last merge

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861922#comment-15861922
 ] 

Hive QA commented on HIVE-15878:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852125/HIVE-15878.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10246 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr]
 (batchId=140)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3497/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3497/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3497/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 5 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852125 - PreCommit-HIVE-Build

> LLAP text cache: bug in last merge
> --
>
> Key: HIVE-15878
> URL: https://issues.apache.org/jira/browse/HIVE-15878
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15878.patch
>
>
> While rebasing the last patch, a bug was introduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15880) Allow insert overwrite query to use auto.purge table property

2017-02-10 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-15880:
--


> Allow insert overwrite query to use auto.purge table property
> -
>
> Key: HIVE-15880
> URL: https://issues.apache.org/jira/browse/HIVE-15880
> Project: Hive
>  Issue Type: Improvement
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> It seems inconsistent that auto.purge property is not considered when we do a 
> INSERT OVERWRITE while it is when we do a DROP TABLE
> Drop table doesn't move table data to Trash when auto.purge is set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> No rows affected (0.064 seconds)
> > alter table temp set tblproperties('auto.purge'='true');
> No rows affected (0.083 seconds)
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> No rows affected (25.473 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:03 
> /user/hive/warehouse/temp/00_0
> #
> > drop table temp;
> No rows affected (0.242 seconds)
> # hdfs dfs -ls /user/hive/warehouse/temp
> ls: `/user/hive/warehouse/temp': No such file or directory
> #
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> #
> {noformat}
> INSERT OVERWRITE query moves the table data to Trash even when auto.purge is 
> set to true
> {noformat}
> > create table temp(col1 string, col2 string);
> > alter table temp set tblproperties('auto.purge'='true');
> > insert into temp values ('test', 'test'), ('test2', 'test2');
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 22 2017-02-09 13:07 
> /user/hive/warehouse/temp/00_0
> #
> > insert overwrite table temp select * from dummy;
> # hdfs dfs -ls /user/hive/warehouse/temp
> Found 1 items
> -rwxrwxrwt   3 hive hive 26 2017-02-09 13:08 
> /user/hive/warehouse/temp/00_0
> # sudo -u hive hdfs dfs -ls /user/hive/.Trash/Current/user/hive/warehouse
> Found 1 items
> drwx--   - hive hive  0 2017-02-09 13:08 
> /user/hive/.Trash/Current/user/hive/warehouse/temp
> #
> {noformat}
> While move operations are not very costly on HDFS it could be significant 
> overhead on slow FileSystems like S3. This could improve the performance of 
> {{INSERT OVERWRITE TABLE}} queries especially when there are large number of 
> partitions on tables located on S3 should the user wish to set auto.purge 
> property to true



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15872) The PERCENTILE UDAF does not work with empty set

2017-02-10 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861898#comment-15861898
 ] 

Wei Zheng commented on HIVE-15872:
--

[~debugger87] Thanks for the patch. The fix looks good. Can you add a unit test 
for the failing case?

> The PERCENTILE UDAF does not work with empty set
> 
>
> Key: HIVE-15872
> URL: https://issues.apache.org/jira/browse/HIVE-15872
> Project: Hive
>  Issue Type: Bug
>  Components: UDF
>Reporter: Chaozhong Yang
>Assignee: Chaozhong Yang
> Fix For: 2.1.2
>
> Attachments: HIVE-15872.patch
>
>
> 1. Original SQL:
> select
> percentile_approx(
> column0,
> array(0.50, 0.70, 0.90, 0.95, 0.99)
> )
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> 2. Exception StackTrace:
> Error: java.lang.RuntimeException: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at 
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at 
> org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at 
> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
>  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
> processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) 
> ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
>  at 
> org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) 
> ... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 
> at java.util.ArrayList.rangeCheck(ArrayList.java:653) at 
> java.util.ArrayList.get(ArrayList.java:429) at 
> org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318)
>  at 
> org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
>  at 
> org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
>  ... 8 more
> 3. review data:
> select
> column0
> from
> my_table
> where
> date = '20170207'
> and column1 = 'value1'
> and column2 = 'value2'
> and column3 = 'value3'
> and column4 = 'value4'
> and column5 = 'value5'
> After run this sql, we found the result is NULL.
> 4. what's the meaning of [0.0, 1.0] in stacktrace?
> In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an 
> ArrayList which name is partialHistogram. Normally, the basic structure of 
> partialHistogram is [npercentiles, percentile0, percentile1..., nbins, 
> bin0.x, bin0.y, bin1.x, bin1.y,...]. However, if we process NULL(empty set) 
> column values, the partialHistoram will only contains [npercentiles(0), 
> nbins(1)]. That's the reason why the stacktrace shows a strange row data: 
> {"key":{},"value":{"_col0":[0.0,1.0]}}
> Before we call histogram#merge (on-line hisgoram algorithm from paper: 
> http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ), the 
> partialHistogram should remove elements which store percentiles like 
> `partialHistogram.subList(0, nquantiles+1).clear();`. In the case of empty 
> set, GenericUDAFPercentileApproxEvaluator will not remove percentiles. 
> Consequently, NumericHistogram will merge a list which contains only 2 
> elements([0, 1.0]) and throws IndexOutOfBoundsException. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15879) Fix HiveMetaStoreChecker.checkPartitionDirs method

2017-02-10 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar reassigned HIVE-15879:
--


> Fix HiveMetaStoreChecker.checkPartitionDirs method
> --
>
> Key: HIVE-15879
> URL: https://issues.apache.org/jira/browse/HIVE-15879
> Project: Hive
>  Issue Type: Bug
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> HIVE-15803 fixes the msck hang issue in 
> HiveMetaStoreChecker.checkPartitionDirs method by adding a check to see if 
> the Threadpool has any spare threads. If not it uses single threaded listing 
> of the files.
> {noformat}
> if (pool != null) {
>   synchronized (pool) {
> // In case of recursive calls, it is possible to deadlock with TP. 
> Check TP usage here.
> if (pool.getActiveCount() < pool.getMaximumPoolSize()) {
>   useThreadPool = true;
> }
> if (!useThreadPool) {
>   if (LOG.isDebugEnabled()) {
> LOG.debug("Not using threadPool as active count:" + 
> pool.getActiveCount()
> + ", max:" + pool.getMaximumPoolSize());
>   }
> }
>   }
> }
> {noformat}
> Based on the java doc of getActiveCount() below 
> bq. Returns the approximate number of threads that are actively executing 
> tasks.
> it returns only approximate number of threads and it cannot be guaranteed 
> that it always returns the exact number of active threads. This still exposes 
> the method implementation to the msck hang bug in rare corner cases.
> We could either:
> 1. Use a atomic counter to track exactly how many threads are actively running
> 2. Relook at the method itself to make it much simpler. Like eg, look into 
> the possibility of changing the recursive implementation to an iterative 
> implementation where worker threads pick tasks from a queue until the queue 
> is empty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Attachment: HIVE-15388.08.patch

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
> 

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Patch Available  (was: Open)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Open  (was: Patch Available)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>

[jira] [Commented] (HIVE-15489) Alternatively use table scan stats for HoS

2017-02-10 Thread Chao Sun (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861872#comment-15861872
 ] 

Chao Sun commented on HIVE-15489:
-

Thanks for reviewing the patch, [~xuefuz]!

bq. 1. The new configuration might have a better name. 
"hive.spark.use.ts.stats" seems a little too general. Please consider a more 
specific name, something like "hive_on_spark.use.file.size.for.mapjoin". Very 
minor though.

Sure. I can change that. HIVE-15796 is also going to add a new config 
{{hive.spark.use.op.stats}}. Do you think we should combine these two? since 
they are similar.

bq. 2. For new property, we probably want to default it to the old behavior 
when checking in. Maybe we can have some test cases run with this new 
configuration on.

Yes. I plan to set the default to false. Setting it to true is just for testing.

bq. 3. If join op isn't coming directly from table scan, I saw we are still 
using operator stats to decide mapjoin. This can still cause the issue of 
inaccurate estimation, right? Should we just don't convert it to map join in 
such a case?

I've thought about this. The downside is many good cases will be turned to 
reduce join as well. But I think this config is mainly for stability, so it 
should be fine, as long as we document this well. Will add to next patch.

bq. 4. There seems to be some test failures in the above run. Are they related?

I ran these tests locally and didn't see any issue.



> Alternatively use table scan stats for HoS
> --
>
> Key: HIVE-15489
> URL: https://issues.apache.org/jira/browse/HIVE-15489
> Project: Hive
>  Issue Type: Improvement
>  Components: Spark, Statistics
>Affects Versions: 2.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
> Attachments: HIVE-15489.1.patch, HIVE-15489.2.patch, 
> HIVE-15489.3.patch, HIVE-15489.4.patch, HIVE-15489.wip.patch
>
>
> For MapJoin in HoS, we should provide an option to only use stats in the TS 
> rather than the populated stats in each of the join branch. This could be 
> pretty conservative but more reliable.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15878) LLAP text cache: bug in last merge

2017-02-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15878:

Status: Patch Available  (was: Open)

> LLAP text cache: bug in last merge
> --
>
> Key: HIVE-15878
> URL: https://issues.apache.org/jira/browse/HIVE-15878
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15878.patch
>
>
> While rebasing the last patch, a bug was introduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15878) LLAP text cache: bug in last merge

2017-02-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-15878:

Attachment: HIVE-15878.patch

[~gopalv] can you take a look?
Note the comment that also made it into the patch. Before ORC-141, the memory 
manager may potentially cause issues with large files.

> LLAP text cache: bug in last merge
> --
>
> Key: HIVE-15878
> URL: https://issues.apache.org/jira/browse/HIVE-15878
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15878.patch
>
>
> While rebasing the last patch, a bug was introduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15878) LLAP text cache: bug in last merge

2017-02-10 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HIVE-15878:
---


> LLAP text cache: bug in last merge
> --
>
> Key: HIVE-15878
> URL: https://issues.apache.org/jira/browse/HIVE-15878
> Project: Hive
>  Issue Type: Bug
>Reporter: Gopal V
>Assignee: Sergey Shelukhin
>
> While rebasing the last patch, a bug was introduced.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861859#comment-15861859
 ] 

Hive QA commented on HIVE-15388:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12851744/HIVE-15388.07.patch

{color:green}SUCCESS:{color} +1 due to 6 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 8 failed/errored test(s), 10248 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[show_functions] 
(batchId=67)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver[explainanalyze_3] 
(batchId=93)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.ql.parse.TestParseDriverIntervals.parseInterval[select 
(1) day] (batchId=252)
org.apache.hadoop.hive.ql.parse.TestParseDriverIntervals.parseInterval[select 
(1) days] (batchId=252)
org.apache.hadoop.hive.ql.parse.TestParseDriverIntervals.parseInterval[select 
(1+1) days] (batchId=252)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3496/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3496/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3496/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 8 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12851744 - PreCommit-HIVE-Build

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
> 

[jira] [Commented] (HIVE-15691) Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink

2017-02-10 Thread Roshan Naik (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861858#comment-15861858
 ] 

Roshan Naik commented on HIVE-15691:


Should be able to get to it on Monday.

> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink
> -
>
> Key: HIVE-15691
> URL: https://issues.apache.org/jira/browse/HIVE-15691
> Project: Hive
>  Issue Type: New Feature
>  Components: HCatalog, Transactions
>Reporter: Kalyan
>Assignee: Kalyan
> Attachments: HIVE-15691.1.patch, HIVE-15691.patch, 
> HIVE-15691-updated.patch
>
>
> Create StrictRegexWriter to work with RegexSerializer for Flume Hive Sink.
> It is similar to StrictJsonWriter available in hive.
> Dependency is there in flume to commit.
> FLUME-3036 : Create a RegexSerializer for Hive Sink.
> Patch is available for Flume, Please verify the below link
> https://github.com/kalyanhadooptraining/flume/commit/1c651e81395404321f9964c8d9d2af6f4a2aaef9



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15786) Provide additional information from the llapstatus command

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861814#comment-15861814
 ] 

Hive QA commented on HIVE-15786:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852108/HIVE-15786.03.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3495/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3495/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3495/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-02-10 20:18:46.223
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3495/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-02-10 20:18:46.226
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at 0959e77 HIVE-15866: LazySimpleDeserializeRead doesn't recognized 
lower case 'true' properly (Matt McCline, reviewed by Gunther Hagleitner)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at 0959e77 HIVE-15866: LazySimpleDeserializeRead doesn't recognized 
lower case 'true' properly (Matt McCline, reviewed by Gunther Hagleitner)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-02-10 20:18:47.335
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
Going to apply patch with: patch -p0
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapSliderUtils.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusOptionsProcessor.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/cli/LlapStatusServiceDriver.java
patching file 
llap-server/src/java/org/apache/hadoop/hive/llap/cli/status/LlapStatusHelpers.java
patching file llap-server/src/main/resources/llap-cli-log4j2.properties
patching file pom.xml
+ [[ maven == \m\a\v\e\n ]]
+ rm -rf /data/hiveptest/working/maven/org/apache/hive
+ mvn -B clean install -DskipTests -T 4 -q 
-Dmaven.repo.local=/data/hiveptest/working/maven
ANTLR Parser Generator  Version 3.5.2
Output file 
/data/hiveptest/working/apache-github-source-source/metastore/target/generated-sources/antlr3/org/apache/hadoop/hive/metastore/parser/FilterParser.java
 does not exist: must build 
/data/hiveptest/working/apache-github-source-source/metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
org/apache/hadoop/hive/metastore/parser/Filter.g
DataNucleus Enhancer (version 4.1.6) for API "JDO"
DataNucleus Enhancer : Classpath
>>  /usr/share/maven/boot/plexus-classworlds-2.x.jar
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MDatabase
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MFieldSchema
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MType
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MTable
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MConstraint
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MSerDeInfo
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MOrder
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MColumnDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MStringList
ENHANCED (Persistable) : 
org.apache.hadoop.hive.metastore.model.MStorageDescriptor
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MPartition
ENHANCED (Persistable) : org.apache.hadoop.hive.metastore.model.MIndex
ENHANCED 

[jira] [Commented] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861806#comment-15861806
 ] 

Hive QA commented on HIVE-15877:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852110/HIVE-15877.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 10246 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testConnection (batchId=229)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValid (batchId=229)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testIsValidNeg (batchId=229)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeProxyAuth 
(batchId=229)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testNegativeTokenAuth 
(batchId=229)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testProxyAuth (batchId=229)
org.apache.hive.minikdc.TestJdbcWithDBTokenStore.testTokenAuth (batchId=229)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3494/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3494/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3494/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 9 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852110 - PreCommit-HIVE-Build

> Upload dependency jars for druid storage handler
> 
>
> Key: HIVE-15877
> URL: https://issues.apache.org/jira/browse/HIVE-15877
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15877.patch
>
>
> Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861788#comment-15861788
 ] 

Pengcheng Xiong commented on HIVE-15388:


I have made changes to the patch. We now support (1) days ('1') days, i.e., 
stringliteral/number within parentheses. I discussed with [~ashutoshc] and we 
agree that we should keep the semantic for between, i.e., not between rather 
than between not. We should try to (1) fold multiple not (2) make vectorization 
work in separate patches.

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>   

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Attachment: (was: HIVE-15388.08.patch)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>   OR 

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Patch Available  (was: Open)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Status: Open  (was: Patch Available)

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
>

[jira] [Updated] (HIVE-15388) HiveParser spends lots of time in parsing queries with lots of "("

2017-02-10 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-15388:
---
Attachment: HIVE-15388.08.patch

> HiveParser spends lots of time in parsing queries with lots of "("
> --
>
> Key: HIVE-15388
> URL: https://issues.apache.org/jira/browse/HIVE-15388
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.2.0
>Reporter: Rajesh Balamohan
>Assignee: Pengcheng Xiong
> Attachments: HIVE-15388.01.patch, HIVE-15388.02.patch, 
> HIVE-15388.03.patch, HIVE-15388.04.patch, HIVE-15388.05.patch, 
> HIVE-15388.06.patch, HIVE-15388.07.patch, HIVE-15388.08.patch, 
> hive-15388.stacktrace.txt
>
>
> Branch: apache-master (applicable with previous releases as well)
> Queries generated via tools can have lots of "(" for "AND/OR" conditions. 
> This causes huge delays in parsing phase when the number of expressions are 
> high.
> e.g
> {noformat}
> SELECT `iata`,
>`airport`,
>`city`,
>`state`,
>`country`,
>`lat`,
>`lon`
> FROM airports
> WHERE 
> ((`airports`.`airport`
>  = "Thigpen"
>   
>   OR `airports`.`airport` = "Astoria Regional")
>   
>  OR `airports`.`airport` = "Warsaw Municipal")
>   
> OR `airports`.`airport` = "John F Kennedy Memorial")
>  
> OR `airports`.`airport` = "Hall-Miller Municipal")
> 
> OR `airports`.`airport` = "Atqasuk")
>OR 
> `airports`.`airport` = "William B Hartsfield-Atlanta Intl")
>   OR 
> `airports`.`airport` = "Artesia Municipal")
>  OR 
> `airports`.`airport` = "Outagamie County Regional")
> OR 
> `airports`.`airport` = "Watertown Municipal")
>OR 
> `airports`.`airport` = "Augusta State")
>   OR 
> `airports`.`airport` = "Aurora Municipal")
>  OR 
> `airports`.`airport` = "Alakanuk")
> OR 
> `airports`.`airport` = "Austin Municipal")
>OR 
> `airports`.`airport` = "Auburn Municipal")
>   OR 
> `airports`.`airport` = "Auburn-Opelik")
>  OR 
> `airports`.`airport` = "Austin-Bergstrom International")
> OR 
> `airports`.`airport` = "Wausau Municipal")
>OR 
> `airports`.`airport` = "Mecklenburg-Brunswick Regional")
>   OR 
> `airports`.`airport` = "Alva Regional")
>  OR 
> `airports`.`airport` = "Asheville Regional")
> OR 
> `airports`.`airport` = "Avon Park Municipal")
>OR 
> `airports`.`airport` = "Wilkes-Barre/Scranton Intl")
>   OR 
> `airports`.`airport` = "Marana Northwest Regional")
>  OR 
> `airports`.`airport` = "Catalina")
> OR 
> `airports`.`airport` = "Washington Municipal")
>OR 
> `airports`.`airport` = "Wainwright")
>   OR `airports`.`airport` 
> = "West Memphis Municipal")
>  OR `airports`.`airport` 
> = "Arlington Municipal")
> OR `airports`.`airport` = 
> "Algona Municipal")
>OR `airports`.`airport` = 
> "Chandler")
> 

[jira] [Commented] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861727#comment-15861727
 ] 

Eugene Koifman commented on HIVE-15839:
---

failures have age > 1

[~wzheng] could you review please

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch, HIVE-15839.02.patch, 
> HIVE-15839.03.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861723#comment-15861723
 ] 

Hive QA commented on HIVE-15839:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852094/HIVE-15839.03.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10246 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3493/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3493/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3493/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852094 - PreCommit-HIVE-Build

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch, HIVE-15839.02.patch, 
> HIVE-15839.03.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861722#comment-15861722
 ] 

slim bouguerra commented on HIVE-15877:
---

[~ashutoshc] can you please review this ? 


> Upload dependency jars for druid storage handler
> 
>
> Key: HIVE-15877
> URL: https://issues.apache.org/jira/browse/HIVE-15877
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15877.patch
>
>
> Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15877:
--
Attachment: HIVE-15877.patch

> Upload dependency jars for druid storage handler
> 
>
> Key: HIVE-15877
> URL: https://issues.apache.org/jira/browse/HIVE-15877
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15877.patch
>
>
> Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15877:
--
Status: Patch Available  (was: Open)

> Upload dependency jars for druid storage handler
> 
>
> Key: HIVE-15877
> URL: https://issues.apache.org/jira/browse/HIVE-15877
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
> Attachments: HIVE-15877.patch
>
>
> Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15876) Add the ability to define composed functions

2017-02-10 Thread Peter Attardo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Attardo updated HIVE-15876:
-
Description: 
I'm not entirely sure the solution implied by the subject is the best one, but 
it's a useful shorthand for addressing the problem I'm seeing.

I have a use case for wanting to see if a given unix timestamp (with 
milliseconds) falls within a date range expressed by a start_date and an 
end_date in '-MM-dd' format. Without any UDFs, it would look something like 
this:

...
WHERE
time >= unix_timestamp('${start_date}','-MM-dd')*1000
AND time < unix_timestamp(date_add('${end_date}', 1),'-MM-dd')*1000

This condition is obviously quite a pain to read and write, and one that's easy 
to get wrong when trying to reproduce. I would like to simplify to something 
like:

...
WHERE time_in_range(time, '${start_date}','${end_date}')

I was able to write a UDF for the above relatively easily, but when testing it, 
it performed notably worse than the first example. The reason for that quickly 
became clear. Even though both functions are deterministic, that is largely 
irrelevant in the latter example, because the 'time' variable is not static. 
The query optimizer can do nothing with it, and must do the full function 
evaluation on each row. Whereas in the first example the optimizer can see that 
the "sub-functions" of unix_timestamp and date_add are both deterministic and 
have static inputs, and will only evaluate them once for the whole query.

I would like a way to define a single function that maintains the ability for 
the optimizer to see which of its constituent parts only need be evaluated 
once. Whether that is some syntax in the CREATE FUNCTION DDL or some 
annotations within scala; either would be incredibly useful.

  was:
I'm not entirely sure the solution implied by the subject is the best one, but 
it's a useful shorthand for addressing the problem I'm seeing.

I have a use case for wanting to see if a given unix timestamp (with 
microseconds) falls within a date range expressed by a start_date and an 
end_date in '-MM-dd' format. Without any UDFs, it would look something like 
this:

...
WHERE
time >= unix_timestamp('${start_date}','-MM-dd')*1000
AND time < unix_timestamp(date_add('${end_date}', 1),'-MM-dd')*1000

This condition is obviously quite a pain to read and write, and one that's easy 
to get wrong when trying to reproduce. I would like to simplify to something 
like:

...
WHERE time_in_range(time, '${start_date}','${end_date}')

I was able to write a UDF for the above relatively easily, but when testing it, 
it performed notably worse than the first example. The reason for that quickly 
became clear. Even though both functions are deterministic, that is largely 
irrelevant in the latter example, because the 'time' variable is not static. 
The query optimizer can do nothing with it, and must do the full function 
evaluation on each row. Whereas in the first example the optimizer can see that 
the "sub-functions" of unix_timestamp and date_add are both deterministic and 
have static inputs, and will only evaluate them once for the whole query.

I would like a way to define a single function that maintains the ability for 
the optimizer to see which of its constituent parts only need be evaluated 
once. Whether that is some syntax in the CREATE FUNCTION DDL or some 
annotations within scala; either would be incredibly useful.


> Add the ability to define composed functions
> 
>
> Key: HIVE-15876
> URL: https://issues.apache.org/jira/browse/HIVE-15876
> Project: Hive
>  Issue Type: Wish
>Reporter: Peter Attardo
>Priority: Minor
>
> I'm not entirely sure the solution implied by the subject is the best one, 
> but it's a useful shorthand for addressing the problem I'm seeing.
> I have a use case for wanting to see if a given unix timestamp (with 
> milliseconds) falls within a date range expressed by a start_date and an 
> end_date in '-MM-dd' format. Without any UDFs, it would look something 
> like this:
> ...
> WHERE
> time >= unix_timestamp('${start_date}','-MM-dd')*1000
> AND time < unix_timestamp(date_add('${end_date}', 1),'-MM-dd')*1000
> This condition is obviously quite a pain to read and write, and one that's 
> easy to get wrong when trying to reproduce. I would like to simplify to 
> something like:
> ...
> WHERE time_in_range(time, '${start_date}','${end_date}')
> I was able to write a UDF for the above relatively easily, but when testing 
> it, it performed notably worse than the first example. The reason for that 
> quickly became clear. Even though both functions are deterministic, that is 
> largely irrelevant in the latter example, because the 'time' variable is not 
> static. The query optimizer can do nothing 

[jira] [Updated] (HIVE-15786) Provide additional information from the llapstatus command

2017-02-10 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-15786:
--
Attachment: HIVE-15786.03.patch

Updated patch.
- Rebased
- Fixed comments
- Added a separate logger and appender. (Want this information to go into 
llap-cli.log)
- Fixes to make sure this is logged once at the end.
- Some fixes to watchmode for states other than SUCCEEDED.

[~prasanth_j] - can you please take a look.

> Provide additional information from the llapstatus command
> --
>
> Key: HIVE-15786
> URL: https://issues.apache.org/jira/browse/HIVE-15786
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-15786.01.patch, HIVE-15786.03.patch
>
>
> Slider is making enhancements to provide additional information like 
> completed containers, pending containers etc.
> Integrate with this to provide additional details in llapstatus.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15799) LLAP: rename VertorDeserializeOrcWriter

2017-02-10 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861705#comment-15861705
 ] 

Prasanth Jayachandran commented on HIVE-15799:
--

+1

> LLAP: rename VertorDeserializeOrcWriter
> ---
>
> Key: HIVE-15799
> URL: https://issues.apache.org/jira/browse/HIVE-15799
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-15799.patch
>
>
> As convenient as it is to grep for, based on continuous RB comments I am not 
> sure the world is yet ready for vertorized execution.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra reassigned HIVE-15877:
-

Assignee: slim bouguerra

> Upload dependency jars for druid storage handler
> 
>
> Key: HIVE-15877
> URL: https://issues.apache.org/jira/browse/HIVE-15877
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>Assignee: slim bouguerra
>
> Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15877:
--
Affects Version/s: 2.2.0

> Upload dependency jars for druid storage handler
> 
>
> Key: HIVE-15877
> URL: https://issues.apache.org/jira/browse/HIVE-15877
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>
> Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15877) Upload dependency jars for druid storage handler

2017-02-10 Thread slim bouguerra (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

slim bouguerra updated HIVE-15877:
--
Component/s: Druid integration

> Upload dependency jars for druid storage handler
> 
>
> Key: HIVE-15877
> URL: https://issues.apache.org/jira/browse/HIVE-15877
> Project: Hive
>  Issue Type: Bug
>  Components: Druid integration
>Affects Versions: 2.2.0
>Reporter: slim bouguerra
>
> Upload dependency jars for druid storage handler



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15876) Add the ability to define composed functions

2017-02-10 Thread Peter Attardo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Attardo updated HIVE-15876:
-
Description: 
I'm not entirely sure the solution implied by the subject is the best one, but 
it's a useful shorthand for addressing the problem I'm seeing.

I have a use case for wanting to see if a given unix timestamp (with 
microseconds) falls within a date range expressed by a start_date and an 
end_date in '-MM-dd' format. Without any UDFs, it would look something like 
this:

...
WHERE
time >= unix_timestamp('${start_date}','-MM-dd')*1000
AND time < unix_timestamp(date_add('${end_date}', 1),'-MM-dd')*1000

This condition is obviously quite a pain to read and write, and one that's easy 
to get wrong when trying to reproduce. I would like to simplify to something 
like:

...
WHERE time_in_range(time, '${start_date}','${end_date}')

I was able to write a UDF for the above relatively easily, but when testing it, 
it performed notably worse than the first example. The reason for that quickly 
became clear. Even though both functions are deterministic, that is largely 
irrelevant in the latter example, because the 'time' variable is not static. 
The query optimizer can do nothing with it, and must do the full function 
evaluation on each row. Whereas in the first example the optimizer can see that 
the "sub-functions" of unix_timestamp and date_add are both deterministic and 
have static inputs, and will only evaluate them once for the whole query.

I would like a way to define a single function that maintains the ability for 
the optimizer to see which of its constituent parts only need be evaluated 
once. Whether that is some syntax in the CREATE FUNCTION DDL or some 
annotations within scala; either would be incredibly useful.

  was:
I'm not entirely sure the solution implied by the subject is the best one, but 
it's a useful shorthand for addressing the problem I'm seeing.

I have a use case for wanting to see if a given unix timestamp (with 
milliseconds) falls within a date range expressed by a start_date and an 
end_date in '-MM-dd' format. Without any UDFs, it would look something like 
this:

...
WHERE
time >= unix_timestamp('${start_date}','-MM-dd')*1000
AND time < unix_timestamp(date_add('${end_date}', 1),'-MM-dd')*1000

This condition is obviously quite a pain to read and write, and one that's easy 
to get wrong when trying to reproduce. I would like to simplify to something 
like:

...
WHERE time_in_range(time, '${start_date}','${end_date}')

I was able to write a UDF for the above relatively easily, but when testing it, 
it performed notably worse than the first example. The reason for that quickly 
became clear. Even though both functions are deterministic, that is largely 
irrelevant in the latter example, because the 'time' variable is not static. 
The query optimizer can do nothing with it, and must do the full function 
evaluation on each row. Whereas in the first example the optimizer can see that 
the "sub-functions" of unix_timestamp and date_add are both deterministic and 
have static inputs, and will only evaluate them once for the whole query.

I would like a way to define a single function that maintains the ability for 
the optimizer to see which of its constituent parts only need be evaluated 
once. Whether that is some syntax in the CREATE FUNCTION DDL or some 
annotations within scala; either would be incredibly useful.


> Add the ability to define composed functions
> 
>
> Key: HIVE-15876
> URL: https://issues.apache.org/jira/browse/HIVE-15876
> Project: Hive
>  Issue Type: Wish
>Reporter: Peter Attardo
>Priority: Minor
>
> I'm not entirely sure the solution implied by the subject is the best one, 
> but it's a useful shorthand for addressing the problem I'm seeing.
> I have a use case for wanting to see if a given unix timestamp (with 
> microseconds) falls within a date range expressed by a start_date and an 
> end_date in '-MM-dd' format. Without any UDFs, it would look something 
> like this:
> ...
> WHERE
> time >= unix_timestamp('${start_date}','-MM-dd')*1000
> AND time < unix_timestamp(date_add('${end_date}', 1),'-MM-dd')*1000
> This condition is obviously quite a pain to read and write, and one that's 
> easy to get wrong when trying to reproduce. I would like to simplify to 
> something like:
> ...
> WHERE time_in_range(time, '${start_date}','${end_date}')
> I was able to write a UDF for the above relatively easily, but when testing 
> it, it performed notably worse than the first example. The reason for that 
> quickly became clear. Even though both functions are deterministic, that is 
> largely irrelevant in the latter example, because the 'time' variable is not 
> static. The query optimizer can do nothing 

[jira] [Commented] (HIVE-15858) Beeline ^C doesn't close the session

2017-02-10 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861651#comment-15861651
 ] 

Vihang Karajgaonkar commented on HIVE-15858:


bq. For the desired behavior described in HIVE-15626 for ^C, that would need to 
be dealt with in signalhandler. So I think it makes sense to handle this two 
different requirements in two different places.

Got it. In that case patch looks good to me too. Thanks [~thejas] and [~sankarh]

> Beeline ^C doesn't close the session
> 
>
> Key: HIVE-15858
> URL: https://issues.apache.org/jira/browse/HIVE-15858
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Reporter: Sankar Hariappan
>Assignee: Sankar Hariappan
> Attachments: HIVE-15858.01.patch
>
>
> When open multiple connections through Beeline to Hiveserver2 and if tries to 
> close the client using !quit or ^C command, it looks like all the 
> connections/sessions are not getting closed.
> !quit seems to close the current active connection but fails to close other 
> open sessions.
> ^C doesn't close any session.
> This behaviour is noticed only with the HTTP mode of transport 
> (hive.server2.transport.mode=http). In case of BINARY mode, server triggers 
> the close session when a tcp connection is closed by peer.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15626) beeline should not exit after canceling the query on ctrl-c

2017-02-10 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861635#comment-15861635
 ] 

Vihang Karajgaonkar commented on HIVE-15626:


Thanks [~thejas] for the information. I am okay with this patch not being 
committed to branch-1. Sergey mentioned 1.2 when the JIRA was created and hence 
I thought of including this patch in branch-1. I agree, lets stabilize branch-1 
first before adding new patches.

> beeline should not exit after canceling the query on ctrl-c
> ---
>
> Key: HIVE-15626
> URL: https://issues.apache.org/jira/browse/HIVE-15626
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
>Reporter: Sergey Shelukhin
>Assignee: Vihang Karajgaonkar
> Fix For: 2.2.0
>
> Attachments: HIVE-15626.01.patch
>
>
> I am seeing this in 1.2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15866) LazySimpleDeserializeRead doesn't recognized lower case 'true' properly

2017-02-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15866:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> LazySimpleDeserializeRead doesn't recognized lower case 'true' properly
> ---
>
> Key: HIVE-15866
> URL: https://issues.apache.org/jira/browse/HIVE-15866
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE--15866.01.patch, HIVE--15866.02.patch
>
>
> The if stmt looks at the wrong index for the lower case variant...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15866) LazySimpleDeserializeRead doesn't recognized lower case 'true' properly

2017-02-10 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861632#comment-15861632
 ] 

Matt McCline commented on HIVE-15866:
-

Committed to master.

> LazySimpleDeserializeRead doesn't recognized lower case 'true' properly
> ---
>
> Key: HIVE-15866
> URL: https://issues.apache.org/jira/browse/HIVE-15866
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE--15866.01.patch, HIVE--15866.02.patch
>
>
> The if stmt looks at the wrong index for the lower case variant...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15866) LazySimpleDeserializeRead doesn't recognized lower case 'true' properly

2017-02-10 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-15866:

Attachment: HIVE--15866.02.patch

Update golden file.

> LazySimpleDeserializeRead doesn't recognized lower case 'true' properly
> ---
>
> Key: HIVE-15866
> URL: https://issues.apache.org/jira/browse/HIVE-15866
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Fix For: 2.2.0
>
> Attachments: HIVE--15866.01.patch, HIVE--15866.02.patch
>
>
> The if stmt looks at the wrong index for the lower case variant...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-02-10 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861611#comment-15861611
 ] 

Marcelo Vanzin commented on HIVE-15671:
---

bq. I was talking about server detecting a driver problem after it has 
connected back to the server.

Hmm. That is definitely not any of the "connect" timeouts, which probably means 
it isn't configured and is just using netty's default (which is probably no 
timeout?). Would probably need something using 
{{io.netty.handler.timeout.IdleStateHandler}}, and also some periodic "ping" so 
that the connection isn't torn down without reason.

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15839:
--
Status: Patch Available  (was: Open)

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch, HIVE-15839.02.patch, 
> HIVE-15839.03.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15839:
--
Attachment: HIVE-15839.03.patch

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch, HIVE-15839.02.patch, 
> HIVE-15839.03.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15839:
--
Status: Open  (was: Patch Available)

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch, HIVE-15839.02.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15839:
--
Status: Patch Available  (was: Open)

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-14901) HiveServer2: Use user supplied fetch size to determine #rows serialized in tasks

2017-02-10 Thread Norris Lee (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861573#comment-15861573
 ] 

Norris Lee commented on HIVE-14901:
---

Review board: https://reviews.apache.org/r/56555/

> HiveServer2: Use user supplied fetch size to determine #rows serialized in 
> tasks
> 
>
> Key: HIVE-14901
> URL: https://issues.apache.org/jira/browse/HIVE-14901
> Project: Hive
>  Issue Type: Sub-task
>  Components: HiveServer2, JDBC, ODBC
>Affects Versions: 2.1.0
>Reporter: Vaibhav Gumashta
>Assignee: Norris Lee
> Attachments: HIVE-14901.1.patch, HIVE-14901.2.patch, 
> HIVE-14901.3.patch, HIVE-14901.patch
>
>
> Currently, we use {{hive.server2.thrift.resultset.max.fetch.size}} to decide 
> the max number of rows that we write in tasks. However, we should ideally use 
> the user supplied value (which can be extracted from the 
> ThriftCLIService.FetchResults' request parameter) to decide how many rows to 
> serialize in a blob in the tasks. We should however use 
> {{hive.server2.thrift.resultset.max.fetch.size}} to have an upper bound on 
> it, so that we don't go OOM in tasks and HS2. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15871) Cannot insert into target table because column number/types are different with hive.merge.cardinality.check=false

2017-02-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15871:
--
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

committed to master
thanks Wei for the review

> Cannot insert into target table because column number/types are different 
> with hive.merge.cardinality.check=false 
> --
>
> Key: HIVE-15871
> URL: https://issues.apache.org/jira/browse/HIVE-15871
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Fix For: 2.2.0
>
> Attachments: HIVE-15871.02.patch, HIVE-15871.03.patch
>
>
> Merge statement with WHEN MATCHED and hive.merge.cardinality.check=false
> causes errors like
> {noformat}
> FAILED: SemanticException [Error 10044]: Line 11:12 Cannot insert into target 
> table because column number/types are different 'part_0': Table insclause-0 
> has 3 columns, but query has 4 columns.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-15839:
--
Attachment: HIVE-15839.02.patch

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch, HIVE-15839.02.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (HIVE-15839) Don't force cardinality check if only WHEN NOT MATCHED is specified

2017-02-10 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman reassigned HIVE-15839:
-

Assignee: Eugene Koifman

> Don't force cardinality check if only WHEN NOT MATCHED is specified
> ---
>
> Key: HIVE-15839
> URL: https://issues.apache.org/jira/browse/HIVE-15839
> Project: Hive
>  Issue Type: Sub-task
>  Components: Query Planning, Query Processor, Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Minor
> Attachments: HIVE-15839.01.patch
>
>
> should've been part of HIVE-14949
> if only WHEN NOT MATCHED is specified then the join is basically an anti-join 
> and we are not retrieving any values from target side. So the cardinality 
> check is just overhead (though presumably very minor since the filter above 
> the join will filter everything and thus there is nothing to group)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15430) Change SchemaTool table validator to test based on the dbType

2017-02-10 Thread Naveen Gangam (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861539#comment-15861539
 ] 

Naveen Gangam commented on HIVE-15430:
--

review posted at https://reviews.apache.org/r/56549/

[~aihuaxu] [~ctang.ma] [~ychena] Could you please review? Thanks

> Change SchemaTool table validator to test based on the dbType
> -
>
> Key: HIVE-15430
> URL: https://issues.apache.org/jira/browse/HIVE-15430
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15430.1.patch, HIVE-15430.2.patch
>
>
> Currently the validator parses the "oracle" schema file to determine what 
> tables are expected in the database.  (mostly because of ease of parsing the 
> schema file compared to other syntax). We have learnt from HIVE-15118, that 
> not all schema files have the same amount of tables. For example, derby has 
> an old table that is never used that other DBs do not contain).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15871) Cannot insert into target table because column number/types are different with hive.merge.cardinality.check=false

2017-02-10 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861537#comment-15861537
 ] 

Wei Zheng commented on HIVE-15871:
--

+1

> Cannot insert into target table because column number/types are different 
> with hive.merge.cardinality.check=false 
> --
>
> Key: HIVE-15871
> URL: https://issues.apache.org/jira/browse/HIVE-15871
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15871.02.patch, HIVE-15871.03.patch
>
>
> Merge statement with WHEN MATCHED and hive.merge.cardinality.check=false
> causes errors like
> {noformat}
> FAILED: SemanticException [Error 10044]: Line 11:12 Cannot insert into target 
> table because column number/types are different 'part_0': Table insclause-0 
> has 3 columns, but query has 4 columns.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15430) Change SchemaTool table validator to test based on the dbType

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861478#comment-15861478
 ] 

Hive QA commented on HIVE-15430:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852065/HIVE-15430.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10241 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver.org.apache.hadoop.hive.cli.TestSparkNegativeCliDriver
 (batchId=230)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3492/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3492/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3492/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852065 - PreCommit-HIVE-Build

> Change SchemaTool table validator to test based on the dbType
> -
>
> Key: HIVE-15430
> URL: https://issues.apache.org/jira/browse/HIVE-15430
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15430.1.patch, HIVE-15430.2.patch
>
>
> Currently the validator parses the "oracle" schema file to determine what 
> tables are expected in the database.  (mostly because of ease of parsing the 
> schema file compared to other syntax). We have learnt from HIVE-15118, that 
> not all schema files have the same amount of tables. For example, derby has 
> an old table that is never used that other DBs do not contain).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15871) Cannot insert into target table because column number/types are different with hive.merge.cardinality.check=false

2017-02-10 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861462#comment-15861462
 ] 

Eugene Koifman commented on HIVE-15871:
---

all failures have age > 1

> Cannot insert into target table because column number/types are different 
> with hive.merge.cardinality.check=false 
> --
>
> Key: HIVE-15871
> URL: https://issues.apache.org/jira/browse/HIVE-15871
> Project: Hive
>  Issue Type: Sub-task
>  Components: Transactions
>Affects Versions: 2.2.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-15871.02.patch, HIVE-15871.03.patch
>
>
> Merge statement with WHEN MATCHED and hive.merge.cardinality.check=false
> causes errors like
> {noformat}
> FAILED: SemanticException [Error 10044]: Line 11:12 Cannot insert into target 
> table because column number/types are different 'part_0': Table insclause-0 
> has 3 columns, but query has 4 columns.
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13780) Allow user to update AVRO table schema via command even if table's definition was defined through schema file

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861419#comment-15861419
 ] 

Hive QA commented on HIVE-13780:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852059/HIVE-13780.0.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10247 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3491/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3491/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3491/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852059 - PreCommit-HIVE-Build

> Allow user to update AVRO table schema via command even if table's definition 
> was defined through schema file
> -
>
> Key: HIVE-13780
> URL: https://issues.apache.org/jira/browse/HIVE-13780
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 2.0.0
>Reporter: Eric Lin
>Assignee: Adam Szita
>Priority: Minor
> Attachments: HIVE-13780.0.patch
>
>
> If a table is defined as below:
> {code}
> CREATE TABLE test
> STORED AS AVRO 
> TBLPROPERTIES ('avro.schema.url'='/tmp/schema.json');
> {code}
> if user tries to run command:
> {code}
> ALTER TABLE test CHANGE COLUMN col1 col1 STRING COMMENT 'test comment';
> {code}
> The query will return without any warning, but has no affect to the table.
> It would be good if we can allow user to ALTER table (add/change column, 
> update comment etc) even though the schema is defined through schema file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15430) Change SchemaTool table validator to test based on the dbType

2017-02-10 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15430:
-
Attachment: HIVE-15430.2.patch

Fixing a test failure for derby where the tables parsed from schema files 
contained "APP." where as the tables from the DB via JDBC returned just 
the tablename without the "APP." prefix. Fixed the schema tool to accommodate 
this difference.

> Change SchemaTool table validator to test based on the dbType
> -
>
> Key: HIVE-15430
> URL: https://issues.apache.org/jira/browse/HIVE-15430
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15430.1.patch, HIVE-15430.2.patch
>
>
> Currently the validator parses the "oracle" schema file to determine what 
> tables are expected in the database.  (mostly because of ease of parsing the 
> schema file compared to other syntax). We have learnt from HIVE-15118, that 
> not all schema files have the same amount of tables. For example, derby has 
> an old table that is never used that other DBs do not contain).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15430) Change SchemaTool table validator to test based on the dbType

2017-02-10 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15430:
-
Status: Patch Available  (was: Open)

> Change SchemaTool table validator to test based on the dbType
> -
>
> Key: HIVE-15430
> URL: https://issues.apache.org/jira/browse/HIVE-15430
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15430.1.patch, HIVE-15430.2.patch
>
>
> Currently the validator parses the "oracle" schema file to determine what 
> tables are expected in the database.  (mostly because of ease of parsing the 
> schema file compared to other syntax). We have learnt from HIVE-15118, that 
> not all schema files have the same amount of tables. For example, derby has 
> an old table that is never used that other DBs do not contain).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15430) Change SchemaTool table validator to test based on the dbType

2017-02-10 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-15430:
-
Status: Open  (was: Patch Available)

> Change SchemaTool table validator to test based on the dbType
> -
>
> Key: HIVE-15430
> URL: https://issues.apache.org/jira/browse/HIVE-15430
> Project: Hive
>  Issue Type: Sub-task
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Naveen Gangam
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-15430.1.patch
>
>
> Currently the validator parses the "oracle" schema file to determine what 
> tables are expected in the database.  (mostly because of ease of parsing the 
> schema file compared to other syntax). We have learnt from HIVE-15118, that 
> not all schema files have the same amount of tables. For example, derby has 
> an old table that is never used that other DBs do not contain).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-15872) The PERCENTILE UDAF does not work with empty set

2017-02-10 Thread Chaozhong Yang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaozhong Yang updated HIVE-15872:
--
Description: 
1. Original SQL:

select
percentile_approx(
column0,
array(0.50, 0.70, 0.90, 0.95, 0.99)
)
from
my_table
where
date = '20170207'
and column1 = 'value1'
and column2 = 'value2'
and column3 = 'value3'
and column4 = 'value4'
and column5 = 'value5'

2. Exception StackTrace:

Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244) ... 
7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:766)
 at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:235) 
... 7 more Caused by: java.lang.IndexOutOfBoundsException: Index: 2, Size: 2 at 
java.util.ArrayList.rangeCheck(ArrayList.java:653) at 
java.util.ArrayList.get(ArrayList.java:429) at 
org.apache.hadoop.hive.ql.udf.generic.NumericHistogram.merge(NumericHistogram.java:134)
 at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFPercentileApprox$GenericUDAFPercentileApproxEvaluator.merge(GenericUDAFPercentileApprox.java:318)
 at 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFEvaluator.aggregate(GenericUDAFEvaluator.java:188)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.updateAggregations(GroupByOperator.java:612)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:851)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:695)
 at 
org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:761)
 ... 8 more

3. review data:

select
column0
from
my_table
where
date = '20170207'
and column1 = 'value1'
and column2 = 'value2'
and column3 = 'value3'
and column4 = 'value4'
and column5 = 'value5'

After run this sql, we found the result is NULL.

4. what's the meaning of [0.0, 1.0] in stacktrace?

In GenericUDAFPercentileApproxEvaluator, the method `merge` should process an 
ArrayList which name is partialHistogram. Normally, the basic structure of 
partialHistogram is [npercentiles, percentile0, percentile1..., nbins, bin0.x, 
bin0.y, bin1.x, bin1.y,...]. However, if we process NULL(empty set) column 
values, the partialHistoram will only contains [npercentiles(0), nbins(1)]. 
That's the reason why the stacktrace shows a strange row data: 
{"key":{},"value":{"_col0":[0.0,1.0]}}

Before we call histogram#merge (on-line hisgoram algorithm from paper: 
http://www.jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf ), the 
partialHistogram should remove elements which store percentiles like 
`partialHistogram.subList(0, nquantiles+1).clear();`. In the case of empty set, 
GenericUDAFPercentileApproxEvaluator will not remove percentiles. Consequently, 
NumericHistogram will merge a list which contains only 2 elements([0, 1.0]) 
and throws IndexOutOfBoundsException. 

  was:
1. Original SQL:

select
percentile_approx(
column0,
array(0.50, 0.70, 0.90, 0.95, 0.99)
)
from
my_table
where
date = '20170207'
and column1 = 'value1'
and column2 = 'value2'
and column3 = 'value3'
and column4 = 'value4'
and column5 = 'value5'

2. Exception StackTrace:

Error: java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
processing row (tag=0) {"key":{},"value":{"_col0":[0.0,1.0]}} at 
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:256) at 
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:453) at 
org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:401) at 
org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at 
java.security.AccessController.doPrivileged(Native Method) at 
javax.security.auth.Subject.doAs(Subject.java:422) at 

[jira] [Commented] (HIVE-12767) Implement table property to address Parquet int96 timestamp bug

2017-02-10 Thread Barna Zsombor Klara (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861366#comment-15861366
 ] 

Barna Zsombor Klara commented on HIVE-12767:


query14 and encryption_join_with_different_encryption_keys are know flaky tests 
(HIVE-15744, HIVE-15696)

> Implement table property to address Parquet int96 timestamp bug
> ---
>
> Key: HIVE-12767
> URL: https://issues.apache.org/jira/browse/HIVE-12767
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-12767.3.patch, HIVE-12767.4.patch, 
> HIVE-12767.5.patch, HIVE-12767.6.patch, HIVE-12767.7.patch, 
> HIVE-12767.8.patch, TestNanoTimeUtils.java
>
>
> Parque timestamps using INT96 are not compatible with other tools, like 
> Impala, due to issues in Hive because it adjusts timezones values in a 
> different way than Impala.
> To address such issues. a new table property (parquet.mr.int96.write.zone) 
> must be used in Hive that detects what timezone to use when writing and 
> reading timestamps from Parquet.
> The following is the exit criteria for the fix:
> * Hive will read Parquet MR int96 timestamp data and adjust values using a 
> time zone from a table property, if set, or using the local time zone if it 
> is absent. No adjustment will be applied to data written by Impala.
> * Hive will write Parquet int96 timestamps using a time zone adjustment from 
> the same table property, if set, or using the local time zone if it is 
> absent. This keeps the data in the table consistent.
> * New tables created by Hive will set the table property to UTC if the global 
> option to set the property for new tables is enabled.
> ** Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
> the property unless the global setting to do so is enabled.
> ** Tables created using CREATE TABLE LIKE  will copy the 
> property of the table that is copied.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-12767) Implement table property to address Parquet int96 timestamp bug

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-12767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861350#comment-15861350
 ] 

Hive QA commented on HIVE-12767:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852050/HIVE-12767.8.patch

{color:green}SUCCESS:{color} +1 due to 8 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10258 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3490/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3490/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3490/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852050 - PreCommit-HIVE-Build

> Implement table property to address Parquet int96 timestamp bug
> ---
>
> Key: HIVE-12767
> URL: https://issues.apache.org/jira/browse/HIVE-12767
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-12767.3.patch, HIVE-12767.4.patch, 
> HIVE-12767.5.patch, HIVE-12767.6.patch, HIVE-12767.7.patch, 
> HIVE-12767.8.patch, TestNanoTimeUtils.java
>
>
> Parque timestamps using INT96 are not compatible with other tools, like 
> Impala, due to issues in Hive because it adjusts timezones values in a 
> different way than Impala.
> To address such issues. a new table property (parquet.mr.int96.write.zone) 
> must be used in Hive that detects what timezone to use when writing and 
> reading timestamps from Parquet.
> The following is the exit criteria for the fix:
> * Hive will read Parquet MR int96 timestamp data and adjust values using a 
> time zone from a table property, if set, or using the local time zone if it 
> is absent. No adjustment will be applied to data written by Impala.
> * Hive will write Parquet int96 timestamps using a time zone adjustment from 
> the same table property, if set, or using the local time zone if it is 
> absent. This keeps the data in the table consistent.
> * New tables created by Hive will set the table property to UTC if the global 
> option to set the property for new tables is enabled.
> ** Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
> the property unless the global setting to do so is enabled.
> ** Tables created using CREATE TABLE LIKE  will copy the 
> property of the table that is copied.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-13780) Allow user to update AVRO table schema via command even if table's definition was defined through schema file

2017-02-10 Thread Adam Szita (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861344#comment-15861344
 ] 

Adam Szita commented on HIVE-13780:
---

Approach in [^HIVE-13780.0.patch]:
-added new config property to enable leaving schema literal / url tblproperties 
upon ALTER TABLE commands on Avro table
-if this property is set, ALTER TABLE command will clear the schema literal and 
url parameters (if set) and will update HMS with the new schema
-HMS will be responsible for handling the schema

> Allow user to update AVRO table schema via command even if table's definition 
> was defined through schema file
> -
>
> Key: HIVE-13780
> URL: https://issues.apache.org/jira/browse/HIVE-13780
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 2.0.0
>Reporter: Eric Lin
>Assignee: Adam Szita
>Priority: Minor
> Attachments: HIVE-13780.0.patch
>
>
> If a table is defined as below:
> {code}
> CREATE TABLE test
> STORED AS AVRO 
> TBLPROPERTIES ('avro.schema.url'='/tmp/schema.json');
> {code}
> if user tries to run command:
> {code}
> ALTER TABLE test CHANGE COLUMN col1 col1 STRING COMMENT 'test comment';
> {code}
> The query will return without any warning, but has no affect to the table.
> It would be good if we can allow user to ALTER table (add/change column, 
> update comment etc) even though the schema is defined through schema file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13780) Allow user to update AVRO table schema via command even if table's definition was defined through schema file

2017-02-10 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-13780:
--
Status: Patch Available  (was: In Progress)

> Allow user to update AVRO table schema via command even if table's definition 
> was defined through schema file
> -
>
> Key: HIVE-13780
> URL: https://issues.apache.org/jira/browse/HIVE-13780
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 2.0.0
>Reporter: Eric Lin
>Assignee: Adam Szita
>Priority: Minor
> Attachments: HIVE-13780.0.patch
>
>
> If a table is defined as below:
> {code}
> CREATE TABLE test
> STORED AS AVRO 
> TBLPROPERTIES ('avro.schema.url'='/tmp/schema.json');
> {code}
> if user tries to run command:
> {code}
> ALTER TABLE test CHANGE COLUMN col1 col1 STRING COMMENT 'test comment';
> {code}
> The query will return without any warning, but has no affect to the table.
> It would be good if we can allow user to ALTER table (add/change column, 
> update comment etc) even though the schema is defined through schema file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-13780) Allow user to update AVRO table schema via command even if table's definition was defined through schema file

2017-02-10 Thread Adam Szita (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Szita updated HIVE-13780:
--
Attachment: HIVE-13780.0.patch

> Allow user to update AVRO table schema via command even if table's definition 
> was defined through schema file
> -
>
> Key: HIVE-13780
> URL: https://issues.apache.org/jira/browse/HIVE-13780
> Project: Hive
>  Issue Type: Improvement
>  Components: CLI
>Affects Versions: 2.0.0
>Reporter: Eric Lin
>Assignee: Adam Szita
>Priority: Minor
> Attachments: HIVE-13780.0.patch
>
>
> If a table is defined as below:
> {code}
> CREATE TABLE test
> STORED AS AVRO 
> TBLPROPERTIES ('avro.schema.url'='/tmp/schema.json');
> {code}
> if user tries to run command:
> {code}
> ALTER TABLE test CHANGE COLUMN col1 col1 STRING COMMENT 'test comment';
> {code}
> The query will return without any warning, but has no affect to the table.
> It would be good if we can allow user to ALTER table (add/change column, 
> update comment etc) even though the schema is defined through schema file.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12767) Implement table property to address Parquet int96 timestamp bug

2017-02-10 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-12767:
---
Attachment: HIVE-12767.8.patch

Rebased from master because the patch did not apply anymore.

> Implement table property to address Parquet int96 timestamp bug
> ---
>
> Key: HIVE-12767
> URL: https://issues.apache.org/jira/browse/HIVE-12767
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-12767.3.patch, HIVE-12767.4.patch, 
> HIVE-12767.5.patch, HIVE-12767.6.patch, HIVE-12767.7.patch, 
> HIVE-12767.8.patch, TestNanoTimeUtils.java
>
>
> Parque timestamps using INT96 are not compatible with other tools, like 
> Impala, due to issues in Hive because it adjusts timezones values in a 
> different way than Impala.
> To address such issues. a new table property (parquet.mr.int96.write.zone) 
> must be used in Hive that detects what timezone to use when writing and 
> reading timestamps from Parquet.
> The following is the exit criteria for the fix:
> * Hive will read Parquet MR int96 timestamp data and adjust values using a 
> time zone from a table property, if set, or using the local time zone if it 
> is absent. No adjustment will be applied to data written by Impala.
> * Hive will write Parquet int96 timestamps using a time zone adjustment from 
> the same table property, if set, or using the local time zone if it is 
> absent. This keeps the data in the table consistent.
> * New tables created by Hive will set the table property to UTC if the global 
> option to set the property for new tables is enabled.
> ** Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
> the property unless the global setting to do so is enabled.
> ** Tables created using CREATE TABLE LIKE  will copy the 
> property of the table that is copied.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (HIVE-12767) Implement table property to address Parquet int96 timestamp bug

2017-02-10 Thread Barna Zsombor Klara (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Barna Zsombor Klara updated HIVE-12767:
---
Attachment: (was: HIVE-12767.8.patch)

> Implement table property to address Parquet int96 timestamp bug
> ---
>
> Key: HIVE-12767
> URL: https://issues.apache.org/jira/browse/HIVE-12767
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1, 2.0.0
>Reporter: Sergio Peña
>Assignee: Barna Zsombor Klara
> Attachments: HIVE-12767.3.patch, HIVE-12767.4.patch, 
> HIVE-12767.5.patch, HIVE-12767.6.patch, HIVE-12767.7.patch, 
> TestNanoTimeUtils.java
>
>
> Parque timestamps using INT96 are not compatible with other tools, like 
> Impala, due to issues in Hive because it adjusts timezones values in a 
> different way than Impala.
> To address such issues. a new table property (parquet.mr.int96.write.zone) 
> must be used in Hive that detects what timezone to use when writing and 
> reading timestamps from Parquet.
> The following is the exit criteria for the fix:
> * Hive will read Parquet MR int96 timestamp data and adjust values using a 
> time zone from a table property, if set, or using the local time zone if it 
> is absent. No adjustment will be applied to data written by Impala.
> * Hive will write Parquet int96 timestamps using a time zone adjustment from 
> the same table property, if set, or using the local time zone if it is 
> absent. This keeps the data in the table consistent.
> * New tables created by Hive will set the table property to UTC if the global 
> option to set the property for new tables is enabled.
> ** Tables created using CREATE TABLE and CREATE TABLE LIKE FILE will not set 
> the property unless the global setting to do so is enabled.
> ** Tables created using CREATE TABLE LIKE  will copy the 
> property of the table that is copied.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15671) RPCServer.registerClient() erroneously uses server/client handshake timeout for connection timeout

2017-02-10 Thread KaiXu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861241#comment-15861241
 ] 

KaiXu commented on HIVE-15671:
--

I created HIVE-15859 for the issue, comments or suggestions are welcomed. 
Thanks!

> RPCServer.registerClient() erroneously uses server/client handshake timeout 
> for connection timeout
> --
>
> Key: HIVE-15671
> URL: https://issues.apache.org/jira/browse/HIVE-15671
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Affects Versions: 1.1.0
>Reporter: Xuefu Zhang
>Assignee: Xuefu Zhang
> Attachments: HIVE-15671.1.patch, HIVE-15671.patch
>
>
> {code}
>   /**
>* Tells the RPC server to expect a connection from a new client.
>* ...
>*/
>   public Future registerClient(final String clientId, String secret,
>   RpcDispatcher serverDispatcher) {
> return registerClient(clientId, secret, serverDispatcher, 
> config.getServerConnectTimeoutMs());
>   }
> {code}
> {{config.getServerConnectTimeoutMs()}} returns value for 
> *hive.spark.client.server.connect.timeout*, which is meant for timeout for 
> handshake between Hive client and remote Spark driver. Instead, the timeout 
> should be *hive.spark.client.connect.timeout*, which is for timeout for 
> remote Spark driver in connecting back to Hive client.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15229) 'like any' and 'like all' operators in hive

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861223#comment-15861223
 ] 

Hive QA commented on HIVE-15229:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852042/HIVE-15229.2.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3489/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3489/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3489/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2017-02-10 12:56:16.800
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-3489/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2017-02-10 12:56:16.803
+ cd apache-github-source-source
+ git fetch origin
+ git reset --hard HEAD
HEAD is now at c0198e5 HIVE-15850: Proper handling of timezone in Druid storage 
handler (Jesus Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is up-to-date with 'origin/master'.
+ git reset --hard origin/master
HEAD is now at c0198e5 HIVE-15850: Proper handling of timezone in Druid storage 
handler (Jesus Camacho Rodriguez, reviewed by Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2017-02-10 12:56:17.734
+ patchCommandPath=/data/hiveptest/working/scratch/smart-apply-patch.sh
+ patchFilePath=/data/hiveptest/working/scratch/build.patch
+ [[ -f /data/hiveptest/working/scratch/build.patch ]]
+ chmod +x /data/hiveptest/working/scratch/smart-apply-patch.sh
+ /data/hiveptest/working/scratch/smart-apply-patch.sh 
/data/hiveptest/working/scratch/build.patch
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g:393
error: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g: patch does not 
apply
error: patch failed: 
ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g:92
error: ql/src/java/org/apache/hadoop/hive/ql/parse/IdentifiersParser.g: patch 
does not apply
The patch does not appear to apply with p0, p1, or p2
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852042 - PreCommit-HIVE-Build

> 'like any' and 'like all' operators in hive
> ---
>
> Key: HIVE-15229
> URL: https://issues.apache.org/jira/browse/HIVE-15229
> Project: Hive
>  Issue Type: New Feature
>  Components: Operators
>Reporter: Simanchal Das
>Assignee: Simanchal Das
>Priority: Minor
> Attachments: HIVE-15229.1.patch, HIVE-15229.2.patch
>
>
> In Teradata 'like any' and 'like all' operators are mostly used when we are 
> matching a text field with numbers of patterns.
> 'like any' and 'like all' operator are equivalents of multiple like operator 
> like example below.
> {noformat}
> --like any
> select col1 from table1 where col2 like any ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like condition 
> select col1 from table1 where col2 like '%accountant%' or col2 like 
> '%accounting%' or col2 like '%retail%' or col2 like '%bank%' or col2 like 
> '%insurance%' ;
> --like all
> select col1 from table1 where col2 like all ('%accountant%', '%accounting%', 
> '%retail%', '%bank%', '%insurance%');
> --Can be written using multiple like operator 
> select col1 from table1 where col2 like '%accountant%' and col2 like 
> '%accounting%' and col2 like '%retail%' and col2 like '%bank%' and col2 like 
> '%insurance%' ;
> {noformat}
> Problem statement:
> Now a days so many data warehouse projects are being migrated from Teradata 
> to Hive.
> Always Data engineer and Business analyst are searching for these two 
> operator.
> If we introduce these two 

[jira] [Commented] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally

2017-02-10 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861218#comment-15861218
 ] 

Hive QA commented on HIVE-15860:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852041/HIVE-15860.2.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10227 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver
 (batchId=162)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3488/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3488/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3488/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852041 - PreCommit-HIVE-Build

> RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
> -
>
> Key: HIVE-15860
> URL: https://issues.apache.org/jira/browse/HIVE-15860
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15860.1.patch, HIVE-15860.2.patch
>
>
> It happens when RemoteDriver crashes between {{JobStarted}} and 
> {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will 
> consider the job has started, however it can't get the job info because it 
> hasn't received the JobId. Then the monitor will loop forever.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15849) hplsql should add enterGlobalScope func to UDF

2017-02-10 Thread Fei Hui (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861214#comment-15861214
 ] 

Fei Hui commented on HIVE-15849:


[~alangates] [~dmtolpeko] could you please give any suggestions ?

> hplsql should add enterGlobalScope func to UDF
> --
>
> Key: HIVE-15849
> URL: https://issues.apache.org/jira/browse/HIVE-15849
> Project: Hive
>  Issue Type: Bug
>  Components: hpl/sql
>Affects Versions: 2.2.0
>Reporter: Fei Hui
>Assignee: Fei Hui
> Attachments: HIVE-15849.patch
>
>
> code in Udf.java
> {code:title=Udf.java|borderStyle=solid}
> if (exec == null) {
>   exec = new Exec();
>   String query = queryOI.getPrimitiveJavaObject(arguments[0].get());
>   String[] args = { "-e", query, "-trace" };
>   try {
> exec.setUdfRun(true);
> exec.init(args);
>   } catch (Exception e) {
> throw new HiveException(e.getMessage());
>   }
> }
> if (arguments.length > 1) {
>   setParameters(arguments);
> }
> Var result = exec.run();
> if (result != null) {
>   return result.toString();
> }
> {code}
> Here is my thoughts
> {quote}
> we should add 'exec.enterGlobalScope(); '  between 'exec = new Exec();' and 
> 'setParameters(arguments);'
> Because if we do not call exec.enterGlobalScope(),  setParameters(arguments) 
> will useless. Vars are not added into scope , but exec.run() will use vars 
> which we set. The vars are parameters passed to UDF, [, :1, :2, ...n] which 
> are description in Udf.java
> {quote}
> Before add this function, the result as follow. we get the wrong result, 
> because the result contains  empty string  
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.30 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, !
> Hello, !
> {quote}
> After add this function, we get the right result
> {quote}
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting pre-SQL statement
> Starting query
> Query executed successfully (2.35 sec)
> Ln:8 SELECT completed successfully
> Ln:8 Standalone SELECT executed: 1 columns in the result set
> Hello, fei!
> Hello, fei!
> {quote}
> tests come from http://www.hplsql.org/udf



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HIVE-15860) RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally

2017-02-10 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-15860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15861205#comment-15861205
 ] 

Xuefu Zhang commented on HIVE-15860:


+1. Patch #2 looks good to me. Thanks for fixing this, [~lirui]!


> RemoteSparkJobMonitor may hang when RemoteDriver exits abnormally
> -
>
> Key: HIVE-15860
> URL: https://issues.apache.org/jira/browse/HIVE-15860
> Project: Hive
>  Issue Type: Bug
>Reporter: Rui Li
>Assignee: Rui Li
> Attachments: HIVE-15860.1.patch, HIVE-15860.2.patch
>
>
> It happens when RemoteDriver crashes between {{JobStarted}} and 
> {{JobSubmitted}}, e.g. killed by {{kill -9}}. RemoteSparkJobMonitor will 
> consider the job has started, however it can't get the job info because it 
> hasn't received the JobId. Then the monitor will loop forever.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Issue Comment Deleted] (HIVE-15863) Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC timezone

2017-02-10 Thread Jesus Camacho Rodriguez (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-15863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesus Camacho Rodriguez updated HIVE-15863:
---
Comment: was deleted

(was: 

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12852031/HIVE-15863.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 10244 tests 
executed
*Failed tests:*
{noformat}
TestDerbyConnector - did not produce a TEST-*.xml file (likely timed out) 
(batchId=235)
org.apache.hadoop.hive.cli.TestEncryptedHDFSCliDriver.testCliDriver[encryption_join_with_different_encryption_keys]
 (batchId=159)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query14] 
(batchId=223)
org.apache.hadoop.hive.cli.TestPerfCliDriver.testCliDriver[query23] 
(batchId=223)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/3487/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/3487/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-3487/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 4 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12852031 - PreCommit-HIVE-Build)

> Calendar inside DATE, TIME and TIMESTAMP literals for Calcite should have UTC 
> timezone
> --
>
> Key: HIVE-15863
> URL: https://issues.apache.org/jira/browse/HIVE-15863
> Project: Hive
>  Issue Type: Bug
>  Components: CBO
>Affects Versions: 2.2.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Related to CALCITE-1623.
> At query preparation time, Calcite uses a Calendar to hold the value of DATE, 
> TIME, TIMESTAMP literals. It assumes that Calendar has a UTC (GMT) time zone, 
> and bad things might happen if it does not. Currently, we pass the Calendar 
> object with user timezone from Hive. We need to pass it with UTC timezone and 
> make the inverse conversion when we go back from Calcite to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   >