[jira] [Commented] (HIVE-18390) IndexOutOfBoundsException when query a partitioned view in ColumnPruner

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317922#comment-16317922
 ] 

Hive QA commented on HIVE-18390:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904937/HIVE-18390.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 21 failed/errored test(s), 11549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[char_pad_convert] 
(batchId=7)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver[bucketizedhiveinputformat]
 (batchId=177)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.cli.control.TestDanglingQOuts.checkDanglingQOut 
(batchId=208)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8515/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8515/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8515/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 21 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12904937 - PreCommit-HIVE-Build

> IndexOutOfBoundsException when query a  partitioned view in ColumnPruner 
> -
>
> Key: HIVE-18390
> URL: https://issues.apache.org/jira/browse/HIVE-18390
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Views
>Affects Versions: 2.1.1
>Reporter: Hengyu Dai
> Attachments: HIVE-18390.patch
>
>
> IndexOutOfBoundsException is encountered when query a partitioned view.
> in Column Prunning, each SEL operator collects the accessed column in current 
> SEL operator,
> When ColumnPrunerSelectProc getting a view's columns accessed, it will first 
> get the index of output column names in the view, then call 
> Table.getCols().get(index).getName() to finally get the 
> name of output column, but Table.getCols() will not return all columns 
> (partitioned column is
> lacked), so if partitioned columns is queried, an IndexOutOfBoundsException 
> will throw.
> REPRODUCE: 
> {code:sql}
> create table foo
> (
> `a` string
> ) partitioned by (`b` string)
> ;
> create view bar partitioned on (b) as
> select a,b from foo;
> select * from bar; --IndexOutOfBoundsException
> {code}
> OPERATORE TREE:
> {code:java}
> TS[0]
>|
> SEL[1]
>|
> SEL[2]
>|
> FS[3]
> {code}
> SEL[1] collects accessed column(contains partitioned column b), b's internal 
> column name is '_col1', the corresponding column index is 1, but actually 
> bar's getCols() returned a list of length 1: ['a'], so tab.getCols().get(1) 
> throw tab.getCols().get(index)
> HOW TO FIX:
> instead of call view's getCols() method, we should get all columns including 
> partitioned columns



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-01-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18079:
---
Attachment: HIVE-18079.8.patch

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-18079.1.patch, HIVE-18079.2.patch, 
> HIVE-18079.4.patch, HIVE-18079.5.patch, HIVE-18079.6.patch, 
> HIVE-18079.7.patch, HIVE-18079.8.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-18402) load data should rename files consistent with insert statements (bucketed tables only) Part4

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal resolved HIVE-18402.
---
Resolution: Fixed

No test changes required.

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part4
> 
>
> Key: HIVE-18402
> URL: https://issues.apache.org/jira/browse/HIVE-18402
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files smb_bucket_input etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-01-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18079:
---
Attachment: (was: HIVE-18079.8.patch)

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-18079.1.patch, HIVE-18079.2.patch, 
> HIVE-18079.4.patch, HIVE-18079.5.patch, HIVE-18079.6.patch, 
> HIVE-18079.7.patch, HIVE-18079.8.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18079) Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator bit-size

2018-01-08 Thread Gopal V (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-18079:
---
Attachment: HIVE-18079.8.patch

> Statistics: Allow HyperLogLog to be merged to the lowest-common-denominator 
> bit-size
> 
>
> Key: HIVE-18079
> URL: https://issues.apache.org/jira/browse/HIVE-18079
> Project: Hive
>  Issue Type: Improvement
>  Components: Standalone Metastore, Statistics
>Affects Versions: 3.0.0
>Reporter: Gopal V
>Assignee: Gopal V
> Attachments: HIVE-18079.1.patch, HIVE-18079.2.patch, 
> HIVE-18079.4.patch, HIVE-18079.5.patch, HIVE-18079.6.patch, 
> HIVE-18079.7.patch, HIVE-18079.8.patch
>
>
> HyperLogLog can merge a 14 bit HLL into a 10 bit HLL bitset, because of its 
> mathematical hash distribution & construction.
> Allow the squashing of a 14 bit HLL -> 10 bit HLL without needing a second 
> scan over the data-set.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-18382) Duplicate entry key when create_table/add_partition

2018-01-08 Thread Biao Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Biao Wu resolved HIVE-18382.

Resolution: Not A Bug

load error JDO ConnectionFactoryImpl

> Duplicate entry key when create_table/add_partition 
> 
>
> Key: HIVE-18382
> URL: https://issues.apache.org/jira/browse/HIVE-18382
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 1.2.1
> Environment: Hive: 1.2.1
> Hadoop: 2.7.1
> metadb: Mysql, version:5.1.40
>Reporter: Biao Wu
>Priority: Critical
>
> Add_partitions and create_table often fails.
> Here is the HMS log.
> {code:java}
> 2018-01-03 03:43:55,541 ERROR [pool-10-thread-76716]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invoke(173)) - Retrying 
> HMSHandler after 2000 ms (attempt 1 of 10) with error: javax.jdo.JDOData
> StoreException: Get request failed : SELECT `A0`.`PARAM_VALUE` FROM 
> `SERDE_PARAMS` `A0` WHERE `A0`.`SERDE_ID` = ? AND `A0`.`PARAM_KEY` = ?
> at 
> org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:543)
> at 
> org.datanucleus.api.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:720)
> at 
> org.datanucleus.api.jdo.JDOPersistenceManager.makePersistent(JDOPersistenceManager.java:740)
> at 
> org.apache.hadoop.hive.metastore.ObjectStore.createTable(ObjectStore.java:859)
> at 
> org.apache.hadoop.hive.metastore.ObjectStoreWithBIMapping.createTable(ObjectStoreWithBIMapping.java:174)
> at sun.reflect.GeneratedMethodAccessor95.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
> at com.sun.proxy.$Proxy11.createTable(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_core(HiveMetaStore.java:1522)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.create_table_with_environment_context(HiveMetaStore.java:1555)
> at sun.reflect.GeneratedMethodAccessor87.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107)
> at 
> com.sun.proxy.$Proxy13.create_table_with_environment_context(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor87.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$MetricHMSProxy.invoke(HiveMetaStore.java:6098)
> at 
> com.sun.proxy.$Proxy13.create_table_with_environment_context(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:9216)
> at 
> org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$create_table_with_environment_context.getResult(ThriftHiveMetastore.java:9200)
> at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:731)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor$1.run(HadoopThriftAuthBridge.java:726)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1690)
> at 
> org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingProcessor.process(HadoopThriftAuthBridge.java:726)
> at 
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> NestedThrowablesStackTrace:
> java.sql.BatchUpdateException: Duplicate entry '508649089' for key 'PRIMARY'
> at 
> com.mysql.jdbc.SQLError.createBatchUpdateException(SQLError.java:1167)
> at 
> 

[jira] [Updated] (HIVE-18400) load data should rename files consistent with insert statements (bucketed tables only) Part2

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18400:
--
Attachment: HIVE-18400.1.patch

[~ekoifman] can you please review the result only changes?
No code changes.
Copied some data files with bucket friendly names.

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part2
> 
>
> Key: HIVE-18400
> URL: https://issues.apache.org/jira/browse/HIVE-18400
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18400.1.patch
>
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcbucket0 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18400) load data should rename files consistent with insert statements (bucketed tables only) Part2

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18400:
--
Status: Patch Available  (was: In Progress)

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part2
> 
>
> Key: HIVE-18400
> URL: https://issues.apache.org/jira/browse/HIVE-18400
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcbucket0 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-18400) load data should rename files consistent with insert statements (bucketed tables only) Part2

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18400 started by Deepak Jaiswal.
-
> load data should rename files consistent with insert statements (bucketed 
> tables only) Part2
> 
>
> Key: HIVE-18400
> URL: https://issues.apache.org/jira/browse/HIVE-18400
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcbucket0 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18399) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18399:
--
Attachment: (was: HIVE-18399.1.patch)

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18399
> URL: https://issues.apache.org/jira/browse/HIVE-18399
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18399.1.patch
>
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcsortbucket1outof4 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18399) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18399:
--
Attachment: HIVE-18399.1.patch

[~ekoifman] can you please review the result only changes?
No code changes.
Copied some data files with bucket friendly names.
Only 1 test has result diff because different bucket is used.

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18399
> URL: https://issues.apache.org/jira/browse/HIVE-18399
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18399.1.patch
>
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcsortbucket1outof4 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18399) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18399:
--
Status: Patch Available  (was: In Progress)

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18399
> URL: https://issues.apache.org/jira/browse/HIVE-18399
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18399.1.patch
>
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcsortbucket1outof4 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Issue Comment Deleted] (HIVE-18399) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18399:
--
Comment: was deleted

(was: [~ekoifman] can you please review the result only changes?

No code changes.
Copied some data files with bucket friendly names.
Only 1 test has result diff because different bucket is used.)

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18399
> URL: https://issues.apache.org/jira/browse/HIVE-18399
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18399.1.patch
>
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcsortbucket1outof4 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18399) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18399:
--
Attachment: HIVE-18399.1.patch

[~ekoifman] can you please review the result only changes?

No code changes.
Copied some data files with bucket friendly names.
Only 1 test has result diff because different bucket is used.

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18399
> URL: https://issues.apache.org/jira/browse/HIVE-18399
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18399.1.patch
>
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcsortbucket1outof4 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-18399) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18399 started by Deepak Jaiswal.
-
> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18399
> URL: https://issues.apache.org/jira/browse/HIVE-18399
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcsortbucket1outof4 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18390) IndexOutOfBoundsException when query a partitioned view in ColumnPruner

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317836#comment-16317836
 ] 

Hive QA commented on HIVE-18390:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
41s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
13s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
52s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
13s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8412748 |
| Default Java | 1.8.0_111 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8515/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> IndexOutOfBoundsException when query a  partitioned view in ColumnPruner 
> -
>
> Key: HIVE-18390
> URL: https://issues.apache.org/jira/browse/HIVE-18390
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Views
>Affects Versions: 2.1.1
>Reporter: Hengyu Dai
> Attachments: HIVE-18390.patch
>
>
> IndexOutOfBoundsException is encountered when query a partitioned view.
> in Column Prunning, each SEL operator collects the accessed column in current 
> SEL operator,
> When ColumnPrunerSelectProc getting a view's columns accessed, it will first 
> get the index of output column names in the view, then call 
> Table.getCols().get(index).getName() to finally get the 
> name of output column, but Table.getCols() will not return all columns 
> (partitioned column is
> lacked), so if partitioned columns is queried, an IndexOutOfBoundsException 
> will throw.
> REPRODUCE: 
> {code:sql}
> create table foo
> (
> `a` string
> ) partitioned by (`b` string)
> ;
> create view bar partitioned on (b) as
> select a,b from foo;
> select * from bar; --IndexOutOfBoundsException
> {code}
> OPERATORE TREE:
> {code:java}
> TS[0]
>|
> SEL[1]
>|
> SEL[2]
>|
> FS[3]
> {code}
> SEL[1] collects accessed column(contains partitioned column b), b's internal 
> column name is '_col1', the corresponding column index is 1, but actually 
> bar's getCols() returned a list of length 1: ['a'], so tab.getCols().get(1) 
> throw tab.getCols().get(index)
> HOW TO FIX:
> instead of call view's getCols() method, we should get all columns including 
> partitioned columns



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317820#comment-16317820
 ] 

Hive QA commented on HIVE-18269:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905175/HIVE-18269.03.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 11549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8514/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8514/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8514/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905175 - PreCommit-HIVE-Build

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, 
> HIVE-18269.03.patch, HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 
> 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317742#comment-16317742
 ] 

Hive QA commented on HIVE-18269:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
28s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  1m 
 2s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
16s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
57s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
30s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
32s{color} | {color:red} ql: The patch generated 6 new + 118 unchanged - 8 
fixed = 124 total (was 126) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
13s{color} | {color:red} llap-server: The patch generated 4 new + 250 unchanged 
- 4 fixed = 254 total (was 254) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m 41s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8412748 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8514/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8514/yetus/diff-checkstyle-llap-server.txt
 |
| modules | C: common ql llap-server U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8514/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, 
> HIVE-18269.03.patch, HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 
> 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--

[jira] [Commented] (HIVE-18352) introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools

2018-01-08 Thread anishek (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317741#comment-16317741
 ] 

anishek commented on HIVE-18352:


[~thejas] yes it is

> introduce a METADATAONLY option while doing REPL DUMP to allow integrations 
> of other tools 
> ---
>
> Key: HIVE-18352
> URL: https://issues.apache.org/jira/browse/HIVE-18352
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
>  Labels: pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18352.0.patch, HIVE-18352.1.patch, 
> HIVE-18352.2.patch
>
>
> * Introduce a METADATAONLY option as part of the REPL DUMP command which will 
> only try and dump out events for DDL changes, this will be faster as we wont 
> need  scan of files on HDFS for DML changes. 
> * Additionally since we are only going to dump metadata operations, it might 
> be useful to include acid tables as well via an option as well. This option 
> can be removed when ACID support is complete via HIVE-18320
> it will be good to support the "WITH" clause as part of REPL DUMP command as 
> well (repl dump already supports it viaHIVE-17757) to achieve the above as 
> that will prevent less changes to the syntax of the statement and provide 
> more flexibility in future to include additional options as well. 
> {code}
> REPL DUMP [db_name] {FROM [event_id]} {TO [event_id]} {WITH 
> (['key'='value'],.)}
> {code}
> This will enable other tools like security / schema registry /  metadata 
> discovery to use replication related subsystem for their needs as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18373) Make it easier to search for column name in a table

2018-01-08 Thread Madhudeep Petwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317732#comment-16317732
 ] 

Madhudeep Petwal commented on HIVE-18373:
-

Oh. sorry for misreading the statement. Will get back onto this.

> Make it easier to search for column name in a table
> ---
>
> Key: HIVE-18373
> URL: https://issues.apache.org/jira/browse/HIVE-18373
> Project: Hive
>  Issue Type: New Feature
>Reporter: Siddhant Saraf
>Assignee: Madhudeep Petwal
>Priority: Minor
>
> Within a database, to filter for tables with the string 'abc' in its name, I 
> can use something like:
> {code:java}
> hive> use my_database;
> hive> show tables '*abc*';
> {code}
> It would be great if I can do something similar to search within the list of 
> columns in a table.
> I have a table with around 3200 columns. Searching for the column of interest 
> is an onerous task after doing a describe on it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18359) Extend grouping set limits from int to long

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317719#comment-16317719
 ] 

Hive QA commented on HIVE-18359:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905167/HIVE-18359.4.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 26 failed/errored test(s), 11550 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[cbo_rp_annotate_stats_groupby]
 (batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucket_map_join_tez1]
 (batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[cte_1] 
(batchId=169)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_grouping_sets]
 (batchId=169)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query27] 
(batchId=247)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query36] 
(batchId=247)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query70] 
(batchId=247)
org.apache.hadoop.hive.cli.TestSparkPerfCliDriver.testCliDriver[query86] 
(batchId=247)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.hcatalog.templeton.TestConcurrentJobRequestsThreadsAndTimeout.ConcurrentListJobsVerifyExceptions
 (batchId=188)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestTriggersMoveWorkloadManager.testTriggerMoveAndKill 
(batchId=235)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8513/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8513/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8513/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 26 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905167 - PreCommit-HIVE-Build

> Extend grouping set limits from int to long
> ---
>
> Key: HIVE-18359
> URL: https://issues.apache.org/jira/browse/HIVE-18359
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18359.1.patch, HIVE-18359.2.patch, 
> HIVE-18359.3.patch, HIVE-18359.4.patch
>
>
> Grouping sets is broken for >32 columns because of usage of Int for bitmap 
> (also GROUPING__ID virtual column). This assumption breaks grouping 
> sets/rollups/cube when number of participating aggregation columns is >32. 
> The easier fix would be extend it to Long for now. The correct fix would be 
> to use BitSets everywhere but that would require GROUPING__ID column type to 
> binary which will make predicates on GROUPING__ID difficult to deal with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-08 Thread Rui Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317692#comment-16317692
 ] 

Rui Li commented on HIVE-16484:
---

bq. Hive wouldn't need a separate Spark installation to be able to launch Spark 
apps. It could ship with everything ready to run HoS out of the box.
Yeah I also believe that's the main benefit. But if SparkLauncher cannot give 
us that, why don't we just use {{InProcessLauncher}}?

Regarding the extra connection, I'm not sure how it impacts us 
performance-wise. My main concern is it brings extra chance of issues while the 
benefits are not quite clear. For example, we had several connection timeout 
issues with the RPC framework. And seems {{LauncherServer}}/{{LauncherBackend}} 
have very similar configs to tweak, like 
{{spark.launcher.childConnectionTimeout}}.

Regarding debug, I assume it's mainly for yarn-client mode right? Because the 
process we launched in yarn-cluster mode is only a light-weight client talking 
to RM. And by deault it exits once the app starts running(HIVE-13895). I agree 
it makes debugging easier, but again that require InProcessLauncher.

So my suggestion is we wait until InProcessLauncher is released and implement 
another SparkClient using it. We can decide whether to get rid of the current 
SparkClientImpl when InProcessLauncher is mature. Does that make sense?

BTW, is there any docs about the SparkLauncher implementation? I just want to 
have a better understanding about it.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18359) Extend grouping set limits from int to long

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317672#comment-16317672
 ] 

Hive QA commented on HIVE-18359:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
55s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
46s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
50s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
56s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
51s{color} | {color:red} ql: The patch generated 10 new + 1446 unchanged - 3 
fixed = 1456 total (was 1449) {color} |
| {color:red}-1{color} | {color:red} whitespace {color} | {color:red}  0m  
1s{color} | {color:red} The patch has 96 line(s) that end in whitespace. Use 
git apply --whitespace=fix <>. Refer 
https://git-scm.com/docs/git-apply {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
54s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 14m 47s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8412748 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8513/yetus/diff-checkstyle-ql.txt
 |
| whitespace | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8513/yetus/whitespace-eol.txt 
|
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8513/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Extend grouping set limits from int to long
> ---
>
> Key: HIVE-18359
> URL: https://issues.apache.org/jira/browse/HIVE-18359
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18359.1.patch, HIVE-18359.2.patch, 
> HIVE-18359.3.patch, HIVE-18359.4.patch
>
>
> Grouping sets is broken for >32 columns because of usage of Int for bitmap 
> (also GROUPING__ID virtual column). This assumption breaks grouping 
> sets/rollups/cube when number of participating aggregation columns is >32. 
> The easier fix would be extend it to Long for now. The correct fix would be 
> to use BitSets everywhere but that would require GROUPING__ID column type to 
> binary which will make predicates on GROUPING__ID difficult to deal with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18367) Describe Extended output is truncated on a table with an explicit row format containing tabs or newlines.

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317648#comment-16317648
 ] 

Hive QA commented on HIVE-18367:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905158/HIVE-18367.3.patch

{color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11551 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8512/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8512/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8512/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905158 - PreCommit-HIVE-Build

> Describe Extended output is truncated on a table with an explicit row format 
> containing tabs or newlines.
> -
>
> Key: HIVE-18367
> URL: https://issues.apache.org/jira/browse/HIVE-18367
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-18367.1.patch, HIVE-18367.2.patch, 
> HIVE-18367.3.patch
>
>
> 'Describe Extended' dumps information about a table. The protocol for sending 
> this data relies on tabs and newlines to separate pieces of data. If a table 
> has 'FIELDS terminated by XXX' or 'LINES terminated by XXX' where XXX is a 
> tab or newline then the output seen by the user is prematurely truncated. Fix 
> this by replacing tabs and newlines in the table description with “\n” and 
> “\t”.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (HIVE-17916) remove ConfVars.HIVE_VECTORIZATION_ROW_IDENTIFIER_ENABLED

2018-01-08 Thread Teddy Choi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Teddy Choi resolved HIVE-17916.
---
Resolution: Fixed

> remove ConfVars.HIVE_VECTORIZATION_ROW_IDENTIFIER_ENABLED
> -
>
> Key: HIVE-17916
> URL: https://issues.apache.org/jira/browse/HIVE-17916
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Teddy Choi
>
> follow up from HIVE-12631.  Filing so it doesn't get lost.
> There is this code in UpdateDeleteSemanticAnalyzer
> {noformat}
>   // TODO: remove when this is enabled everywhere
> HiveConf.setBoolVar(conf, 
> ConfVars.HIVE_VECTORIZATION_ROW_IDENTIFIER_ENABLED, true);
> {noformat}
> The 1st update/delete statement on a session will enable this and it will be 
> enabled for all future queries which makes this flag useless/misleading.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18367) Describe Extended output is truncated on a table with an explicit row format containing tabs or newlines.

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317604#comment-16317604
 ] 

Hive QA commented on HIVE-18367:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
35s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
34s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
32s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
42s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
22s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
56s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
31s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
45s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m 
15s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 17m  6s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8412748 |
| Default Java | 1.8.0_111 |
| modules | C: ql itests/hive-unit U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8512/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Describe Extended output is truncated on a table with an explicit row format 
> containing tabs or newlines.
> -
>
> Key: HIVE-18367
> URL: https://issues.apache.org/jira/browse/HIVE-18367
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-18367.1.patch, HIVE-18367.2.patch, 
> HIVE-18367.3.patch
>
>
> 'Describe Extended' dumps information about a table. The protocol for sending 
> this data relies on tabs and newlines to separate pieces of data. If a table 
> has 'FIELDS terminated by XXX' or 'LINES terminated by XXX' where XXX is a 
> tab or newline then the output seen by the user is prematurely truncated. Fix 
> this by replacing tabs and newlines in the table description with “\n” and 
> “\t”.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18411) Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader

2018-01-08 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-18411:

Status: Patch Available  (was: Open)

[~Ferd], can you help to review the fix, thanks.

> Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader
> -
>
> Key: HIVE-18411
> URL: https://issues.apache.org/jira/browse/HIVE-18411
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Attachments: HIVE-18411.001.patch
>
>
> ColumnVector should be initialized to the default size at the begin of 
> readBatch(), otherwise, ArrayIndexOutOfBoundsException will be thrown because 
> the size of ColumnVector may be updated in the last readBatch().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18391) load data should rename files consistent with insert statements (bucketed tables only)

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317572#comment-16317572
 ] 

Hive QA commented on HIVE-18391:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905150/HIVE-18391.3.patch

{color:green}SUCCESS:{color} +1 due to 66 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 155 failed/errored test(s), 11549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_10] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_1] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_2] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_3] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=245)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join32] (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_11] 
(batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_12] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] 
(batchId=89)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_1] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_2] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_1] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_2] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_3] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_4] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_7] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_8] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin10] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin11] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin12] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin8] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin9] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[confirm_initial_tbl_stats]
 (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog2] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynpart_sort_opt_bucketing]
 (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_2] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_3] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_4] 
(batchId=88)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_5] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_7] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_8] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_9] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_test_1] 
(batchId=8)

[jira] [Updated] (HIVE-18411) Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader

2018-01-08 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma updated HIVE-18411:

Attachment: HIVE-18411.001.patch

> Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader
> -
>
> Key: HIVE-18411
> URL: https://issues.apache.org/jira/browse/HIVE-18411
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
> Attachments: HIVE-18411.001.patch
>
>
> ColumnVector should be initialized to the default size at the begin of 
> readBatch(), otherwise, ArrayIndexOutOfBoundsException will be thrown because 
> the size of ColumnVector may be updated in the last readBatch().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18391) load data should rename files consistent with insert statements (bucketed tables only)

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317557#comment-16317557
 ] 

Hive QA commented on HIVE-18391:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
37s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
51s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
 8s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
12s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
21s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
44s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m  
9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m  
9s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
31s{color} | {color:red} ql: The patch generated 3 new + 10 unchanged - 0 fixed 
= 13 total (was 10) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
33s{color} | {color:red} root: The patch generated 3 new + 10 unchanged - 0 
fixed = 13 total (was 10) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 43m  5s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8412748 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8510/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8510/yetus/diff-checkstyle-root.txt
 |
| modules | C: ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8510/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> load data should rename files consistent with insert statements (bucketed 
> tables only)
> --
>
> Key: HIVE-18391
> URL: https://issues.apache.org/jira/browse/HIVE-18391
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18391.1.patch, HIVE-18391.2.patch, 
> HIVE-18391.3.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant.



--
This message was sent 

[jira] [Commented] (HIVE-18323) Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet

2018-01-08 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317556#comment-16317556
 ] 

Vihang Karajgaonkar commented on HIVE-18323:


Are timestamps are serialized as binary instead of longs based on 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L499
 ?

If yes, then I think we should readBinary instead of reading longs. In case we 
should use 
{{NanoTimeUtils.getTimestamp(NanoTime.fromBinary(dataColumn.readBytes()), 
false)}} to get the timestamps deserialized from the binary values.

> Vectorization: add the support of timestamp in 
> VectorizedPrimitiveColumnReader for parquet
> --
>
> Key: HIVE-18323
> URL: https://issues.apache.org/jira/browse/HIVE-18323
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18323.1.patch
>
>
> {noformat}
> CREATE TABLE `t1`(
>   `ts` timestamp,
>   `s1` string)
> STORED AS PARQUET;
> set hive.vectorized.execution.enabled=true;
> SELECT * from t1 SORT BY s1;
> {noformat}
> This query will throw exception since timestamp is not supported here yet.
> {noformat}
> Caused by: java.io.IOException: java.io.IOException: Unsupported type: 
> optional int96 ts
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18229) add the unmanaged mapping command

2018-01-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18229:

Attachment: HIVE-18229.01.patch

Fixing issues, addressing RB feedback, also fixed an unrelated issue in alter 
mapping

> add the unmanaged mapping command
> -
>
> Key: HIVE-18229
> URL: https://issues.apache.org/jira/browse/HIVE-18229
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18229.01.patch, HIVE-18229.patch
>
>
> This is to add an option to enable WM but map queries past the default pool 
> and explicitly into unmanaged sessions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18411) Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader

2018-01-08 Thread Colin Ma (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Ma reassigned HIVE-18411:
---


> Fix ArrayIndexOutOfBoundsException for VectorizedListColumnReader
> -
>
> Key: HIVE-18411
> URL: https://issues.apache.org/jira/browse/HIVE-18411
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Colin Ma
>Assignee: Colin Ma
>Priority: Critical
>
> ColumnVector should be initialized to the default size at the begin of 
> readBatch(), otherwise, ArrayIndexOutOfBoundsException will be thrown because 
> the size of ColumnVector may be updated in the last readBatch().



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18221) test acid default

2018-01-08 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-18221:
--
Attachment: HIVE-18221.26.patch

patch 26 fixes (hacks) MetastoreConf so that it agrees with HiveConf on which 
hive-site.xml to use

> test acid default
> -
>
> Key: HIVE-18221
> URL: https://issues.apache.org/jira/browse/HIVE-18221
> Project: Hive
>  Issue Type: Test
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
> Attachments: HIVE-18221.01.patch, HIVE-18221.02.patch, 
> HIVE-18221.03.patch, HIVE-18221.04.patch, HIVE-18221.07.patch, 
> HIVE-18221.08.patch, HIVE-18221.09.patch, HIVE-18221.10.patch, 
> HIVE-18221.11.patch, HIVE-18221.12.patch, HIVE-18221.13.patch, 
> HIVE-18221.14.patch, HIVE-18221.16.patch, HIVE-18221.18.patch, 
> HIVE-18221.19.patch, HIVE-18221.20.patch, HIVE-18221.21.patch, 
> HIVE-18221.22.patch, HIVE-18221.23.patch, HIVE-18221.24.patch, 
> HIVE-18221.26.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-18410:
---
Status: Patch Available  (was: In Progress)

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: HIVE-18410.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-18410:
---
Attachment: HIVE-18410.patch

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: HIVE-18410.patch, profiling_with_patch.nps, 
> profiling_with_patch.png, profiling_without_patch.nps, 
> profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-18410:
---
Affects Version/s: 2.3.2

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317521#comment-16317521
 ] 

Ratandeep Ratti commented on HIVE-18410:


Attached profiling data

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-18410:
---
Fix Version/s: 2.3.2

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.3.2
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Fix For: 2.3.2
>
> Attachments: profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-18410:
---
Attachment: profiling_with_patch.png
profiling_without_patch.png

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Attachments: profiling_with_patch.nps, profiling_with_patch.png, 
> profiling_without_patch.nps, profiling_without_patch.png
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Work started] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-18410 started by Ratandeep Ratti.
--
> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Attachments: profiling_with_patch.nps, profiling_without_patch.nps
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti updated HIVE-18410:
---
Attachment: profiling_with_patch.nps
profiling_without_patch.nps

> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
> Attachments: profiling_with_patch.nps, profiling_without_patch.nps
>
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18410) [Performance][Avro] Reading flat Avro tables is very expensive in Hive

2018-01-08 Thread Ratandeep Ratti (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ratandeep Ratti reassigned HIVE-18410:
--


> [Performance][Avro] Reading flat Avro tables is very expensive in Hive
> --
>
> Key: HIVE-18410
> URL: https://issues.apache.org/jira/browse/HIVE-18410
> Project: Hive
>  Issue Type: Improvement
>Reporter: Ratandeep Ratti
>Assignee: Ratandeep Ratti
>
> There's a performance penalty when reading flat [no nested fields] Avro 
> tables. When reading the same flat dataset in Pig, it takes half the time.  
> On profiling, a lot of time is spent in 
> {{AvroDeserializer.deserializeSingleItemNullableUnion()}}. The bulk of the 
> time is spent in GenericData.get().resolveUnion(), which calls 
> GenericData.getSchemaName(Object datum), which does a lot of instanceof 
> checks.  This could be simplified with performance benefits. A approach is 
> described in this patch which almost halves the runtime.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18326) LLAP Tez scheduler - only preempt tasks if there's a dependency between them

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317508#comment-16317508
 ] 

Hive QA commented on HIVE-18326:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12904855/HIVE-18326.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 11539 tests 
executed
*Failed tests:*
{noformat}
TestJdbcWithMiniKdc - did not produce a TEST-*.xml file (likely timed out) 
(batchId=246)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8509/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8509/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8509/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12904855 - PreCommit-HIVE-Build

> LLAP Tez scheduler - only preempt tasks if there's a dependency between them
> 
>
> Key: HIVE-18326
> URL: https://issues.apache.org/jira/browse/HIVE-18326
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-18326.01.patch, HIVE-18326.02.patch, 
> HIVE-18326.02.patch, HIVE-18326.patch
>
>
> It is currently possible for e.g. two sides of a union (or a join for that 
> matter) to have slightly different priorities. We don't want to preempt 
> running tasks on one side in favor of the other side in such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18352) introduce a METADATAONLY option while doing REPL DUMP to allow integrations of other tools

2018-01-08 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317505#comment-16317505
 ] 

Thejas M Nair commented on HIVE-18352:
--

[~anishek]
Is the github pull request updated with the latest patch ?


> introduce a METADATAONLY option while doing REPL DUMP to allow integrations 
> of other tools 
> ---
>
> Key: HIVE-18352
> URL: https://issues.apache.org/jira/browse/HIVE-18352
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: anishek
>Assignee: anishek
>  Labels: pull-request-available
> Fix For: 3.0.0
>
> Attachments: HIVE-18352.0.patch, HIVE-18352.1.patch, 
> HIVE-18352.2.patch
>
>
> * Introduce a METADATAONLY option as part of the REPL DUMP command which will 
> only try and dump out events for DDL changes, this will be faster as we wont 
> need  scan of files on HDFS for DML changes. 
> * Additionally since we are only going to dump metadata operations, it might 
> be useful to include acid tables as well via an option as well. This option 
> can be removed when ACID support is complete via HIVE-18320
> it will be good to support the "WITH" clause as part of REPL DUMP command as 
> well (repl dump already supports it viaHIVE-17757) to achieve the above as 
> that will prevent less changes to the syntax of the statement and provide 
> more flexibility in future to include additional options as well. 
> {code}
> REPL DUMP [db_name] {FROM [event_id]} {TO [event_id]} {WITH 
> (['key'='value'],.)}
> {code}
> This will enable other tools like security / schema registry /  metadata 
> discovery to use replication related subsystem for their needs as well. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18323) Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet

2018-01-08 Thread Ferdinand Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ferdinand Xu updated HIVE-18323:

Issue Type: Sub-task  (was: Improvement)
Parent: HIVE-14826

> Vectorization: add the support of timestamp in 
> VectorizedPrimitiveColumnReader for parquet
> --
>
> Key: HIVE-18323
> URL: https://issues.apache.org/jira/browse/HIVE-18323
> Project: Hive
>  Issue Type: Sub-task
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18323.1.patch
>
>
> {noformat}
> CREATE TABLE `t1`(
>   `ts` timestamp,
>   `s1` string)
> STORED AS PARQUET;
> set hive.vectorized.execution.enabled=true;
> SELECT * from t1 SORT BY s1;
> {noformat}
> This query will throw exception since timestamp is not supported here yet.
> {noformat}
> Caused by: java.io.IOException: java.io.IOException: Unsupported type: 
> optional int96 ts
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18323) Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet

2018-01-08 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317493#comment-16317493
 ] 

Ferdinand Xu commented on HIVE-18323:
-

Thanks [~aihuaxu] for the patch. Generally LGTM. Could you add some unit test 
cases for this?

> Vectorization: add the support of timestamp in 
> VectorizedPrimitiveColumnReader for parquet
> --
>
> Key: HIVE-18323
> URL: https://issues.apache.org/jira/browse/HIVE-18323
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18323.1.patch
>
>
> {noformat}
> CREATE TABLE `t1`(
>   `ts` timestamp,
>   `s1` string)
> STORED AS PARQUET;
> set hive.vectorized.execution.enabled=true;
> SELECT * from t1 SORT BY s1;
> {noformat}
> This query will throw exception since timestamp is not supported here yet.
> {noformat}
> Caused by: java.io.IOException: java.io.IOException: Unsupported type: 
> optional int96 ts
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18379) ALTER TABLE authorization_part SET PROPERTIES ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is MicroManaged table.

2018-01-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317486#comment-16317486
 ] 

Sergey Shelukhin commented on HIVE-18379:
-

What was the error, is the table object absent? It seems to change the logic 
for the normal case, I wonder if it just needs an extra null check.

> ALTER TABLE authorization_part SET PROPERTIES 
> ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is 
> MicroManaged table.
> -
>
> Key: HIVE-18379
> URL: https://issues.apache.org/jira/browse/HIVE-18379
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
> Attachments: HIVE-18379.01.patch, HIVE-18379.02.patch, 
> HIVE-18379.03.patch
>
>
> ALTER TABLE authorization_part SET TBLPROPERTIES 
> ("PARTITION_LEVEL_PRIVILEGE"="TRUE") fails when authorization_part is a 
> Micromanaged table.
> This is from authorization_2.q qtest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18380) ALTER TABLE CONCATENATE is not supported on Micro-managed table

2018-01-08 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317481#comment-16317481
 ] 

Eugene Koifman commented on HIVE-18380:
---

I think concatenating/compacting contiguous deltas should be possible - there 
is no technical reason it can't be.

> ALTER TABLE CONCATENATE is not supported on Micro-managed table
> ---
>
> Key: HIVE-18380
> URL: https://issues.apache.org/jira/browse/HIVE-18380
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Minor
>
> ALTER TABLE CONCATENATE is not supported on Micro-managed table. 
> Example qtest is "alter_merge_2_orc.q" and the unsupportability is revealed 
> when we start with tables in micro-managed table type(insert_only 
> transactional).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18397) Support vectorization for INTERVAL_DAY_TIME type

2018-01-08 Thread Ferdinand Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317479#comment-16317479
 ] 

Ferdinand Xu commented on HIVE-18397:
-

I think Parquet doesn't support interval_day_time type since write path does 
not support it as shown in the code you mentioned.

> Support vectorization for INTERVAL_DAY_TIME type
> 
>
> Key: HIVE-18397
> URL: https://issues.apache.org/jira/browse/HIVE-18397
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>
> Vectorizer currently assumes that all the primitives are supported for a 
> inputformat which implements {{VectorizedInputFormatInterface}}. Currently 
> any table which has timestamp or interval_day_time type will fail execution 
> in vectorized mode. HIVE-18323 adds support for timestamp. We should also add 
> support for interval_day_time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18380) ALTER TABLE CONCATENATE is not supported on Micro-managed table

2018-01-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317467#comment-16317467
 ] 

Sergey Shelukhin commented on HIVE-18380:
-

[~ekoifman] [~steveyeom2017] should it be? Alter table concatenate that 
preserves txn boundaries (i.e. concats files in each directory separately) 
would generally be pointless, and destroying txn boundaries to combined 
directories is basically a compaction and should be done by compactor, as far 
as I understand

> ALTER TABLE CONCATENATE is not supported on Micro-managed table
> ---
>
> Key: HIVE-18380
> URL: https://issues.apache.org/jira/browse/HIVE-18380
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Minor
>
> ALTER TABLE CONCATENATE is not supported on Micro-managed table. 
> Example qtest is "alter_merge_2_orc.q" and the unsupportability is revealed 
> when we start with tables in micro-managed table type(insert_only 
> transactional).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-18380) ALTER TABLE CONCATENATE is not supported on Micro-managed table

2018-01-08 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317467#comment-16317467
 ] 

Sergey Shelukhin edited comment on HIVE-18380 at 1/9/18 1:12 AM:
-

[~ekoifman] [~steveyeom2017] should it be? Alter table concatenate that 
preserves txn boundaries (i.e. concats files in each directory separately) 
would generally be pointless, and destroying txn boundaries to combine 
directories is basically a compaction and should be done by compactor, as far 
as I understand


was (Author: sershe):
[~ekoifman] [~steveyeom2017] should it be? Alter table concatenate that 
preserves txn boundaries (i.e. concats files in each directory separately) 
would generally be pointless, and destroying txn boundaries to combined 
directories is basically a compaction and should be done by compactor, as far 
as I understand

> ALTER TABLE CONCATENATE is not supported on Micro-managed table
> ---
>
> Key: HIVE-18380
> URL: https://issues.apache.org/jira/browse/HIVE-18380
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Minor
>
> ALTER TABLE CONCATENATE is not supported on Micro-managed table. 
> Example qtest is "alter_merge_2_orc.q" and the unsupportability is revealed 
> when we start with tables in micro-managed table type(insert_only 
> transactional).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18397) Support vectorization for INTERVAL_DAY_TIME type

2018-01-08 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317465#comment-16317465
 ] 

Vihang Karajgaonkar commented on HIVE-18397:


Based on what I see in 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/io/parquet/write/DataWritableWriter.java#L106
 Parquet in Hive doesn't seem to support intervals. [~Ferd] Do you know if this 
understanding correct?

> Support vectorization for INTERVAL_DAY_TIME type
> 
>
> Key: HIVE-18397
> URL: https://issues.apache.org/jira/browse/HIVE-18397
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Vihang Karajgaonkar
>
> Vectorizer currently assumes that all the primitives are supported for a 
> inputformat which implements {{VectorizedInputFormatInterface}}. Currently 
> any table which has timestamp or interval_day_time type will fail execution 
> in vectorized mode. HIVE-18323 adds support for timestamp. We should also add 
> support for interval_day_time.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18409) load data should rename files consistent with insert statements (bucketed tables only) Part11

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18409:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part11
> -
>
> Key: HIVE-18409
> URL: https://issues.apache.org/jira/browse/HIVE-18409
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files which are of binary format. (dat, rc etc)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18407) load data should rename files consistent with insert statements (bucketed tables only) Part9

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18407:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part9
> 
>
> Key: HIVE-18407
> URL: https://issues.apache.org/jira/browse/HIVE-18407
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files sortdp etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18408) load data should rename files consistent with insert statements (bucketed tables only) Part10

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18408:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part10
> -
>
> Key: HIVE-18408
> URL: https://issues.apache.org/jira/browse/HIVE-18408
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files in1 etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18406) load data should rename files consistent with insert statements (bucketed tables only) Part8

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18406:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part8
> 
>
> Key: HIVE-18406
> URL: https://issues.apache.org/jira/browse/HIVE-18406
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files empty1 etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18405) load data should rename files consistent with insert statements (bucketed tables only) Part7

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18405:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part7
> 
>
> Key: HIVE-18405
> URL: https://issues.apache.org/jira/browse/HIVE-18405
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files SortCol1Col2 etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18404) load data should rename files consistent with insert statements (bucketed tables only) Part6

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18404:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part6
> 
>
> Key: HIVE-18404
> URL: https://issues.apache.org/jira/browse/HIVE-18404
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files t1 etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18403) load data should rename files consistent with insert statements (bucketed tables only) Part5

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18403:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part5
> 
>
> Key: HIVE-18403
> URL: https://issues.apache.org/jira/browse/HIVE-18403
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files  smbbucket_1 etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18402) load data should rename files consistent with insert statements (bucketed tables only) Part4

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal updated HIVE-18402:
--
Summary: load data should rename files consistent with insert statements 
(bucketed tables only) Part4  (was: load data should rename files consistent 
with insert statements (bucketed tables only) Part1)

> load data should rename files consistent with insert statements (bucketed 
> tables only) Part4
> 
>
> Key: HIVE-18402
> URL: https://issues.apache.org/jira/browse/HIVE-18402
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files smb_bucket_input etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18402) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18402:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18402
> URL: https://issues.apache.org/jira/browse/HIVE-18402
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files smb_bucket_input etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18401) load data should rename files consistent with insert statements (bucketed tables only) Part3

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18401:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part3
> 
>
> Key: HIVE-18401
> URL: https://issues.apache.org/jira/browse/HIVE-18401
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcbucket20 etc



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18400) load data should rename files consistent with insert statements (bucketed tables only) Part2

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18400:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part2
> 
>
> Key: HIVE-18400
> URL: https://issues.apache.org/jira/browse/HIVE-18400
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcbucket0 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (HIVE-18399) load data should rename files consistent with insert statements (bucketed tables only) Part1

2018-01-08 Thread Deepak Jaiswal (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Deepak Jaiswal reassigned HIVE-18399:
-


> load data should rename files consistent with insert statements (bucketed 
> tables only) Part1
> 
>
> Key: HIVE-18399
> URL: https://issues.apache.org/jira/browse/HIVE-18399
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
>
> As part of this effort, this JIRA tracks updating tests which use load data 
> files srcsortbucket1outof4 etc.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18326) LLAP Tez scheduler - only preempt tasks if there's a dependency between them

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317425#comment-16317425
 ] 

Hive QA commented on HIVE-18326:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
0s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
51s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
40s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
30s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
20s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
23s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
10s{color} | {color:red} llap-tez: The patch generated 6 new + 146 unchanged - 
0 fixed = 152 total (was 146) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
22s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 11m 39s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8412748 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8509/yetus/diff-checkstyle-llap-tez.txt
 |
| modules | C: common llap-tez U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8509/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> LLAP Tez scheduler - only preempt tasks if there's a dependency between them
> 
>
> Key: HIVE-18326
> URL: https://issues.apache.org/jira/browse/HIVE-18326
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-18326.01.patch, HIVE-18326.02.patch, 
> HIVE-18326.02.patch, HIVE-18326.patch
>
>
> It is currently possible for e.g. two sides of a union (or a join for that 
> matter) to have slightly different priorities. We don't want to preempt 
> running tasks on one side in favor of the other side in such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18326) LLAP Tez scheduler - only preempt tasks if there's a dependency between them

2018-01-08 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317403#comment-16317403
 ] 

Siddharth Seth commented on HIVE-18326:
---

On DAG itself - I think this is the only place. That's a core component 
internal to Tez. I would be very careful about depending on this. I know 
there's other places where Tez internals are used - they're mostly from the 
runtime though.
I think a Tez API specific change can be made. In fact TEZ-3770 will likely go 
in without API changes, which I had asked for there as well.

> LLAP Tez scheduler - only preempt tasks if there's a dependency between them
> 
>
> Key: HIVE-18326
> URL: https://issues.apache.org/jira/browse/HIVE-18326
> Project: Hive
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
> Fix For: 3.0.0
>
> Attachments: HIVE-18326.01.patch, HIVE-18326.02.patch, 
> HIVE-18326.02.patch, HIVE-18326.patch
>
>
> It is currently possible for e.g. two sides of a union (or a join for that 
> matter) to have slightly different priorities. We don't want to preempt 
> running tasks on one side in favor of the other side in such cases.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18379) ALTER TABLE authorization_part SET PROPERTIES ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is MicroManaged table.

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317398#comment-16317398
 ] 

Hive QA commented on HIVE-18379:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905143/HIVE-18379.02.patch

{color:red}ERROR:{color} -1 due to no test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 17 failed/errored test(s), 11549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[stats_aggregator_error_1]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8508/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8508/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8508/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 17 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905143 - PreCommit-HIVE-Build

> ALTER TABLE authorization_part SET PROPERTIES 
> ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is 
> MicroManaged table.
> -
>
> Key: HIVE-18379
> URL: https://issues.apache.org/jira/browse/HIVE-18379
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
> Attachments: HIVE-18379.01.patch, HIVE-18379.02.patch, 
> HIVE-18379.03.patch
>
>
> ALTER TABLE authorization_part SET TBLPROPERTIES 
> ("PARTITION_LEVEL_PRIVILEGE"="TRUE") fails when authorization_part is a 
> Micromanaged table.
> This is from authorization_2.q qtest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-08 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317380#comment-16317380
 ] 

Sahil Takiar commented on HIVE-16484:
-

[~xuefuz] makes sense. In that case, maybe making this change configurable (and 
turned off by default) makes sense? I can create a separate instance of 
{{SparkClient}} so that there aren't many changes required to 
{{SparkClientImpl}}.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Comment Edited] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317336#comment-16317336
 ] 

Xuefu Zhang edited comment on HIVE-16484 at 1/8/18 11:57 PM:
-

[~stakiar], I'm not denying the potential benefits we might get, for which I'm 
totally up for them. However, I wouldn't feel comfortable to replace a critical 
code path that's proven working with something that's completely new. For this, 
a fallback is much better than a sheer replacement.


was (Author: xuefuz):
[~stakiar], I'm not denying the potential benefits we might get, for which I'm 
totally up for them. However, I wouldn't feel comfortable to replace a code 
path that's proven working with something completely new. For this, a fallback 
is much better than a sheer replacement.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18379) ALTER TABLE authorization_part SET PROPERTIES ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is MicroManaged table.

2018-01-08 Thread Steve Yeom (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Yeom updated HIVE-18379:
--
Attachment: HIVE-18379.03.patch

Cleared checkstyle warning..

> ALTER TABLE authorization_part SET PROPERTIES 
> ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is 
> MicroManaged table.
> -
>
> Key: HIVE-18379
> URL: https://issues.apache.org/jira/browse/HIVE-18379
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
> Attachments: HIVE-18379.01.patch, HIVE-18379.02.patch, 
> HIVE-18379.03.patch
>
>
> ALTER TABLE authorization_part SET TBLPROPERTIES 
> ("PARTITION_LEVEL_PRIVILEGE"="TRUE") fails when authorization_part is a 
> Micromanaged table.
> This is from authorization_2.q qtest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18323) Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet

2018-01-08 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317372#comment-16317372
 ] 

Vihang Karajgaonkar commented on HIVE-18323:


Also, looks like there is some code duplication in 
{{VectorizedListColumnReader.readPrimitiveTypedRow}} which may need some change 
as well.

> Vectorization: add the support of timestamp in 
> VectorizedPrimitiveColumnReader for parquet
> --
>
> Key: HIVE-18323
> URL: https://issues.apache.org/jira/browse/HIVE-18323
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18323.1.patch
>
>
> {noformat}
> CREATE TABLE `t1`(
>   `ts` timestamp,
>   `s1` string)
> STORED AS PARQUET;
> set hive.vectorized.execution.enabled=true;
> SELECT * from t1 SORT BY s1;
> {noformat}
> This query will throw exception since timestamp is not supported here yet.
> {noformat}
> Caused by: java.io.IOException: java.io.IOException: Unsupported type: 
> optional int96 ts
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317336#comment-16317336
 ] 

Xuefu Zhang commented on HIVE-16484:


[~stakiar], I'm not denying the potential benefits we might get, for which I'm 
totally up for them. However, I wouldn't feel comfortable to replace a code 
path that's proven working with something completely new. For this, a fallback 
is much better than a sheer replacement.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18379) ALTER TABLE authorization_part SET PROPERTIES ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is MicroManaged table.

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317331#comment-16317331
 ] 

Hive QA commented on HIVE-18379:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m  
0s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
53s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
19s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  0m 
59s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
34s{color} | {color:red} ql: The patch generated 2 new + 119 unchanged - 1 
fixed = 121 total (was 120) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  0m 
57s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
11s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 13m 34s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 8412748 |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8508/yetus/diff-checkstyle-ql.txt
 |
| modules | C: ql U: ql |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8508/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> ALTER TABLE authorization_part SET PROPERTIES 
> ("PARTITIONL_LEVEL_PRIVILEGE"="TRUE"); fails when authorization_part is 
> MicroManaged table.
> -
>
> Key: HIVE-18379
> URL: https://issues.apache.org/jira/browse/HIVE-18379
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Assignee: Steve Yeom
>Priority: Minor
> Attachments: HIVE-18379.01.patch, HIVE-18379.02.patch
>
>
> ALTER TABLE authorization_part SET TBLPROPERTIES 
> ("PARTITION_LEVEL_PRIVILEGE"="TRUE") fails when authorization_part is a 
> Micromanaged table.
> This is from authorization_2.q qtest.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18393) Error returned when some other type is read as string from parquet tables

2018-01-08 Thread Janaki Lahorani (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Janaki Lahorani updated HIVE-18393:
---
Attachment: HIVE-18393.1.patch

The build failures doesn't make sense.  Resubmitting to evaluate.

> Error returned when some other type is read as string from parquet tables
> -
>
> Key: HIVE-18393
> URL: https://issues.apache.org/jira/browse/HIVE-18393
> Project: Hive
>  Issue Type: Bug
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
> Fix For: 3.0.0
>
> Attachments: HIVE-18393.1.patch, HIVE-18393.1.patch
>
>
> TimeStamp, Decimal, Double, Float, BigInt, Int, SmallInt, Tinyint and Boolean 
> when read as String, Varchar or Char should return the correct data.  Now 
> this results in error for parquet tables.
> Test Case:
> drop table if exists testAltCol;
> create table testAltCol
> (cId  TINYINT,
>  cTimeStamp TIMESTAMP,
>  cDecimal   DECIMAL(38,18),
>  cDoubleDOUBLE,
>  cFloat   FLOAT,
>  cBigIntBIGINT,
>  cInt INT,
>  cSmallInt  SMALLINT,
>  cTinyint   TINYINT,
>  cBoolean   BOOLEAN);
> insert into testAltCol values
> (1,
>  '2017-11-07 09:02:49.9',
>  12345678901234567890.123456789012345678,
>  1.79e308,
>  3.4e38,
>  1234567890123456789,
>  1234567890,
>  12345,
>  123,
>  TRUE);
> insert into testAltCol values
> (2,
>  '1400-01-01 01:01:01.1',
>  1.1,
>  2.2,
>  3.3,
>  1,
>  2,
>  3,
>  4,
>  FALSE);
> insert into testAltCol values
> (3,
>  '1400-01-01 01:01:01.1',
>  10.1,
>  20.2,
>  30.3,
>  1234567890123456789,
>  1234567890,
>  12345,
>  123,
>  TRUE);
> select cId, cTimeStamp from testAltCol order by cId;
> select cId, cDecimal, cDouble, cFloat from testAltCol order by cId;
> select cId, cBigInt, cInt, cSmallInt, cTinyint from testAltCol order by cId;
> select cId, cBoolean from testAltCol order by cId;
> drop table if exists testAltColP;
> create table testAltColP stored as parquet as select * from testAltCol;
> select cId, cTimeStamp from testAltColP order by cId;
> select cId, cDecimal, cDouble, cFloat from testAltColP order by cId;
> select cId, cBigInt, cInt, cSmallInt, cTinyint from testAltColP order by cId;
> select cId, cBoolean from testAltColP order by cId;
> alter table testAltColP replace columns
> (cId  TINYINT,
>  cTimeStamp STRING,
>  cDecimal   STRING,
>  cDoubleSTRING,
>  cFloat   STRING,
>  cBigIntSTRING,
>  cInt STRING,
>  cSmallInt  STRING,
>  cTinyint   STRING,
>  cBoolean   STRING);
> select cId, cTimeStamp from testAltColP order by cId;
> select cId, cDecimal, cDouble, cFloat from testAltColP order by cId;
> select cId, cBigInt, cInt, cSmallInt, cTinyint from testAltColP order by cId;
> select cId, cBoolean from testAltColP order by cId;
> alter table testAltColP replace columns
> (cId  TINYINT,
>  cTimeStamp VARCHAR(100),
>  cDecimal   VARCHAR(100),
>  cDoubleVARCHAR(100),
>  cFloat   VARCHAR(100),
>  cBigIntVARCHAR(100),
>  cInt VARCHAR(100),
>  cSmallInt  VARCHAR(100),
>  cTinyint   VARCHAR(100),
>  cBoolean   VARCHAR(100));
> select cId, cTimeStamp from testAltColP order by cId;
> select cId, cDecimal, cDouble, cFloat from testAltColP order by cId;
> select cId, cBigInt, cInt, cSmallInt, cTinyint from testAltColP order by cId;
> select cId, cBoolean from testAltColP order by cId;
> alter table testAltColP replace columns
> (cId  TINYINT,
>  cTimeStamp CHAR(100),
>  cDecimal   CHAR(100),
>  cDoubleCHAR(100),
>  cFloat   CHAR(100),
>  cBigIntCHAR(100),
>  cInt CHAR(100),
>  cSmallInt  CHAR(100),
>  cTinyint   CHAR(100),
>  cBoolean   CHAR(100));
> select cId, cTimeStamp from testAltColP order by cId;
> select cId, cDecimal, cDouble, cFloat from testAltColP order by cId;
> select cId, cBigInt, cInt, cSmallInt, cTinyint from testAltColP order by cId;
> select cId, cBoolean from testAltColP order by cId;
> drop table if exists testAltColP;
> Error:
> FAILED: Execution Error, return code 2 from 
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> Excerpt for log:
> 2018-01-05T15:54:05,756 ERROR [LocalJobRunner Map Task Executor #0] 
> mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
> Error while processing row [Error getting row data with exception 
> java.lang.UnsupportedOperationException: Cannot inspect 
> org.apache.hadoop.hive.serde2.io.TimestampWritable
>   at 
> org.apache.hadoop.hive.ql.io.parquet.serde.primitive.ParquetStringInspector.getPrimitiveJavaObject(ParquetStringInspector.java:77)



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-08 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317318#comment-16317318
 ] 

Marcelo Vanzin commented on HIVE-16484:
---

Just amending my previous response: if the concern with {{SparkLauncher}} is 
the number of file descriptors because of the extra one used by the launcher 
server connection, using {{InProcessLauncher}} will probably end up decreasing 
the number of fds being used. Instead of potentially 3 fds for a child process 
(pipes for stdin / stdout / stderr), you have one for the socket connection. 
For the normal, child process case, then yes, you just get one extra file 
descriptor. (Or maybe you get even, because I think {{SparkLauncher}} will 
merge stdout and stderr in that case.)

As for why this is better, I think the main advantage will come by using 
{{InProcessLauncher}} eventually, since Hive wouldn't need a separate Spark 
installation to be able to launch Spark apps. It could ship with everything 
ready to run HoS out of the box.

Security can probably become simpler; instead of having to run kinit before 
starting a Spark child process, HS2 could potentially just instantiate 
{{InProcessLauncher}} inside a {{proxyUser.doAs}} call. I haven't actually 
tried that, but that's the general idea of how to use it in a secure env.


> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18330) Fix TestMsgBusConnection - doesn't test tests the original intention

2018-01-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-18330:

   Resolution: Fixed
Fix Version/s: 3.0.0
   Status: Resolved  (was: Patch Available)

Pushed to master. Thanks, Zoltan!


> Fix TestMsgBusConnection - doesn't test tests the original intention
> 
>
> Key: HIVE-18330
> URL: https://issues.apache.org/jira/browse/HIVE-18330
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Fix For: 3.0.0
>
> Attachments: HIVE-18330.01.patch
>
>
> If the msgbus usage is configured; and the ActiveMQ broker is down; the 
> notificationlistener throws NPEs.
> this test should have never been passed...there is a point where it drops a 
> database; and that command returns with an error - there are other things 
> which are intrestinglike create database on an existing db is sucess 
> somewhere  - so it get posted to the msgbus.
> discovered during HIVE-18238



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18330) Fix TestMsgBusConnection - doesn't test tests the original intention

2018-01-08 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317304#comment-16317304
 ] 

Ashutosh Chauhan commented on HIVE-18330:
-

+1 lets get test fix in.

> Fix TestMsgBusConnection - doesn't test tests the original intention
> 
>
> Key: HIVE-18330
> URL: https://issues.apache.org/jira/browse/HIVE-18330
> Project: Hive
>  Issue Type: Bug
>Reporter: Zoltan Haindrich
>Assignee: Zoltan Haindrich
> Attachments: HIVE-18330.01.patch
>
>
> If the msgbus usage is configured; and the ActiveMQ broker is down; the 
> notificationlistener throws NPEs.
> this test should have never been passed...there is a point where it drops a 
> database; and that command returns with an error - there are other things 
> which are intrestinglike create database on an existing db is sucess 
> somewhere  - so it get posted to the msgbus.
> discovered during HIVE-18238



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17396) Support DPP with map joins where the source and target belong in the same stage

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317302#comment-16317302
 ] 

Hive QA commented on HIVE-17396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905139/HIVE-17396.6.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 20 failed/errored test(s), 11549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[mapjoin_hook] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.metastore.TestRemoteUGIHiveMetaStoreIpAddress.testIpAddress
 (batchId=218)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8507/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8507/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8507/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 20 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905139 - PreCommit-HIVE-Build

> Support DPP with map joins where the source and target belong in the same 
> stage
> ---
>
> Key: HIVE-17396
> URL: https://issues.apache.org/jira/browse/HIVE-17396
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
> Attachments: HIVE-17396.1.patch, HIVE-17396.1.patch, 
> HIVE-17396.1.patch, HIVE-17396.2.patch, HIVE-17396.3.patch, 
> HIVE-17396.4.patch, HIVE-17396.4.patch, HIVE-17396.5.patch, HIVE-17396.6.patch
>
>
> When the target of a partition pruning sink operator is in not the same as 
> the target of hash table sink operator, both source and target gets scheduled 
> within the same spark job, and that can result in File Not Found Exception.  
> HIVE-17225 has a fix to disable DPP in that scenario.  This JIRA is to 
> support DPP for such cases.
> Test Case:
> SET hive.spark.dynamic.partition.pruning=true;
> SET hive.auto.convert.join=true;
> SET hive.strict.checks.cartesian.product=false;
> CREATE TABLE part_table1 (col int) PARTITIONED BY (part1_col int);
> CREATE TABLE part_table2 (col int) PARTITIONED BY (part2_col int);
> CREATE TABLE reg_table (col int);
> ALTER TABLE part_table1 ADD PARTITION (part1_col = 1);
> ALTER TABLE part_table2 ADD PARTITION (part2_col = 1);
> ALTER TABLE part_table2 ADD PARTITION (part2_col = 2);
> INSERT INTO TABLE part_table1 PARTITION (part1_col = 1) VALUES (1);
> INSERT INTO TABLE part_table2 PARTITION (part2_col = 1) VALUES (1);
> INSERT INTO TABLE part_table2 PARTITION (part2_col = 2) VALUES (2);
> INSERT INTO table reg_table VALUES (1), (2), (3), (4), (5), (6);
> EXPLAIN SELECT *
> FROM   part_table1 pt1,
>part_table2 pt2,
>reg_table rt
> WHERE  rt.col = pt1.part1_col
> ANDpt2.part2_col = pt1.part1_col;
> Plan:
> STAGE 

[jira] [Updated] (HIVE-18390) IndexOutOfBoundsException when query a partitioned view in ColumnPruner

2018-01-08 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-18390:

Status: Patch Available  (was: Open)

> IndexOutOfBoundsException when query a  partitioned view in ColumnPruner 
> -
>
> Key: HIVE-18390
> URL: https://issues.apache.org/jira/browse/HIVE-18390
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning, Views
>Affects Versions: 2.1.1
>Reporter: Hengyu Dai
> Attachments: HIVE-18390.patch
>
>
> IndexOutOfBoundsException is encountered when query a partitioned view.
> in Column Prunning, each SEL operator collects the accessed column in current 
> SEL operator,
> When ColumnPrunerSelectProc getting a view's columns accessed, it will first 
> get the index of output column names in the view, then call 
> Table.getCols().get(index).getName() to finally get the 
> name of output column, but Table.getCols() will not return all columns 
> (partitioned column is
> lacked), so if partitioned columns is queried, an IndexOutOfBoundsException 
> will throw.
> REPRODUCE: 
> {code:sql}
> create table foo
> (
> `a` string
> ) partitioned by (`b` string)
> ;
> create view bar partitioned on (b) as
> select a,b from foo;
> select * from bar; --IndexOutOfBoundsException
> {code}
> OPERATORE TREE:
> {code:java}
> TS[0]
>|
> SEL[1]
>|
> SEL[2]
>|
> FS[3]
> {code}
> SEL[1] collects accessed column(contains partitioned column b), b's internal 
> column name is '_col1', the corresponding column index is 1, but actually 
> bar's getCols() returned a list of length 1: ['a'], so tab.getCols().get(1) 
> throw tab.getCols().get(index)
> HOW TO FIX:
> instead of call view's getCols() method, we should get all columns including 
> partitioned columns



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18323) Vectorization: add the support of timestamp in VectorizedPrimitiveColumnReader for parquet

2018-01-08 Thread Vihang Karajgaonkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317297#comment-16317297
 ] 

Vihang Karajgaonkar commented on HIVE-18323:


Thanks [~aihuaxu] for taking this up. Can you please add some test cases?

Vectorizer expects all the primitive to be vectorized if a input format 
implements VectorizedInputFormatInterface. This means any parquet table which 
has timestamp or {{INTERVAL_DAY_TIME}} will fail to execute in vectorized mode. 
I think we should fix this soon. I will create another JIRA for supporting 
interval_day_time.

I took a quick look. The isRepeating flag is always set to false in the patch. 
Can we change line 330 to something like below?
{noformat}
c.isRepeating = c.isRepeating && (c.vector[0] == c.vector[rowId]);
{noformat}
Would be good if [~Ferd] also takes a look at this one.

> Vectorization: add the support of timestamp in 
> VectorizedPrimitiveColumnReader for parquet
> --
>
> Key: HIVE-18323
> URL: https://issues.apache.org/jira/browse/HIVE-18323
> Project: Hive
>  Issue Type: Improvement
>  Components: Vectorization
>Affects Versions: 3.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-18323.1.patch
>
>
> {noformat}
> CREATE TABLE `t1`(
>   `ts` timestamp,
>   `s1` string)
> STORED AS PARQUET;
> set hive.vectorized.execution.enabled=true;
> SELECT * from t1 SORT BY s1;
> {noformat}
> This query will throw exception since timestamp is not supported here yet.
> {noformat}
> Caused by: java.io.IOException: java.io.IOException: Unsupported type: 
> optional int96 ts
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
> at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:365)
> at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:116)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18269) LLAP: Fast llap io with slow processing pipeline can lead to OOM

2018-01-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18269:

Attachment: HIVE-18269.03.patch

I cannot repro the new failures and they look like they are in unstable tests. 
Attaching the patch again just in case.

> LLAP: Fast llap io with slow processing pipeline can lead to OOM
> 
>
> Key: HIVE-18269
> URL: https://issues.apache.org/jira/browse/HIVE-18269
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Sergey Shelukhin
> Attachments: HIVE-18269.01.patch, HIVE-18269.02.patch, 
> HIVE-18269.03.patch, HIVE-18269.1.patch, HIVE-18269.bad.patch, Screen Shot 
> 2017-12-13 at 1.15.16 AM.png
>
>
> pendingData linked list in Llap IO elevator (LlapRecordReader.java) may grow 
> indefinitely when Llap IO is faster than processing pipeline. Since we don't 
> have backpressure to slow down the IO, this can lead to indefinite growth of 
> pending data leading to severe GC pressure and eventually lead to OOM.
> This specific instance of LLAP was running on HDFS on top of EBS volume 
> backed by SSD. The query that triggered this is issue was ANALYZE STATISTICS 
> .. FOR COLUMNS which also gather bitvectors. Fast IO and Slow processing case.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18152) Idempotent state change for resource plan

2018-01-08 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-18152:

Resolution: Done
Status: Resolved  (was: Patch Available)

Looks like it's already fixed somewhere else (resourceplan.q has tests for 
active->active, etc.)

> Idempotent state change for resource plan
> -
>
> Key: HIVE-18152
> URL: https://issues.apache.org/jira/browse/HIVE-18152
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18152.1.patch
>
>
> {code}
> show resource plans;
> +--+--++
> | rp_name  |  status  | query_parallelism  |
> +--+--++
> | llap | ACTIVE   | 1  |
> | global   | ENABLED  | 1  |
> +--+--++
> ALTER RESOURCE PLAN llap ACTIVATE;
> Error: Error while processing statement: FAILED: Execution Error, return code 
> 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Cannot get a resource plan to 
> apply (state=08S01,code=1)
> {code}
> It is better not to throw an error when current state is same as the altered 
> state.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-18359) Extend grouping set limits from int to long

2018-01-08 Thread Prasanth Jayachandran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-18359:
-
Attachment: HIVE-18359.4.patch

[~mmccline]/[~kgyrtkirk] vector_grouping_sets.q is failing after this patch. I 
spent a lot of time debugging the issue but couldn't crack it. This is very 
likely related to the states/assumptions introduced by HIVE-17617. Could you 
help debugging the issue?
When running vector_grouping_sets.q with this patch, following exception is 
thrown
{code}
Caused by: java.lang.NullPointerException
at java.lang.System.arraycopy(Native Method)
at 
org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector.setVal(BytesColumnVector.java:173)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorHashKeyWrapperBatch.assignRowColumn(VectorHashKeyWrapperBatch.java:1065)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.writeSingleRow(VectorGroupByOperator.java:1134)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.access$800(VectorGroupByOperator.java:74)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator$ProcessingModeReduceMergePartial.close(VectorGroupByOperator.java:862)
at 
org.apache.hadoop.hive.ql.exec.vector.VectorGroupByOperator.closeOp(VectorGroupByOperator.java:1176)
at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:705)
at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.close(ReduceRecordProcessor.java:383)
{code}


> Extend grouping set limits from int to long
> ---
>
> Key: HIVE-18359
> URL: https://issues.apache.org/jira/browse/HIVE-18359
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18359.1.patch, HIVE-18359.2.patch, 
> HIVE-18359.3.patch, HIVE-18359.4.patch
>
>
> Grouping sets is broken for >32 columns because of usage of Int for bitmap 
> (also GROUPING__ID virtual column). This assumption breaks grouping 
> sets/rollups/cube when number of participating aggregation columns is >32. 
> The easier fix would be extend it to Long for now. The correct fix would be 
> to use BitSets everywhere but that would require GROUPING__ID column type to 
> binary which will make predicates on GROUPING__ID difficult to deal with. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-17396) Support DPP with map joins where the source and target belong in the same stage

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317197#comment-16317197
 ] 

Hive QA commented on HIVE-17396:


| (/) *{color:green}+1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
32s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
31s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
11s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
49s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
1s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
20s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  1m 
27s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  1m 
14s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
17s{color} | {color:green} The patch common passed checkstyle {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  0m 
29s{color} | {color:green} ql: The patch generated 0 new + 21 unchanged - 2 
fixed = 21 total (was 23) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  1m  
0s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 15m 24s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 148807a |
| Default Java | 1.8.0_111 |
| modules | C: common ql U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8507/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> Support DPP with map joins where the source and target belong in the same 
> stage
> ---
>
> Key: HIVE-17396
> URL: https://issues.apache.org/jira/browse/HIVE-17396
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
> Attachments: HIVE-17396.1.patch, HIVE-17396.1.patch, 
> HIVE-17396.1.patch, HIVE-17396.2.patch, HIVE-17396.3.patch, 
> HIVE-17396.4.patch, HIVE-17396.4.patch, HIVE-17396.5.patch, HIVE-17396.6.patch
>
>
> When the target of a partition pruning sink operator is in not the same as 
> the target of hash table sink operator, both source and target gets scheduled 
> within the same spark job, and that can result in File Not Found Exception.  
> HIVE-17225 has a fix to disable DPP in that scenario.  This JIRA is to 
> support DPP for such cases.
> Test Case:
> SET hive.spark.dynamic.partition.pruning=true;
> SET hive.auto.convert.join=true;
> SET hive.strict.checks.cartesian.product=false;
> CREATE TABLE part_table1 (col int) PARTITIONED BY (part1_col int);
> CREATE TABLE part_table2 (col int) PARTITIONED BY (part2_col int);
> CREATE TABLE reg_table (col int);
> ALTER TABLE part_table1 ADD PARTITION (part1_col = 1);
> ALTER TABLE part_table2 ADD 

[jira] [Commented] (HIVE-18372) Create testing infra to test different HMS instances

2018-01-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317175#comment-16317175
 ] 

Alan Gates commented on HIVE-18372:
---

Peter, could you comment a bit on the goals behind this change?  I think what 
you're trying to do is take what we currently do with the TestHiveMetaStore and 
make it generic so that we do not have to put everything we want tested in that 
one class.  Is that correct, or is there more to it?  

Also, I don't understand the ClusterMetaStore class.  Are you connecting to an 
existing external metastore in this case?

A side note on the parameterization.  I would like to make it so that metastore 
(and really all of Hive's unit tests) can run in parallel using the surefire 
plugins parallel features.  I don't know how that interacts with the 
parameterization of the tests.  I'd like to figure that out before we commit 
this. 

> Create testing infra to test different HMS instances
> 
>
> Key: HIVE-18372
> URL: https://issues.apache.org/jira/browse/HIVE-18372
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Peter Vary
>Assignee: Peter Vary
> Attachments: HIVE-18372.patch
>
>
> Since there will be multiple tests, it would be good to have a good 
> infrastructure to help creating those faster, easier.
> This patch will also include the test cases for the Database related methods 
> to showcase the infra



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-08 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317173#comment-16317173
 ] 

Prasanth Jayachandran commented on HIVE-18349:
--

bq. Why should this method be throwing if the context says failure?
Context is available only through listeners. There is no way to know the final 
state when using the API directly without registering any listeners. 

The commit transaction returns a boolean which is not handled by any of the 
metastore functions.If the commit transaction fails for some reason on the db 
side, we rollback the transaction but API will silently ignore the failure. The 
only way to know if a metastore api succeeded is to look for the state in the 
context which is only available if listeners are registered. In one of the 
customer case, we saw a case where drop_table API completed successfully 
although the commit transaction failed in postgres (transaction rolled back but 
not indication via API). To detect such cases, the exception is thrown in the 
endFunction when state is failure and no exception is observed. 

> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch, HIVE-18349.5.patch, 
> HIVE-18349.6.patch, HIVE-18349.7.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18391) load data should rename files consistent with insert statements (bucketed tables only)

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317160#comment-16317160
 ] 

Hive QA commented on HIVE-18391:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905150/HIVE-18391.3.patch

{color:green}SUCCESS:{color} +1 due to 66 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 155 failed/errored test(s), 11549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_10] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_1] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_2] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_3] 
(batchId=245)
org.apache.hadoop.hive.cli.TestBeeLineDriver.testCliDriver[smb_mapjoin_7] 
(batchId=245)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_2] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join32] (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_11] 
(batchId=86)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_12] 
(batchId=33)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_1] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_2] 
(batchId=48)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_3] 
(batchId=2)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_4] 
(batchId=62)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_sortmerge_join_7] 
(batchId=89)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_1] 
(batchId=66)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_2] 
(batchId=58)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark1] 
(batchId=68)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark2] 
(batchId=3)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucket_map_join_spark3] 
(batchId=45)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_1] 
(batchId=32)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_2] 
(batchId=65)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_3] 
(batchId=67)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_4] 
(batchId=41)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_6] 
(batchId=83)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_7] 
(batchId=38)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketcontext_8] 
(batchId=37)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin10] 
(batchId=51)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin11] 
(batchId=71)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin12] 
(batchId=35)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin5] 
(batchId=84)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin8] 
(batchId=12)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin9] 
(batchId=16)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative2] 
(batchId=69)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[bucketmapjoin_negative] 
(batchId=23)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[confirm_initial_tbl_stats]
 (batchId=30)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog2] (batchId=13)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[constprog_partitioner] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[create_view] (batchId=40)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[dynpart_sort_opt_bucketing]
 (batchId=85)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_1_23] 
(batchId=78)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_2] 
(batchId=24)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_3] 
(batchId=80)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_4] 
(batchId=88)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_5] 
(batchId=46)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_7] 
(batchId=76)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_8] 
(batchId=52)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_9] 
(batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_skew_1_23] 
(batchId=8)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[groupby_sort_test_1] 
(batchId=8)

[jira] [Comment Edited] (HIVE-18395) Using stats for aggregates on Acid/MM is off even with "hive.compute.query.using.stats" is true.

2018-01-08 Thread Steve Yeom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317130#comment-16317130
 ] 

Steve Yeom edited comment on HIVE-18395 at 1/8/18 9:46 PM:
---

I talked with Ashutosh Chauhan. 
We turned off using stats on ACID/MM since the operation to insert onto a 
ACID/MM table 
and updating the Metastore on the table regarding stats is not transactional. 
I.e., there could be a case where inserts on MM/ACID are successful but 
updating 
stats on the table in the Metastore are unsuccessful.

Probably to make the set of operations (insert and Metastore metadata update) 
transactional will fix the issue.  


was (Author: steveyeom2017):
I talked with Ashutosh Chauhan. 
We turned off using stats on ACID/MM since the operation to insert onto a 
ACID/MM table 
and updating the Metastore on the table regarding stats is not transactional. 
I.e., there could be a case where inserts on MM/ACID are successful but 
updating 
stats on the table in the Metastore is unsuccessful.

Probably multistatement transaction feature may be needed or to have the set of 
operations
(insert and Metastore metadata update) is transactional.  

> Using stats for aggregates on Acid/MM is off even with 
> "hive.compute.query.using.stats" is true.
> 
>
> Key: HIVE-18395
> URL: https://issues.apache.org/jira/browse/HIVE-18395
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18349) Misc metastore changes for debuggability, error on commit txn failures

2018-01-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317133#comment-16317133
 ] 

Alan Gates commented on HIVE-18349:
---

HiveMetaStore.endFunction, lines 887 after your patch, I don't understand the 
following addition:
{code}
  if (!context.isSuccess() && context.getException() == null) {
888 throw new MetaException("Commit transaction failed");
889   }
{code}
Why should this method be throwing if the context says failure?  Shouldn't that 
be up to the calling methods?  Also, since the context is passed to the end 
function listener this means a broken end function listener that messed with 
the context could cause metastore calls to fail.




> Misc metastore changes for debuggability, error on commit txn failures
> --
>
> Key: HIVE-18349
> URL: https://issues.apache.org/jira/browse/HIVE-18349
> Project: Hive
>  Issue Type: Bug
>  Components: Metastore
>Affects Versions: 3.0.0
>Reporter: Prasanth Jayachandran
>Assignee: Prasanth Jayachandran
> Attachments: HIVE-18349.1.patch, HIVE-18349.2.patch, 
> HIVE-18349.3.patch, HIVE-18349.4.patch, HIVE-18349.5.patch, 
> HIVE-18349.6.patch, HIVE-18349.7.patch
>
>
> 1) Hive metastore audit event log/metastore log does not log the final status 
> (success or failed) of the event. Some operations like for example, 
> drop_table returns a boolean success flag but it never gets logged anywhere. 
> However the same is sent to end event listeners or other metastore event 
> listeners. It will be good to log the final status of the events. 
> 2) Make connection timeout when using connection pool configurable. Currently 
> its hard coded to 30 seconds.
> 3) Provide a config to enable connection leak detection for HikariCP or 
> enable when debug logging is enabled.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18395) Using stats for aggregates on Acid/MM is off even with "hive.compute.query.using.stats" is true.

2018-01-08 Thread Steve Yeom (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317130#comment-16317130
 ] 

Steve Yeom commented on HIVE-18395:
---

I talked with Ashutosh Chauhan. 
We turned off using stats on ACID/MM since the operation to insert onto a 
ACID/MM table 
and updating the Metastore on the table regarding stats is not transactional. 
I.e., there could be a case where inserts on MM/ACID are successful but 
updating 
stats on the table in the Metastore is unsuccessful.

Probably multistatement transaction feature may be needed or to have the set of 
operations
(insert and Metastore metadata update) is transactional.  

> Using stats for aggregates on Acid/MM is off even with 
> "hive.compute.query.using.stats" is true.
> 
>
> Key: HIVE-18395
> URL: https://issues.apache.org/jira/browse/HIVE-18395
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 3.0.0
>Reporter: Steve Yeom
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18391) load data should rename files consistent with insert statements (bucketed tables only)

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317110#comment-16317110
 ] 

Hive QA commented on HIVE-18391:


| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue}  0m  
1s{color} | {color:blue} Findbugs executables are not available. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green}  0m  
0s{color} | {color:green} The patch does not contain any @author tags. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  1m 
21s{color} | {color:blue} Maven dependency ordering for branch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  5m 
33s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
28s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green}  2m 
15s{color} | {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m  
3s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:blue}0{color} | {color:blue} mvndep {color} | {color:blue}  0m 
18s{color} | {color:blue} Maven dependency ordering for patch {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green}  6m 
24s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  5m 
29s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  5m 
29s{color} | {color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  0m 
38s{color} | {color:red} ql: The patch generated 3 new + 10 unchanged - 0 fixed 
= 13 total (was 10) {color} |
| {color:red}-1{color} | {color:red} checkstyle {color} | {color:red}  1m 
43s{color} | {color:red} root: The patch generated 3 new + 10 unchanged - 0 
fixed = 13 total (was 10) {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green}  0m 
 0s{color} | {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green}  6m 
21s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green}  0m 
12s{color} | {color:green} The patch does not generate ASF License warnings. 
{color} |
| {color:black}{color} | {color:black} {color} | {color:black} 42m 44s{color} | 
{color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| Optional Tests |  asflicense  javac  javadoc  findbugs  checkstyle  compile  |
| uname | Linux hiveptest-server-upstream 3.16.0-4-amd64 #1 SMP Debian 
3.16.36-1+deb8u1 (2016-09-03) x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /data/hiveptest/working/yetus/dev-support/hive-personality.sh |
| git revision | master / 148807a |
| Default Java | 1.8.0_111 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8506/yetus/diff-checkstyle-ql.txt
 |
| checkstyle | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8506/yetus/diff-checkstyle-root.txt
 |
| modules | C: ql . U: . |
| Console output | 
http://104.198.109.242/logs//PreCommit-HIVE-Build-8506/yetus.txt |
| Powered by | Apache Yetushttp://yetus.apache.org |


This message was automatically generated.



> load data should rename files consistent with insert statements (bucketed 
> tables only)
> --
>
> Key: HIVE-18391
> URL: https://issues.apache.org/jira/browse/HIVE-18391
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Deepak Jaiswal
>Assignee: Deepak Jaiswal
> Attachments: HIVE-18391.1.patch, HIVE-18391.2.patch, 
> HIVE-18391.3.patch
>
>
> Insert statements create files of format ending with _0, 0001_0 etc. 
> However, the load data uses the input file name. That results in inconsistent 
> naming convention which makes SMB joins difficult in some scenarios and may 
> cause trouble for other types of queries in future.
> We need consistent naming convention.
> For bucketed table, hive relies on user to name the files matching the bucket 
> in non-strict mode. Hive assumes that the data belongs to same bucket in a 
> file. In strict mode, loading bucketed table is disabled.
> This will likely affect most of the tests which load data which is pretty 
> significant.



--
This message was sent 

[jira] [Updated] (HIVE-18367) Describe Extended output is truncated on a table with an explicit row format containing tabs or newlines.

2018-01-08 Thread Andrew Sherman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Sherman updated HIVE-18367:
--
Attachment: HIVE-18367.3.patch

Some test changes because the 'describe extended' output now always comes
on one line. This seems better. Anyone who needs to parse the output
(for example to get view text) can use 'describe formatted'

> Describe Extended output is truncated on a table with an explicit row format 
> containing tabs or newlines.
> -
>
> Key: HIVE-18367
> URL: https://issues.apache.org/jira/browse/HIVE-18367
> Project: Hive
>  Issue Type: Bug
>Reporter: Andrew Sherman
>Assignee: Andrew Sherman
> Attachments: HIVE-18367.1.patch, HIVE-18367.2.patch, 
> HIVE-18367.3.patch
>
>
> 'Describe Extended' dumps information about a table. The protocol for sending 
> this data relies on tabs and newlines to separate pieces of data. If a table 
> has 'FIELDS terminated by XXX' or 'LINES terminated by XXX' where XXX is a 
> tab or newline then the output seen by the user is prematurely truncated. Fix 
> this by replacing tabs and newlines in the table description with “\n” and 
> “\t”.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18355) Add builder for metastore Thrift classes missed in the first pass - FunctionBuilder

2018-01-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317103#comment-16317103
 ] 

Alan Gates commented on HIVE-18355:
---

A couple of comments in the build method:
# It should only set the owner if owner hasn't already been set.  As it is now 
it will override any owner the set in setOwner.
# Why the double try blocks, with the outer one catching all exceptions?  The 
outer block can be removed.  If there are any uncaught exceptions as a result, 
they should be caught and converted to MetaException in the inner block as 
IOException is.

> Add builder for metastore Thrift classes missed in the first pass - 
> FunctionBuilder
> ---
>
> Key: HIVE-18355
> URL: https://issues.apache.org/jira/browse/HIVE-18355
> Project: Hive
>  Issue Type: Bug
>  Components: Standalone Metastore
>Affects Versions: 3.0.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-18355.patch
>
>
> Add a FunctionBuilder class



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-08 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317066#comment-16317066
 ] 

Sahil Takiar commented on HIVE-16484:
-

[~xuefuz] thanks for voicing your concern, I see a few benefits to doing this:

* The main benefit is the usage of {{InProcessLauncher}} which was added in 
SPARK-11035
** I didn't add the integration with {{InProcessLauncher}} to this patch mainly 
because I didn't want the diff to get too big; I plan to add integration with 
{{InProcessLauncher}} in another JIRA
** The {{InProcessLauncher}} avoids running {{bin/spark-submit}}, it calls 
{{SparkSubmit#main}} directly, which decreases the amount of time it takes to 
start a HoS session; a separate process doesn't need to be launched to start 
the Spark app
** It also makes HoS easier to debug because everything is run in a single 
process, we don't have to rely on re-directing stdout / stderr output streams, 
etc.
* The API is much cleaner than building up command line arguments for 
{{bin/spark-submit}}

Some other thoughts:

{quote} Moreover, security related stuff will need more testing at least. 
{quote} I'm not that familiar with the security aspects of HoS, but I can add 
some tests with {{MiniHiveKdc}} / doAs to check if things are still good. 

{quote} I'd feel nervous in completely different code path which is so critical 
{quote} Valid point, but the code path isn't that different, at the end of the 
day everything is going through {{SparkSubmit.scala}}.

{quote} we can make a switch in later releases {quote} I don't think we have 
plans to release Hive 3.0.0 anytime soon, so we can fix any issues with 
{{SparkLauncher}} before the release.

Let me know your thoughts.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde

2018-01-08 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17580:
---
Status: Patch Available  (was: In Progress)

> Remove dependency of get_fields_with_environment_context API to serde
> -
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>  Labels: pull-request-available
> Attachments: HIVE-17580-standalone-metastore.001.patch
>
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (HIVE-17580) Remove dependency of get_fields_with_environment_context API to serde

2018-01-08 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-17580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar updated HIVE-17580:
---
Attachment: HIVE-17580-standalone-metastore.001.patch

attaching the first version of the patch to trigger tests.

> Remove dependency of get_fields_with_environment_context API to serde
> -
>
> Key: HIVE-17580
> URL: https://issues.apache.org/jira/browse/HIVE-17580
> Project: Hive
>  Issue Type: Sub-task
>  Components: Standalone Metastore
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>  Labels: pull-request-available
> Attachments: HIVE-17580-standalone-metastore.001.patch
>
>
> {{get_fields_with_environment_context}} metastore API uses {{Deserializer}} 
> class to access the fields metadata for the cases where it is stored along 
> with the data files (avro tables). The problem is Deserializer classes is 
> defined in hive-serde module and in order to make metastore independent of 
> Hive we will have to remove this dependency (atleast we should change it to 
> runtime dependency instead of compile time).
> The other option is investigate if we can use SearchArgument to provide this 
> functionality.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit

2018-01-08 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317034#comment-16317034
 ] 

Xuefu Zhang commented on HIVE-16484:


I'd echo with [~lirui], wondering the benefits the proposal brings. While I 
only gave a brief look on the patch, but from the conversations I found that 
SparkLauncher doesn't really offer all the advantages that are listed in the 
description. Rather, it brings uncertainty and possible stability issues in 
Hive.

We have been using HoS using spark-submit for our production. While it bears 
some imperfection (like launching a dummy process), it works for us. I'd feel 
nervous in completely different code path which is so critical. Moreover, 
security related stuff will need more testing at least.

Having said that, I'd suggest we keep existing implementation of Spark job 
submission. If we want to test out SparkLauncher, I think we can use it to 
replace the other code path where class {{org.apache.spark.deploy.SparkSubmit}} 
is directly invoked(, if that makes sense at all).

When SparkLauncher becomes mature and capable of replacing {{bin/spark-submit}} 
with the promised benefits, we can make a switch in later releases, which 
hopefully brings no impact to Hive on Spark users.

> Investigate SparkLauncher for HoS as alternative to bin/spark-submit
> 
>
> Key: HIVE-16484
> URL: https://issues.apache.org/jira/browse/HIVE-16484
> Project: Hive
>  Issue Type: Bug
>  Components: Spark
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-16484.1.patch, HIVE-16484.10.patch, 
> HIVE-16484.2.patch, HIVE-16484.3.patch, HIVE-16484.4.patch, 
> HIVE-16484.5.patch, HIVE-16484.6.patch, HIVE-16484.7.patch, 
> HIVE-16484.8.patch, HIVE-16484.9.patch
>
>
> The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} 
> directory and invokes the {{bin/spark-submit}} script, which spawns a 
> separate process to run the Spark application.
> {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch 
> Spark applications.
> I see a few advantages:
> * No need to spawn a separate process to launch a HoS --> lower startup time
> * Simplifies the code in {{SparkClientImpl}} --> easier to debug
> * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which 
> contains some useful utilities for querying the state of the Spark job
> ** It also allows the launcher to specify a list of job listeners



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HIVE-18393) Error returned when some other type is read as string from parquet tables

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-18393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317028#comment-16317028
 ] 

Hive QA commented on HIVE-18393:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905136/HIVE-18393.1.patch

{color:red}ERROR:{color} -1 due to build exiting with an error

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8505/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8505/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8505/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Tests exited with: NonZeroExitCodeException
Command 'bash /data/hiveptest/working/scratch/source-prep.sh' failed with exit 
status 1 and output '+ date '+%Y-%m-%d %T.%3N'
2018-01-08 20:39:56.649
+ [[ -n /usr/lib/jvm/java-8-openjdk-amd64 ]]
+ export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
+ export 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ 
PATH=/usr/lib/jvm/java-8-openjdk-amd64/bin/:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
+ export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m '
+ ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m '
+ export 'MAVEN_OPTS=-Xmx1g '
+ MAVEN_OPTS='-Xmx1g '
+ cd /data/hiveptest/working/
+ tee /data/hiveptest/logs/PreCommit-HIVE-Build-8505/source-prep.txt
+ [[ false == \t\r\u\e ]]
+ mkdir -p maven ivy
+ [[ git = \s\v\n ]]
+ [[ git = \g\i\t ]]
+ [[ -z master ]]
+ [[ -d apache-github-source-source ]]
+ [[ ! -d apache-github-source-source/.git ]]
+ [[ ! -d apache-github-source-source ]]
+ date '+%Y-%m-%d %T.%3N'
2018-01-08 20:39:56.652
+ cd apache-github-source-source
+ git fetch origin
>From https://github.com/apache/hive
   a6b88d9..148807a  master -> origin/master
Auto packing the repository in background for optimum performance.
See "git help gc" for manual housekeeping.
+ git reset --hard HEAD
HEAD is now at a6b88d9 HIVE-18096 : add a user-friendly show plan command 
(Harish Jaiprakash, reviewed by Sergey Shelukhin)
+ git clean -f -d
+ git checkout master
Already on 'master'
Your branch is behind 'origin/master' by 3 commits, and can be fast-forwarded.
  (use "git pull" to update your local branch)
+ git reset --hard origin/master
HEAD is now at 148807a HIVE-16970: General Improvements To 
org.apache.hadoop.hive.metastore.cache.CacheUtils (BELUGA BEHR, reviewed by 
Ashutosh Chauhan)
+ git merge --ff-only origin/master
Already up-to-date.
+ date '+%Y-%m-%d %T.%3N'
2018-01-08 20:40:00.806
+ rm -rf ../yetus
+ mkdir ../yetus
+ cp -R . ../yetus
cp: cannot stat ?./.git/gc.pid?: No such file or directory
+ exit 1
'
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905136 - PreCommit-HIVE-Build

> Error returned when some other type is read as string from parquet tables
> -
>
> Key: HIVE-18393
> URL: https://issues.apache.org/jira/browse/HIVE-18393
> Project: Hive
>  Issue Type: Bug
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
> Fix For: 3.0.0
>
> Attachments: HIVE-18393.1.patch
>
>
> TimeStamp, Decimal, Double, Float, BigInt, Int, SmallInt, Tinyint and Boolean 
> when read as String, Varchar or Char should return the correct data.  Now 
> this results in error for parquet tables.
> Test Case:
> drop table if exists testAltCol;
> create table testAltCol
> (cId  TINYINT,
>  cTimeStamp TIMESTAMP,
>  cDecimal   DECIMAL(38,18),
>  cDoubleDOUBLE,
>  cFloat   FLOAT,
>  cBigIntBIGINT,
>  cInt INT,
>  cSmallInt  SMALLINT,
>  cTinyint   TINYINT,
>  cBoolean   BOOLEAN);
> insert into testAltCol values
> (1,
>  '2017-11-07 09:02:49.9',
>  12345678901234567890.123456789012345678,
>  1.79e308,
>  3.4e38,
>  1234567890123456789,
>  1234567890,
>  12345,
>  123,
>  TRUE);
> insert into testAltCol values
> (2,
>  '1400-01-01 01:01:01.1',
>  1.1,
>  2.2,
>  3.3,
>  1,
>  2,
>  3,
>  4,
>  FALSE);
> insert into testAltCol values
> (3,
>  '1400-01-01 01:01:01.1',
>  10.1,
>  20.2,
>  30.3,
>  1234567890123456789,
>  1234567890,
>  12345,
>  123,
>  TRUE);
> select cId, cTimeStamp from testAltCol order by cId;
> select cId, cDecimal, cDouble, cFloat from testAltCol order by cId;
> select cId, cBigInt, cInt, cSmallInt, cTinyint from testAltCol order by cId;
> select cId, cBoolean from testAltCol order by cId;
> drop table if exists testAltColP;
> create table testAltColP stored as parquet as select * from testAltCol;
> select cId, cTimeStamp from testAltColP order by cId;
> select cId, cDecimal, cDouble, cFloat from testAltColP order by cId;
> select cId, cBigInt, cInt, cSmallInt, 

[jira] [Commented] (HIVE-17396) Support DPP with map joins where the source and target belong in the same stage

2018-01-08 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-17396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16317014#comment-16317014
 ] 

Hive QA commented on HIVE-17396:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12905135/HIVE-17396.5.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 18 failed/errored test(s), 11549 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[auto_join25] (batchId=72)
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[ppd_join5] (batchId=35)
org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[llap_smb] 
(batchId=150)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[bucketsortoptimize_insert_2]
 (batchId=151)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[hybridgrace_hashjoin_2]
 (batchId=156)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[insert_values_orig_table_use_metadata]
 (batchId=164)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid] 
(batchId=168)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[llap_acid_fast]
 (batchId=159)
org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[sysdb] 
(batchId=159)
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[authorization_part]
 (batchId=93)
org.apache.hadoop.hive.cli.TestSparkCliDriver.testCliDriver[ppd_join5] 
(batchId=120)
org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore.testTransactionalValidation
 (batchId=213)
org.apache.hadoop.hive.ql.io.TestDruidRecordWriter.testWrite (batchId=253)
org.apache.hadoop.hive.ql.parse.TestReplicationScenarios.testConstraints 
(batchId=225)
org.apache.hive.jdbc.TestSSL.testConnectionMismatch (batchId=231)
org.apache.hive.jdbc.TestSSL.testConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestSSL.testMetastoreConnectionWrongCertCN (batchId=231)
org.apache.hive.jdbc.TestTriggersMoveWorkloadManager.testTriggerMoveConflictKill
 (batchId=235)
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/8504/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/8504/console
Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-8504/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.YetusPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 18 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12905135 - PreCommit-HIVE-Build

> Support DPP with map joins where the source and target belong in the same 
> stage
> ---
>
> Key: HIVE-17396
> URL: https://issues.apache.org/jira/browse/HIVE-17396
> Project: Hive
>  Issue Type: Sub-task
>  Components: Spark
>Reporter: Janaki Lahorani
>Assignee: Janaki Lahorani
> Attachments: HIVE-17396.1.patch, HIVE-17396.1.patch, 
> HIVE-17396.1.patch, HIVE-17396.2.patch, HIVE-17396.3.patch, 
> HIVE-17396.4.patch, HIVE-17396.4.patch, HIVE-17396.5.patch, HIVE-17396.6.patch
>
>
> When the target of a partition pruning sink operator is in not the same as 
> the target of hash table sink operator, both source and target gets scheduled 
> within the same spark job, and that can result in File Not Found Exception.  
> HIVE-17225 has a fix to disable DPP in that scenario.  This JIRA is to 
> support DPP for such cases.
> Test Case:
> SET hive.spark.dynamic.partition.pruning=true;
> SET hive.auto.convert.join=true;
> SET hive.strict.checks.cartesian.product=false;
> CREATE TABLE part_table1 (col int) PARTITIONED BY (part1_col int);
> CREATE TABLE part_table2 (col int) PARTITIONED BY (part2_col int);
> CREATE TABLE reg_table (col int);
> ALTER TABLE part_table1 ADD PARTITION (part1_col = 1);
> ALTER TABLE part_table2 ADD PARTITION (part2_col = 1);
> ALTER TABLE part_table2 ADD PARTITION (part2_col = 2);
> INSERT INTO TABLE part_table1 PARTITION (part1_col = 1) VALUES (1);
> INSERT INTO TABLE part_table2 PARTITION (part2_col = 1) VALUES (1);
> INSERT INTO TABLE part_table2 PARTITION (part2_col = 2) VALUES (2);
> INSERT INTO table reg_table VALUES (1), (2), (3), (4), (5), (6);
> EXPLAIN SELECT *
> FROM   part_table1 pt1,
>part_table2 pt2,
>reg_table rt
> WHERE  rt.col = pt1.part1_col
> ANDpt2.part2_col = pt1.part1_col;
> Plan:
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
> Spark
>  A 

  1   2   >