[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Affects Version/s: 2.1.0

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, 
> HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, 
> HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, 
> HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Component/s: Hive

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.1.0
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, 
> HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, 
> HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, 
> HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Status: Patch Available  (was: Open)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, 
> HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, 
> HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, 
> HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: HIVE-13934.11.patch

Thanks for suggestion. patch 11 changed default to -1f and used >0 for 
comparison. Previous patch didn't get triggered QA run. Hope patch 11 will 
trigger it.

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, 
> HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, 
> HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, 
> HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14204) Optimize loading dynamic partitions

2016-07-19 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385363#comment-15385363
 ] 

Rajesh Balamohan commented on HIVE-14204:
-

I will check and repost the patch.

> Optimize loading dynamic partitions 
> 
>
> Key: HIVE-14204
> URL: https://issues.apache.org/jira/browse/HIVE-14204
> Project: Hive
>  Issue Type: Improvement
>Reporter: Rajesh Balamohan
>Assignee: Rajesh Balamohan
>Priority: Minor
> Attachments: HIVE-14204.1.patch
>
>
> Lots of time is spent in sequential fashion to load dynamic partitioned 
> dataset in driver side. E.g simple dynamic partitioned load as follows takes 
> 300+ seconds
> {noformat}
> INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from 
> tpcds_bin_partitioned_orc_200.web_sales;
> Time taken to load dynamic partitions: 309.22 seconds
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385357#comment-15385357
 ] 

Eugene Koifman commented on HIVE-14292:
---

if you look at the logic where isDuplicateKey() is used, the idea is to lock a 
record with specific value and if it's not there, then insert it and then lock 
it.  Since you cannot be certain 2 entities are not going through this same 
process, the logic catches the "duplicate key" and proceeds but raises an error 
if it's any other error.

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
> Attachments: HIVE-14292.patch
>
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740)
> at 
> 

[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Wei Zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385354#comment-15385354
 ] 

Wei Zheng commented on HIVE-14292:
--

I see we're relaxing the !isDuplicateKeyError logic. Just want to understand 
why we don't throw exception for duplicate key errors? There are a dozen errors 
with "duplicate something" in the link above..

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
> Attachments: HIVE-14292.patch
>
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341)
>   

[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-19 Thread Yibing Shi (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibing Shi updated HIVE-14205:
--
Attachment: HIVE-14205.5.patch

attach a new patch that includes latest changes in master branch. If this still 
doesn't work, I will remove the binary files and use insert instead as 
[~ctang.ma] has said.

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> Another test case to show this problem is:
> {noformat}
> hive> create table avro_union_test2 (value uniontype) stored as 
> avro;
> OK
> Time taken: 0.053 seconds
> hive> show create table avro_union_test2;
> OK
> CREATE TABLE `avro_union_test2`(
>   `value` uniontype COMMENT '')

[jira] [Comment Edited] (HIVE-14293) PerfLogger.openScopes should be transient

2016-07-19 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385229#comment-15385229
 ] 

Gopal V edited comment on HIVE-14293 at 7/20/16 3:02 AM:
-

[~daijy]: yes, the transient would be needed in the same way - to ensure that 
the PerfLogger is initialized correctly in the operator side instead of being a 
copy of state left over from the planner.


was (Author: gopalv):
[~daijy]: yes, the transient would be needed in the same way - to ensure that 
the PerfLogger is initialized correctly in the operator side instead of being a 
copy from the planner.

> PerfLogger.openScopes should be transient
> -
>
> Key: HIVE-14293
> URL: https://issues.apache.org/jira/browse/HIVE-14293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-14293.1.patch
>
>
> See the following exception when running Hive e2e tests:
> {code}
> 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, 
> v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name 
> = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) 
> INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = 
> s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions;
> INFO  : Compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, 
> type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), 
> FieldSchema(name:s.gpa, type:double, comment:null), 
> FieldSchema(name:v.registration, type:string, comment:null), 
> FieldSchema(name:v2.contributions, type:float, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); 
> Time taken: 1.165 seconds
> INFO  : Executing 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Session is already open
> INFO  : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1)
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Error caching map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.util.ConcurrentModificationException
> Serialization trace:
> classes (sun.misc.Launcher$AppClassLoader)
> classloader (java.security.ProtectionDomain)
> context (java.security.AccessControlContext)
> acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader)
> classLoader (org.apache.hadoop.hive.conf.HiveConf)
> conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics)
> metrics 
> (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope)
> openScopes (org.apache.hadoop.hive.ql.log.PerfLogger)
> perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>  

[jira] [Commented] (HIVE-14293) PerfLogger.openScopes should be transient

2016-07-19 Thread Gopal V (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385229#comment-15385229
 ] 

Gopal V commented on HIVE-14293:


[~daijy]: yes, the transient would be needed in the same way - to ensure that 
the PerfLogger is initialized correctly in the operator side instead of being a 
copy from the planner.

> PerfLogger.openScopes should be transient
> -
>
> Key: HIVE-14293
> URL: https://issues.apache.org/jira/browse/HIVE-14293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-14293.1.patch
>
>
> See the following exception when running Hive e2e tests:
> {code}
> 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, 
> v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name 
> = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) 
> INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = 
> s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions;
> INFO  : Compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, 
> type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), 
> FieldSchema(name:s.gpa, type:double, comment:null), 
> FieldSchema(name:v.registration, type:string, comment:null), 
> FieldSchema(name:v2.contributions, type:float, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); 
> Time taken: 1.165 seconds
> INFO  : Executing 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Session is already open
> INFO  : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1)
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Error caching map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.util.ConcurrentModificationException
> Serialization trace:
> classes (sun.misc.Launcher$AppClassLoader)
> classloader (java.security.ProtectionDomain)
> context (java.security.AccessControlContext)
> acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader)
> classLoader (org.apache.hadoop.hive.conf.HiveConf)
> conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics)
> metrics 
> (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope)
> openScopes (org.apache.hadoop.hive.ql.log.PerfLogger)
> perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   

[jira] [Commented] (HIVE-14293) PerfLogger.openScopes should be transient

2016-07-19 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385211#comment-15385211
 ] 

Daniel Dai commented on HIVE-14293:
---

We do want to use PerfLogger in the backend in map join to print out 
performance message. It's possible to switch to a different mechanism for that 
purpose, but that worth a separate ticket.

> PerfLogger.openScopes should be transient
> -
>
> Key: HIVE-14293
> URL: https://issues.apache.org/jira/browse/HIVE-14293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-14293.1.patch
>
>
> See the following exception when running Hive e2e tests:
> {code}
> 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, 
> v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name 
> = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) 
> INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = 
> s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions;
> INFO  : Compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, 
> type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), 
> FieldSchema(name:s.gpa, type:double, comment:null), 
> FieldSchema(name:v.registration, type:string, comment:null), 
> FieldSchema(name:v2.contributions, type:float, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); 
> Time taken: 1.165 seconds
> INFO  : Executing 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Session is already open
> INFO  : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1)
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Error caching map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.util.ConcurrentModificationException
> Serialization trace:
> classes (sun.misc.Launcher$AppClassLoader)
> classloader (java.security.ProtectionDomain)
> context (java.security.AccessControlContext)
> acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader)
> classLoader (org.apache.hadoop.hive.conf.HiveConf)
> conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics)
> metrics 
> (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope)
> openScopes (org.apache.hadoop.hive.ql.log.PerfLogger)
> perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) 
> 

[jira] [Commented] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385208#comment-15385208
 ] 

Daniel Dai commented on HIVE-14282:
---

Forget to mention here, I also bump Pig version since partition push down of a 
constant UDF condition only works for Pig 0.15+. Need to watch if it breaks any 
UT.

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385198#comment-15385198
 ] 

Gunther Hagleitner commented on HIVE-13934:
---

+1 although I'm hoping you could change the float comparison on commit ( != 
0.0f). Either make the default -1f and do "> 0" when checking or if possible 
make the default "null" and check for that. 

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, 
> HIVE-13934.2.patch, HIVE-13934.3.patch, HIVE-13934.4.patch, 
> HIVE-13934.6.patch, HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Gunther Hagleitner (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385191#comment-15385191
 ] 

Gunther Hagleitner commented on HIVE-14282:
---

FWIW this seems to silently update min pig version for hcat from 0.12 to 0.16...

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14292:
--
Target Version/s: 1.3.0, 2.2.0, 2.1.1

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
> Attachments: HIVE-14292.patch
>
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357)
> at 
> 

[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385189#comment-15385189
 ] 

Eugene Koifman commented on HIVE-14292:
---

TxnHandler.isDuplicateKey() method is checking wrong SQL code
https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html
MySql has > 1 with ideantical msg
it's currently checking 1022 which has msg "Can't write; duplicate key in table 
'%s'"
but the one MySql comes back with is 1062 "Duplicate entry '%s' for key %d"
there is also 1586 with "Duplicate entry '%s' for key '%s'"
I don't see anything that explains which is produced when...

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
> Attachments: HIVE-14292.patch
>
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> 

[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Eugene Koifman (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385190#comment-15385190
 ] 

Eugene Koifman commented on HIVE-14292:
---

[~wzheng] could you review please

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
> Attachments: HIVE-14292.patch
>
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357)
> at 
> 

[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14292:
--
Status: Patch Available  (was: Open)

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 2.1.0, 1.3.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
> Attachments: HIVE-14292.patch
>
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357)
> at 
> 

[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14292:
--
Attachment: HIVE-14292.patch

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
> Attachments: HIVE-14292.patch
>
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357)
> at 
> 

[jira] [Commented] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-19 Thread Matt McCline (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385169#comment-15385169
 ] 

Matt McCline commented on HIVE-14214:
-

Ok, review board created.  New patch submitted for Hive QA.

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, 
> HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-19 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14214:

Attachment: HIVE-14214.05.patch

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, 
> HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-19 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14214:

Attachment: (was: HIVE-14214.05.patch)

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385164#comment-15385164
 ] 

Mohit Sabharwal commented on HIVE-14251:


LGTM.

If i understand correctly, the only difference between isCommonTypeOf and 
implicitConvertible is this line
{code}
// Allow implicit String to Double conversion
if (fromPg == PrimitiveGrouping.STRING_GROUP && to == 
PrimitiveCategory.DOUBLE) {
  return true;
}
{code}

Wondering if it's easy to re-use implicitConvertible instead of duplicating the 
code ? Maybe add a flag to the method
for String to Double conversion ? 


> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-07-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12181:

Status: Patch Available  (was: Open)

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.2.patch, 
> HIVE-12181.patch, HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver

2016-07-19 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-12181:

Status: Open  (was: Patch Available)

> Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
> ---
>
> Key: HIVE-12181
> URL: https://issues.apache.org/jira/browse/HIVE-12181
> Project: Hive
>  Issue Type: Improvement
>  Components: Statistics
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: HIVE-12181.1.patch, HIVE-12181.2.patch, HIVE-12181.patch
>
>
> There was a performance concern earlier, but HIVE-7587 has fixed that. We can 
> change the default to true now.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-19 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14214:

Status: Patch Available  (was: In Progress)

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, 
> HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)

2016-07-19 Thread Matt McCline (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline updated HIVE-14214:

Attachment: HIVE-14214.05.patch

> ORC Schema Evolution and Predicate Push Down do not work together (no rows 
> returned)
> 
>
> Key: HIVE-14214
> URL: https://issues.apache.org/jira/browse/HIVE-14214
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Reporter: Matt McCline
>Assignee: Matt McCline
>Priority: Critical
> Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, 
> HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, 
> HIVE-14214.WIP.patch
>
>
> In Schema Evolution, the reader schema is different than the file schema 
> which is used to evaluate predicate push down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs

2016-07-19 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385129#comment-15385129
 ] 

Siddharth Seth commented on HIVE-14167:
---

+1. Assuming you've tested it on a cluster, and seen the correct directories 
being used.

> Use work directories provided by Tez instead of directly using YARN local dirs
> --
>
> Key: HIVE-14167
> URL: https://issues.apache.org/jira/browse/HIVE-14167
> Project: Hive
>  Issue Type: Improvement
>Affects Versions: 2.1.0
>Reporter: Siddharth Seth
>Assignee: Wei Zheng
> Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, 
> HIVE-14167.3.patch
>
>
> HIVE-13303 fixed things to use multiple directories instead of a single tmp 
> directory. However it's using yarn-local-dirs directly.
> I'm not sure how well using the yarn-local-dir will work on a secure cluster.
> Would be better to use Tez*Context.getWorkDirs. This provides an app specific 
> directory - writable by the user.
> cc [~sershe]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Attachment: HIVE-13934.10.patch

[~hagleitn] patch 10 addresses review comments. Thanks!

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, 
> HIVE-13934.2.patch, HIVE-13934.3.patch, HIVE-13934.4.patch, 
> HIVE-13934.6.patch, HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor

2016-07-19 Thread Wei Zheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Zheng updated HIVE-13934:
-
Status: Open  (was: Patch Available)

> Configure Tez to make nocondiional task size memory available for the 
> Processor
> ---
>
> Key: HIVE-13934
> URL: https://issues.apache.org/jira/browse/HIVE-13934
> Project: Hive
>  Issue Type: Bug
>Reporter: Wei Zheng
>Assignee: Wei Zheng
> Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, 
> HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, 
> HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch
>
>
> Currently, noconditionaltasksize is not validated against the container size, 
> the reservations made in the container by Tez for Inputs / Outputs etc.
> Check this at compile time to see if enough memory is available, or set up 
> the vertex to reserve additional memory for the Processor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385098#comment-15385098
 ] 

Thejas M Nair commented on HIVE-14282:
--

+1


> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14293) PerfLogger.openScopes should be transient

2016-07-19 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385089#comment-15385089
 ] 

Prasanth Jayachandran commented on HIVE-14293:
--

+1

> PerfLogger.openScopes should be transient
> -
>
> Key: HIVE-14293
> URL: https://issues.apache.org/jira/browse/HIVE-14293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-14293.1.patch
>
>
> See the following exception when running Hive e2e tests:
> {code}
> 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, 
> v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name 
> = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) 
> INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = 
> s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions;
> INFO  : Compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, 
> type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), 
> FieldSchema(name:s.gpa, type:double, comment:null), 
> FieldSchema(name:v.registration, type:string, comment:null), 
> FieldSchema(name:v2.contributions, type:float, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); 
> Time taken: 1.165 seconds
> INFO  : Executing 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Session is already open
> INFO  : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1)
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Error caching map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.util.ConcurrentModificationException
> Serialization trace:
> classes (sun.misc.Launcher$AppClassLoader)
> classloader (java.security.ProtectionDomain)
> context (java.security.AccessControlContext)
> acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader)
> classLoader (org.apache.hadoop.hive.conf.HiveConf)
> conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics)
> metrics 
> (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope)
> openScopes (org.apache.hadoop.hive.ql.log.PerfLogger)
> perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> 

[jira] [Updated] (HIVE-14275) Driver#releasePlan throws NullPointerException

2016-07-19 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14275:

Component/s: (was: HiveServer2)

> Driver#releasePlan throws NullPointerException
> --
>
> Key: HIVE-14275
> URL: https://issues.apache.org/jira/browse/HIVE-14275
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> We'll need to add a null check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14275) Driver#releasePlan throws NullPointerException

2016-07-19 Thread Vaibhav Gumashta (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vaibhav Gumashta updated HIVE-14275:

Affects Version/s: (was: 2.1.0)
   0.14.0
   1.2.1

> Driver#releasePlan throws NullPointerException
> --
>
> Key: HIVE-14275
> URL: https://issues.apache.org/jira/browse/HIVE-14275
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Affects Versions: 0.14.0, 1.2.1
>Reporter: Vaibhav Gumashta
>Assignee: Vaibhav Gumashta
>
> We'll need to add a null check.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13906) Remove guava dependence from storage-api module

2016-07-19 Thread Owen O'Malley (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Owen O'Malley updated HIVE-13906:
-
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

I just committed this. Thanks for the review, Sergey.

> Remove guava dependence from storage-api module
> ---
>
> Key: HIVE-13906
> URL: https://issues.apache.org/jira/browse/HIVE-13906
> Project: Hive
>  Issue Type: Bug
>  Components: storage-api
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Fix For: 2.2.0
>
> Attachments: HIVE-13906.patch
>
>
> Guava is a very problematic library to depend on because of the version 
> incompatibilities and the use of it in the storage-api module causes it to 
> leak into everything that depends on it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14293) PerfLogger.openScopes should be transient

2016-07-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14293:
--
Status: Patch Available  (was: Open)

> PerfLogger.openScopes should be transient
> -
>
> Key: HIVE-14293
> URL: https://issues.apache.org/jira/browse/HIVE-14293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-14293.1.patch
>
>
> See the following exception when running Hive e2e tests:
> {code}
> 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, 
> v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name 
> = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) 
> INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = 
> s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions;
> INFO  : Compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, 
> type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), 
> FieldSchema(name:s.gpa, type:double, comment:null), 
> FieldSchema(name:v.registration, type:string, comment:null), 
> FieldSchema(name:v2.contributions, type:float, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); 
> Time taken: 1.165 seconds
> INFO  : Executing 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Session is already open
> INFO  : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1)
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Error caching map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.util.ConcurrentModificationException
> Serialization trace:
> classes (sun.misc.Launcher$AppClassLoader)
> classloader (java.security.ProtectionDomain)
> context (java.security.AccessControlContext)
> acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader)
> classLoader (org.apache.hadoop.hive.conf.HiveConf)
> conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics)
> metrics 
> (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope)
> openScopes (org.apache.hadoop.hive.ql.log.PerfLogger)
> perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> 

[jira] [Updated] (HIVE-14293) PerfLogger.openScopes should be transient

2016-07-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14293:
--
Attachment: HIVE-14293.1.patch

> PerfLogger.openScopes should be transient
> -
>
> Key: HIVE-14293
> URL: https://issues.apache.org/jira/browse/HIVE-14293
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 2.1.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Attachments: HIVE-14293.1.patch
>
>
> See the following exception when running Hive e2e tests:
> {code}
> 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, 
> v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name 
> = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) 
> INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = 
> s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions;
> INFO  : Compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, 
> type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), 
> FieldSchema(name:s.gpa, type:double, comment:null), 
> FieldSchema(name:v.registration, type:string, comment:null), 
> FieldSchema(name:v2.contributions, type:float, comment:null)], 
> properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); 
> Time taken: 1.165 seconds
> INFO  : Executing 
> command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): 
> SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s 
> INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = 
> v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and 
> v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, 
> v.registration, v2.contributions
> INFO  : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8
> INFO  : Total jobs = 1
> INFO  : Launching Job 1 out of 1
> INFO  : Starting task [Stage-1:MAPRED] in serial mode
> INFO  : Session is already open
> INFO  : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1)
> ERROR : Failed to execute tez graph.
> java.lang.RuntimeException: Error caching map.xml: 
> org.apache.hive.com.esotericsoftware.kryo.KryoException: 
> java.util.ConcurrentModificationException
> Serialization trace:
> classes (sun.misc.Launcher$AppClassLoader)
> classloader (java.security.ProtectionDomain)
> context (java.security.AccessControlContext)
> acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader)
> classLoader (org.apache.hadoop.hive.conf.HiveConf)
> conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics)
> metrics 
> (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope)
> openScopes (org.apache.hadoop.hive.ql.log.PerfLogger)
> perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator)
> childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator)
> aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork)
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) 
> ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) 
> [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009]
>   at 
> 

[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException

2016-07-19 Thread Eugene Koifman (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-14292:
--
Affects Version/s: (was: 1.2.1)
   1.3.0

> ACID table creation fails on mysql with 
> MySQLIntegrityConstraintViolationException
> --
>
> Key: HIVE-14292
> URL: https://issues.apache.org/jira/browse/HIVE-14292
> Project: Hive
>  Issue Type: Bug
>  Components: Transactions
>Affects Versions: 1.3.0, 2.1.0
> Environment: MySQL
>Reporter: Deepesh Khandelwal
>Assignee: Eugene Koifman
>
> While creating a ACID table ran into the following error:
> {noformat}
> >>>  create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');
> INFO  : Compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): 
> create table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true')
> INFO  : Semantic Analysis Completed
> INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
> INFO  : Completed compiling 
> command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); 
> Time taken: 0.111 seconds
> Error: Error running query: java.lang.RuntimeException: Unable to lock 
> 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' 
> (SQLState=23000, ErrorCode=1062) (state=,code=0)
> Aborting command set because "force" is false and command failed: "create 
> table acidcount1 (id int) 
> clustered by (id) into 2 buckets 
> stored as orc 
> tblproperties('transactional'='true');"
> {noformat}
> Saw the following detailed stack in the server log:
> {noformat}
> 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: 
> metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - 
> java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate 
> entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012)
> at 
> org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140)
> at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99)
> at com.sun.proxy.$Proxy26.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259)
> at com.sun.proxy.$Proxy28.lock(Unknown Source)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341)
> at 
> org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357)
> at 
> 

[jira] [Updated] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation

2016-07-19 Thread Siddharth Seth (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Siddharth Seth updated HIVE-14225:
--
Attachment: HIVE-14225.01.patch

Patch to
- configure slider to inform YARN to aggregate files with the name .done.
- Removes the query-based routing
- Moves to RFA as the default router, since query-routing still requires some 
work.
- Adds a value in HiveConf - similar to other variables like container-size, to 
access this value at runtime (when present in hive-site.xml)


> Llap slider package should support configuring YARN rolling log aggregation
> ---
>
> Key: HIVE-14225
> URL: https://issues.apache.org/jira/browse/HIVE-14225
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14225.01.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap

2016-07-19 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14290:
-
Target Version/s: 2.2.0
   Fix Version/s: (was: 2.2.0)

> Refactor HIVE-14054 to use Collections#newSetFromMap
> 
>
> Key: HIVE-14290
> URL: https://issues.apache.org/jira/browse/HIVE-14290
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Trivial
> Attachments: HIVE-14290.1.patch
>
>
> There is a minor refactor that can be made to HiveMetaStoreChecker so that it 
> cleanly creates and uses a set that is backed by a Map implementation. In 
> this case, the underlying Map implementation is ConcurrentHashMap. This 
> refactor will help prevent issues such as the one reported in HIVE-14054.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap

2016-07-19 Thread Peter Slawski (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Slawski updated HIVE-14290:
-
Fix Version/s: 2.2.0
   Status: Patch Available  (was: Open)

I've attached a patch that makes this minor refactor.

> Refactor HIVE-14054 to use Collections#newSetFromMap
> 
>
> Key: HIVE-14290
> URL: https://issues.apache.org/jira/browse/HIVE-14290
> Project: Hive
>  Issue Type: Improvement
>  Components: Metastore
>Affects Versions: 2.1.0
>Reporter: Peter Slawski
>Assignee: Peter Slawski
>Priority: Trivial
> Fix For: 2.2.0
>
> Attachments: HIVE-14290.1.patch
>
>
> There is a minor refactor that can be made to HiveMetaStoreChecker so that it 
> cleanly creates and uses a set that is backed by a Map implementation. In 
> this case, the underlying Map implementation is ConcurrentHashMap. This 
> refactor will help prevent issues such as the one reported in HIVE-14054.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384933#comment-15384933
 ] 

Sahil Takiar commented on HIVE-14170:
-

Hey Tao,

I addressed your comments, and updated the RB. I also pulled in the changes 
from HIVE-14169 since it doesn't really make sense to commit them separately.

Can you take a look at the RB? Link: https://reviews.apache.org/r/49782/

Thanks!

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, 
> HIVE-14170.3.patch, HIVE-14170.4.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used

2016-07-19 Thread Sahil Takiar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384925#comment-15384925
 ] 

Sahil Takiar commented on HIVE-14169:
-

Hey [~taoli-hwx]

* Yes, by default it is still false
* For non-table formats we came to the conclusion that there is no real benefit 
to using BufferedRows. It only really makes sense if the table output format is 
used. The reason is that if table output format is used along with 
BufferedRows, then the BufferedRows can calculate the optimal sizing for each 
row that it prints out. However, this isn't applicable for non-table formats. 
This is why I made the change to stop honoring the value of incremental if a 
non-table format is used.

Also, I am going to close this JIRA and mark it as a duplicate of HIVE-14170 - 
since it doesn't make sense to commit these changes without HIVE-14170 along 
with it.

> Honor --incremental flag only if TableOutputFormat is used
> --
>
> Key: HIVE-14169
> URL: https://issues.apache.org/jira/browse/HIVE-14169
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14169.1.patch
>
>
> * When Beeline prints out a {{ResultSet}} to stdout it uses the 
> {{BeeLine.print}} method
> * This method takes the {{ResultSet}} from the completed query and uses a 
> specified {{OutputFormat}} to print the rows (by default it uses 
> {{TableOutputFormat}})
> * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class 
> (either a {{IncrementalRows}} or a {{BufferedRows}} class)
> The advantage of {{BufferedRows}} is that it can do a global calculation of 
> the column width, however, this is only useful for {{TableOutputFormat}}. So 
> there is no need to buffer all the rows if a different {{OutputFormat}} is 
> used. This JIRA will change the behavior of the {{--incremental}} flag so 
> that it is only honored if {{TableOutputFormat}} is used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384877#comment-15384877
 ] 

Hari Sankar Sivarama Subramaniyan commented on HIVE-13995:
--

Updated RB, did some basic testing on the failed tests to make that 1. NPE is 
not encountered  2. We remove the unnecessary PART_NAME IN () whenever we do not prune any partitions.

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics 

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly

2016-07-19 Thread Sahil Takiar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sahil Takiar updated HIVE-12646:

Attachment: HIVE-12646.4.patch

Updated patch to address review comments (details in RB).

> beeline and HIVE CLI do not parse ; in quote properly
> -
>
> Key: HIVE-12646
> URL: https://issues.apache.org/jira/browse/HIVE-12646
> Project: Hive
>  Issue Type: Bug
>  Components: CLI, Clients
>Reporter: Yongzhi Chen
>Assignee: Sahil Takiar
> Attachments: HIVE-12646.2.patch, HIVE-12646.3.patch, 
> HIVE-12646.4.patch, HIVE-12646.patch
>
>
> Beeline and Cli have to escape ; in the quote while most other shell scripts 
> need not. For example:
> in Beeline:
> {noformat}
> 0: jdbc:hive2://localhost:1> select ';' from tlb1;
> select ';' from tlb1;
> 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115
> 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403
> Error: Error while compiling statement: FAILED: ParseException line 1:8 
> cannot recognize input near '' '
> {noformat}
> while in mysql shell:
> {noformat}
> mysql> SELECT CONCAT(';', 'foo') FROM test limit 3;
> ++
> | ;foo   |
> | ;foo   |
> | ;foo   |
> ++
> 3 rows in set (0.00 sec)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384867#comment-15384867
 ] 

Chaoyu Tang commented on HIVE-14267:


+1.

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format

2016-07-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384863#comment-15384863
 ] 

Chaoyu Tang commented on HIVE-14205:


I am not sure why the infrastructure could not apply this patch but I was able 
to do that in my local machine and also verified the fix. I wonder if it was 
caused by the binary avro file. If so, maybe we can consider to insert instead 
of load data into the test table?

> Hive doesn't support union type with AVRO file format
> -
>
> Key: HIVE-14205
> URL: https://issues.apache.org/jira/browse/HIVE-14205
> Project: Hive
>  Issue Type: Bug
>  Components: Serializers/Deserializers
>Reporter: Yibing Shi
>Assignee: Yibing Shi
> Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, 
> HIVE-14205.3.patch, HIVE-14205.4.patch
>
>
> Reproduce steps:
> {noformat}
> hive> CREATE TABLE avro_union_test
> > PARTITIONED BY (p int)
> > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
> > STORED AS INPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
> > OUTPUTFORMAT 
> 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
> > TBLPROPERTIES ('avro.schema.literal'='{
> >"type":"record",
> >"name":"nullUnionTest",
> >"fields":[
> >   {
> >  "name":"value",
> >  "type":[
> > "null",
> > "int",
> > "long"
> >  ],
> >  "default":null
> >   }
> >]
> > }');
> OK
> Time taken: 0.105 seconds
> hive> alter table avro_union_test add partition (p=1);
> OK
> Time taken: 0.093 seconds
> hive> select * from avro_union_test;
> FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: 
> Failed with exception Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported 
> yet.java.lang.RuntimeException: Hive internal error inside 
> isAssignableFromSettablePrimitiveOI void not supported yet.
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200)
>   at 
> org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140)
>   at 
> org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482)
>   at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
>   at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194)
>   at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381)
>   at 
> org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773)
>   at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:497)
>   at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
> {noformat}
> Another test case to show this problem is:
> {noformat}
> hive> create table avro_union_test2 (value uniontype) stored as 
> avro;
> OK
> Time taken: 0.053 seconds
> hive> show create table avro_union_test2;
> OK
> 

[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Attachment: (was: HIVE-14267.2.patch)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Patch Available  (was: Open)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Attachment: HIVE-14267.2.patch

Patch isnt getting picked up for pre-commits. Re-attaching the same patch

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Open  (was: Patch Available)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384793#comment-15384793
 ] 

Chaoyu Tang edited comment on HIVE-14281 at 7/19/16 8:24 PM:
-

Another use case if we use a decimal with small scale such as decimal (38, 6): 
{code}
create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d 
decimal(38, 6), e decimal(38, 6), f decimal(38, 6))
insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 
6.00);
hive> explain select a*b*c*d*e*f from test1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test1
  Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36))
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
ListSink

hive> select a*b*c*d*e*f from test1;
OK
NULL
{code}


was (Author: ctang.ma):
Another use case if we use a decimal with small scale such as decimal (38, 6): 
{cdoe}
create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d 
decimal(38, 6), e decimal(38, 6), f decimal(38, 6))
insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 
6.00);
hive> explain select a*b*c*d*e*f from test1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test1
  Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36))
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
ListSink

hive> select a*b*c*d*e*f from test1;
OK
NULL
{code}

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384793#comment-15384793
 ] 

Chaoyu Tang commented on HIVE-14281:


Another use case if we use a decimal with small scale such as decimal (38, 6): 
{cdoe}
create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d 
decimal(38, 6), e decimal(38, 6), f decimal(38, 6))
insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 
6.00);
hive> explain select a*b*c*d*e*f from test1;
OK
STAGE DEPENDENCIES:
  Stage-0 is a root stage

STAGE PLANS:
  Stage: Stage-0
Fetch Operator
  limit: -1
  Processor Tree:
TableScan
  alias: test1
  Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
  Select Operator
expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36))
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column 
stats: NONE
ListSink

hive> select a*b*c*d*e*f from test1;
OK
NULL
{code}

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14086) org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro schema file

2016-07-19 Thread Lars Volker (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Volker updated HIVE-14086:
---
Attachment: avroremoved.json
avro.sql
avro.json

SQL to create table (avro.sql):
{noformat}
CREATE TABLE avro_table
  PARTITIONED BY (str_part STRING)
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json'
  );
{noformat}

avro.json:
{noformat}
{
"namespace": "com.cloudera.test",
"name": "avro_table",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" },
{ "name":"CamelCol", "type":"string" }
]
}
{noformat}

avroremoved.json (one column removed from schema):
{noformat}
{
"namespace": "com.cloudera.test",
"name": "avro_table",
"type": "record",
"fields": [
{ "name":"string1", "type":"string" }
]
}
{noformat}

> org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro 
> schema file
> 
>
> Key: HIVE-14086
> URL: https://issues.apache.org/jira/browse/HIVE-14086
> Project: Hive
>  Issue Type: Bug
>  Components: API
>Reporter: Lars Volker
> Attachments: avro.json, avro.sql, avroremoved.json
>
>
> Consider this table, using an external Avro schema file:
> {noformat}
> CREATE TABLE avro_table
>   PARTITIONED BY (str_part STRING)
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   STORED AS INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   TBLPROPERTIES (
> 'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json'
>   );
> {noformat}
> This will populate the "COLUMNS_V2" metastore table with the correct column 
> information (as per HIVE-6308). The columns of this table can then be queried 
> via the Hive API, for example by calling {{.getSd().getCols()}} on a 
> {{org.apache.hadoop.hive.metastore.api.Table}} object.
> Changes to the avro.schema.url file - either changing where it points to or 
> changing its contents - will be reflected in the output of {{describe 
> formatted avro_table}} *but not* in the result of the {{.getSd().getCols()}} 
> API call. Instead it looks like Hive only reads the Avro schema file 
> internally, but does not expose the information therein via its API.
> Is there a way to obtain the effective Table information via Hive? Would it 
> make sense to fix table retrieval so calls to {{get_table}} return the 
> correct set of columns?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: (was: HIVE-13995.5.patch)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HIVE-14283) Beeline tests are broken

2016-07-19 Thread Vihang Karajgaonkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vihang Karajgaonkar resolved HIVE-14283.

Resolution: Not A Bug

This was environment issue. Tests are working fine.

> Beeline tests are broken
> 
>
> Key: HIVE-14283
> URL: https://issues.apache.org/jira/browse/HIVE-14283
> Project: Hive
>  Issue Type: Bug
>  Components: Beeline
>Affects Versions: 2.2.0
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>
> Beeline tests seems to be broken.
> {noformat}
> ---
>  T E S T S
> ---
> ---
>  T E S T S
> ---
> Running org.apache.hive.beeline.cli.TestHiveCli
> Tests run: 22, Failures: 22, Errors: 0, Skipped: 0, Time elapsed: 8.514 sec 
> <<< FAILURE! - in org.apache.hive.beeline.cli.TestHiveCli
> testSetPromptValue(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 
> 1.599 sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSourceCmd2(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.291 
> sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSourceCmd3(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.306 
> sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testInvalidOptions2(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 
> 0.292 sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testCmd(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.271 sec  
> <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testHelp(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.284 sec  
> <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSourceCmd(org.apache.hive.beeline.cli.TestHiveCli)  Time elapsed: 0.259 
> sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> testSqlFromCmdWithDBName(org.apache.hive.beeline.cli.TestHiveCli)  Time 
> elapsed: 0.214 sec  <<< FAILURE!
> java.lang.AssertionError: Supported return code is 0 while the actual is 2
>   at org.junit.Assert.fail(Assert.java:88)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73)
>   at 
> org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260)
>   at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283)
> 

[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Chaoyu Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384742#comment-15384742
 ] 

Chaoyu Tang commented on HIVE-14281:


For Java BigDecimal, there is a comment about the rounding and wonder if it can 
be used in Hive
{code}
Before rounding, the scale of the logical exact intermediate result (e.g. 
multiplier.scale() + multiplicand.scale()) is the preferred scale for that 
operation (e.g. multiply). If the exact numerical result cannot be represented 
in precision digits, rounding selects the set of digits to return and the scale 
of the result is reduced from the scale of the intermediate result to the least 
scale which can represent the precision digits actually returned. If the exact 
result can be represented with at most precision digits, the representation of 
the result with the scale closest to the preferred scale is returned.
{code}
I checked the MySQL which supports max precision 65 and max scale 30:
{code}
create table decimaltest (col1 decimal(65,14), col2 decimal(65, 14));
insert into decimaltest values 
(987654321001234567890123456789012345678901234567890.12345678901234, 
10.12345678901234);
select col1 * col2 from decimaltest
--
returns 
987654321001234567890123456789012345678901234567890123456789.0
{code}
It is hard to interpret this result, its precision is 73 ( > max 65) and scale 
is 9 (instead of 28). But its metadata in a JDBC application is decimal with 
precision 65 and scale 28.

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well

2016-07-19 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14284:
-
Component/s: Security

> HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
> ---
>
> Key: HIVE-14284
> URL: https://issues.apache.org/jira/browse/HIVE-14284
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Security
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> HiveAuthzContext provides useful information about the context of the 
> commands, such as the command string and ip address information. However, 
> this is available to only checkPrivileges and filterListCmdObjects api calls.
> This should be made available for other api calls such as grant/revoke 
> methods and role management methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well

2016-07-19 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-14284:
-
Component/s: Authorization

> HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
> ---
>
> Key: HIVE-14284
> URL: https://issues.apache.org/jira/browse/HIVE-14284
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization, Security
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
>
> HiveAuthzContext provides useful information about the context of the 
> commands, such as the command string and ip address information. However, 
> this is available to only checkPrivileges and filterListCmdObjects api calls.
> This should be made available for other api calls such as grant/revoke 
> methods and role management methods.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Attachment: HIVE-13995.5.patch

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Open  (was: Patch Available)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time

2016-07-19 Thread Hari Sankar Sivarama Subramaniyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sankar Sivarama Subramaniyan updated HIVE-13995:
-
Status: Patch Available  (was: Open)

> Hive generates inefficient metastore queries for TPCDS tables with 1800+ 
> partitions leading to higher compile time
> --
>
> Key: HIVE-13995
> URL: https://issues.apache.org/jira/browse/HIVE-13995
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Nita Dembla
>Assignee: Hari Sankar Sivarama Subramaniyan
> Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, 
> HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch
>
>
> TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when 
> the query does not a filter on the partition column, metastore queries 
> generated have a large IN clause listing all the partition names. Most RDBMS 
> systems have issues optimizing large IN clause and even when a good index 
> plan is chosen , comparing to 1800+ string values will not lead to best 
> execution time.
> When all partitions are chosen, not specifying the partition list and having 
> filters only on table and column name will generate the same result set as 
> long as there are no concurrent modifications to partition list of the hive 
> table (adding/dropping partitions).
> For eg: For TPCDS query18, the metastore query gathering partition column 
> statistics runs in 0.5 secs in Mysql. Following is output from mysql log
> {noformat}
> -- Query_time: 0.482063  Lock_time: 0.003037 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales' 
>  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  and "PARTITION_NAME" in 
> ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654')
>  group by "PARTITION_NAME";
> {noformat}
> Functionally equivalent query runs in 0.1 seconds
> {noformat}
> --Query_time: 0.121296  Lock_time: 0.000156 Rows_sent: 1836  Rows_examined: 
> 18360
> select count("COLUMN_NAME") from "PART_COL_STATS"
>  where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 
> 'catalog_sales'  and "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>  group by "PARTITION_NAME";
> {noformat}
> If removing the partition list seems drastic, its also possible to simply 
> list the range since hive gets a ordered list of partition names. This 
> performs equally well as earlier query
> {noformat}
> # Query_time: 0.143874  Lock_time: 0.000154 Rows_sent: 1836  Rows_examined: 
> 18360
> SET timestamp=1464014881;
> select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = 
> 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales'  and 
> "COLUMN_NAME" in 
> ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit')
>   and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= 
> 'cs_sold_date_sk=2452654' 
> group by "PARTITION_NAME";
> {noformat}
> Another thing to check is the IN clause of column names. Columns in 
> projection list of hive query are mentioned here. Not sure if statistics of 
> these columns are required for hive query optimization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Patch Available  (was: Open)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Attachment: HIVE-14267.2.patch

Attaching new patch based on the input from RB

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.2.patch, HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out

2016-07-19 Thread Naveen Gangam (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Naveen Gangam updated HIVE-14267:
-
Status: Open  (was: Patch Available)

> HS2 open_operations metrics not decremented when an operation gets timed out
> 
>
> Key: HIVE-14267
> URL: https://issues.apache.org/jira/browse/HIVE-14267
> Project: Hive
>  Issue Type: Bug
>  Components: HiveServer2
>Reporter: David Karoly
>Assignee: Naveen Gangam
>Priority: Minor
> Attachments: HIVE-14267.patch
>
>
> When an operation gets timed out, it is removed from handleToOperation hash 
> map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS 
> counter is not decremented. 
> This can result in an inaccurate open operations metrics value being 
> reported. Especially when submitting queries to Hive from Hue with 
> close_queries=false option, this results in misleading HS2 metrics charts.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath

2016-07-19 Thread Mohit Sabharwal (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384632#comment-15384632
 ] 

Mohit Sabharwal commented on HIVE-14229:


+1

> the jars in hive.aux.jar.paths are not added to session classpath 
> --
>
> Key: HIVE-14229
> URL: https://issues.apache.org/jira/browse/HIVE-14229
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14229.1.patch
>
>
> The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 
> classpath while hive.aux.jar.paths is not. 
> Then the local task like 'select udf(x) from src' will fail to find needed 
> udf class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14277:
---
Affects Version/s: 2.0.0
   2.1.0

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14277:
---
Fix Version/s: 2.1.1
   2.2.0

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.2.0, 2.1.1
>
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables

2016-07-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14277:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thanks [~ashutoshc] for the review.

> Disable StatsOptimizer for all ACID tables
> --
>
> Key: HIVE-14277
> URL: https://issues.apache.org/jira/browse/HIVE-14277
> Project: Hive
>  Issue Type: Sub-task
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Attachments: HIVE-14277.01.patch
>
>
> We have observed lots of cases where ACID table is created for HCat 
> streaming. Streaming will directly insert data into the table but the stats 
> of the table are not updated (or there is no good way to update). We would 
> like to disable StatsOptimzer for all acid tables so that it will at least 
> not give wrong results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Affects Version/s: 2.0.0

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0, 2.2.0
>
> Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, 
> HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Fix Version/s: 2.2.0

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0, 2.2.0
>
> Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, 
> HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER

2016-07-19 Thread Pengcheng Xiong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pengcheng Xiong updated HIVE-14221:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

pushed to master. Thanks [~ashutoshc] for the review.

> set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
> 
>
> Key: HIVE-14221
> URL: https://issues.apache.org/jira/browse/HIVE-14221
> Project: Hive
>  Issue Type: Sub-task
>  Components: Security
>Affects Versions: 2.0.0
>Reporter: Pengcheng Xiong
>Assignee: Pengcheng Xiong
> Fix For: 2.1.0
>
> Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, 
> HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14242) Backport ORC-53 to Hive

2016-07-19 Thread Prasanth Jayachandran (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384589#comment-15384589
 ] 

Prasanth Jayachandran commented on HIVE-14242:
--

+1

> Backport ORC-53 to Hive
> ---
>
> Key: HIVE-14242
> URL: https://issues.apache.org/jira/browse/HIVE-14242
> Project: Hive
>  Issue Type: Bug
>  Components: ORC
>Reporter: Owen O'Malley
>Assignee: Owen O'Malley
> Attachments: HIVE-14242.patch
>
>
> ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem 
> in TypeDescription that should be backported to Hive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete

2016-07-19 Thread Hive QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384575#comment-15384575
 ] 

Hive QA commented on HIVE-14224:




Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12818775/HIVE-14224.04.patch

{color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified.

{color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10321 tests 
executed
*Failed tests:*
{noformat}
TestMiniTezCliDriver-acid_globallimit.q-cte_mat_1.q-union5.q-and-12-more - did 
not produce a TEST-*.xml file
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions
org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken
org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections
{noformat}

Test results: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/testReport
Console output: 
https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/console
Test logs: 
http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-578/

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.TestCheckPhase
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 7 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12818775 - PreCommit-HIVE-MASTER-Build

> LLAP rename query specific log files once a query is complete
> -
>
> Key: HIVE-14224
> URL: https://issues.apache.org/jira/browse/HIVE-14224
> Project: Hive
>  Issue Type: Improvement
>Reporter: Siddharth Seth
>Assignee: Siddharth Seth
> Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, 
> HIVE-14224.04.patch, HIVE-14224.wip.01.patch
>
>
> Once a query is complete, rename the query specific log file so that YARN can 
> aggregate the logs (once it's configured to do so).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Fix Version/s: 2.1.1
   2.2.0
   1.3.0
   Status: Patch Available  (was: Open)

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Fix For: 1.3.0, 2.2.0, 2.1.1
>
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384559#comment-15384559
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:38 PM:


[~stakiar] Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 


was (Author: taoli-hwx):
[~stakiar] Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Attachment: HIVE-14282.1.patch

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Component/s: HCatalog

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>  Components: HCatalog
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
> Attachments: HIVE-14282.1.patch
>
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated HIVE-14282:
--
Summary: HCatLoader ToDate() exception with hive partition table 
,partitioned by column of DATE datatype  (was: Pig ToDate() exception with hive 
partition table ,partitioned by column of DATE datatype)

> HCatLoader ToDate() exception with hive partition table ,partitioned by 
> column of DATE datatype
> ---
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai reassigned HIVE-14282:
-

Assignee: Daniel Dai

> Pig ToDate() exception with hive partition table ,partitioned by column of 
> DATE datatype
> 
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>Assignee: Daniel Dai
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14254) Correct the hive version by changing "svn" to "git"

2016-07-19 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384561#comment-15384561
 ] 

Tao Li commented on HIVE-14254:
---

Thanks [~spena] for you help!

> Correct the hive version by changing "svn" to "git"
> ---
>
> Key: HIVE-14254
> URL: https://issues.apache.org/jira/browse/HIVE-14254
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14254.1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When running "hive --version", "subversion" is displayed below, which should 
> be "git".
> $ hive --version
> ​Hive 2.1.0-SNAPSHOT
> ​Subversion git://



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype

2016-07-19 Thread Raghavender Rao Guruvannagari (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghavender Rao Guruvannagari updated HIVE-14282:
-
Affects Version/s: (was: 0.15.0)
   1.2.1
  Environment: 
PIG Version : (0.15.0) 
HIVE : 1.2.1
OS Version : CentOS release 6.7 (Final)
OS Kernel : 2.6.32-573.18.1.el6.x86_64

  was:
PIG Version : (0.15.0) 
OS Version : CentOS release 6.7 (Final)
OS Kernel : 2.6.32-573.18.1.el6.x86_64


> Pig ToDate() exception with hive partition table ,partitioned by column of 
> DATE datatype
> 
>
> Key: HIVE-14282
> URL: https://issues.apache.org/jira/browse/HIVE-14282
> Project: Hive
>  Issue Type: Bug
>Affects Versions: 1.2.1
> Environment: PIG Version : (0.15.0) 
> HIVE : 1.2.1
> OS Version : CentOS release 6.7 (Final)
> OS Kernel : 2.6.32-573.18.1.el6.x86_64
>Reporter: Raghavender Rao Guruvannagari
>
> ToDate() function doesnt work with a partitioned table, partitioned by the 
> column of DATE Datatype.
> Below are the steps I followed to recreate the problem.
> -->Sample input file to hive table :
> hdfs@testhost ~$ cat test.log 
> 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone
> 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung
> 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC
> 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung
> 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone
> 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung
> 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome
> 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9
> 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone
> 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC
> 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung
> --> Copied to hdfs directory:
> hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/
> -->Create partitoned table (partitioned with date data type column) in hive:
> 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt 
> DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType 
> STRING,Visit INT,PhModel STRING) row format delimited fields terminated by 
> ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> load data inpath 
> '/user/hdfs/test.log' overwrite into table mytable;
> 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition 
> = true;
> 0: jdbc:hive2://testhost.com:1/default> SET 
> hive.exec.dynamic.partition.mode = nonstrict;
> 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number 
> INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel 
> STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields 
> terminated by ',' stored as textfile;
> 0: jdbc:hive2://testhost.com:1/default> insert overwrite table 
> partmytable partition(Dt,Time) select 
> Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable;
> 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable;
> --> Try to filter with ToDate function which fails with error:
> hdfs@testhost ~$ pig -useHCatalog
> grunt>
> grunt> temp = LOAD 'partmytable' using 
> org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> -->Try to filter the normal table with same statement works;
> grunt>
> grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader();
> grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd');
> grunt> dump temp1;
> Workaround :
> Use below statement instead of direct ToDate();
> grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', 
> '-MM-dd')) <=(long)0;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384559#comment-15384559
 ] 

Tao Li edited comment on HIVE-14170 at 7/19/16 5:31 PM:


[~stakiar] Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 


was (Author: taoli-hwx):
@stakiar Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used

2016-07-19 Thread Tao Li (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384559#comment-15384559
 ] 

Tao Li commented on HIVE-14170:
---

@stakiar Another thinking is that we may improve the "buffered page" mode to 
avoid OOM issue. For example, we can iterate through the whole result set once 
to calculate the max column width (and without loading the result set into 
memory). Then we iterate the result set again to print out. The pros is that it 
requires minimal code change. The cons is that the latency should be higher 
because we iterate the result set twice. 

> Beeline IncrementalRows should buffer rows and incrementally re-calculate 
> width if TableOutputFormat is used
> 
>
> Key: HIVE-14170
> URL: https://issues.apache.org/jira/browse/HIVE-14170
> Project: Hive
>  Issue Type: Sub-task
>  Components: Beeline
>Reporter: Sahil Takiar
>Assignee: Sahil Takiar
> Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch
>
>
> If {{--incremental}} is specified in Beeline, rows are meant to be printed 
> out immediately. However, if {{TableOutputFormat}} is used with this option 
> the formatting can look really off.
> The reason is that {{IncrementalRows}} does not do a global calculation of 
> the optimal width size for {{TableOutputFormat}} (it can't because it only 
> sees one row at a time). The output of {{BufferedRows}} looks much better 
> because it can do this global calculation.
> If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width 
> should be re-calculated every "x" rows ("x" can be configurable and by 
> default it can be 1000).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Attachment: (was: HIVE-14251.1.patch)

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data

2016-07-19 Thread Aihua Xu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aihua Xu updated HIVE-14251:

Attachment: HIVE-14251.1.patch

> Union All of different types resolves to incorrect data
> ---
>
> Key: HIVE-14251
> URL: https://issues.apache.org/jira/browse/HIVE-14251
> Project: Hive
>  Issue Type: Bug
>  Components: Query Planning
>Affects Versions: 2.0.0
>Reporter: Aihua Xu
>Assignee: Aihua Xu
> Attachments: HIVE-14251.1.patch
>
>
> create table src(c1 date, c2 int, c3 double);
> insert into src values ('2016-01-01',5,1.25);
> select * from 
> (select c1 from src union all
> select c2 from src union all
> select c3 from src) t;
> It will return NULL for the c1 values. Seems the common data type is resolved 
> to the last c3 which is double.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14254) Correct the hive version by changing "svn" to "git"

2016-07-19 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergio Peña updated HIVE-14254:
---
   Resolution: Fixed
Fix Version/s: 2.2.0
   Status: Resolved  (was: Patch Available)

Thanks [~taoli-hwx] for your patch. I committed this to 2.2.

> Correct the hive version by changing "svn" to "git"
> ---
>
> Key: HIVE-14254
> URL: https://issues.apache.org/jira/browse/HIVE-14254
> Project: Hive
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 2.1.0
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14254.1.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> When running "hive --version", "subversion" is displayed below, which should 
> be "git".
> $ hive --version
> ​Hive 2.1.0-SNAPSHOT
> ​Subversion git://



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14281) Issue in decimal multiplication

2016-07-19 Thread Xuefu Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384459#comment-15384459
 ] 

Xuefu Zhang commented on HIVE-14281:


Not sure this is a problem though. The next row may contain data with 18 
decimal points, for which precision may get lost. I would think user shouldn't 
specific decimal(38, 18) for numbers that don't require such a scale.

Of course, we may want to check how other DBs handle this.

> Issue in decimal multiplication
> ---
>
> Key: HIVE-14281
> URL: https://issues.apache.org/jira/browse/HIVE-14281
> Project: Hive
>  Issue Type: Bug
>  Components: Types
>Reporter: Chaoyu Tang
>Assignee: Chaoyu Tang
>
> {code}
> CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18));
> INSERT OVERWRITE TABLE test VALUES (20, 20);
> SELECT a*b from test
> {code}
> The returned result is NULL (instead of 400)
> It is because Hive adds the scales from operands and the type for a*b is set 
> to decimal (38, 36). Hive could not handle this case properly (e.g. by 
> rounding)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Peter Vary (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Vary updated HIVE-14123:
--
Attachment: HIVE-14123.10.patch

Addressing review comments

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.10.patch, HIVE-14123.2.patch, 
> HIVE-14123.3.patch, HIVE-14123.4.patch, HIVE-14123.5.patch, 
> HIVE-14123.6.patch, HIVE-14123.7.patch, HIVE-14123.8.patch, 
> HIVE-14123.9.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-13815) Improve logic to infer false predicates

2016-07-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384418#comment-15384418
 ] 

Ashutosh Chauhan commented on HIVE-13815:
-

This is an useful optimization to have especially for machine generated queries.

> Improve logic to infer false predicates
> ---
>
> Key: HIVE-13815
> URL: https://issues.apache.org/jira/browse/HIVE-13815
> Project: Hive
>  Issue Type: Sub-task
>  Components: CBO
>Affects Versions: 2.1.0
>Reporter: Jesus Camacho Rodriguez
>Assignee: Jesus Camacho Rodriguez
>
> Follow-up/extension of the work done in HIVE-13068.
> Ex.
> ql/src/test/results/clientpositive/annotate_stats_filter.q.out
> {{predicate: ((year = 2001) and (state = 'OH') and (state = 'FL')) (type: 
> boolean)}} -> {{false}}
> ql/src/test/results/clientpositive/cbo_rp_join1.q.out
> {{predicate: ((_col0 = _col1) and (_col1 = 40) and (_col0 = 40)) (type: 
> boolean)}} -> {{predicate: ((_col1 = 40) and (_col0 = 40)) (type: boolean)}}
> ql/src/test/results/clientpositive/constprog_semijoin.q.out 
> {{predicate: (((id = 100) = true) and (id <> 100)) (type: boolean)}} -> 
> {{false}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt

2016-07-19 Thread Aihua Xu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384394#comment-15384394
 ] 

Aihua Xu commented on HIVE-14123:
-

Minor comments. The patch looks good to me. 

+1.

> Add beeline configuration option to show database in the prompt
> ---
>
> Key: HIVE-14123
> URL: https://issues.apache.org/jira/browse/HIVE-14123
> Project: Hive
>  Issue Type: Improvement
>  Components: Beeline, CLI
>Affects Versions: 2.2.0
>Reporter: Peter Vary
>Assignee: Peter Vary
>Priority: Minor
> Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, 
> HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, 
> HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch
>
>
> There are several jira issues complaining that, the Beeline does not respect 
> hive.cli.print.current.db.
> This is partially true, since in embedded mode, it uses the 
> hive.cli.print.current.db to change the prompt, since HIVE-10511.
> In beeline mode, I think this function should use a beeline command line 
> option instead, like for the showHeader option emphasizing, that this is a 
> client side option.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive

2016-07-19 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384391#comment-15384391
 ] 

Sushanth Sowmyan commented on HIVE-10022:
-

Yup, those are valid concerns, I'm trying to test them out.

> Authorization checks for non existent file/directory should not be recursive
> 
>
> Key: HIVE-10022
> URL: https://issues.apache.org/jira/browse/HIVE-10022
> Project: Hive
>  Issue Type: Bug
>  Components: Authorization
>Affects Versions: 0.14.0
>Reporter: Pankit Thapar
>Assignee: Pankit Thapar
> Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, HIVE-10022.patch
>
>
> I am testing a query like : 
> set hive.test.authz.sstd.hs2.mode=true;
> set 
> hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest;
> set 
> hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator;
> set hive.security.authorization.enabled=true;
> set user.name=user1;
> create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc 
> location '${OUTPUT}' TBLPROPERTIES ('transactional'='true');
> Now, in the above query,  since authorization is true, 
> we would end up calling doAuthorizationV2() which ultimately ends up calling 
> SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : 
> FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor 
> of the object we are trying to authorize if the object does not exist. 
> The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS.
> Now assume, we have a path as a/b/c/d that we are trying to authorize.
> In case, a/b/c/d does not exist, we would call 
> FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c 
> also does not exist.
> If under the subtree at a/b, we have millions of files, then 
> FileUtils.isActionPermittedForFileHierarchy()  is going to check file 
> permission on each of those objects. 
> I do not completely understand why do we have to check for file permissions 
> in all the objects in  branch of the tree that we are not  trying to read 
> from /write to.  
> We could have checked file permission on the ancestor that exists and if it 
> matches what we expect, the return true.
> Please confirm if this is a bug so that I can submit a patch else let me know 
> what I am missing ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sushanth Sowmyan updated HIVE-14268:

Attachment: HIVE-14268.3.patch

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.3.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication

2016-07-19 Thread Sushanth Sowmyan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384376#comment-15384376
 ] 

Sushanth Sowmyan commented on HIVE-14268:
-

Sounds good - reuploading .1.patch as .3.patch so the tests run on that.

> INSERT-OVERWRITE is not generating an INSERT event during hive replication
> --
>
> Key: HIVE-14268
> URL: https://issues.apache.org/jira/browse/HIVE-14268
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Murali Ramasami
>Assignee: Sushanth Sowmyan
> Attachments: HIVE-14268.2.patch, HIVE-14268.patch
>
>
> During Hive replication invoked from falcon, the source cluster did not 
> generate appropriate INSERT events associated with the INSERT OVERWRITE, 
> generating only an ALTER PARTITION event. However, an ALTER PARTITION is a 
> metadata-only event, and thus, only metadata changes were replicated across, 
> modifying the metadata of the destination, while not updating the data. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4

2016-07-19 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384362#comment-15384362
 ] 

Ashutosh Chauhan commented on HIVE-14278:
-

+1
At some point we need to make changes in pom files so that we do not download 
junit3 jars.

> Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
> 
>
> Key: HIVE-14278
> URL: https://issues.apache.org/jira/browse/HIVE-14278
> Project: Hive
>  Issue Type: Bug
>  Components: Hive
>Affects Versions: 2.2.0
>Reporter: Balint Molnar
>Assignee: Balint Molnar
>Priority: Minor
> Fix For: 2.2.0
>
> Attachments: HIVE-14278.patch
>
>
> Migrate TestHadoop23SAuthBridge.java from unit3 to unit4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >