[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Affects Version/s: 2.1.0 > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, > HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, > HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, > HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Component/s: Hive > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, > HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, > HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, > HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Status: Patch Available (was: Open) > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, > HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, > HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, > HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Attachment: HIVE-13934.11.patch Thanks for suggestion. patch 11 changed default to -1f and used >0 for comparison. Previous patch didn't get triggered QA run. Hope patch 11 will trigger it. > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, > HIVE-13934.11.patch, HIVE-13934.2.patch, HIVE-13934.3.patch, > HIVE-13934.4.patch, HIVE-13934.6.patch, HIVE-13934.7.patch, > HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14204) Optimize loading dynamic partitions
[ https://issues.apache.org/jira/browse/HIVE-14204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385363#comment-15385363 ] Rajesh Balamohan commented on HIVE-14204: - I will check and repost the patch. > Optimize loading dynamic partitions > > > Key: HIVE-14204 > URL: https://issues.apache.org/jira/browse/HIVE-14204 > Project: Hive > Issue Type: Improvement >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-14204.1.patch > > > Lots of time is spent in sequential fashion to load dynamic partitioned > dataset in driver side. E.g simple dynamic partitioned load as follows takes > 300+ seconds > {noformat} > INSERT INTO web_sales_test partition(ws_sold_date_sk) select * from > tpcds_bin_partitioned_orc_200.web_sales; > Time taken to load dynamic partitions: 309.22 seconds > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385357#comment-15385357 ] Eugene Koifman commented on HIVE-14292: --- if you look at the logic where isDuplicateKey() is used, the idea is to lock a record with specific value and if it's not there, then insert it and then lock it. Since you cannot be certain 2 entities are not going through this same process, the logic catches the "duplicate key" and proceeds but raises an error if it's any other error. > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > Attachments: HIVE-14292.patch > > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740) > at >
[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385354#comment-15385354 ] Wei Zheng commented on HIVE-14292: -- I see we're relaxing the !isDuplicateKeyError logic. Just want to understand why we don't throw exception for duplicate key errors? There are a dozen errors with "duplicate something" in the link above.. > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > Attachments: HIVE-14292.patch > > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740) > at > org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341) >
[jira] [Updated] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yibing Shi updated HIVE-14205: -- Attachment: HIVE-14205.5.patch attach a new patch that includes latest changes in master branch. If this still doesn't work, I will remove the binary files and use insert instead as [~ctang.ma] has said. > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch, HIVE-14205.5.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK > CREATE TABLE `avro_union_test2`( > `value` uniontype COMMENT '')
[jira] [Comment Edited] (HIVE-14293) PerfLogger.openScopes should be transient
[ https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385229#comment-15385229 ] Gopal V edited comment on HIVE-14293 at 7/20/16 3:02 AM: - [~daijy]: yes, the transient would be needed in the same way - to ensure that the PerfLogger is initialized correctly in the operator side instead of being a copy of state left over from the planner. was (Author: gopalv): [~daijy]: yes, the transient would be needed in the same way - to ensure that the PerfLogger is initialized correctly in the operator side instead of being a copy from the planner. > PerfLogger.openScopes should be transient > - > > Key: HIVE-14293 > URL: https://issues.apache.org/jira/browse/HIVE-14293 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-14293.1.patch > > > See the following exception when running Hive e2e tests: > {code} > 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, > v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name > = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) > INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = > s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions; > INFO : Compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, > type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), > FieldSchema(name:s.gpa, type:double, comment:null), > FieldSchema(name:v.registration, type:string, comment:null), > FieldSchema(name:v2.contributions, type:float, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); > Time taken: 1.165 seconds > INFO : Executing > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8 > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Session is already open > INFO : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1) > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Error caching map.xml: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.util.ConcurrentModificationException > Serialization trace: > classes (sun.misc.Launcher$AppClassLoader) > classloader (java.security.ProtectionDomain) > context (java.security.AccessControlContext) > acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader) > classLoader (org.apache.hadoop.hive.conf.HiveConf) > conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics) > metrics > (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope) > openScopes (org.apache.hadoop.hive.ql.log.PerfLogger) > perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) > childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] >
[jira] [Commented] (HIVE-14293) PerfLogger.openScopes should be transient
[ https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385229#comment-15385229 ] Gopal V commented on HIVE-14293: [~daijy]: yes, the transient would be needed in the same way - to ensure that the PerfLogger is initialized correctly in the operator side instead of being a copy from the planner. > PerfLogger.openScopes should be transient > - > > Key: HIVE-14293 > URL: https://issues.apache.org/jira/browse/HIVE-14293 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-14293.1.patch > > > See the following exception when running Hive e2e tests: > {code} > 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, > v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name > = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) > INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = > s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions; > INFO : Compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, > type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), > FieldSchema(name:s.gpa, type:double, comment:null), > FieldSchema(name:v.registration, type:string, comment:null), > FieldSchema(name:v2.contributions, type:float, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); > Time taken: 1.165 seconds > INFO : Executing > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8 > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Session is already open > INFO : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1) > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Error caching map.xml: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.util.ConcurrentModificationException > Serialization trace: > classes (sun.misc.Launcher$AppClassLoader) > classloader (java.security.ProtectionDomain) > context (java.security.AccessControlContext) > acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader) > classLoader (org.apache.hadoop.hive.conf.HiveConf) > conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics) > metrics > (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope) > openScopes (org.apache.hadoop.hive.ql.log.PerfLogger) > perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) > childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) > [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] >
[jira] [Commented] (HIVE-14293) PerfLogger.openScopes should be transient
[ https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385211#comment-15385211 ] Daniel Dai commented on HIVE-14293: --- We do want to use PerfLogger in the backend in map join to print out performance message. It's possible to switch to a different mechanism for that purpose, but that worth a separate ticket. > PerfLogger.openScopes should be transient > - > > Key: HIVE-14293 > URL: https://issues.apache.org/jira/browse/HIVE-14293 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-14293.1.patch > > > See the following exception when running Hive e2e tests: > {code} > 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, > v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name > = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) > INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = > s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions; > INFO : Compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, > type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), > FieldSchema(name:s.gpa, type:double, comment:null), > FieldSchema(name:v.registration, type:string, comment:null), > FieldSchema(name:v2.contributions, type:float, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); > Time taken: 1.165 seconds > INFO : Executing > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8 > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Session is already open > INFO : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1) > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Error caching map.xml: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.util.ConcurrentModificationException > Serialization trace: > classes (sun.misc.Launcher$AppClassLoader) > classloader (java.security.ProtectionDomain) > context (java.security.AccessControlContext) > acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader) > classLoader (org.apache.hadoop.hive.conf.HiveConf) > conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics) > metrics > (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope) > openScopes (org.apache.hadoop.hive.ql.log.PerfLogger) > perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) > childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) >
[jira] [Commented] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385208#comment-15385208 ] Daniel Dai commented on HIVE-14282: --- Forget to mention here, I also bump Pig version since partition push down of a constant UDF condition only works for Pig 0.15+. Need to watch if it breaks any UT. > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385198#comment-15385198 ] Gunther Hagleitner commented on HIVE-13934: --- +1 although I'm hoping you could change the float comparison on commit ( != 0.0f). Either make the default -1f and do "> 0" when checking or if possible make the default "null" and check for that. > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, > HIVE-13934.2.patch, HIVE-13934.3.patch, HIVE-13934.4.patch, > HIVE-13934.6.patch, HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385191#comment-15385191 ] Gunther Hagleitner commented on HIVE-14282: --- FWIW this seems to silently update min pig version for hcat from 0.12 to 0.16... > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14292: -- Target Version/s: 1.3.0, 2.2.0, 2.1.1 > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > Attachments: HIVE-14292.patch > > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740) > at > org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357) > at >
[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385189#comment-15385189 ] Eugene Koifman commented on HIVE-14292: --- TxnHandler.isDuplicateKey() method is checking wrong SQL code https://dev.mysql.com/doc/refman/5.5/en/error-messages-server.html MySql has > 1 with ideantical msg it's currently checking 1022 which has msg "Can't write; duplicate key in table '%s'" but the one MySql comes back with is 1062 "Duplicate entry '%s' for key %d" there is also 1586 with "Duplicate entry '%s' for key '%s'" I don't see anything that explains which is produced when... > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > Attachments: HIVE-14292.patch > > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at >
[jira] [Commented] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385190#comment-15385190 ] Eugene Koifman commented on HIVE-14292: --- [~wzheng] could you review please > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > Attachments: HIVE-14292.patch > > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740) > at > org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357) > at >
[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14292: -- Status: Patch Available (was: Open) > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.1.0, 1.3.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > Attachments: HIVE-14292.patch > > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740) > at > org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357) > at >
[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14292: -- Attachment: HIVE-14292.patch > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > Attachments: HIVE-14292.patch > > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740) > at > org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357) > at >
[jira] [Commented] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)
[ https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385169#comment-15385169 ] Matt McCline commented on HIVE-14214: - Ok, review board created. New patch submitted for Hive QA. > ORC Schema Evolution and Predicate Push Down do not work together (no rows > returned) > > > Key: HIVE-14214 > URL: https://issues.apache.org/jira/browse/HIVE-14214 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, > HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, > HIVE-14214.WIP.patch > > > In Schema Evolution, the reader schema is different than the file schema > which is used to evaluate predicate push down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)
[ https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14214: Attachment: HIVE-14214.05.patch > ORC Schema Evolution and Predicate Push Down do not work together (no rows > returned) > > > Key: HIVE-14214 > URL: https://issues.apache.org/jira/browse/HIVE-14214 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, > HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, > HIVE-14214.WIP.patch > > > In Schema Evolution, the reader schema is different than the file schema > which is used to evaluate predicate push down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)
[ https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14214: Attachment: (was: HIVE-14214.05.patch) > ORC Schema Evolution and Predicate Push Down do not work together (no rows > returned) > > > Key: HIVE-14214 > URL: https://issues.apache.org/jira/browse/HIVE-14214 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, > HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.WIP.patch > > > In Schema Evolution, the reader schema is different than the file schema > which is used to evaluate predicate push down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385164#comment-15385164 ] Mohit Sabharwal commented on HIVE-14251: LGTM. If i understand correctly, the only difference between isCommonTypeOf and implicitConvertible is this line {code} // Allow implicit String to Double conversion if (fromPg == PrimitiveGrouping.STRING_GROUP && to == PrimitiveCategory.DOUBLE) { return true; } {code} Wondering if it's easy to re-use implicitConvertible instead of duplicating the code ? Maybe add a flag to the method for String to Double conversion ? > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
[ https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-12181: Status: Patch Available (was: Open) > Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver > --- > > Key: HIVE-12181 > URL: https://issues.apache.org/jira/browse/HIVE-12181 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12181.1.patch, HIVE-12181.2.patch, > HIVE-12181.patch, HIVE-12181.patch > > > There was a performance concern earlier, but HIVE-7587 has fixed that. We can > change the default to true now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12181) Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver
[ https://issues.apache.org/jira/browse/HIVE-12181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-12181: Status: Open (was: Patch Available) > Change hive.stats.fetch.column.stats value to true for MiniTezCliDriver > --- > > Key: HIVE-12181 > URL: https://issues.apache.org/jira/browse/HIVE-12181 > Project: Hive > Issue Type: Improvement > Components: Statistics >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12181.1.patch, HIVE-12181.2.patch, HIVE-12181.patch > > > There was a performance concern earlier, but HIVE-7587 has fixed that. We can > change the default to true now. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)
[ https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14214: Status: Patch Available (was: In Progress) > ORC Schema Evolution and Predicate Push Down do not work together (no rows > returned) > > > Key: HIVE-14214 > URL: https://issues.apache.org/jira/browse/HIVE-14214 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, > HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, > HIVE-14214.WIP.patch > > > In Schema Evolution, the reader schema is different than the file schema > which is used to evaluate predicate push down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14214) ORC Schema Evolution and Predicate Push Down do not work together (no rows returned)
[ https://issues.apache.org/jira/browse/HIVE-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14214: Attachment: HIVE-14214.05.patch > ORC Schema Evolution and Predicate Push Down do not work together (no rows > returned) > > > Key: HIVE-14214 > URL: https://issues.apache.org/jira/browse/HIVE-14214 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-14214.01.patch, HIVE-14214.02.patch, > HIVE-14214.03.patch, HIVE-14214.04.patch, HIVE-14214.05.patch, > HIVE-14214.WIP.patch > > > In Schema Evolution, the reader schema is different than the file schema > which is used to evaluate predicate push down. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14167) Use work directories provided by Tez instead of directly using YARN local dirs
[ https://issues.apache.org/jira/browse/HIVE-14167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385129#comment-15385129 ] Siddharth Seth commented on HIVE-14167: --- +1. Assuming you've tested it on a cluster, and seen the correct directories being used. > Use work directories provided by Tez instead of directly using YARN local dirs > -- > > Key: HIVE-14167 > URL: https://issues.apache.org/jira/browse/HIVE-14167 > Project: Hive > Issue Type: Improvement >Affects Versions: 2.1.0 >Reporter: Siddharth Seth >Assignee: Wei Zheng > Attachments: HIVE-14167.1.patch, HIVE-14167.2.patch, > HIVE-14167.3.patch > > > HIVE-13303 fixed things to use multiple directories instead of a single tmp > directory. However it's using yarn-local-dirs directly. > I'm not sure how well using the yarn-local-dir will work on a secure cluster. > Would be better to use Tez*Context.getWorkDirs. This provides an app specific > directory - writable by the user. > cc [~sershe] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Attachment: HIVE-13934.10.patch [~hagleitn] patch 10 addresses review comments. Thanks! > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.10.patch, > HIVE-13934.2.patch, HIVE-13934.3.patch, HIVE-13934.4.patch, > HIVE-13934.6.patch, HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13934) Configure Tez to make nocondiional task size memory available for the Processor
[ https://issues.apache.org/jira/browse/HIVE-13934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13934: - Status: Open (was: Patch Available) > Configure Tez to make nocondiional task size memory available for the > Processor > --- > > Key: HIVE-13934 > URL: https://issues.apache.org/jira/browse/HIVE-13934 > Project: Hive > Issue Type: Bug >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13934.1.patch, HIVE-13934.2.patch, > HIVE-13934.3.patch, HIVE-13934.4.patch, HIVE-13934.6.patch, > HIVE-13934.7.patch, HIVE-13934.8.patch, HIVE-13934.9.patch > > > Currently, noconditionaltasksize is not validated against the container size, > the reservations made in the container by Tez for Inputs / Outputs etc. > Check this at compile time to see if enough memory is available, or set up > the vertex to reserve additional memory for the Processor. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385098#comment-15385098 ] Thejas M Nair commented on HIVE-14282: -- +1 > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14293) PerfLogger.openScopes should be transient
[ https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15385089#comment-15385089 ] Prasanth Jayachandran commented on HIVE-14293: -- +1 > PerfLogger.openScopes should be transient > - > > Key: HIVE-14293 > URL: https://issues.apache.org/jira/browse/HIVE-14293 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-14293.1.patch > > > See the following exception when running Hive e2e tests: > {code} > 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, > v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name > = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) > INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = > s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions; > INFO : Compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, > type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), > FieldSchema(name:s.gpa, type:double, comment:null), > FieldSchema(name:v.registration, type:string, comment:null), > FieldSchema(name:v2.contributions, type:float, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); > Time taken: 1.165 seconds > INFO : Executing > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8 > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Session is already open > INFO : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1) > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Error caching map.xml: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.util.ConcurrentModificationException > Serialization trace: > classes (sun.misc.Launcher$AppClassLoader) > classloader (java.security.ProtectionDomain) > context (java.security.AccessControlContext) > acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader) > classLoader (org.apache.hadoop.hive.conf.HiveConf) > conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics) > metrics > (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope) > openScopes (org.apache.hadoop.hive.ql.log.PerfLogger) > perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) > childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) > [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at >
[jira] [Updated] (HIVE-14275) Driver#releasePlan throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-14275: Component/s: (was: HiveServer2) > Driver#releasePlan throws NullPointerException > -- > > Key: HIVE-14275 > URL: https://issues.apache.org/jira/browse/HIVE-14275 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0, 1.2.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > We'll need to add a null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14275) Driver#releasePlan throws NullPointerException
[ https://issues.apache.org/jira/browse/HIVE-14275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-14275: Affects Version/s: (was: 2.1.0) 0.14.0 1.2.1 > Driver#releasePlan throws NullPointerException > -- > > Key: HIVE-14275 > URL: https://issues.apache.org/jira/browse/HIVE-14275 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.14.0, 1.2.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > > We'll need to add a null check. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13906) Remove guava dependence from storage-api module
[ https://issues.apache.org/jira/browse/HIVE-13906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-13906: - Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) I just committed this. Thanks for the review, Sergey. > Remove guava dependence from storage-api module > --- > > Key: HIVE-13906 > URL: https://issues.apache.org/jira/browse/HIVE-13906 > Project: Hive > Issue Type: Bug > Components: storage-api >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.2.0 > > Attachments: HIVE-13906.patch > > > Guava is a very problematic library to depend on because of the version > incompatibilities and the use of it in the storage-api module causes it to > leak into everything that depends on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14293) PerfLogger.openScopes should be transient
[ https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14293: -- Status: Patch Available (was: Open) > PerfLogger.openScopes should be transient > - > > Key: HIVE-14293 > URL: https://issues.apache.org/jira/browse/HIVE-14293 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-14293.1.patch > > > See the following exception when running Hive e2e tests: > {code} > 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, > v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name > = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) > INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = > s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions; > INFO : Compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, > type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), > FieldSchema(name:s.gpa, type:double, comment:null), > FieldSchema(name:v.registration, type:string, comment:null), > FieldSchema(name:v2.contributions, type:float, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); > Time taken: 1.165 seconds > INFO : Executing > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8 > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Session is already open > INFO : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1) > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Error caching map.xml: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.util.ConcurrentModificationException > Serialization trace: > classes (sun.misc.Launcher$AppClassLoader) > classloader (java.security.ProtectionDomain) > context (java.security.AccessControlContext) > acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader) > classLoader (org.apache.hadoop.hive.conf.HiveConf) > conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics) > metrics > (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope) > openScopes (org.apache.hadoop.hive.ql.log.PerfLogger) > perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) > childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) > [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at >
[jira] [Updated] (HIVE-14293) PerfLogger.openScopes should be transient
[ https://issues.apache.org/jira/browse/HIVE-14293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14293: -- Attachment: HIVE-14293.1.patch > PerfLogger.openScopes should be transient > - > > Key: HIVE-14293 > URL: https://issues.apache.org/jira/browse/HIVE-14293 > Project: Hive > Issue Type: Bug >Affects Versions: 2.1.0 >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-14293.1.patch > > > See the following exception when running Hive e2e tests: > {code} > 0: jdbc:hive2://nat-r6-ojss-hsihs2-1.openstac> SELECT s.name, s2.age, s.gpa, > v.registration, v2.contributions FROM student s INNER JOIN voter v ON (s.name > = v.name) INNER JOIN student s2 ON (s2.age = v.age and s.name = s2.name) > INNER JOIN voter v2 ON (v2.name = s2.name and v2.age = s2.age) WHERE v2.age = > s.age ORDER BY s.name, s2.age, s.gpa, v.registration, v2.contributions; > INFO : Compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:s.name, > type:string, comment:null), FieldSchema(name:s2.age, type:int, comment:null), > FieldSchema(name:s.gpa, type:double, comment:null), > FieldSchema(name:v.registration, type:string, comment:null), > FieldSchema(name:v2.contributions, type:float, comment:null)], > properties:null) > INFO : Completed compiling > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8); > Time taken: 1.165 seconds > INFO : Executing > command(queryId=hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8): > SELECT s.name, s2.age, s.gpa, v.registration, v2.contributions FROM student s > INNER JOIN voter v ON (s.name = v.name) INNER JOIN student s2 ON (s2.age = > v.age and s.name = s2.name) INNER JOIN voter v2 ON (v2.name = s2.name and > v2.age = s2.age) WHERE v2.age = s.age ORDER BY s.name, s2.age, s.gpa, > v.registration, v2.contributions > INFO : Query ID = hive_20160717224915_3a52719f-539f-4f82-a9cd-0c0af4e09ef8 > INFO : Total jobs = 1 > INFO : Launching Job 1 out of 1 > INFO : Starting task [Stage-1:MAPRED] in serial mode > INFO : Session is already open > INFO : Dag name: SELECT s.name, s2.age, sv2.contributions(Stage-1) > ERROR : Failed to execute tez graph. > java.lang.RuntimeException: Error caching map.xml: > org.apache.hive.com.esotericsoftware.kryo.KryoException: > java.util.ConcurrentModificationException > Serialization trace: > classes (sun.misc.Launcher$AppClassLoader) > classloader (java.security.ProtectionDomain) > context (java.security.AccessControlContext) > acc (org.apache.hadoop.hive.ql.exec.UDFClassLoader) > classLoader (org.apache.hadoop.hive.conf.HiveConf) > conf (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics) > metrics > (org.apache.hadoop.hive.common.metrics.metrics2.CodahaleMetrics$CodahaleMetricsScope) > openScopes (org.apache.hadoop.hive.ql.log.PerfLogger) > perfLogger (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.MapJoinOperator) > childOperators (org.apache.hadoop.hive.ql.exec.SelectOperator) > childOperators (org.apache.hadoop.hive.ql.exec.FilterOperator) > childOperators (org.apache.hadoop.hive.ql.exec.TableScanOperator) > aliasToWork (org.apache.hadoop.hive.ql.plan.MapWork) > at > org.apache.hadoop.hive.ql.exec.Utilities.setBaseWork(Utilities.java:582) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.Utilities.setMapWork(Utilities.java:516) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:601) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at > org.apache.hadoop.hive.ql.exec.tez.DagUtils.createVertex(DagUtils.java:1147) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.build(TezTask.java:390) > ~[hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.tez.TezTask.execute(TezTask.java:164) > [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197) > [hive-exec-2.1.0.2.5.0.0-1009.jar:2.1.0.2.5.0.0-1009] > at >
[jira] [Updated] (HIVE-14292) ACID table creation fails on mysql with MySQLIntegrityConstraintViolationException
[ https://issues.apache.org/jira/browse/HIVE-14292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14292: -- Affects Version/s: (was: 1.2.1) 1.3.0 > ACID table creation fails on mysql with > MySQLIntegrityConstraintViolationException > -- > > Key: HIVE-14292 > URL: https://issues.apache.org/jira/browse/HIVE-14292 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.3.0, 2.1.0 > Environment: MySQL >Reporter: Deepesh Khandelwal >Assignee: Eugene Koifman > > While creating a ACID table ran into the following error: > {noformat} > >>> create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true'); > INFO : Compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15): > create table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true') > INFO : Semantic Analysis Completed > INFO : Returning Hive schema: Schema(fieldSchemas:null, properties:null) > INFO : Completed compiling > command(queryId=hive_20160719105944_bfe65377-59fa-4e17-941e-1f86b8daca15); > Time taken: 0.111 seconds > Error: Error running query: java.lang.RuntimeException: Unable to lock > 'CheckLock' due to: Duplicate entry 'CheckLock-0' for key 'PRIMARY' > (SQLState=23000, ErrorCode=1062) (state=,code=0) > Aborting command set because "force" is false and command failed: "create > table acidcount1 (id int) > clustered by (id) into 2 buckets > stored as orc > tblproperties('transactional'='true');" > {noformat} > Saw the following detailed stack in the server log: > {noformat} > 2016-07-19T10:59:46,213 ERROR [HiveServer2-Background-Pool: Thread-463]: > metastore.RetryingHMSHandler (RetryingHMSHandler.java:invokeInternal(196)) - > java.lang.RuntimeException: Unable to lock 'CheckLock' due to: Duplicate > entry 'CheckLock-0' for key 'PRIMARY' (SQLState=23000, ErrorCode=1062) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.acquireLock(TxnHandler.java:3235) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLock(TxnHandler.java:2309) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.checkLockWithRetry(TxnHandler.java:1012) > at > org.apache.hadoop.hive.metastore.txn.TxnHandler.lock(TxnHandler.java:784) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.lock(HiveMetaStore.java:5941) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy26.lock(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lock(HiveMetaStoreClient.java:2109) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:154) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2259) > at com.sun.proxy.$Proxy28.lock(Unknown Source) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager$SynchronizedMetaStoreClient.lock(DbTxnManager.java:740) > at > org.apache.hadoop.hive.ql.lockmgr.DbLockManager.lock(DbLockManager.java:103) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocks(DbTxnManager.java:341) > at > org.apache.hadoop.hive.ql.lockmgr.DbTxnManager.acquireLocksWithHeartbeatDelay(DbTxnManager.java:357) > at >
[jira] [Updated] (HIVE-14225) Llap slider package should support configuring YARN rolling log aggregation
[ https://issues.apache.org/jira/browse/HIVE-14225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14225: -- Attachment: HIVE-14225.01.patch Patch to - configure slider to inform YARN to aggregate files with the name .done. - Removes the query-based routing - Moves to RFA as the default router, since query-routing still requires some work. - Adds a value in HiveConf - similar to other variables like container-size, to access this value at runtime (when present in hive-site.xml) > Llap slider package should support configuring YARN rolling log aggregation > --- > > Key: HIVE-14225 > URL: https://issues.apache.org/jira/browse/HIVE-14225 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14225.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap
[ https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-14290: - Target Version/s: 2.2.0 Fix Version/s: (was: 2.2.0) > Refactor HIVE-14054 to use Collections#newSetFromMap > > > Key: HIVE-14290 > URL: https://issues.apache.org/jira/browse/HIVE-14290 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Trivial > Attachments: HIVE-14290.1.patch > > > There is a minor refactor that can be made to HiveMetaStoreChecker so that it > cleanly creates and uses a set that is backed by a Map implementation. In > this case, the underlying Map implementation is ConcurrentHashMap. This > refactor will help prevent issues such as the one reported in HIVE-14054. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14290) Refactor HIVE-14054 to use Collections#newSetFromMap
[ https://issues.apache.org/jira/browse/HIVE-14290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Slawski updated HIVE-14290: - Fix Version/s: 2.2.0 Status: Patch Available (was: Open) I've attached a patch that makes this minor refactor. > Refactor HIVE-14054 to use Collections#newSetFromMap > > > Key: HIVE-14290 > URL: https://issues.apache.org/jira/browse/HIVE-14290 > Project: Hive > Issue Type: Improvement > Components: Metastore >Affects Versions: 2.1.0 >Reporter: Peter Slawski >Assignee: Peter Slawski >Priority: Trivial > Fix For: 2.2.0 > > Attachments: HIVE-14290.1.patch > > > There is a minor refactor that can be made to HiveMetaStoreChecker so that it > cleanly creates and uses a set that is backed by a Map implementation. In > this case, the underlying Map implementation is ConcurrentHashMap. This > refactor will help prevent issues such as the one reported in HIVE-14054. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: HIVE-13995.5.patch > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: (was: HIVE-13995.5.patch) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384933#comment-15384933 ] Sahil Takiar commented on HIVE-14170: - Hey Tao, I addressed your comments, and updated the RB. I also pulled in the changes from HIVE-14169 since it doesn't really make sense to commit them separately. Can you take a look at the RB? Link: https://reviews.apache.org/r/49782/ Thanks! > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, > HIVE-14170.3.patch, HIVE-14170.4.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14169) Honor --incremental flag only if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384925#comment-15384925 ] Sahil Takiar commented on HIVE-14169: - Hey [~taoli-hwx] * Yes, by default it is still false * For non-table formats we came to the conclusion that there is no real benefit to using BufferedRows. It only really makes sense if the table output format is used. The reason is that if table output format is used along with BufferedRows, then the BufferedRows can calculate the optimal sizing for each row that it prints out. However, this isn't applicable for non-table formats. This is why I made the change to stop honoring the value of incremental if a non-table format is used. Also, I am going to close this JIRA and mark it as a duplicate of HIVE-14170 - since it doesn't make sense to commit these changes without HIVE-14170 along with it. > Honor --incremental flag only if TableOutputFormat is used > -- > > Key: HIVE-14169 > URL: https://issues.apache.org/jira/browse/HIVE-14169 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14169.1.patch > > > * When Beeline prints out a {{ResultSet}} to stdout it uses the > {{BeeLine.print}} method > * This method takes the {{ResultSet}} from the completed query and uses a > specified {{OutputFormat}} to print the rows (by default it uses > {{TableOutputFormat}}) > * The {{print}} method also wraps the {{ResultSet}} into a {{Rows}} class > (either a {{IncrementalRows}} or a {{BufferedRows}} class) > The advantage of {{BufferedRows}} is that it can do a global calculation of > the column width, however, this is only useful for {{TableOutputFormat}}. So > there is no need to buffer all the rows if a different {{OutputFormat}} is > used. This JIRA will change the behavior of the {{--incremental}} flag so > that it is only honored if {{TableOutputFormat}} is used. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384877#comment-15384877 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-13995: -- Updated RB, did some basic testing on the failed tests to make that 1. NPE is not encountered 2. We remove the unnecessary PART_NAME IN () whenever we do not prune any partitions. > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: HIVE-13995.5.patch > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: (was: HIVE-13995.5.patch) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12646) beeline and HIVE CLI do not parse ; in quote properly
[ https://issues.apache.org/jira/browse/HIVE-12646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-12646: Attachment: HIVE-12646.4.patch Updated patch to address review comments (details in RB). > beeline and HIVE CLI do not parse ; in quote properly > - > > Key: HIVE-12646 > URL: https://issues.apache.org/jira/browse/HIVE-12646 > Project: Hive > Issue Type: Bug > Components: CLI, Clients >Reporter: Yongzhi Chen >Assignee: Sahil Takiar > Attachments: HIVE-12646.2.patch, HIVE-12646.3.patch, > HIVE-12646.4.patch, HIVE-12646.patch > > > Beeline and Cli have to escape ; in the quote while most other shell scripts > need not. For example: > in Beeline: > {noformat} > 0: jdbc:hive2://localhost:1> select ';' from tlb1; > select ';' from tlb1; > 15/12/10 10:45:26 DEBUG TSaslTransport: writing data length: 115 > 15/12/10 10:45:26 DEBUG TSaslTransport: CLIENT: reading data length: 3403 > Error: Error while compiling statement: FAILED: ParseException line 1:8 > cannot recognize input near '' ' > {noformat} > while in mysql shell: > {noformat} > mysql> SELECT CONCAT(';', 'foo') FROM test limit 3; > ++ > | ;foo | > | ;foo | > | ;foo | > ++ > 3 rows in set (0.00 sec) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384867#comment-15384867 ] Chaoyu Tang commented on HIVE-14267: +1. > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14205) Hive doesn't support union type with AVRO file format
[ https://issues.apache.org/jira/browse/HIVE-14205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384863#comment-15384863 ] Chaoyu Tang commented on HIVE-14205: I am not sure why the infrastructure could not apply this patch but I was able to do that in my local machine and also verified the fix. I wonder if it was caused by the binary avro file. If so, maybe we can consider to insert instead of load data into the test table? > Hive doesn't support union type with AVRO file format > - > > Key: HIVE-14205 > URL: https://issues.apache.org/jira/browse/HIVE-14205 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Reporter: Yibing Shi >Assignee: Yibing Shi > Attachments: HIVE-14205.1.patch, HIVE-14205.2.patch, > HIVE-14205.3.patch, HIVE-14205.4.patch > > > Reproduce steps: > {noformat} > hive> CREATE TABLE avro_union_test > > PARTITIONED BY (p int) > > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > > TBLPROPERTIES ('avro.schema.literal'='{ > >"type":"record", > >"name":"nullUnionTest", > >"fields":[ > > { > > "name":"value", > > "type":[ > > "null", > > "int", > > "long" > > ], > > "default":null > > } > >] > > }'); > OK > Time taken: 0.105 seconds > hive> alter table avro_union_test add partition (p=1); > OK > Time taken: 0.093 seconds > hive> select * from avro_union_test; > FAILED: RuntimeException org.apache.hadoop.hive.ql.metadata.HiveException: > Failed with exception Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported > yet.java.lang.RuntimeException: Hive internal error inside > isAssignableFromSettablePrimitiveOI void not supported yet. > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettablePrimitiveOI(ObjectInspectorUtils.java:1140) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.isInstanceOfSettableOI(ObjectInspectorUtils.java:1149) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1187) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1220) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.hasAllFieldsSettable(ObjectInspectorUtils.java:1200) > at > org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorConverters.getConvertedOI(ObjectInspectorConverters.java:219) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.setupOutputObjectInspector(FetchOperator.java:581) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.initialize(FetchOperator.java:172) > at > org.apache.hadoop.hive.ql.exec.FetchOperator.(FetchOperator.java:140) > at > org.apache.hadoop.hive.ql.exec.FetchTask.initialize(FetchTask.java:79) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:482) > at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311) > at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1194) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1289) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1120) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1108) > at > org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:218) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:170) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:381) > at > org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:773) > at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:691) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:626) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:497) > at org.apache.hadoop.util.RunJar.run(RunJar.java:221) > at org.apache.hadoop.util.RunJar.main(RunJar.java:136) > {noformat} > Another test case to show this problem is: > {noformat} > hive> create table avro_union_test2 (value uniontype) stored as > avro; > OK > Time taken: 0.053 seconds > hive> show create table avro_union_test2; > OK >
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Attachment: (was: HIVE-14267.2.patch) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Patch Available (was: Open) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Attachment: HIVE-14267.2.patch Patch isnt getting picked up for pre-commits. Re-attaching the same patch > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Open (was: Patch Available) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384793#comment-15384793 ] Chaoyu Tang edited comment on HIVE-14281 at 7/19/16 8:24 PM: - Another use case if we use a decimal with small scale such as decimal (38, 6): {code} create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d decimal(38, 6), e decimal(38, 6), f decimal(38, 6)) insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 6.00); hive> explain select a*b*c*d*e*f from test1; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test1 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36)) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE ListSink hive> select a*b*c*d*e*f from test1; OK NULL {code} was (Author: ctang.ma): Another use case if we use a decimal with small scale such as decimal (38, 6): {cdoe} create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d decimal(38, 6), e decimal(38, 6), f decimal(38, 6)) insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 6.00); hive> explain select a*b*c*d*e*f from test1; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test1 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36)) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE ListSink hive> select a*b*c*d*e*f from test1; OK NULL {code} > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384793#comment-15384793 ] Chaoyu Tang commented on HIVE-14281: Another use case if we use a decimal with small scale such as decimal (38, 6): {cdoe} create table test1 (a decimal(38, 6), b decimal(38, 6), c decimal(38, 6), d decimal(38, 6), e decimal(38, 6), f decimal(38, 6)) insert into test1 values (1.00, 2.00, 3.00, 4.00, 5.00, 6.00); hive> explain select a*b*c*d*e*f from test1; OK STAGE DEPENDENCIES: Stage-0 is a root stage STAGE PLANS: Stage: Stage-0 Fetch Operator limit: -1 Processor Tree: TableScan alias: test1 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE Select Operator expressions: (a * b) * c) * d) * e) * f) (type: decimal(38,36)) outputColumnNames: _col0 Statistics: Num rows: 1 Data size: 53 Basic stats: COMPLETE Column stats: NONE ListSink hive> select a*b*c*d*e*f from test1; OK NULL {code} > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14086) org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro schema file
[ https://issues.apache.org/jira/browse/HIVE-14086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Volker updated HIVE-14086: --- Attachment: avroremoved.json avro.sql avro.json SQL to create table (avro.sql): {noformat} CREATE TABLE avro_table PARTITIONED BY (str_part STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' TBLPROPERTIES ( 'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json' ); {noformat} avro.json: {noformat} { "namespace": "com.cloudera.test", "name": "avro_table", "type": "record", "fields": [ { "name":"string1", "type":"string" }, { "name":"CamelCol", "type":"string" } ] } {noformat} avroremoved.json (one column removed from schema): {noformat} { "namespace": "com.cloudera.test", "name": "avro_table", "type": "record", "fields": [ { "name":"string1", "type":"string" } ] } {noformat} > org.apache.hadoop.hive.metastore.api.Table does not return columns from Avro > schema file > > > Key: HIVE-14086 > URL: https://issues.apache.org/jira/browse/HIVE-14086 > Project: Hive > Issue Type: Bug > Components: API >Reporter: Lars Volker > Attachments: avro.json, avro.sql, avroremoved.json > > > Consider this table, using an external Avro schema file: > {noformat} > CREATE TABLE avro_table > PARTITIONED BY (str_part STRING) > ROW FORMAT SERDE > 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' > STORED AS INPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' > OUTPUTFORMAT > 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' > TBLPROPERTIES ( > 'avro.schema.url'='hdfs://localhost:20500/tmp/avro.json' > ); > {noformat} > This will populate the "COLUMNS_V2" metastore table with the correct column > information (as per HIVE-6308). The columns of this table can then be queried > via the Hive API, for example by calling {{.getSd().getCols()}} on a > {{org.apache.hadoop.hive.metastore.api.Table}} object. > Changes to the avro.schema.url file - either changing where it points to or > changing its contents - will be reflected in the output of {{describe > formatted avro_table}} *but not* in the result of the {{.getSd().getCols()}} > API call. Instead it looks like Hive only reads the Avro schema file > internally, but does not expose the information therein via its API. > Is there a way to obtain the effective Table information via Hive? Would it > make sense to fix table retrieval so calls to {{get_table}} return the > correct set of columns? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: (was: HIVE-13995.5.patch) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14283) Beeline tests are broken
[ https://issues.apache.org/jira/browse/HIVE-14283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved HIVE-14283. Resolution: Not A Bug This was environment issue. Tests are working fine. > Beeline tests are broken > > > Key: HIVE-14283 > URL: https://issues.apache.org/jira/browse/HIVE-14283 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.2.0 >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar > > Beeline tests seems to be broken. > {noformat} > --- > T E S T S > --- > --- > T E S T S > --- > Running org.apache.hive.beeline.cli.TestHiveCli > Tests run: 22, Failures: 22, Errors: 0, Skipped: 0, Time elapsed: 8.514 sec > <<< FAILURE! - in org.apache.hive.beeline.cli.TestHiveCli > testSetPromptValue(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: > 1.599 sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSourceCmd2(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.291 > sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSourceCmd3(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.306 > sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testInvalidOptions2(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: > 0.292 sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testCmd(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.271 sec > <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testHelp(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.284 sec > <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSourceCmd(org.apache.hive.beeline.cli.TestHiveCli) Time elapsed: 0.259 > sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) > testSqlFromCmdWithDBName(org.apache.hive.beeline.cli.TestHiveCli) Time > elapsed: 0.214 sec <<< FAILURE! > java.lang.AssertionError: Supported return code is 0 while the actual is 2 > at org.junit.Assert.fail(Assert.java:88) > at > org.apache.hive.beeline.cli.TestHiveCli.executeCMD(TestHiveCli.java:73) > at > org.apache.hive.beeline.cli.TestHiveCli.initFromFile(TestHiveCli.java:260) > at org.apache.hive.beeline.cli.TestHiveCli.setup(TestHiveCli.java:283) >
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: HIVE-13995.5.patch > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384742#comment-15384742 ] Chaoyu Tang commented on HIVE-14281: For Java BigDecimal, there is a comment about the rounding and wonder if it can be used in Hive {code} Before rounding, the scale of the logical exact intermediate result (e.g. multiplier.scale() + multiplicand.scale()) is the preferred scale for that operation (e.g. multiply). If the exact numerical result cannot be represented in precision digits, rounding selects the set of digits to return and the scale of the result is reduced from the scale of the intermediate result to the least scale which can represent the precision digits actually returned. If the exact result can be represented with at most precision digits, the representation of the result with the scale closest to the preferred scale is returned. {code} I checked the MySQL which supports max precision 65 and max scale 30: {code} create table decimaltest (col1 decimal(65,14), col2 decimal(65, 14)); insert into decimaltest values (987654321001234567890123456789012345678901234567890.12345678901234, 10.12345678901234); select col1 * col2 from decimaltest -- returns 987654321001234567890123456789012345678901234567890123456789.0 {code} It is hard to interpret this result, its precision is 73 ( > max 65) and scale is 9 (instead of 28). But its metadata in a JDBC application is decimal with precision 65 and scale 28. > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
[ https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14284: - Component/s: Security > HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well > --- > > Key: HIVE-14284 > URL: https://issues.apache.org/jira/browse/HIVE-14284 > Project: Hive > Issue Type: Bug > Components: Authorization, Security >Reporter: Thejas M Nair >Assignee: Thejas M Nair > > HiveAuthzContext provides useful information about the context of the > commands, such as the command string and ip address information. However, > this is available to only checkPrivileges and filterListCmdObjects api calls. > This should be made available for other api calls such as grant/revoke > methods and role management methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14284) HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well
[ https://issues.apache.org/jira/browse/HIVE-14284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14284: - Component/s: Authorization > HiveAuthorizer: Pass HiveAuthzContext to grant/revoke/role apis as well > --- > > Key: HIVE-14284 > URL: https://issues.apache.org/jira/browse/HIVE-14284 > Project: Hive > Issue Type: Bug > Components: Authorization, Security >Reporter: Thejas M Nair >Assignee: Thejas M Nair > > HiveAuthzContext provides useful information about the context of the > commands, such as the command string and ip address information. However, > this is available to only checkPrivileges and filterListCmdObjects api calls. > This should be made available for other api calls such as grant/revoke > methods and role management methods. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Attachment: HIVE-13995.5.patch > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Status: Open (was: Patch Available) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13995) Hive generates inefficient metastore queries for TPCDS tables with 1800+ partitions leading to higher compile time
[ https://issues.apache.org/jira/browse/HIVE-13995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-13995: - Status: Patch Available (was: Open) > Hive generates inefficient metastore queries for TPCDS tables with 1800+ > partitions leading to higher compile time > -- > > Key: HIVE-13995 > URL: https://issues.apache.org/jira/browse/HIVE-13995 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Nita Dembla >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-13995.1.patch, HIVE-13995.2.patch, > HIVE-13995.3.patch, HIVE-13995.4.patch, HIVE-13995.5.patch > > > TPCDS fact tables (store_sales, catalog_sales) have 1800+ partitions and when > the query does not a filter on the partition column, metastore queries > generated have a large IN clause listing all the partition names. Most RDBMS > systems have issues optimizing large IN clause and even when a good index > plan is chosen , comparing to 1800+ string values will not lead to best > execution time. > When all partitions are chosen, not specifying the partition list and having > filters only on table and column name will generate the same result set as > long as there are no concurrent modifications to partition list of the hive > table (adding/dropping partitions). > For eg: For TPCDS query18, the metastore query gathering partition column > statistics runs in 0.5 secs in Mysql. Following is output from mysql log > {noformat} > -- Query_time: 0.482063 Lock_time: 0.003037 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' > and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" in > ('cs_sold_date_sk=2450815','cs_sold_date_sk=2450816','cs_sold_date_sk=2450817','cs_sold_date_sk=2450818','cs_sold_date_sk=2450819','cs_sold_date_sk=2450820','cs_sold_date_sk=2450821','cs_sold_date_sk=2450822','cs_sold_date_sk=2450823','cs_sold_date_sk=2450824','cs_sold_date_sk=2450825','cs_sold_date_sk=2450826','cs_sold_date_sk=2450827','cs_sold_date_sk=2450828','cs_sold_date_sk=2450829','cs_sold_date_sk=2450830','cs_sold_date_sk=2450831','cs_sold_date_sk=2450832','cs_sold_date_sk=2450833','cs_sold_date_sk=2450834','cs_sold_date_sk=2450835','cs_sold_date_sk=2450836','cs_sold_date_sk=2450837','cs_sold_date_sk=2450838','cs_sold_date_sk=2450839','cs_sold_date_sk=2450840','cs_sold_date_sk=2450841','cs_sold_date_sk=2450842','cs_sold_date_sk=2450843','cs_sold_date_sk=2450844','cs_sold_date_sk=2450845','cs_sold_date_sk=2450846','cs_sold_date_sk=2450847','cs_sold_date_sk=2450848','cs_sold_date_sk=2450849','cs_sold_date_sk=2450850','cs_sold_date_sk=2450851','cs_sold_date_sk=2450852','cs_sold_date_sk=2450853','cs_sold_date_sk=2450854','cs_sold_date_sk=2450855','cs_sold_date_sk=2450856',...,'cs_sold_date_sk=2452654') > group by "PARTITION_NAME"; > {noformat} > Functionally equivalent query runs in 0.1 seconds > {noformat} > --Query_time: 0.121296 Lock_time: 0.000156 Rows_sent: 1836 Rows_examined: > 18360 > select count("COLUMN_NAME") from "PART_COL_STATS" > where "DB_NAME" = 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = > 'catalog_sales' and "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > group by "PARTITION_NAME"; > {noformat} > If removing the partition list seems drastic, its also possible to simply > list the range since hive gets a ordered list of partition names. This > performs equally well as earlier query > {noformat} > # Query_time: 0.143874 Lock_time: 0.000154 Rows_sent: 1836 Rows_examined: > 18360 > SET timestamp=1464014881; > select count("COLUMN_NAME") from "PART_COL_STATS" where "DB_NAME" = > 'tpcds_bin_partitioned_orc_3' and "TABLE_NAME" = 'catalog_sales' and > "COLUMN_NAME" in > ('cs_bill_customer_sk','cs_bill_cdemo_sk','cs_item_sk','cs_quantity','cs_list_price','cs_sales_price','cs_coupon_amt','cs_net_profit') > and "PARTITION_NAME" >= 'cs_sold_date_sk=2450815' and "PARTITION_NAME" <= > 'cs_sold_date_sk=2452654' > group by "PARTITION_NAME"; > {noformat} > Another thing to check is the IN clause of column names. Columns in > projection list of hive query are mentioned here. Not sure if statistics of > these columns are required for hive query optimization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Patch Available (was: Open) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Attachment: HIVE-14267.2.patch Attaching new patch based on the input from RB > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.2.patch, HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14267) HS2 open_operations metrics not decremented when an operation gets timed out
[ https://issues.apache.org/jira/browse/HIVE-14267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14267: - Status: Open (was: Patch Available) > HS2 open_operations metrics not decremented when an operation gets timed out > > > Key: HIVE-14267 > URL: https://issues.apache.org/jira/browse/HIVE-14267 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: David Karoly >Assignee: Naveen Gangam >Priority: Minor > Attachments: HIVE-14267.patch > > > When an operation gets timed out, it is removed from handleToOperation hash > map in OperationManager.removeTimedOutOperation(). However OPEN_OPERATIONS > counter is not decremented. > This can result in an inaccurate open operations metrics value being > reported. Especially when submitting queries to Hive from Hue with > close_queries=false option, this results in misleading HS2 metrics charts. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14229) the jars in hive.aux.jar.paths are not added to session classpath
[ https://issues.apache.org/jira/browse/HIVE-14229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384632#comment-15384632 ] Mohit Sabharwal commented on HIVE-14229: +1 > the jars in hive.aux.jar.paths are not added to session classpath > -- > > Key: HIVE-14229 > URL: https://issues.apache.org/jira/browse/HIVE-14229 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14229.1.patch > > > The jars in hive.reloadable.aux.jar.paths are being added to HiveServer2 > classpath while hive.aux.jar.paths is not. > Then the local task like 'select udf(x) from src' will fail to find needed > udf class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14277: --- Affects Version/s: 2.0.0 2.1.0 > Disable StatsOptimizer for all ACID tables > -- > > Key: HIVE-14277 > URL: https://issues.apache.org/jira/browse/HIVE-14277 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0, 2.1.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14277.01.patch > > > We have observed lots of cases where ACID table is created for HCat > streaming. Streaming will directly insert data into the table but the stats > of the table are not updated (or there is no good way to update). We would > like to disable StatsOptimzer for all acid tables so that it will at least > not give wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14277: --- Fix Version/s: 2.1.1 2.2.0 > Disable StatsOptimizer for all ACID tables > -- > > Key: HIVE-14277 > URL: https://issues.apache.org/jira/browse/HIVE-14277 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0, 2.1.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-14277.01.patch > > > We have observed lots of cases where ACID table is created for HCat > streaming. Streaming will directly insert data into the table but the stats > of the table are not updated (or there is no good way to update). We would > like to disable StatsOptimzer for all acid tables so that it will at least > not give wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14277) Disable StatsOptimizer for all ACID tables
[ https://issues.apache.org/jira/browse/HIVE-14277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14277: --- Resolution: Fixed Status: Resolved (was: Patch Available) pushed to master. Thanks [~ashutoshc] for the review. > Disable StatsOptimizer for all ACID tables > -- > > Key: HIVE-14277 > URL: https://issues.apache.org/jira/browse/HIVE-14277 > Project: Hive > Issue Type: Sub-task >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-14277.01.patch > > > We have observed lots of cases where ACID table is created for HCat > streaming. Streaming will directly insert data into the table but the stats > of the table are not updated (or there is no good way to update). We would > like to disable StatsOptimzer for all acid tables so that it will at least > not give wrong results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Affects Version/s: 2.0.0 > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0, 2.2.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Fix Version/s: 2.2.0 > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0, 2.2.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14221) set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER
[ https://issues.apache.org/jira/browse/HIVE-14221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-14221: --- Resolution: Fixed Status: Resolved (was: Patch Available) pushed to master. Thanks [~ashutoshc] for the review. > set SQLStdHiveAuthorizerFactoryForTest as default HIVE_AUTHORIZATION_MANAGER > > > Key: HIVE-14221 > URL: https://issues.apache.org/jira/browse/HIVE-14221 > Project: Hive > Issue Type: Sub-task > Components: Security >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Fix For: 2.1.0 > > Attachments: HIVE-14221.01.patch, HIVE-14221.02.patch, > HIVE-14221.03.patch, HIVE-14221.04.patch, HIVE-14221.05.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14242) Backport ORC-53 to Hive
[ https://issues.apache.org/jira/browse/HIVE-14242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384589#comment-15384589 ] Prasanth Jayachandran commented on HIVE-14242: -- +1 > Backport ORC-53 to Hive > --- > > Key: HIVE-14242 > URL: https://issues.apache.org/jira/browse/HIVE-14242 > Project: Hive > Issue Type: Bug > Components: ORC >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-14242.patch > > > ORC-53 was mostly about the mapreduce shims for ORC, but it fixed a problem > in TypeDescription that should be backported to Hive. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14224) LLAP rename query specific log files once a query is complete
[ https://issues.apache.org/jira/browse/HIVE-14224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384575#comment-15384575 ] Hive QA commented on HIVE-14224: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12818775/HIVE-14224.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 7 failed/errored test(s), 10321 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-acid_globallimit.q-cte_mat_1.q-union5.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_acid_globallimit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_subquery_multiinsert org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testCheckPermissions org.apache.hadoop.hive.llap.daemon.impl.TestLlapTokenChecker.testGetToken org.apache.hadoop.hive.metastore.TestMetaStoreMetrics.testConnections {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/578/console Test logs: http://ec2-204-236-174-241.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-578/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 7 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12818775 - PreCommit-HIVE-MASTER-Build > LLAP rename query specific log files once a query is complete > - > > Key: HIVE-14224 > URL: https://issues.apache.org/jira/browse/HIVE-14224 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14224.02.patch, HIVE-14224.03.patch, > HIVE-14224.04.patch, HIVE-14224.wip.01.patch > > > Once a query is complete, rename the query specific log file so that YARN can > aggregate the logs (once it's configured to do so). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Fix Version/s: 2.1.1 2.2.0 1.3.0 Status: Patch Available (was: Open) > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384559#comment-15384559 ] Tao Li edited comment on HIVE-14170 at 7/19/16 5:38 PM: [~stakiar] Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. was (Author: taoli-hwx): [~stakiar] Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Attachment: HIVE-14282.1.patch > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Component/s: HCatalog > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > Attachments: HIVE-14282.1.patch > > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-14282: -- Summary: HCatLoader ToDate() exception with hive partition table ,partitioned by column of DATE datatype (was: Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype) > HCatLoader ToDate() exception with hive partition table ,partitioned by > column of DATE datatype > --- > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned HIVE-14282: - Assignee: Daniel Dai > Pig ToDate() exception with hive partition table ,partitioned by column of > DATE datatype > > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari >Assignee: Daniel Dai > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14254) Correct the hive version by changing "svn" to "git"
[ https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384561#comment-15384561 ] Tao Li commented on HIVE-14254: --- Thanks [~spena] for you help! > Correct the hive version by changing "svn" to "git" > --- > > Key: HIVE-14254 > URL: https://issues.apache.org/jira/browse/HIVE-14254 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.1.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14254.1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > When running "hive --version", "subversion" is displayed below, which should > be "git". > $ hive --version > Hive 2.1.0-SNAPSHOT > Subversion git:// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14282) Pig ToDate() exception with hive partition table ,partitioned by column of DATE datatype
[ https://issues.apache.org/jira/browse/HIVE-14282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghavender Rao Guruvannagari updated HIVE-14282: - Affects Version/s: (was: 0.15.0) 1.2.1 Environment: PIG Version : (0.15.0) HIVE : 1.2.1 OS Version : CentOS release 6.7 (Final) OS Kernel : 2.6.32-573.18.1.el6.x86_64 was: PIG Version : (0.15.0) OS Version : CentOS release 6.7 (Final) OS Kernel : 2.6.32-573.18.1.el6.x86_64 > Pig ToDate() exception with hive partition table ,partitioned by column of > DATE datatype > > > Key: HIVE-14282 > URL: https://issues.apache.org/jira/browse/HIVE-14282 > Project: Hive > Issue Type: Bug >Affects Versions: 1.2.1 > Environment: PIG Version : (0.15.0) > HIVE : 1.2.1 > OS Version : CentOS release 6.7 (Final) > OS Kernel : 2.6.32-573.18.1.el6.x86_64 >Reporter: Raghavender Rao Guruvannagari > > ToDate() function doesnt work with a partitioned table, partitioned by the > column of DATE Datatype. > Below are the steps I followed to recreate the problem. > -->Sample input file to hive table : > hdfs@testhost ~$ cat test.log > 2012-06-13,16:11:17,574,140.134.127.109,SearchPage,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,466,43.176.108.158,Electronics,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,501,97.73.102.79,Appliances,Google.com,Android,4,iPhone > 2012-06-13,16:11:17,469,166.98.157.122,Recommendations,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,557,36.159.147.50,Sporting,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,449,128.215.122.234,ShoppingCart,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,502,46.81.131.92,Electronics,Google.com,Android,5,Samsung > 2012-06-13,16:11:17,554,120.187.105.127,Automotive,Google.com,Win8,5,HTC > 2012-06-13,16:11:17,447,127.94.64.59,DetailPage,Google.com,Win8,3,Samsung > 2012-06-13,16:11:17,490,132.54.25.75,ShoppingCart,Google.com,Win8,3,iPhone > 2012-06-13,16:11:17,578,79.201.53.179,Automotive,Google.com,Win8,5,Samsung > 2012-06-13,16:11:17,435,158.106.164.38,HomePage,Google.com,Web,5,Chrome > 2012-06-13,16:11:17,523,17.131.82.171,Recommendations,Google.com,Web,3,IE9 > 2012-06-13,16:11:17,575,178.95.126.105,Appliances,Google.com,iOS,3,iPhone > 2012-06-13,16:11:17,468,225.143.39.176,SearchPage,Google.com,iOS,5,HTC > 2012-06-13,16:11:17,511,43.103.102.147,ShoppingCart,Google.com,iOS,5,Samsung > --> Copied to hdfs directory: > hdfs@testhost ~$ hdfs dfs -put -f test.log /user/hdfs/ > -->Create partitoned table (partitioned with date data type column) in hive: > 0: jdbc:hive2://hdp2.raghav.com:1/default> create table mytable(Dt > DATE,Time STRING,Number INT,IPAddr STRING,Type STRING,Site STRING,OSType > STRING,Visit INT,PhModel STRING) row format delimited fields terminated by > ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> load data inpath > '/user/hdfs/test.log' overwrite into table mytable; > 0: jdbc:hive2://testhost..com:1/default> SET hive.exec.dynamic.partition > = true; > 0: jdbc:hive2://testhost.com:1/default> SET > hive.exec.dynamic.partition.mode = nonstrict; > 0: jdbc:hive2://testhost.com:1/default> create table partmytable(Number > INT,IPAddr STRING,Type STRING,Site STRING,OSType STRING,Visit INT,PhModel > STRING) partitioned by (Dt DATE,Time STRING) row format delimited fields > terminated by ',' stored as textfile; > 0: jdbc:hive2://testhost.com:1/default> insert overwrite table > partmytable partition(Dt,Time) select > Number,IPAddr,Type,Site,OSType,Visit,PhModel,Dt,Time from mytable; > 0: jdbc:hive2://hdp2.raghav.com:1/default> describe partmytable; > --> Try to filter with ToDate function which fails with error: > hdfs@testhost ~$ pig -useHCatalog > grunt> > grunt> temp = LOAD 'partmytable' using > org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > -->Try to filter the normal table with same statement works; > grunt> > grunt> temp = LOAD 'mytable' using org.apache.hive.hcatalog.pig.HCatLoader(); > grunt> temp1 = FILTER temp by dt == ToDate('2012-06-13','-MM-dd'); > grunt> dump temp1; > Workaround : > Use below statement instead of direct ToDate(); > grunt>temp1 = FILTER temp5 by DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) >=(long)0 AND DaysBetween(dt,(datetime)ToDate('2012-06-13', > '-MM-dd')) <=(long)0; -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384559#comment-15384559 ] Tao Li edited comment on HIVE-14170 at 7/19/16 5:31 PM: [~stakiar] Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. was (Author: taoli-hwx): @stakiar Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384559#comment-15384559 ] Tao Li commented on HIVE-14170: --- @stakiar Another thinking is that we may improve the "buffered page" mode to avoid OOM issue. For example, we can iterate through the whole result set once to calculate the max column width (and without loading the result set into memory). Then we iterate the result set again to print out. The pros is that it requires minimal code change. The cons is that the latency should be higher because we iterate the result set twice. > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Attachment: (was: HIVE-14251.1.patch) > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14251) Union All of different types resolves to incorrect data
[ https://issues.apache.org/jira/browse/HIVE-14251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-14251: Attachment: HIVE-14251.1.patch > Union All of different types resolves to incorrect data > --- > > Key: HIVE-14251 > URL: https://issues.apache.org/jira/browse/HIVE-14251 > Project: Hive > Issue Type: Bug > Components: Query Planning >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-14251.1.patch > > > create table src(c1 date, c2 int, c3 double); > insert into src values ('2016-01-01',5,1.25); > select * from > (select c1 from src union all > select c2 from src union all > select c3 from src) t; > It will return NULL for the c1 values. Seems the common data type is resolved > to the last c3 which is double. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14254) Correct the hive version by changing "svn" to "git"
[ https://issues.apache.org/jira/browse/HIVE-14254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergio Peña updated HIVE-14254: --- Resolution: Fixed Fix Version/s: 2.2.0 Status: Resolved (was: Patch Available) Thanks [~taoli-hwx] for your patch. I committed this to 2.2. > Correct the hive version by changing "svn" to "git" > --- > > Key: HIVE-14254 > URL: https://issues.apache.org/jira/browse/HIVE-14254 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.1.0 >Reporter: Tao Li >Assignee: Tao Li >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14254.1.patch > > Original Estimate: 2h > Remaining Estimate: 2h > > When running "hive --version", "subversion" is displayed below, which should > be "git". > $ hive --version > Hive 2.1.0-SNAPSHOT > Subversion git:// -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14281) Issue in decimal multiplication
[ https://issues.apache.org/jira/browse/HIVE-14281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384459#comment-15384459 ] Xuefu Zhang commented on HIVE-14281: Not sure this is a problem though. The next row may contain data with 18 decimal points, for which precision may get lost. I would think user shouldn't specific decimal(38, 18) for numbers that don't require such a scale. Of course, we may want to check how other DBs handle this. > Issue in decimal multiplication > --- > > Key: HIVE-14281 > URL: https://issues.apache.org/jira/browse/HIVE-14281 > Project: Hive > Issue Type: Bug > Components: Types >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > > {code} > CREATE TABLE test (a DECIMAL(38,18), b DECIMAL(38,18)); > INSERT OVERWRITE TABLE test VALUES (20, 20); > SELECT a*b from test > {code} > The returned result is NULL (instead of 400) > It is because Hive adds the scales from operands and the type for a*b is set > to decimal (38, 36). Hive could not handle this case properly (e.g. by > rounding) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-14123: -- Attachment: HIVE-14123.10.patch Addressing review comments > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.10.patch, HIVE-14123.2.patch, > HIVE-14123.3.patch, HIVE-14123.4.patch, HIVE-14123.5.patch, > HIVE-14123.6.patch, HIVE-14123.7.patch, HIVE-14123.8.patch, > HIVE-14123.9.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13815) Improve logic to infer false predicates
[ https://issues.apache.org/jira/browse/HIVE-13815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384418#comment-15384418 ] Ashutosh Chauhan commented on HIVE-13815: - This is an useful optimization to have especially for machine generated queries. > Improve logic to infer false predicates > --- > > Key: HIVE-13815 > URL: https://issues.apache.org/jira/browse/HIVE-13815 > Project: Hive > Issue Type: Sub-task > Components: CBO >Affects Versions: 2.1.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > > Follow-up/extension of the work done in HIVE-13068. > Ex. > ql/src/test/results/clientpositive/annotate_stats_filter.q.out > {{predicate: ((year = 2001) and (state = 'OH') and (state = 'FL')) (type: > boolean)}} -> {{false}} > ql/src/test/results/clientpositive/cbo_rp_join1.q.out > {{predicate: ((_col0 = _col1) and (_col1 = 40) and (_col0 = 40)) (type: > boolean)}} -> {{predicate: ((_col1 = 40) and (_col0 = 40)) (type: boolean)}} > ql/src/test/results/clientpositive/constprog_semijoin.q.out > {{predicate: (((id = 100) = true) and (id <> 100)) (type: boolean)}} -> > {{false}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14123) Add beeline configuration option to show database in the prompt
[ https://issues.apache.org/jira/browse/HIVE-14123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384394#comment-15384394 ] Aihua Xu commented on HIVE-14123: - Minor comments. The patch looks good to me. +1. > Add beeline configuration option to show database in the prompt > --- > > Key: HIVE-14123 > URL: https://issues.apache.org/jira/browse/HIVE-14123 > Project: Hive > Issue Type: Improvement > Components: Beeline, CLI >Affects Versions: 2.2.0 >Reporter: Peter Vary >Assignee: Peter Vary >Priority: Minor > Attachments: HIVE-14123.2.patch, HIVE-14123.3.patch, > HIVE-14123.4.patch, HIVE-14123.5.patch, HIVE-14123.6.patch, > HIVE-14123.7.patch, HIVE-14123.8.patch, HIVE-14123.9.patch, HIVE-14123.patch > > > There are several jira issues complaining that, the Beeline does not respect > hive.cli.print.current.db. > This is partially true, since in embedded mode, it uses the > hive.cli.print.current.db to change the prompt, since HIVE-10511. > In beeline mode, I think this function should use a beeline command line > option instead, like for the showHeader option emphasizing, that this is a > client side option. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-10022) Authorization checks for non existent file/directory should not be recursive
[ https://issues.apache.org/jira/browse/HIVE-10022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384391#comment-15384391 ] Sushanth Sowmyan commented on HIVE-10022: - Yup, those are valid concerns, I'm trying to test them out. > Authorization checks for non existent file/directory should not be recursive > > > Key: HIVE-10022 > URL: https://issues.apache.org/jira/browse/HIVE-10022 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 0.14.0 >Reporter: Pankit Thapar >Assignee: Pankit Thapar > Attachments: HIVE-10022.2.patch, HIVE-10022.3.patch, HIVE-10022.patch > > > I am testing a query like : > set hive.test.authz.sstd.hs2.mode=true; > set > hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactoryForTest; > set > hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateConfigUserAuthenticator; > set hive.security.authorization.enabled=true; > set user.name=user1; > create table auth_noupd(i int) clustered by (i) into 2 buckets stored as orc > location '${OUTPUT}' TBLPROPERTIES ('transactional'='true'); > Now, in the above query, since authorization is true, > we would end up calling doAuthorizationV2() which ultimately ends up calling > SQLAuthorizationUtils.getPrivilegesFromFS() which calls a recursive method : > FileUtils.isActionPermittedForFileHierarchy() with the object or the ancestor > of the object we are trying to authorize if the object does not exist. > The logic in FileUtils.isActionPermittedForFileHierarchy() is DFS. > Now assume, we have a path as a/b/c/d that we are trying to authorize. > In case, a/b/c/d does not exist, we would call > FileUtils.isActionPermittedForFileHierarchy() with say a/b/ assuming a/b/c > also does not exist. > If under the subtree at a/b, we have millions of files, then > FileUtils.isActionPermittedForFileHierarchy() is going to check file > permission on each of those objects. > I do not completely understand why do we have to check for file permissions > in all the objects in branch of the tree that we are not trying to read > from /write to. > We could have checked file permission on the ancestor that exists and if it > matches what we expect, the return true. > Please confirm if this is a bug so that I can submit a patch else let me know > what I am missing ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-14268: Attachment: HIVE-14268.3.patch > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.3.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14268) INSERT-OVERWRITE is not generating an INSERT event during hive replication
[ https://issues.apache.org/jira/browse/HIVE-14268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384376#comment-15384376 ] Sushanth Sowmyan commented on HIVE-14268: - Sounds good - reuploading .1.patch as .3.patch so the tests run on that. > INSERT-OVERWRITE is not generating an INSERT event during hive replication > -- > > Key: HIVE-14268 > URL: https://issues.apache.org/jira/browse/HIVE-14268 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Murali Ramasami >Assignee: Sushanth Sowmyan > Attachments: HIVE-14268.2.patch, HIVE-14268.patch > > > During Hive replication invoked from falcon, the source cluster did not > generate appropriate INSERT events associated with the INSERT OVERWRITE, > generating only an ALTER PARTITION event. However, an ALTER PARTITION is a > metadata-only event, and thus, only metadata changes were replicated across, > modifying the metadata of the destination, while not updating the data. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14278) Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4
[ https://issues.apache.org/jira/browse/HIVE-14278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15384362#comment-15384362 ] Ashutosh Chauhan commented on HIVE-14278: - +1 At some point we need to make changes in pom files so that we do not download junit3 jars. > Migrate TestHadoop23SAuthBridge.java from Unit3 to Unit4 > > > Key: HIVE-14278 > URL: https://issues.apache.org/jira/browse/HIVE-14278 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.2.0 >Reporter: Balint Molnar >Assignee: Balint Molnar >Priority: Minor > Fix For: 2.2.0 > > Attachments: HIVE-14278.patch > > > Migrate TestHadoop23SAuthBridge.java from unit3 to unit4 -- This message was sent by Atlassian JIRA (v6.3.4#6332)