[jira] [Commented] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
[ https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982369#comment-15982369 ] Rajesh Balamohan commented on HIVE-16353: - Thanks [~gopalv]. Patch lgtm. +1. > Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot > -- > > Key: HIVE-16353 > URL: https://issues.apache.org/jira/browse/HIVE-16353 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16353.patch > > > HIVE-16049 upgraded to jetty 9. It is committed to apache master which is > still 2.3.0-snapshot. This breaks couple of other components like LLAP and > ends up throwing the following error during runtime. > {noformat} > 2017-04-02T20:17:45,435 WARN [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP > Daemon with exception > java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule > at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) > ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) > [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > Caused by: java.lang.ClassNotFoundException: > org.eclipse.jetty.rewrite.handler.Rule > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77] > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77] > ... 7 more > 2017-04-02T20:17:45,441 INFO [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown > invoked > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified
[ https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982368#comment-15982368 ] Sahil Takiar commented on HIVE-15396: - [~pxiong] created an RB: https://reviews.apache.org/r/58691/ > Basic Stats are not collected when for managed tables with LOCATION specified > - > > Key: HIVE-15396 > URL: https://issues.apache.org/jira/browse/HIVE-15396 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, > HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, > HIVE-15396.6.patch, HIVE-15396.7.patch > > > Basic stats are not collected when a managed table is created with a > specified {{LOCATION}} clause. > {code} > 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int); > 0: jdbc:hive2://localhost:1> describe formatted hdfs_1; > +---++-+ > | col_name| data_type >| comment | > +---++-+ > | # col_name| data_type >| comment | > | | NULL >| NULL| > | col | int >| | > | | NULL >| NULL| > | # Detailed Table Information | NULL >| NULL| > | Database: | default >| NULL| > | Owner:| anonymous >| NULL| > | CreateTime: | Wed Mar 22 18:09:19 PDT 2017 >| NULL| > | LastAccessTime: | UNKNOWN >| NULL| > | Retention:| 0 >| NULL| > | Location: | file:/warehouse/hdfs_1 | NULL > | > | Table Type: | MANAGED_TABLE >| NULL| > | Table Parameters: | NULL >| NULL| > | | COLUMN_STATS_ACCURATE >| {\"BASIC_STATS\":\"true\"} | > | | numFiles >| 0 | > | | numRows >| 0 | > | | rawDataSize >| 0 | > | | totalSize >| 0 | > | | transient_lastDdlTime >| 1490231359 | > | | NULL >| NULL| > | # Storage Information | NULL >| NULL| > | SerDe Library:| > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL >| > | InputFormat: | org.apache.hadoop.mapred.TextInputFormat >| NULL| > | OutputFormat: | > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL >| > | Compressed: | No >| NULL| > | Num Buckets: | -1 >| NULL| > | Bucket Columns: | [] >| NULL| > | Sort Columns: | [] >| NULL| > | Storage Desc Params: | NULL >| NULL
[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified
[ https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982333#comment-15982333 ] Pengcheng Xiong commented on HIVE-15396: [~stakiar], could u create a review request? Thanks. > Basic Stats are not collected when for managed tables with LOCATION specified > - > > Key: HIVE-15396 > URL: https://issues.apache.org/jira/browse/HIVE-15396 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, > HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, > HIVE-15396.6.patch, HIVE-15396.7.patch > > > Basic stats are not collected when a managed table is created with a > specified {{LOCATION}} clause. > {code} > 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int); > 0: jdbc:hive2://localhost:1> describe formatted hdfs_1; > +---++-+ > | col_name| data_type >| comment | > +---++-+ > | # col_name| data_type >| comment | > | | NULL >| NULL| > | col | int >| | > | | NULL >| NULL| > | # Detailed Table Information | NULL >| NULL| > | Database: | default >| NULL| > | Owner:| anonymous >| NULL| > | CreateTime: | Wed Mar 22 18:09:19 PDT 2017 >| NULL| > | LastAccessTime: | UNKNOWN >| NULL| > | Retention:| 0 >| NULL| > | Location: | file:/warehouse/hdfs_1 | NULL > | > | Table Type: | MANAGED_TABLE >| NULL| > | Table Parameters: | NULL >| NULL| > | | COLUMN_STATS_ACCURATE >| {\"BASIC_STATS\":\"true\"} | > | | numFiles >| 0 | > | | numRows >| 0 | > | | rawDataSize >| 0 | > | | totalSize >| 0 | > | | transient_lastDdlTime >| 1490231359 | > | | NULL >| NULL| > | # Storage Information | NULL >| NULL| > | SerDe Library:| > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL >| > | InputFormat: | org.apache.hadoop.mapred.TextInputFormat >| NULL| > | OutputFormat: | > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL >| > | Compressed: | No >| NULL| > | Num Buckets: | -1 >| NULL| > | Bucket Columns: | [] >| NULL| > | Sort Columns: | [] >| NULL| > | Storage Desc Params: | NULL >| NULL
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982331#comment-15982331 ] Pengcheng Xiong commented on HIVE-16147: LGTM. +1 pending tests. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16445) enable Acid by default in the parent patch and run build bot
[ https://issues.apache.org/jira/browse/HIVE-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982323#comment-15982323 ] Hive QA commented on HIVE-16445: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864866/HIVE-16445.01.patch {color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 599 failed/errored test(s), 9092 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[acid_table_stats] (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[autoColumnStats_4] (batchId=12) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[insert_values_orig_table_use_metadata] (batchId=59) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock1] (batchId=7) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock2] (batchId=29) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock3] (batchId=50) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[lock4] (batchId=49) org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver[row__id] (batchId=73) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=137) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=138) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=139) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapCliDriver (batchId=141) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=142) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=143) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=144) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=145) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=146) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=147) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=148) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=149) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=150) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=151) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=152) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=153) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=154) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=155) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=156) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=157) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=158) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver (batchId=159) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver (batchId=165) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver (batchId=166) org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver (batchId=167) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver (batchId=96) org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver (batchId=97) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into1] (batchId=87) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into2] (batchId=88) org.apache.hadoop.hive.cli.TestNegativeCliDriver.testCliDriver[insert_into3] (batchId=87)
[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
[ https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16510: Attachment: HIVE-16510.02.patch > Vectorization: Add vectorized PTF tests in preparation for HIVE-16369 > - > > Key: HIVE-16510 > URL: https://issues.apache.org/jira/browse/HIVE-16510 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch > > > Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so > separating out adding vectorized versions of current windowing_*.q tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
[ https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16510: Attachment: (was: HIVE-16510.02.patch) > Vectorization: Add vectorized PTF tests in preparation for HIVE-16369 > - > > Key: HIVE-16510 > URL: https://issues.apache.org/jira/browse/HIVE-16510 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch > > > Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so > separating out adding vectorized versions of current windowing_*.q tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
[ https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16510: Status: Patch Available (was: In Progress) > Vectorization: Add vectorized PTF tests in preparation for HIVE-16369 > - > > Key: HIVE-16510 > URL: https://issues.apache.org/jira/browse/HIVE-16510 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch > > > Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so > separating out adding vectorized versions of current windowing_*.q tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16510) Vectorization: Add vectorized PTF tests in preparation for HIVE-16369
[ https://issues.apache.org/jira/browse/HIVE-16510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-16510: Attachment: HIVE-16510.02.patch > Vectorization: Add vectorized PTF tests in preparation for HIVE-16369 > - > > Key: HIVE-16510 > URL: https://issues.apache.org/jira/browse/HIVE-16510 > Project: Hive > Issue Type: Bug >Reporter: Matt McCline >Assignee: Matt McCline >Priority: Critical > Attachments: HIVE-16510.01.patch, HIVE-16510.02.patch > > > Had trouble with HIVE-16369 patch being blocked by Apache SPAM filters -- so > separating out adding vectorized versions of current windowing_*.q tests. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size
[ https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982262#comment-15982262 ] Prasanth Jayachandran commented on HIVE-16503: -- bq. Is SESSIONS_PER_DEFAULT_QUEUE guaranteed to be >= 1? The default value is 1 but it doesn't guard against values set by user. In .3 patch added guard against <=0 values and a unit test for the same. bq. Does it make sense to add range validators for the new settings (0, 1.0)? .1 patch had RangeValidator but then I removed it as >1.0 values are also valid. Something like 120% oversubscription is a valid scenario. However under subscription is not allowed and is guarded by the return value {code} return Math.max(maxSize, llapMaxSize); {code} where maxSize is the initial no conditional task size. > LLAP: Oversubscribe memory for noconditional task size > -- > > Key: HIVE-16503 > URL: https://issues.apache.org/jira/browse/HIVE-16503 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, > HIVE-16503.3.patch > > > When running map joins in llap, it can potentially use more memory for hash > table loading (assuming other executors in the daemons have some memory to > spare). This map join conversion decision has to be made during compilation > that can provide some more room for LLAP. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size
[ https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-16503: - Attachment: HIVE-16503.3.patch > LLAP: Oversubscribe memory for noconditional task size > -- > > Key: HIVE-16503 > URL: https://issues.apache.org/jira/browse/HIVE-16503 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch, > HIVE-16503.3.patch > > > When running map joins in llap, it can potentially use more memory for hash > table loading (assuming other executors in the daemons have some memory to > spare). This map join conversion decision has to be made during compilation > that can provide some more room for LLAP. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
[ https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16524: - Description: The Id attribute is defined in w3c as follows: 1.The id attribute specifies the unique id of the HTML element. 2.Id must be unique in the HTML document. 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or by CSS to change or add a style to an element with the specified id. But,the "id='attributes_table'" in hiveserver2.jsp and QueryProfileTmpl.jamon: 1.Not quoted by any css and js 2.It has the same id attribute name on the same page So I suggest removing this id attribute definition,Please Check It. > Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon > > > Key: HIVE-16524 > URL: https://issues.apache.org/jira/browse/HIVE-16524 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16524.1.patch > > > The Id attribute is defined in w3c as follows: > 1.The id attribute specifies the unique id of the HTML element. > 2.Id must be unique in the HTML document. > 3.The id attribute can be used as a link anchor, by JavaScript (HTML DOM) or > by CSS to change or add a style to an element with the specified id. > But,the "id='attributes_table'" in hiveserver2.jsp and > QueryProfileTmpl.jamon: > 1.Not quoted by any css and js > 2.It has the same id attribute name on the same page > So I suggest removing this id attribute definition,Please Check It. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good
[ https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982241#comment-15982241 ] Sergey Shelukhin edited comment on HIVE-16523 at 4/25/17 1:25 AM: -- [~gopalv] [~mmccline] do you mind taking a look? With this patch, I see the collisions in Q58 reduced to 1-2 elements per hash code (from 400-1200 for the worst codes before). was (Author: sershe): [~gopalv] [~mmccline] do you mind taking a look? > VectorHashKeyWrapper hash code for strings is not so good > - > > Key: HIVE-16523 > URL: https://issues.apache.org/jira/browse/HIVE-16523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16523.patch > > > Perf issues in vectorized gby on some string keys -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
[ https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16524: - Status: Patch Available (was: Open) > Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon > > > Key: HIVE-16524 > URL: https://issues.apache.org/jira/browse/HIVE-16524 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16524.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size
[ https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982242#comment-15982242 ] Gunther Hagleitner commented on HIVE-16503: --- LGTM +1. Some smaller questions: - Is SESSIONS_PER_DEFAULT_QUEUE guaranteed to be >= 1? - Does it make sense to add range validators for the new settings (0, 1.0)? > LLAP: Oversubscribe memory for noconditional task size > -- > > Key: HIVE-16503 > URL: https://issues.apache.org/jira/browse/HIVE-16503 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch > > > When running map joins in llap, it can potentially use more memory for hash > table loading (assuming other executors in the daemons have some memory to > spare). This map join conversion decision has to be made during compilation > that can provide some more room for LLAP. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
[ https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin updated HIVE-16524: - Attachment: HIVE-16524.1.patch > Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon > > > Key: HIVE-16524 > URL: https://issues.apache.org/jira/browse/HIVE-16524 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > Attachments: HIVE-16524.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good
[ https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16523: Attachment: HIVE-16523.patch > VectorHashKeyWrapper hash code for strings is not so good > - > > Key: HIVE-16523 > URL: https://issues.apache.org/jira/browse/HIVE-16523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16523.patch > > > Perf issues in vectorized gby on some string keys -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good
[ https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-16523: Status: Patch Available (was: Open) [~gopalv] [~mmccline] do you mind taking a look? > VectorHashKeyWrapper hash code for strings is not so good > - > > Key: HIVE-16523 > URL: https://issues.apache.org/jira/browse/HIVE-16523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-16523.patch > > > Perf issues in vectorized gby on some string keys -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16524) Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon
[ https://issues.apache.org/jira/browse/HIVE-16524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZhangBing Lin reassigned HIVE-16524: > Remove the redundant item type in hiveserver2.jsp and QueryProfileTmpl.jamon > > > Key: HIVE-16524 > URL: https://issues.apache.org/jira/browse/HIVE-16524 > Project: Hive > Issue Type: Bug >Affects Versions: 3.0.0 >Reporter: ZhangBing Lin >Assignee: ZhangBing Lin >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16523) VectorHashKeyWrapper hash code for strings is not so good
[ https://issues.apache.org/jira/browse/HIVE-16523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-16523: --- > VectorHashKeyWrapper hash code for strings is not so good > - > > Key: HIVE-16523 > URL: https://issues.apache.org/jira/browse/HIVE-16523 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > Perf issues in vectorized gby on some string keys -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-11420) add support for "set autocommit"
[ https://issues.apache.org/jira/browse/HIVE-11420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982201#comment-15982201 ] Eugene Koifman commented on HIVE-11420: --- Currently (as of HIVE-12636), the system will recognize "set autocommit " command but it has no effect. This should be supported at SessionState level probably to match JDBC semantics. > add support for "set autocommit" > > > Key: HIVE-11420 > URL: https://issues.apache.org/jira/browse/HIVE-11420 > Project: Hive > Issue Type: Sub-task > Components: CLI, Transactions >Affects Versions: 1.3.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > > HIVE-11077 add support for "set autocommit true/false". > should add support for "set autocommit" to return the current value. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16503) LLAP: Oversubscribe memory for noconditional task size
[ https://issues.apache.org/jira/browse/HIVE-16503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982193#comment-15982193 ] Prasanth Jayachandran commented on HIVE-16503: -- test failures are unrelated to the patch. accumulo_index.q and accumulo_index.q are already failing in master. skewjoinopt1.q passes locally for me which is probably flaky. I will trigger another test run to make sure anyways. > LLAP: Oversubscribe memory for noconditional task size > -- > > Key: HIVE-16503 > URL: https://issues.apache.org/jira/browse/HIVE-16503 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16503.1.patch, HIVE-16503.2.patch > > > When running map joins in llap, it can potentially use more memory for hash > table loading (assuming other executors in the daemons have some memory to > spare). This map join conversion decision has to be made during compilation > that can provide some more room for LLAP. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16445) enable Acid by default in the parent patch and run build bot
[ https://issues.apache.org/jira/browse/HIVE-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16445: -- Attachment: HIVE-16445.01.patch set hive.support.concurrency=true hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager > enable Acid by default in the parent patch and run build bot > > > Key: HIVE-16445 > URL: https://issues.apache.org/jira/browse/HIVE-16445 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16445.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16445) enable Acid by default in the parent patch and run build bot
[ https://issues.apache.org/jira/browse/HIVE-16445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-16445: -- Status: Patch Available (was: Open) > enable Acid by default in the parent patch and run build bot > > > Key: HIVE-16445 > URL: https://issues.apache.org/jira/browse/HIVE-16445 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-16445.01.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction
[ https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982186#comment-15982186 ] Eugene Koifman commented on HIVE-12636: --- no related failures for HIVE-12636.17.patch > Ensure that all queries (with DbTxnManager) run in a transaction > > > Key: HIVE-12636 > URL: https://issues.apache.org/jira/browse/HIVE-12636 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, > HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, > HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, > HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, > HIVE-12636.17.patch > > > Assuming Hive is using DbTxnManager > Currently (as of this writing only auto commit mode is supported), only > queries that write to an Acid table start a transaction. > Read-only queries don't open a txn but still acquire locks. > This makes internal structures confusing/odd. > The are constantly 2 code paths to deal with which is inconvenient and error > prone. > Also, a txn id is convenient "handle" for all locks/resources within a txn. > Doing thing would mean the client no longer needs to track locks that it > acquired. This enables further improvements to metastore side of Acid. > # add metastore call to openTxn() and acquireLocks() in a single call. this > it to make sure perf doesn't degrade for read-only query. (Would also be > useful for auto commit write queries) > # Should RO queries generate txn ids from the same sequence? (they could for > example use negative values of a different sequence). Txnid is part of the > delta/base file name. Currently it's 7 digits. If we use the same sequence, > we'll exceed 7 digits faster. (possible upgrade issue). On the other hand > there is value in being able to pick txn id and commit timestamp out of the > same logical sequence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16399) create an index for tc_txnid in TXN_COMPONENTS
[ https://issues.apache.org/jira/browse/HIVE-16399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982182#comment-15982182 ] Eugene Koifman commented on HIVE-16399: --- The master patch contains upgrade-2.1.0-to-2.2.0.sql files but there is no branch-2.2 patch. Shouldn't it match? otherwise LGTM [~wzheng] yes, I think if it's still possible to add it to 2.3 it would be good > create an index for tc_txnid in TXN_COMPONENTS > -- > > Key: HIVE-16399 > URL: https://issues.apache.org/jira/browse/HIVE-16399 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Wei Zheng > Attachments: HIVE-16399.branch-2.3.patch, HIVE-16399.branch-2.patch, > HIVE-16399.master.patch > > > w/o this TxnStore.cleanEmptyAbortedTxns() can be very slow -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction
[ https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982170#comment-15982170 ] Hive QA commented on HIVE-12636: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12864848/HIVE-12636.17.patch {color:green}SUCCESS:{color} +1 due to 11 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 10620 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestAccumuloCliDriver.testCliDriver[accumulo_index] (batchId=225) org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[dynamic_semijoin_user_level] (batchId=140) org.apache.hadoop.hive.cli.TestMiniLlapLocalCliDriver.testCliDriver[vector_if_expr] (batchId=143) {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/4859/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/4859/console Test logs: http://104.198.109.242/logs/PreCommit-HIVE-Build-4859/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12864848 - PreCommit-HIVE-Build > Ensure that all queries (with DbTxnManager) run in a transaction > > > Key: HIVE-12636 > URL: https://issues.apache.org/jira/browse/HIVE-12636 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, > HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, > HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, > HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, > HIVE-12636.17.patch > > > Assuming Hive is using DbTxnManager > Currently (as of this writing only auto commit mode is supported), only > queries that write to an Acid table start a transaction. > Read-only queries don't open a txn but still acquire locks. > This makes internal structures confusing/odd. > The are constantly 2 code paths to deal with which is inconvenient and error > prone. > Also, a txn id is convenient "handle" for all locks/resources within a txn. > Doing thing would mean the client no longer needs to track locks that it > acquired. This enables further improvements to metastore side of Acid. > # add metastore call to openTxn() and acquireLocks() in a single call. this > it to make sure perf doesn't degrade for read-only query. (Would also be > useful for auto commit write queries) > # Should RO queries generate txn ids from the same sequence? (they could for > example use negative values of a different sequence). Txnid is part of the > delta/base file name. Currently it's 7 digits. If we use the same sequence, > we'll exceed 7 digits faster. (possible upgrade issue). On the other hand > there is value in being able to pick txn id and commit timestamp out of the > same logical sequence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16520: -- Status: Patch Available (was: Open) > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-proto-2.patch, HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16520: -- Attachment: HIVE-16520-proto-2.patch Triggering a UT test. > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-proto-2.patch, HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16213) ObjectStore can leak Queries when rollbackTransaction throws an exception
[ https://issues.apache.org/jira/browse/HIVE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar updated HIVE-16213: --- Attachment: HIVE-16213.04.patch Attaching the patch again to trigger QA > ObjectStore can leak Queries when rollbackTransaction throws an exception > - > > Key: HIVE-16213 > URL: https://issues.apache.org/jira/browse/HIVE-16213 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Alexander Kolbasov >Assignee: Vihang Karajgaonkar > Attachments: HIVE-16213.01.patch, HIVE-16213.02.patch, > HIVE-16213.03.patch, HIVE-16213.04.patch > > > In ObjectStore.java there are a few places with the code similar to: > {code} > Query query = null; > try { > openTransaction(); > query = pm.newQuery(Something.class); > ... > commited = commitTransaction(); > } finally { > if (!commited) { > rollbackTransaction(); > } > if (query != null) { > query.closeAll(); > } > } > {code} > The problem is that rollbackTransaction() may throw an exception in which > case query.closeAll() wouldn't be executed. > The fix would be to wrap rollbackTransaction in its own try-catch block. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction
[ https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12636: -- Attachment: (was: HIVE-12636.16.patch) > Ensure that all queries (with DbTxnManager) run in a transaction > > > Key: HIVE-12636 > URL: https://issues.apache.org/jira/browse/HIVE-12636 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, > HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, > HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, > HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, > HIVE-12636.17.patch > > > Assuming Hive is using DbTxnManager > Currently (as of this writing only auto commit mode is supported), only > queries that write to an Acid table start a transaction. > Read-only queries don't open a txn but still acquire locks. > This makes internal structures confusing/odd. > The are constantly 2 code paths to deal with which is inconvenient and error > prone. > Also, a txn id is convenient "handle" for all locks/resources within a txn. > Doing thing would mean the client no longer needs to track locks that it > acquired. This enables further improvements to metastore side of Acid. > # add metastore call to openTxn() and acquireLocks() in a single call. this > it to make sure perf doesn't degrade for read-only query. (Would also be > useful for auto commit write queries) > # Should RO queries generate txn ids from the same sequence? (they could for > example use negative values of a different sequence). Txnid is part of the > delta/base file name. Currently it's 7 digits. If we use the same sequence, > we'll exceed 7 digits faster. (possible upgrade issue). On the other hand > there is value in being able to pick txn id and commit timestamp out of the > same logical sequence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction
[ https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12636: -- Attachment: HIVE-12636.17.patch [~wzheng], could you review please > Ensure that all queries (with DbTxnManager) run in a transaction > > > Key: HIVE-12636 > URL: https://issues.apache.org/jira/browse/HIVE-12636 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, > HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, > HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, > HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, > HIVE-12636.16.patch, HIVE-12636.17.patch > > > Assuming Hive is using DbTxnManager > Currently (as of this writing only auto commit mode is supported), only > queries that write to an Acid table start a transaction. > Read-only queries don't open a txn but still acquire locks. > This makes internal structures confusing/odd. > The are constantly 2 code paths to deal with which is inconvenient and error > prone. > Also, a txn id is convenient "handle" for all locks/resources within a txn. > Doing thing would mean the client no longer needs to track locks that it > acquired. This enables further improvements to metastore side of Acid. > # add metastore call to openTxn() and acquireLocks() in a single call. this > it to make sure perf doesn't degrade for read-only query. (Would also be > useful for auto commit write queries) > # Should RO queries generate txn ids from the same sequence? (they could for > example use negative values of a different sequence). Txnid is part of the > delta/base file name. Currently it's 7 digits. If we use the same sequence, > we'll exceed 7 digits faster. (possible upgrade issue). On the other hand > there is value in being able to pick txn id and commit timestamp out of the > same logical sequence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16354) Modularization efforts - change some dependencies to smaller client/api modules
[ https://issues.apache.org/jira/browse/HIVE-16354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-16354: Status: Patch Available (was: Open) seems like there were some ptest problems...instead of splitting all of my current changes; and triggering those...I send the whole package for test purposes > Modularization efforts - change some dependencies to smaller client/api > modules > --- > > Key: HIVE-16354 > URL: https://issues.apache.org/jira/browse/HIVE-16354 > Project: Hive > Issue Type: Improvement > Components: Metastore, Server Infrastructure >Reporter: Zoltan Haindrich > Attachments: allinwonder.1.patch > > > in HIVE-16214 I've identified some pieces which might be good to move to new > modules...since that I've looked into it a bit more what could be done in > this aspect...and to prevent going backward in this path; or get stuck at > some point - I would like to be able to propose smaller changes prior to > creating any modules... > The goal here is to remove the unneeded dependencies from the modules which > doesn't necessarily need them: the biggest fish in this tank is the {{jdbc}} > module, which currently ships with full hiveserver server side + all of the > ql codes + the whole metastore (including the jpa persistence libs) - this > makes the jdbc driver a really fat jar... > These changes will also reduce the hive binary distribution size; introducing > service-client have reduce it by 20% percent alone. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16354) Modularization efforts - change some dependencies to smaller client/api modules
[ https://issues.apache.org/jira/browse/HIVE-16354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-16354: Attachment: allinwonder.1.patch > Modularization efforts - change some dependencies to smaller client/api > modules > --- > > Key: HIVE-16354 > URL: https://issues.apache.org/jira/browse/HIVE-16354 > Project: Hive > Issue Type: Improvement > Components: Metastore, Server Infrastructure >Reporter: Zoltan Haindrich > Attachments: allinwonder.1.patch > > > in HIVE-16214 I've identified some pieces which might be good to move to new > modules...since that I've looked into it a bit more what could be done in > this aspect...and to prevent going backward in this path; or get stuck at > some point - I would like to be able to propose smaller changes prior to > creating any modules... > The goal here is to remove the unneeded dependencies from the modules which > doesn't necessarily need them: the biggest fish in this tank is the {{jdbc}} > module, which currently ships with full hiveserver server side + all of the > ql codes + the whole metastore (including the jpa persistence libs) - this > makes the jdbc driver a really fat jar... > These changes will also reduce the hive binary distribution size; introducing > service-client have reduce it by 20% percent alone. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982021#comment-15982021 ] Chaoyu Tang commented on HIVE-16147: Patch has been uploaded to RB. [~pxiong], could you help to review it. Thanks. > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-12614) RESET command does not close spark session
[ https://issues.apache.org/jira/browse/HIVE-12614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15982020#comment-15982020 ] Xuefu Zhang commented on HIVE-12614: +1 > RESET command does not close spark session > -- > > Key: HIVE-12614 > URL: https://issues.apache.org/jira/browse/HIVE-12614 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 1.3.0, 2.1.0 >Reporter: Nemon Lou >Assignee: Sahil Takiar >Priority: Minor > Attachments: HIVE-12614.1.patch, HIVE-12614.2.patch, > HIVE-12614.3.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981915#comment-15981915 ] Mike Fagan edited comment on HIVE-15795 at 4/24/17 10:23 PM: - Patch to fix Accumulo integration test failures attached to the ticket was (Author: faganm): Patch to fix Accumulo integration test failures. > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, > HIVE-15795.3.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16080) Add parquet to possible values for hive.default.fileformat and hive.default.fileformat.managed
[ https://issues.apache.org/jira/browse/HIVE-16080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981941#comment-15981941 ] Sahil Takiar commented on HIVE-16080: - This patch is going into the 2.3 release, not 2.2 I've updated the wiki for both of these config keys and added in the version info. > Add parquet to possible values for hive.default.fileformat and > hive.default.fileformat.managed > -- > > Key: HIVE-16080 > URL: https://issues.apache.org/jira/browse/HIVE-16080 > Project: Hive > Issue Type: Bug >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-16080.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16484) Investigate SparkLauncher for HoS as alternative to bin/spark-submit
[ https://issues.apache.org/jira/browse/HIVE-16484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-16484: Attachment: HIVE-16484.4.patch > Investigate SparkLauncher for HoS as alternative to bin/spark-submit > > > Key: HIVE-16484 > URL: https://issues.apache.org/jira/browse/HIVE-16484 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-16484.1.patch, HIVE-16484.2.patch, > HIVE-16484.3.patch, HIVE-16484.4.patch > > > The {{SparkClientImpl#startDriver}} currently looks for the {{SPARK_HOME}} > directory and invokes the {{bin/spark-submit}} script, which spawns a > separate process to run the Spark application. > {{SparkLauncher}} was added in SPARK-4924 and is a programatic way to launch > Spark applications. > I see a few advantages: > * No need to spawn a separate process to launch a HoS --> lower startup time > * Simplifies the code in {{SparkClientImpl}} --> easier to debug > * {{SparkLauncher#startApplication}} returns a {{SparkAppHandle}} which > contains some useful utilities for querying the state of the Spark job > ** It also allows the launcher to specify a list of job listeners -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Fagan updated HIVE-15795: -- Attachment: HIVE-15795.3.patch Patch to fix Accumulo integration test failures. > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch, > HIVE-15795.3.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Comment Edited] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981911#comment-15981911 ] Mike Fagan edited comment on HIVE-15795 at 4/24/17 8:59 PM: To address failures in integration tests was (Author: faganm): To address failures in inegration tests > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Reopened] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Fagan reopened HIVE-15795: --- To address failures in inegration tests > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16346: Resolution: Fixed Fix Version/s: 2.4.0 Status: Resolved (was: Patch Available) pushed to branch-2. Thanks Sahil. > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Fix For: 2.4.0 > > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16497) FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file system operations should be impersonated
[ https://issues.apache.org/jira/browse/HIVE-16497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981848#comment-15981848 ] Sushanth Sowmyan commented on HIVE-16497: - +1, LGTM. > FileUtils. isActionPermittedForFileHierarchy, isOwnerOfFileHierarchy file > system operations should be impersonated > -- > > Key: HIVE-16497 > URL: https://issues.apache.org/jira/browse/HIVE-16497 > Project: Hive > Issue Type: Bug > Components: Authorization >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Fix For: 3.0.0 > > Attachments: HIVE-16497.1.patch, HIVE-16497.2.patch > > > FileUtils.isActionPermittedForFileHierarchy checks if user has permissions > for given action. The checks are made by impersonating the user. > However, the listing of child dirs are done as the hiveserver2 user. If the > hive user doesn't have permissions on the filesystem, it gives incorrect > error that the user doesn't have permissions to perform the action. > Impersonating the end user for all file operations in that function is also > logically correct thing to do. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-10865) Beeline needs to support DELIMITER command
[ https://issues.apache.org/jira/browse/HIVE-10865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981843#comment-15981843 ] Sahil Takiar commented on HIVE-10865: - [~ctang.ma], [~ngangam], [~ychena] could someone take a look? > Beeline needs to support DELIMITER command > -- > > Key: HIVE-10865 > URL: https://issues.apache.org/jira/browse/HIVE-10865 > Project: Hive > Issue Type: Bug > Components: Beeline >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Sahil Takiar > Attachments: HIVE-10865.1.patch, HIVE-10865.2.patch, > HIVE-10865.3.patch, HIVE-10865.4.patch, HIVE-10865.5.patch > > > MySQL Client provides a DELIMITER command to set statement delimiter. > Beeline needs to support a similar command to allow commands having > semi-colon as non-statement delimiter (as with MySQL stored procedures). This > is a follow-up jira for HIVE-10659 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16277) Exchange Partition between filesystems throws "IllegalArgumentException Wrong FS"
[ https://issues.apache.org/jira/browse/HIVE-16277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar updated HIVE-16277: Status: Open (was: Patch Available) > Exchange Partition between filesystems throws "IllegalArgumentException Wrong > FS" > - > > Key: HIVE-16277 > URL: https://issues.apache.org/jira/browse/HIVE-16277 > Project: Hive > Issue Type: Bug >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-16277.1.patch, HIVE-16277.2.patch, > HIVE-16277.3.patch, HIVE-16277.4.patch > > > The following query: {{alter table s3_tbl exchange partition (country='USA') > with table hdfs_tbl}} fails with the following exception: > {code} > Error: org.apache.hive.service.cli.HiveSQLException: Error while processing > statement: FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: > java.lang.IllegalArgumentException Wrong FS: > s3a://[bucket]/table/country=USA, expected: file:///) > at > org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:379) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:256) > at > org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:347) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at > org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:361) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: > MetaException(message:Got exception: java.lang.IllegalArgumentException Wrong > FS: s3a://[bucket]/table/country=USA, expected: file:///) > at > org.apache.hadoop.hive.ql.metadata.Hive.exchangeTablePartitions(Hive.java:3553) > at > org.apache.hadoop.hive.ql.exec.DDLTask.exchangeTablePartition(DDLTask.java:4691) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:570) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2182) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1838) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1525) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1236) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1231) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:254) > ... 11 more > Caused by: MetaException(message:Got exception: > java.lang.IllegalArgumentException Wrong FS: > s3a://[bucket]/table/country=USA, expected: file:///) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1387) > at > org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:208) > at > org.apache.hadoop.hive.metastore.Warehouse.renameDir(Warehouse.java:200) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.exchange_partitions(HiveMetaStore.java:2967) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:148) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:107) > at com.sun.proxy.$Proxy28.exchange_partitions(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.exchange_partitions(HiveMetaStoreClient.java:690) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >
[jira] [Commented] (HIVE-15396) Basic Stats are not collected when for managed tables with LOCATION specified
[ https://issues.apache.org/jira/browse/HIVE-15396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981840#comment-15981840 ] Sahil Takiar commented on HIVE-15396: - [~pxiong] wanted to see if we can still get this patch in. Let me know what you think of the most recent patch. To summarize: * The patch added basic stats collection for table with a {{LOCATION}} specified, but only if the specified location is empty and the table is not an external table * This should be useful when running on blobstores such as S3, where users commonly specify an explicit {{LOCATION}} clause Thanks for spending the time to look at this! > Basic Stats are not collected when for managed tables with LOCATION specified > - > > Key: HIVE-15396 > URL: https://issues.apache.org/jira/browse/HIVE-15396 > Project: Hive > Issue Type: Bug > Components: Statistics >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-15396.1.patch, HIVE-15396.2.patch, > HIVE-15396.3.patch, HIVE-15396.4.patch, HIVE-15396.5.patch, > HIVE-15396.6.patch, HIVE-15396.7.patch > > > Basic stats are not collected when a managed table is created with a > specified {{LOCATION}} clause. > {code} > 0: jdbc:hive2://localhost:1> create table hdfs_1 (col int); > 0: jdbc:hive2://localhost:1> describe formatted hdfs_1; > +---++-+ > | col_name| data_type >| comment | > +---++-+ > | # col_name| data_type >| comment | > | | NULL >| NULL| > | col | int >| | > | | NULL >| NULL| > | # Detailed Table Information | NULL >| NULL| > | Database: | default >| NULL| > | Owner:| anonymous >| NULL| > | CreateTime: | Wed Mar 22 18:09:19 PDT 2017 >| NULL| > | LastAccessTime: | UNKNOWN >| NULL| > | Retention:| 0 >| NULL| > | Location: | file:/warehouse/hdfs_1 | NULL > | > | Table Type: | MANAGED_TABLE >| NULL| > | Table Parameters: | NULL >| NULL| > | | COLUMN_STATS_ACCURATE >| {\"BASIC_STATS\":\"true\"} | > | | numFiles >| 0 | > | | numRows >| 0 | > | | rawDataSize >| 0 | > | | totalSize >| 0 | > | | transient_lastDdlTime >| 1490231359 | > | | NULL >| NULL| > | # Storage Information | NULL >| NULL| > | SerDe Library:| > org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL >| > | InputFormat: | org.apache.hadoop.mapred.TextInputFormat >| NULL| > | OutputFormat: | > org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat | NULL >| > | Compressed: | No >| NULL| > | Num Buckets: | -1
[jira] [Commented] (HIVE-16346) inheritPerms should be conditional based on the target filesystem
[ https://issues.apache.org/jira/browse/HIVE-16346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981835#comment-15981835 ] Sahil Takiar commented on HIVE-16346: - [~aihuaxu] can this be merged? > inheritPerms should be conditional based on the target filesystem > - > > Key: HIVE-16346 > URL: https://issues.apache.org/jira/browse/HIVE-16346 > Project: Hive > Issue Type: Sub-task >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Attachments: HIVE-16346.1-branch-2.patch, > HIVE-16346.2-branch-2.patch, HIVE-16346.3-branch-2.patch > > > Right now, a lot of the logic in {{Hive.java}} attempts to set permissions of > different files that have been moved / copied. This is only triggered if > {{hive.warehouse.subdir.inherit.perms}} is set to true. > However, on blobstores such as S3, there is no concept of file permissions so > these calls are unnecessary, which can hurt performance. > One solution would be to set {{hive.warehouse.subdir.inherit.perms}} to > false, but this would be a global change that affects an entire HS2 instance. > So HDFS tables will no longer have permissions inheritance. > A better solution would be to make the inheritance of permissions conditional > on the target filesystem. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14864) Distcp is not called from MoveTask when src is a directory
[ https://issues.apache.org/jira/browse/HIVE-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981833#comment-15981833 ] Sahil Takiar commented on HIVE-14864: - I updated the documentation for this here: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution > Distcp is not called from MoveTask when src is a directory > -- > > Key: HIVE-14864 > URL: https://issues.apache.org/jira/browse/HIVE-14864 > Project: Hive > Issue Type: Bug >Reporter: Vihang Karajgaonkar >Assignee: Sahil Takiar > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14864.1.patch, HIVE-14864.2.patch, > HIVE-14864.3.patch, HIVE-14864.4.patch, HIVE-14864.patch > > > In FileUtils.java the following code does not get executed even when src > directory size is greater than HIVE_EXEC_COPYFILE_MAXSIZE because > srcFS.getFileStatus(src).getLen() returns 0 when src is a directory. We > should use srcFS.getContentSummary(src).getLength() instead. > {noformat} > /* Run distcp if source file/dir is too big */ > if (srcFS.getUri().getScheme().equals("hdfs") && > srcFS.getFileStatus(src).getLen() > > conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE)) { > LOG.info("Source is " + srcFS.getFileStatus(src).getLen() + " bytes. > (MAX: " + conf.getLongVar(HiveConf.ConfVars.HIVE_EXEC_COPYFILE_MAXSIZE) + > ")"); > LOG.info("Launch distributed copy (distcp) job."); > HiveConfUtil.updateJobCredentialProviders(conf); > copied = shims.runDistCp(src, dst, conf); > if (copied && deleteSource) { > srcFS.delete(src, true); > } > } > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-8750) Commit initial encryption work
[ https://issues.apache.org/jira/browse/HIVE-8750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981832#comment-15981832 ] Sahil Takiar commented on HIVE-8750: I updated the documentation for this here: https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-QueryandDDLExecution > Commit initial encryption work > -- > > Key: HIVE-8750 > URL: https://issues.apache.org/jira/browse/HIVE-8750 > Project: Hive > Issue Type: Sub-task >Reporter: Brock Noland >Assignee: Sergio Peña > Labels: TODOC15 > Fix For: encryption-branch, 1.1.0 > > Attachments: HIVE-8750.1.patch > > > I believe Sergio has some work done for encryption. In this item we'll commit > it to branch. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-12636) Ensure that all queries (with DbTxnManager) run in a transaction
[ https://issues.apache.org/jira/browse/HIVE-12636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12636: -- Attachment: HIVE-12636.16.patch > Ensure that all queries (with DbTxnManager) run in a transaction > > > Key: HIVE-12636 > URL: https://issues.apache.org/jira/browse/HIVE-12636 > Project: Hive > Issue Type: Improvement > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Critical > Attachments: HIVE-12636.01.patch, HIVE-12636.02.patch, > HIVE-12636.03.patch, HIVE-12636.04.patch, HIVE-12636.05.patch, > HIVE-12636.06.patch, HIVE-12636.07.patch, HIVE-12636.09.patch, > HIVE-12636.10.patch, HIVE-12636.12.patch, HIVE-12636.13.patch, > HIVE-12636.16.patch > > > Assuming Hive is using DbTxnManager > Currently (as of this writing only auto commit mode is supported), only > queries that write to an Acid table start a transaction. > Read-only queries don't open a txn but still acquire locks. > This makes internal structures confusing/odd. > The are constantly 2 code paths to deal with which is inconvenient and error > prone. > Also, a txn id is convenient "handle" for all locks/resources within a txn. > Doing thing would mean the client no longer needs to track locks that it > acquired. This enables further improvements to metastore side of Acid. > # add metastore call to openTxn() and acquireLocks() in a single call. this > it to make sure perf doesn't degrade for read-only query. (Would also be > useful for auto commit write queries) > # Should RO queries generate txn ids from the same sequence? (they could for > example use negative values of a different sequence). Txnid is part of the > delta/base file name. Currently it's 7 digits. If we use the same sequence, > we'll exceed 7 digits faster. (possible upgrade issue). On the other hand > there is value in being able to pick txn id and commit timestamp out of the > same logical sequence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14170) Beeline IncrementalRows should buffer rows and incrementally re-calculate width if TableOutputFormat is used
[ https://issues.apache.org/jira/browse/HIVE-14170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981784#comment-15981784 ] Sahil Takiar commented on HIVE-14170: - This didn't go into the 2.2 release, seems its going into the 2.3 release. I've updated the wiki: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions > Beeline IncrementalRows should buffer rows and incrementally re-calculate > width if TableOutputFormat is used > > > Key: HIVE-14170 > URL: https://issues.apache.org/jira/browse/HIVE-14170 > Project: Hive > Issue Type: Sub-task > Components: Beeline >Reporter: Sahil Takiar >Assignee: Sahil Takiar > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-14170.1.patch, HIVE-14170.2.patch, > HIVE-14170.3.patch, HIVE-14170.4.patch > > > If {{--incremental}} is specified in Beeline, rows are meant to be printed > out immediately. However, if {{TableOutputFormat}} is used with this option > the formatting can look really off. > The reason is that {{IncrementalRows}} does not do a global calculation of > the optimal width size for {{TableOutputFormat}} (it can't because it only > sees one row at a time). The output of {{BufferedRows}} looks much better > because it can do this global calculation. > If {{--incremental}} is used, and {{TableOutputFormat}} is used, the width > should be re-calculated every "x" rows ("x" can be configurable and by > default it can be 1000). -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-7224) Set incremental printing to true by default in Beeline
[ https://issues.apache.org/jira/browse/HIVE-7224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981770#comment-15981770 ] Sahil Takiar commented on HIVE-7224: This didn't go into the 2.2 release, seems its going into the 2.3 release. I've updated the wiki to reflect this: https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients#HiveServer2Clients-BeelineCommandOptions > Set incremental printing to true by default in Beeline > -- > > Key: HIVE-7224 > URL: https://issues.apache.org/jira/browse/HIVE-7224 > Project: Hive > Issue Type: Bug > Components: Beeline, Clients, JDBC >Affects Versions: 0.13.0, 1.0.0, 1.2.0, 1.1.0 >Reporter: Vaibhav Gumashta >Assignee: Sahil Takiar > Labels: TODOC2.2 > Fix For: 2.2.0 > > Attachments: HIVE-7224.1.patch, HIVE-7224.2.patch, HIVE-7224.2.patch, > HIVE-7224.3.patch, HIVE-7224.4.patch, HIVE-7224.5.patch > > > See HIVE-7221. > By default beeline tries to buffer the entire output relation before printing > it on stdout. This can cause OOM when the output relation is large. However, > beeline has the option of incremental prints. We should keep that as the > default. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-16147: --- Attachment: HIVE-16147.patch The patch is to: 1. preserve the column stats in a partitioned table rename 2. since the column stats are no more invalidated during a table rename, I renamed the alter_table_invalidate_column_stats.q to alter_table_column_stats.q > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16147) Rename a partitioned table should not drop its partition columns stats
[ https://issues.apache.org/jira/browse/HIVE-16147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-16147: --- Status: Patch Available (was: Open) > Rename a partitioned table should not drop its partition columns stats > -- > > Key: HIVE-16147 > URL: https://issues.apache.org/jira/browse/HIVE-16147 > Project: Hive > Issue Type: Bug >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-16147.patch > > > When a partitioned table (e.g. sample_pt) is renamed (e.g to > sample_pt_rename), describing its partition shows that the partition column > stats are still accurate, but actually they all have been dropped. > It could be reproduce as following: > 1. analyze table sample_pt compute statistics for columns; > 2. describe formatted default.sample_pt partition (dummy = 3): COLUMN_STATS > for all columns are true > {code} > ... > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: file:/user/hive/warehouse/apache/sample_pt/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > ... > {code} > 3: describe formatted default.sample_pt partition (dummy = 3) salary: column > stats exists > {code} > # col_namedata_type min > max num_nulls distinct_count > avg_col_len max_col_len num_trues > num_falses comment > > > salaryint 1 151370 > 0 94 > > from deserializer > {code} > 4. alter table sample_pt rename to sample_pt_rename; > 5. describe formatted default.sample_pt_rename partition (dummy = 3): > describe the rename table partition (dummy =3) shows that COLUMN_STATS for > columns are still true. > {code} > # Detailed Partition Information > Partition Value: [3] > Database: default > Table:sample_pt_rename > CreateTime: Fri Jan 20 15:42:30 EST 2017 > LastAccessTime: UNKNOWN > Location: > file:/user/hive/warehouse/apache/sample_pt_rename/dummy=3 > Partition Parameters: > COLUMN_STATS_ACCURATE > {\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"code\":\"true\",\"description\":\"true\",\"salary\":\"true\",\"total_emp\":\"true\"}} > last_modified_byctang > last_modified_time 1485217063 > numFiles1 > numRows 100 > rawDataSize 5143 > totalSize 5243 > transient_lastDdlTime 1488842358 > {code} > describe formatted default.sample_pt_rename partition (dummy = 3) salary: the > column stats have been dropped. > {code} > # col_namedata_type comment > > > > salaryint from deserializer > > Time taken: 0.131 seconds, Fetched: 3 row(s) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution
[ https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981720#comment-15981720 ] slim bouguerra commented on HIVE-16522: --- [~ashutoshc] can you please check this ? > Hive is query timer is not keeping track of the fetch task execution > > > Key: HIVE-16522 > URL: https://issues.apache.org/jira/browse/HIVE-16522 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-16522.patch > > > Currently Hive CLI query execution time does not include fetch time execution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution
[ https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-16522: -- Attachment: HIVE-16522.patch > Hive is query timer is not keeping track of the fetch task execution > > > Key: HIVE-16522 > URL: https://issues.apache.org/jira/browse/HIVE-16522 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > Attachments: HIVE-16522.patch > > > Currently Hive CLI query execution time does not include fetch time execution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
[ https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981707#comment-15981707 ] Siddharth Seth commented on HIVE-16343: --- This look up can be quite expensive. e.g. the SMAPS based lookup can take multiple seconds. I don't think refreshing it every 10s is a good idea. Need to have some kind of guard around when it gets refreshed (independent of the metrics config) > LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring > > > Key: HIVE-16343 > URL: https://issues.apache.org/jira/browse/HIVE-16343 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16343.1.patch > > > Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful > for monitoring and also setting up triggers via JMC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution
[ https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-16522: -- Status: Patch Available (was: Open) > Hive is query timer is not keeping track of the fetch task execution > > > Key: HIVE-16522 > URL: https://issues.apache.org/jira/browse/HIVE-16522 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > > Currently Hive CLI query execution time does not include fetch time execution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16522) Hive is query timer is not keeping track of the fetch task execution
[ https://issues.apache.org/jira/browse/HIVE-16522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra reassigned HIVE-16522: - > Hive is query timer is not keeping track of the fetch task execution > > > Key: HIVE-16522 > URL: https://issues.apache.org/jira/browse/HIVE-16522 > Project: Hive > Issue Type: Bug >Reporter: slim bouguerra >Assignee: slim bouguerra > > Currently Hive CLI query execution time does not include fetch time execution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16521) HoS user level explain plan possibly incorrect for UNION clause
[ https://issues.apache.org/jira/browse/HIVE-16521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sahil Takiar reassigned HIVE-16521: --- > HoS user level explain plan possibly incorrect for UNION clause > --- > > Key: HIVE-16521 > URL: https://issues.apache.org/jira/browse/HIVE-16521 > Project: Hive > Issue Type: Bug > Components: Spark >Affects Versions: 3.0.0 >Reporter: Sahil Takiar >Assignee: Sahil Takiar > > The user-level explain plan for queries with a UNION operator look very > different for HoS vs. Hive-on-Tez. Furthermore, the HoS plan looks incomplete: > Query: {{EXPLAIN select count(*) from srcpart where srcpart.ds in (select > max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart)}} > Hive-on-Tez: > {code} > Plan optimized by CBO. > Vertex dependency in root stage > Reducer 2 <- Map 1 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE) > Reducer 3 <- Reducer 2 (CUSTOM_SIMPLE_EDGE) > Reducer 5 <- Map 4 (CUSTOM_SIMPLE_EDGE), Union 6 (CONTAINS) > Reducer 7 <- Union 6 (SIMPLE_EDGE) > Reducer 9 <- Map 8 (CUSTOM_SIMPLE_EDGE), Union 6 (CONTAINS) > Stage-0 > Fetch Operator > limit:-1 > Stage-1 > Reducer 3 > File Output Operator [FS_34] > Group By Operator [GBY_32] (rows=1 width=8) > Output:["_col0"],aggregations:["count(VALUE._col0)"] > <-Reducer 2 [CUSTOM_SIMPLE_EDGE] > PARTITION_ONLY_SHUFFLE [RS_31] > Group By Operator [GBY_30] (rows=1 width=8) > Output:["_col0"],aggregations:["count()"] > Merge Join Operator [MERGEJOIN_44] (rows=1000 width=8) > Conds:RS_26._col0=RS_27._col0(Inner) > <-Map 1 [SIMPLE_EDGE] > SHUFFLE [RS_26] > PartitionCols:_col0 > Select Operator [SEL_2] (rows=2000 width=184) > Output:["_col0"] > TableScan [TS_0] (rows=2000 width=194) > default@srcpart,srcpart,Tbl:COMPLETE,Col:COMPLETE > <-Reducer 7 [SIMPLE_EDGE] > SHUFFLE [RS_27] > PartitionCols:_col0 > Group By Operator [GBY_24] (rows=1 width=184) > Output:["_col0"],keys:KEY._col0 > <-Union 6 [SIMPLE_EDGE] > <-Reducer 5 [CONTAINS] > Reduce Output Operator [RS_23] > PartitionCols:_col0 > Group By Operator [GBY_22] (rows=1 width=184) > Output:["_col0"],keys:_col0 > Filter Operator [FIL_9] (rows=1 width=184) > predicate:_col0 is not null > Group By Operator [GBY_7] (rows=1 width=184) > > Output:["_col0"],aggregations:["max(VALUE._col0)"] > <-Map 4 [CUSTOM_SIMPLE_EDGE] > PARTITION_ONLY_SHUFFLE [RS_6] > Group By Operator [GBY_5] (rows=1 width=184) > Output:["_col0"],aggregations:["max(ds)"] > Select Operator [SEL_4] (rows=2000 > width=194) > Output:["ds"] > TableScan [TS_3] (rows=2000 width=194) > > default@srcpart,srcpart,Tbl:COMPLETE,Col:COMPLETE > <-Reducer 9 [CONTAINS] > Reduce Output Operator [RS_23] > PartitionCols:_col0 > Group By Operator [GBY_22] (rows=1 width=184) > Output:["_col0"],keys:_col0 > Filter Operator [FIL_17] (rows=1 width=184) > predicate:_col0 is not null > Group By Operator [GBY_15] (rows=1 width=184) > > Output:["_col0"],aggregations:["min(VALUE._col0)"] > <-Map 8 [CUSTOM_SIMPLE_EDGE] > PARTITION_ONLY_SHUFFLE [RS_14] > Group By Operator [GBY_13] (rows=1 width=184) > Output:["_col0"],aggregations:["min(ds)"] > Select Operator [SEL_12] (rows=2000 > width=194) > Output:["ds"] > TableScan [TS_11] (rows=2000 width=194) > > default@srcpart,srcpart,Tbl:COMPLETE,Col:COMPLETE > Dynamic Partitioning Event Operator [EVENT_43] (rows=1 > width=184) > Group By Operator [GBY_42] (rows=1 width=184) > Output:["_col0"],keys:_col0 >
[jira] [Commented] (HIVE-11133) Support hive.explain.user for Spark
[ https://issues.apache.org/jira/browse/HIVE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981683#comment-15981683 ] Sahil Takiar commented on HIVE-11133: - [~lirui] no it isn't truncated. This could be another bug. I filed HIVE-16521 as another follow up item. I suspect it has something to do with the UNION operator, the user level plans for HoS vs. Hive-on-Tez look very different for queries with a UNION. > Support hive.explain.user for Spark > --- > > Key: HIVE-11133 > URL: https://issues.apache.org/jira/browse/HIVE-11133 > Project: Hive > Issue Type: Sub-task > Components: Spark >Reporter: Mohit Sabharwal >Assignee: Sahil Takiar > Attachments: HIVE-11133.1.patch, HIVE-11133.2.patch, > HIVE-11133.3.patch, HIVE-11133.4.patch, HIVE-11133.5.patch, > HIVE-11133.6.patch, HIVE-11133.7.patch, HIVE-11133.8.patch > > > User friendly explain output ({{set hive.explain.user=true}}) should support > Spark as well. > Once supported, we should also enable related q-tests like {{explainuser_1.q}} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-14881) integrate MM tables into ACID: merge cleaner into ACID threads
[ https://issues.apache.org/jira/browse/HIVE-14881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981650#comment-15981650 ] Eugene Koifman commented on HIVE-14881: --- As we discussed, I think adding a new compaction type here is going to cause confusion longer term. Also, if you make Worker delete the Directory. getAbortedDirectories(), then the Cleaner doesn't have to change at all and probably has better scalability. getAbortedDirectories() shoudl have have a JavaDoc that it's a list of deltas that have nothing but aborted txns. I'd change {noformat} if (MetaStoreUtils.isInsertOnlyTable(tblproperties) && 1013 txnList.isTxnAborted(delta.minTransaction)) { // for MM table, minTxnId & maxTxnId is same 1014aborted.add(child); 1015 } {noformat} to check that all txns between min/max for current delta and if so, add the delta to the list - this way the code is not specific to MM tables and the collection being build may be useful somewhere else. > integrate MM tables into ACID: merge cleaner into ACID threads > --- > > Key: HIVE-14881 > URL: https://issues.apache.org/jira/browse/HIVE-14881 > Project: Hive > Issue Type: Sub-task > Components: Transactions >Reporter: Sergey Shelukhin >Assignee: Wei Zheng > Attachments: HIVE-14881.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-15795) Support Accumulo Index Tables in Hive Accumulo Connector
[ https://issues.apache.org/jira/browse/HIVE-15795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981634#comment-15981634 ] Josh Elser commented on HIVE-15795: --- bq. I am curious why I didn't run into this before Sergey committed. Sorry about that, folks. Ahh, I had applied this onto a 2.2 branch and not master. That's probably why I didn't catch the issue. Sorry again. [~faganm], can you please attach your patch from reviewboard here for HadoopQA to run? > Support Accumulo Index Tables in Hive Accumulo Connector > > > Key: HIVE-15795 > URL: https://issues.apache.org/jira/browse/HIVE-15795 > Project: Hive > Issue Type: Improvement > Components: Accumulo Storage Handler >Reporter: Mike Fagan >Assignee: Mike Fagan >Priority: Minor > Fix For: 3.0.0 > > Attachments: HIVE-15795.1.patch, HIVE-15795.2.patch > > > Ability to specify an accumulo index table for an accumulo-hive table. > This would greatly improve performance for non-rowid query predicates -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15571) Support Insert into for druid storage handler
[ https://issues.apache.org/jira/browse/HIVE-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-15571: Status: Patch Available (was: Open) > Support Insert into for druid storage handler > - > > Key: HIVE-15571 > URL: https://issues.apache.org/jira/browse/HIVE-15571 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: Nishant Bangarwa > Attachments: HIVE-15571.01.patch > > > Add support of inset into operator for druid storage handler. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
[ https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981631#comment-15981631 ] Prasanth Jayachandran commented on HIVE-16343: -- bq. While launching the process. environment.put("JVM_PID", "$$") / export. Within the process - System.getenv().get("JVM_PID"). does it also happen for process launched by slider? bq. Is there an easier and more reliable way to do this, instead of relying on a pid file I thought this is the reliable when compared to others :) The current default location for the pid file is not reliable as it defaults to /tmp/user location. If JVM_PID is guaranteed to be set I can add that option as well. bq. May want to introduce a config for which process monitor to use, instead of relying on a YARN configuration. hmm.. why do we need this? Unless LLAP adds its own class it is not that useful. Isn't it? bq. How often will the metrics be collected? Configurable in hadoop-metrics2.properties file. The template and Ambari default is to collect every 10s and publish every 5 mins. > LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring > > > Key: HIVE-16343 > URL: https://issues.apache.org/jira/browse/HIVE-16343 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16343.1.patch > > > Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful > for monitoring and also setting up triggers via JMC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16519) Fix exception thrown by checkOutputSpecs
[ https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981614#comment-15981614 ] Sergey Shelukhin commented on HIVE-16519: - +1 > Fix exception thrown by checkOutputSpecs > > > Key: HIVE-16519 > URL: https://issues.apache.org/jira/browse/HIVE-16519 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Labels: druid > Attachments: HIVE-16519.patch > > > do not throw exception by checkOutputSpecs -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16343) LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring
[ https://issues.apache.org/jira/browse/HIVE-16343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981608#comment-15981608 ] Siddharth Seth commented on HIVE-16343: --- Getting access to the PID. Is there an easier and more reliable way to do this, instead of relying on a pid file. Tez/YARN use the following - While launching the process. environment.put("JVM_PID", "$$") / export. Within the process - System.getenv().get("JVM_PID"). If retaining the current method of accessing the pid file, please move to a helper class. The daemon class is getting a little noisy. May want to introduce a config for which process monitor to use, instead of relying on a YARN configuration. How often will the metrics be collected? > LLAP: Publish YARN's ProcFs based memory usage to metrics for monitoring > > > Key: HIVE-16343 > URL: https://issues.apache.org/jira/browse/HIVE-16343 > Project: Hive > Issue Type: Improvement > Components: llap >Affects Versions: 3.0.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-16343.1.patch > > > Publish MemInfo from ProcfsBasedProcessTree to llap metrics. This will useful > for monitoring and also setting up triggers via JMC. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated HIVE-16520: -- Attachment: HIVE-16520-proto.patch > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > Attachments: HIVE-16520-proto.patch > > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16520) Cache hive metadata in metastore
[ https://issues.apache.org/jira/browse/HIVE-16520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned HIVE-16520: - > Cache hive metadata in metastore > > > Key: HIVE-16520 > URL: https://issues.apache.org/jira/browse/HIVE-16520 > Project: Hive > Issue Type: New Feature > Components: Metastore >Reporter: Daniel Dai >Assignee: Daniel Dai > > During Hive 2 benchmark, we find Hive metastore operation take a lot of time > and thus slow down Hive compilation. In some extreme case, it takes much > longer than the actual query run time. Especially, we find the latency of > cloud db is very high and 90% of total query runtime is waiting for metastore > SQL database operations. Based on this observation, the metastore operation > performance will be greatly enhanced if we have a memory structure which > cache the database query result. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16079) HS2: high memory pressure due to duplicate Properties objects
[ https://issues.apache.org/jira/browse/HIVE-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981588#comment-15981588 ] Misha Dmitriev commented on HIVE-16079: --- vector_if_expr test fails in pretty much every Hive build. accumulo_index fails in every 2nd..3rd build, so it looks flaky as well. > HS2: high memory pressure due to duplicate Properties objects > - > > Key: HIVE-16079 > URL: https://issues.apache.org/jira/browse/HIVE-16079 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Misha Dmitriev >Assignee: Misha Dmitriev > Attachments: HIVE-16079.01.patch, HIVE-16079.02.patch, > HIVE-16079.03.patch, hs2-crash-2000p-500m-50q.txt > > > I've created a Hive table with 2000 partitions, each backed by two files, > with one row in each file. When I execute some number of concurrent queries > against this table, e.g. as follows > {code} > for i in `seq 1 50`; do beeline -u jdbc:hive2://localhost:1 -n admin -p > admin -e "select count(i_f_1) from misha_table;" & done > {code} > it results in a big memory spike. With 20 queries I caused an OOM in a HS2 > server with -Xmx200m and with 50 queries - in the one with -Xmx500m. > I am attaching the results of jxray (www.jxray.com) analysis of a heap dump > that was generated in the 50queries/500m heap scenario. It suggests that > there are several opportunities to reduce memory pressure with not very > invasive changes to the code. One (duplicate strings) has been addressed in > https://issues.apache.org/jira/browse/HIVE-15882 In this ticket, I am going > to address the fact that almost 20% of memory is used by instances of > java.util.Properties. These objects are highly duplicate, since for each > partition each concurrently running query creates its own copy of Partion, > PartitionDesc and Properties. Thus we have nearly 100,000 (50 queries * 2,000 > partitions) Properties in memory. By interning/deduplicating these objects we > may be able to save perhaps 15% of memory. > Note, however, that if there are queries that mutate partitions, the > corresponding Properties would be mutated as well. Thus we cannot simply use > a single "canonicalized" Properties object at all times for all Partition > objects representing the same DB partition. Instead, I am going to introduce > a special CopyOnFirstWriteProperties class. Such an object initially > internally references a canonicalized Properties object, and keeps doing so > while only read methods are called. However, once any mutating method is > called, the given CopyOnFirstWriteProperties copies the data into its own > table from the canonicalized table, and uses it ever after. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16441) De-duplicate semijoin branches in n-way joins
[ https://issues.apache.org/jira/browse/HIVE-16441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-16441: -- Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Committed to master > De-duplicate semijoin branches in n-way joins > - > > Key: HIVE-16441 > URL: https://issues.apache.org/jira/browse/HIVE-16441 > Project: Hive > Issue Type: Improvement >Reporter: Deepak Jaiswal >Assignee: Deepak Jaiswal > Fix For: 3.0.0 > > Attachments: HIVE-16441.1.patch, HIVE-16441.2.patch, > HIVE-16441.3.patch, HIVE-16441.4.patch > > > Currently in n-way joins, semi join optimization creates n branches on same > key. Instead it should reuse one branch for all the joins. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16519) Fix exception thrown by checkOutputSpecs
[ https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981553#comment-15981553 ] slim bouguerra commented on HIVE-16519: --- [~sershe] can you review this please > Fix exception thrown by checkOutputSpecs > > > Key: HIVE-16519 > URL: https://issues.apache.org/jira/browse/HIVE-16519 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Labels: druid > Attachments: HIVE-16519.patch > > > do not throw exception by checkOutputSpecs -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16519) Fix exception thrown by checkOutputSpecs
[ https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-16519: -- Attachment: HIVE-16519.patch > Fix exception thrown by checkOutputSpecs > > > Key: HIVE-16519 > URL: https://issues.apache.org/jira/browse/HIVE-16519 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Labels: druid > Attachments: HIVE-16519.patch > > > do not throw exception by checkOutputSpecs -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16519) Fix exception thrown by checkOutputSpecs
[ https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra updated HIVE-16519: -- Status: Patch Available (was: Open) > Fix exception thrown by checkOutputSpecs > > > Key: HIVE-16519 > URL: https://issues.apache.org/jira/browse/HIVE-16519 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Labels: druid > Attachments: HIVE-16519.patch > > > do not throw exception by checkOutputSpecs -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16519) Fix exception thrown by checkOutputSpecs
[ https://issues.apache.org/jira/browse/HIVE-16519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] slim bouguerra reassigned HIVE-16519: - > Fix exception thrown by checkOutputSpecs > > > Key: HIVE-16519 > URL: https://issues.apache.org/jira/browse/HIVE-16519 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: slim bouguerra >Assignee: slim bouguerra > Labels: druid > > do not throw exception by checkOutputSpecs -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16513) width_bucket issues
[ https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981546#comment-15981546 ] Sahil Takiar commented on HIVE-16513: - Yup, I'll try to take a look later today, if not tomorrow. > width_bucket issues > --- > > Key: HIVE-16513 > URL: https://issues.apache.org/jira/browse/HIVE-16513 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin > > width_bucket was recently added with HIVE-15982. This ticket notes a few > issues. > Usability issue: > Currently only accepts integral numeric types. Decimals, floats and doubles > are not supported. > Runtime failures: This query will cause a runtime divide-by-zero in the > reduce stage. > select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1; > The divide-by-zero seems to trigger any time I use a group-by. Here's another > example (that actually requires the group-by): > select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1; > Advanced Usage Issues: > Suppose you have a table e011_01 as follows: > create table e011_01 (c1 integer, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > Compile-time problems: > You cannot use simple case expressions, searched case expressions or grouping > sets. These queries fail: > select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) > from e011_01; > select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) > from e011_01; > select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as > integer)) from e011_02 group by cube(c1, c2); > I'll admit the grouping one is pretty contrived but the case ones seem > straightforward, valid, and it's strange that they don't work. Similar > queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe > [~ashutoshc] can lend some perspective on that? > Interestingly, you can use window functions in width_bucket, example: > select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01; > works just fine. Hopefully we can get to a place where people implementing > functions like this don't need to think about value expression support but we > don't seem to be there yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16513) width_bucket issues
[ https://issues.apache.org/jira/browse/HIVE-16513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981540#comment-15981540 ] Ashutosh Chauhan commented on HIVE-16513: - [~stakiar] Would you like to take a look? > width_bucket issues > --- > > Key: HIVE-16513 > URL: https://issues.apache.org/jira/browse/HIVE-16513 > Project: Hive > Issue Type: Bug >Reporter: Carter Shanklin > > width_bucket was recently added with HIVE-15982. This ticket notes a few > issues. > Usability issue: > Currently only accepts integral numeric types. Decimals, floats and doubles > are not supported. > Runtime failures: This query will cause a runtime divide-by-zero in the > reduce stage. > select width_bucket(c1, 0, c1*2, 10) from e011_01 group by c1; > The divide-by-zero seems to trigger any time I use a group-by. Here's another > example (that actually requires the group-by): > select width_bucket(c1, 0, max(c1), 10) from e011_01 group by c1; > Advanced Usage Issues: > Suppose you have a table e011_01 as follows: > create table e011_01 (c1 integer, c2 smallint); > insert into e011_01 values (1, 1), (2, 2); > Compile-time problems: > You cannot use simple case expressions, searched case expressions or grouping > sets. These queries fail: > select width_bucket(5, c2, case c1 when 1 then c1 * 2 else c1 * 3 end, 10) > from e011_01; > select width_bucket(5, c2, case when c1 < 2 then c1 * 2 else c1 * 3 end, 10) > from e011_01; > select width_bucket(5, c2, max(c1)*10, cast(grouping(c1, c2)*20+1 as > integer)) from e011_02 group by cube(c1, c2); > I'll admit the grouping one is pretty contrived but the case ones seem > straightforward, valid, and it's strange that they don't work. Similar > queries work with other UDFs like sum. Why wouldn't they "just work"? Maybe > [~ashutoshc] can lend some perspective on that? > Interestingly, you can use window functions in width_bucket, example: > select width_bucket(rank() over (order by c2), 0, 10, 10) from e011_01; > works just fine. Hopefully we can get to a place where people implementing > functions like this don't need to think about value expression support but we > don't seem to be there yet. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981539#comment-15981539 ] Sergey Shelukhin commented on HIVE-16516: - +1. I think the idea is that the default should point to a snapshot, otherwise it would be impossible to change anything without a release. Then, the next Hive release would require a s-a release, and would need to point to that. > Set storage-api.version to 3.0.0-SNAPSHOT > - > > Key: HIVE-16516 > URL: https://issues.apache.org/jira/browse/HIVE-16516 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich > Attachments: HIVE-16516.1.patch > > > I think the update of this property was missed during preparation to 3.0.0; > I've bumped into this after cleaning the local .m2 repo caches. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16483) HoS should populate split related configurations to HiveConf
[ https://issues.apache.org/jira/browse/HIVE-16483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun updated HIVE-16483: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Test failure not related. Committed to master. Thanks Xuefu for the review. > HoS should populate split related configurations to HiveConf > > > Key: HIVE-16483 > URL: https://issues.apache.org/jira/browse/HIVE-16483 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Chao Sun >Assignee: Chao Sun > Fix For: 3.0.0 > > Attachments: HIVE-16483.1.patch > > > There are several split related configurations, such as > {{MAPREDMINSPLITSIZE}}, {{MAPREDMINSPLITSIZEPERNODE}}, > {{MAPREDMINSPLITSIZEPERRACK}}, etc., that should be populated to HiveConf. > Currently we only do this for {{MAPREDMINSPLITSIZE}}. > All the others, if not set, will be using the default value, which is 1. > Without these, Spark sometimes will not merge small files for file formats > such as text. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Attachment: HIVE-16488.01.patch > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.2.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Status: Patch Available (was: Open) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.2.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > Attachments: HIVE-16488.01.patch > > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16488) Support replicating into existing db if the db is empty
[ https://issues.apache.org/jira/browse/HIVE-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sankar Hariappan updated HIVE-16488: Attachment: (was: HIVE-16488.01.patch) > Support replicating into existing db if the db is empty > --- > > Key: HIVE-16488 > URL: https://issues.apache.org/jira/browse/HIVE-16488 > Project: Hive > Issue Type: Sub-task > Components: repl >Affects Versions: 2.2.0 >Reporter: Sankar Hariappan >Assignee: Sankar Hariappan > Labels: DR, Replication > > This is a potential usecase where a user may want to manually create a db on > destination to make sure it goes to a certain dir root, or they may have > cases where the db (default, for instance) was automatically created. We > should still allow replicating into this without failing if the db is empty. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
[ https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16353: --- Status: Patch Available (was: Open) > Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot > -- > > Key: HIVE-16353 > URL: https://issues.apache.org/jira/browse/HIVE-16353 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16353.patch > > > HIVE-16049 upgraded to jetty 9. It is committed to apache master which is > still 2.3.0-snapshot. This breaks couple of other components like LLAP and > ends up throwing the following error during runtime. > {noformat} > 2017-04-02T20:17:45,435 WARN [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP > Daemon with exception > java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule > at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) > ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) > [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > Caused by: java.lang.ClassNotFoundException: > org.eclipse.jetty.rewrite.handler.Rule > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77] > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77] > ... 7 more > 2017-04-02T20:17:45,441 INFO [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown > invoked > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
[ https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-16353: --- Attachment: HIVE-16353.patch This is possibly due to the replacement of jetty-all.jar with multiple sub jars in the pom.xml > Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot > -- > > Key: HIVE-16353 > URL: https://issues.apache.org/jira/browse/HIVE-16353 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Gopal V >Priority: Minor > Attachments: HIVE-16353.patch > > > HIVE-16049 upgraded to jetty 9. It is committed to apache master which is > still 2.3.0-snapshot. This breaks couple of other components like LLAP and > ends up throwing the following error during runtime. > {noformat} > 2017-04-02T20:17:45,435 WARN [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP > Daemon with exception > java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule > at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) > ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) > [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > Caused by: java.lang.ClassNotFoundException: > org.eclipse.jetty.rewrite.handler.Rule > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77] > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77] > ... 7 more > 2017-04-02T20:17:45,441 INFO [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown > invoked > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16288) Add blobstore tests for ORC and RCFILE file formats
[ https://issues.apache.org/jira/browse/HIVE-16288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981412#comment-15981412 ] Rentao Wu commented on HIVE-16288: -- I think these tests were added to the wrong directory (should be under itest/hive-blobstore instead of hive-blobstore). https://github.com/apache/hive/commit/ea41d0a685fb346ce20075d0dcc3c736e375bb20 > Add blobstore tests for ORC and RCFILE file formats > --- > > Key: HIVE-16288 > URL: https://issues.apache.org/jira/browse/HIVE-16288 > Project: Hive > Issue Type: Test > Components: Tests >Affects Versions: 2.1.1 >Reporter: Thomas Poepping >Assignee: Thomas Poepping > Fix For: 2.3.0, 3.0.0 > > Attachments: HIVE-16288.patch > > > This patch adds four tests each for ORC and RCFILE when running against > blobstore filesystems: > * Test for bucketed tables > * Test for nonpartitioned tables > * Test for partitioned tables > * Test for partitioned tables with nonstandard partition locations -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16353) Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot
[ https://issues.apache.org/jira/browse/HIVE-16353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned HIVE-16353: -- Assignee: Gopal V > Jetty 9 upgrade breaks hive master which is 2.3.0-snapshot > -- > > Key: HIVE-16353 > URL: https://issues.apache.org/jira/browse/HIVE-16353 > Project: Hive > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Gopal V >Priority: Minor > > HIVE-16049 upgraded to jetty 9. It is committed to apache master which is > still 2.3.0-snapshot. This breaks couple of other components like LLAP and > ends up throwing the following error during runtime. > {noformat} > 2017-04-02T20:17:45,435 WARN [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: Failed to start LLAP > Daemon with exception > java.lang.NoClassDefFoundError: org/eclipse/jetty/rewrite/handler/Rule > at org.apache.hive.http.HttpServer$Builder.build(HttpServer.java:125) > ~[hive-exec-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at > org.apache.hadoop.hive.llap.daemon.services.impl.LlapWebServices.serviceInit(LlapWebServices.java:102) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.serviceInit(LlapDaemon.java:385) > ~[hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163) > ~[hadoop-common-2.7.3.2.5.0.0-1245.jar:?] > at > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon.main(LlapDaemon.java:528) > [hive-llap-server-2.3.0-SNAPSHOT.jar:2.3.0-SNAPSHOT] > Caused by: java.lang.ClassNotFoundException: > org.eclipse.jetty.rewrite.handler.Rule > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77] > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77] > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77] > ... 7 more > 2017-04-02T20:17:45,441 INFO [main ()] > org.apache.hadoop.hive.llap.daemon.impl.LlapDaemon: LlapDaemon shutdown > invoked > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16518) Insert override for druid does not replace all existing segments
[ https://issues.apache.org/jira/browse/HIVE-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Bangarwa updated HIVE-16518: Fix Version/s: 3.0.0 > Insert override for druid does not replace all existing segments > > > Key: HIVE-16518 > URL: https://issues.apache.org/jira/browse/HIVE-16518 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: Nishant Bangarwa >Assignee: Nishant Bangarwa > Fix For: 3.0.0 > > > Insert override for Druid does not replace segments for all intervals. > It just replaces segments for the intervals which are newly ingested. > INSERT OVERRIDE TABLE statement on DruidStorageHandler should override all > existing segments for the table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16518) Insert override for druid does not replace all existing segments
[ https://issues.apache.org/jira/browse/HIVE-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Bangarwa updated HIVE-16518: Component/s: Druid integration > Insert override for druid does not replace all existing segments > > > Key: HIVE-16518 > URL: https://issues.apache.org/jira/browse/HIVE-16518 > Project: Hive > Issue Type: Bug > Components: Druid integration >Reporter: Nishant Bangarwa >Assignee: Nishant Bangarwa > Fix For: 3.0.0 > > > Insert override for Druid does not replace segments for all intervals. > It just replaces segments for the intervals which are newly ingested. > INSERT OVERRIDE TABLE statement on DruidStorageHandler should override all > existing segments for the table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (HIVE-16451) Race condition between HiveStatement.getQueryLog and HiveStatement.runAsyncOnServer
[ https://issues.apache.org/jira/browse/HIVE-16451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15981334#comment-15981334 ] Peter Vary commented on HIVE-16451: --- On the weekend I did some thinking, and I realized that HiveStatement could not be made to thread safe (see execute, and getResultSet could be called from different threads and cause problems). We can not make the HiveStatement thread safe, but we should at least make sure that calling getQueryLog will not cause problems if it is called parallel with any of the followings: cancel, close, execute, executeAsync, executeQuery, executeUpdate, getUpdateCount and more interestingly for the HiveQueryResultSet.next too. It is a quiet complex problem for the first glance, so I created a new jira for it: HIVE-16517 - HiveStatement thread safety issues. In the meantime this patch could solve the problems which can arise during the happy path. > Race condition between HiveStatement.getQueryLog and > HiveStatement.runAsyncOnServer > --- > > Key: HIVE-16451 > URL: https://issues.apache.org/jira/browse/HIVE-16451 > Project: Hive > Issue Type: Bug > Components: Beeline, JDBC >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-16451.02.patch, HIVE-16451.03.patch, > HIVE-16451.patch > > > During the BeeLineDriver testing I have met the following race condition: > - Run the query asynchronously through BeeLine > - Querying the logs in the BeeLine > In the following code: > {code:title=HiveStatement.runAsyncOnServer} > private void runAsyncOnServer(String sql) throws SQLException { > checkConnection("execute"); > closeClientOperation(); > initFlags(); > [..] > } > {code} > {code:title=HiveStatement.getQueryLog} > public List getQueryLog(boolean incremental, int fetchSize) > throws SQLException, ClosedOrCancelledStatementException { > [..] > try { > if (stmtHandle != null) { > [..] > } else { > if (isQueryClosed) { > throw new ClosedOrCancelledStatementException("Method getQueryLog() > failed. The " + > "statement has been closed or cancelled."); > } else { > return logs; > } > } > } catch (SQLException e) { > [..] > } > [..] > } > {code} > The runAsyncOnServer {{closeClientOperation}} sets {{isQueryClosed}} flag to > true: > {code:title=HiveStatement.closeClientOperation} > void closeClientOperation() throws SQLException { > [..] > isQueryClosed = true; > isExecuteStatementFailed = false; > stmtHandle = null; > } > {code} > The {{initFlags}} sets it to false: > {code} > private void initFlags() { > isCancelled = false; > isQueryClosed = false; > isLogBeingGenerated = true; > isExecuteStatementFailed = false; > isOperationComplete = false; > } > {code} > If the {{getQueryLog}} is called after the {{closeClientOperation}}, but > before the {{initFlags}}, then we will have a following warning if verbose > mode is set to true in BeeLine: > {code} > Warning: org.apache.hive.jdbc.ClosedOrCancelledStatementException: Method > getQueryLog() failed. The statement has been closed or cancelled. > (state=,code=0) > {code} > This caused this fail: > https://builds.apache.org/job/PreCommit-HIVE-Build/4691/testReport/org.apache.hadoop.hive.cli/TestBeeLineDriver/testCliDriver_smb_mapjoin_11_/ > {code} > Error Message > Client result comparison failed with error code = 1 while executing > fname=smb_mapjoin_11 > 16a17 > > Warning: org.apache.hive.jdbc.ClosedOrCancelledStatementException: Method > > getQueryLog() failed. The statement has been closed or cancelled. > > (state=,code=0) > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16518) Insert override for druid does not replace all existing segments
[ https://issues.apache.org/jira/browse/HIVE-16518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Bangarwa reassigned HIVE-16518: --- > Insert override for druid does not replace all existing segments > > > Key: HIVE-16518 > URL: https://issues.apache.org/jira/browse/HIVE-16518 > Project: Hive > Issue Type: Bug >Reporter: Nishant Bangarwa >Assignee: Nishant Bangarwa > > Insert override for Druid does not replace segments for all intervals. > It just replaces segments for the intervals which are newly ingested. > INSERT OVERRIDE TABLE statement on DruidStorageHandler should override all > existing segments for the table. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16426) Query cancel: improve the way to handle files
[ https://issues.apache.org/jira/browse/HIVE-16426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-16426: Resolution: Fixed Fix Version/s: 3.0.0 Status: Resolved (was: Patch Available) Push into master. Thanks [~aihuaxu] and [~ctang.ma] for reviewing the patch. > Query cancel: improve the way to handle files > - > > Key: HIVE-16426 > URL: https://issues.apache.org/jira/browse/HIVE-16426 > Project: Hive > Issue Type: Improvement >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Fix For: 3.0.0 > > Attachments: HIVE-16426.1.patch > > > 1. Add data structure support to make it is easy to check query cancel status. > 2. Handle query cancel more gracefully. Remove possible file leaks caused by > query cancel as shown in following stack: > {noformat} > 2017-04-11 09:57:30,727 WARN org.apache.hadoop.hive.ql.exec.Utilities: > [HiveServer2-Background-Pool: Thread-149]: Failed to clean-up tmp directories. > java.io.InterruptedIOException: Call interrupted > at org.apache.hadoop.ipc.Client.call(Client.java:1496) > at org.apache.hadoop.ipc.Client.call(Client.java:1439) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230) > at com.sun.proxy.$Proxy20.delete(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:535) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256) > at > org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) > at com.sun.proxy.$Proxy21.delete(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:2059) > at > org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:675) > at > org.apache.hadoop.hdfs.DistributedFileSystem$13.doCall(DistributedFileSystem.java:671) > at > org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) > at > org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:671) > at > org.apache.hadoop.hive.ql.exec.Utilities.clearWork(Utilities.java:277) > at > org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:463) > at > org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:142) > at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:214) > at > org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100) > at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1978) > at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1691) > at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1423) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1207) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1202) > at > org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238) > at > org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:88) > at > org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:303) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > at > org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:316) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {noformat} > 3. Add checkpoints to related file operations to improve response time for > query cancelling. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-15571) Support Insert into for druid storage handler
[ https://issues.apache.org/jira/browse/HIVE-15571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Bangarwa updated HIVE-15571: Attachment: HIVE-15571.01.patch > Support Insert into for druid storage handler > - > > Key: HIVE-15571 > URL: https://issues.apache.org/jira/browse/HIVE-15571 > Project: Hive > Issue Type: New Feature > Components: Druid integration >Reporter: slim bouguerra >Assignee: Nishant Bangarwa > Attachments: HIVE-15571.01.patch > > > Add support of inset into operator for druid storage handler. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-16516: Status: Patch Available (was: Open) [~owen.omalley] could you take a look? ...am I right regarding changing this to 3.0.0-SNAPSHOT ...or this property should point to a 'released' version? > Set storage-api.version to 3.0.0-SNAPSHOT > - > > Key: HIVE-16516 > URL: https://issues.apache.org/jira/browse/HIVE-16516 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich > Attachments: HIVE-16516.1.patch > > > I think the update of this property was missed during preparation to 3.0.0; > I've bumped into this after cleaning the local .m2 repo caches. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich updated HIVE-16516: Attachment: HIVE-16516.1.patch > Set storage-api.version to 3.0.0-SNAPSHOT > - > > Key: HIVE-16516 > URL: https://issues.apache.org/jira/browse/HIVE-16516 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich > Attachments: HIVE-16516.1.patch > > > I think the update of this property was missed during preparation to 3.0.0; > I've bumped into this after cleaning the local .m2 repo caches. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Assigned] (HIVE-16516) Set storage-api.version to 3.0.0-SNAPSHOT
[ https://issues.apache.org/jira/browse/HIVE-16516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltan Haindrich reassigned HIVE-16516: --- > Set storage-api.version to 3.0.0-SNAPSHOT > - > > Key: HIVE-16516 > URL: https://issues.apache.org/jira/browse/HIVE-16516 > Project: Hive > Issue Type: Bug >Reporter: Zoltan Haindrich >Assignee: Zoltan Haindrich > Attachments: HIVE-16516.1.patch > > > I think the update of this property was missed during preparation to 3.0.0; > I've bumped into this after cleaning the local .m2 repo caches. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16449) BeeLineDriver should handle query result sorting
[ https://issues.apache.org/jira/browse/HIVE-16449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Vary updated HIVE-16449: -- Attachment: HIVE-16449.06.patch Retrigger the tests with the same patch file > BeeLineDriver should handle query result sorting > > > Key: HIVE-16449 > URL: https://issues.apache.org/jira/browse/HIVE-16449 > Project: Hive > Issue Type: Improvement > Components: Testing Infrastructure >Affects Versions: 3.0.0 >Reporter: Peter Vary >Assignee: Peter Vary > Attachments: HIVE-16449.02.patch, HIVE-16449.03.patch, > HIVE-16449.04.patch, HIVE-16449.05.patch, HIVE-16449.06.patch, > HIVE-16449.patch > > > The CLI driver supports the following features: > -- SORT_QUERY_RESULTS > -- HASH_QUERY_RESULTS > -- SORT_AND_HASH_QUERY_RESULTS > BeeLineDriver should find a way to support these -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Updated] (HIVE-16450) Some metastore operations are not retried even with desired underlining exceptions
[ https://issues.apache.org/jira/browse/HIVE-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16450: Attachment: HIVE-16450.2.patch > Some metastore operations are not retried even with desired underlining > exceptions > -- > > Key: HIVE-16450 > URL: https://issues.apache.org/jira/browse/HIVE-16450 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-16450.1.patch, HIVE-16450.2.patch > > > In RetryingHMSHandler class, we are expecting the operations should retry > when the cause of MetaException is JDOException or NucleusException. > {noformat} > if (e.getCause() instanceof MetaException && e.getCause().getCause() > != null) { > if (e.getCause().getCause() instanceof javax.jdo.JDOException || > e.getCause().getCause() instanceof NucleusException) { > // The JDOException or the Nucleus Exception may be wrapped > further in a MetaException > caughtException = e.getCause().getCause(); >} > {noformat} > While in ObjectStore, many places we are only throwing new MetaException(msg) > without the cause, so we are missing retrying for some cases. e.g., with the > following JDOException, we should retry but it's ignored. > {noformat} > 2017-04-04 17:28:21,602 ERROR metastore.ObjectStore > (ObjectStore.java:getMTableColumnStatistics(6555)) - Error retrieving > statistics via jdo > javax.jdo.JDOException: Exception thrown when executing query > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596) > at > org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) > at > org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6546) > at > org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171) > at > org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6606) > at > org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6595) > at > org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2633) > at > org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6594) > at > org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6588) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103) > at com.sun.proxy.$Proxy0.getTableColumnStatistics(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTableUpdateTableColumnStats(HiveAlterHandler.java:787) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:247) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3809) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3779) > at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy3.alter_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9617) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9601) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) >
[jira] [Updated] (HIVE-16450) Some metastore operations are not retried even with desired underlining exceptions
[ https://issues.apache.org/jira/browse/HIVE-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16450: Attachment: (was: HIVE-16450.2.patch) > Some metastore operations are not retried even with desired underlining > exceptions > -- > > Key: HIVE-16450 > URL: https://issues.apache.org/jira/browse/HIVE-16450 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-16450.1.patch, HIVE-16450.2.patch > > > In RetryingHMSHandler class, we are expecting the operations should retry > when the cause of MetaException is JDOException or NucleusException. > {noformat} > if (e.getCause() instanceof MetaException && e.getCause().getCause() > != null) { > if (e.getCause().getCause() instanceof javax.jdo.JDOException || > e.getCause().getCause() instanceof NucleusException) { > // The JDOException or the Nucleus Exception may be wrapped > further in a MetaException > caughtException = e.getCause().getCause(); >} > {noformat} > While in ObjectStore, many places we are only throwing new MetaException(msg) > without the cause, so we are missing retrying for some cases. e.g., with the > following JDOException, we should retry but it's ignored. > {noformat} > 2017-04-04 17:28:21,602 ERROR metastore.ObjectStore > (ObjectStore.java:getMTableColumnStatistics(6555)) - Error retrieving > statistics via jdo > javax.jdo.JDOException: Exception thrown when executing query > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596) > at > org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) > at > org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6546) > at > org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171) > at > org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6606) > at > org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6595) > at > org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2633) > at > org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6594) > at > org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6588) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103) > at com.sun.proxy.$Proxy0.getTableColumnStatistics(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTableUpdateTableColumnStats(HiveAlterHandler.java:787) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:247) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3809) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3779) > at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy3.alter_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9617) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9601) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at >
[jira] [Updated] (HIVE-16450) Some metastore operations are not retried even with desired underlining exceptions
[ https://issues.apache.org/jira/browse/HIVE-16450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-16450: Status: In Progress (was: Patch Available) > Some metastore operations are not retried even with desired underlining > exceptions > -- > > Key: HIVE-16450 > URL: https://issues.apache.org/jira/browse/HIVE-16450 > Project: Hive > Issue Type: Bug > Components: Metastore >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Attachments: HIVE-16450.1.patch, HIVE-16450.2.patch > > > In RetryingHMSHandler class, we are expecting the operations should retry > when the cause of MetaException is JDOException or NucleusException. > {noformat} > if (e.getCause() instanceof MetaException && e.getCause().getCause() > != null) { > if (e.getCause().getCause() instanceof javax.jdo.JDOException || > e.getCause().getCause() instanceof NucleusException) { > // The JDOException or the Nucleus Exception may be wrapped > further in a MetaException > caughtException = e.getCause().getCause(); >} > {noformat} > While in ObjectStore, many places we are only throwing new MetaException(msg) > without the cause, so we are missing retrying for some cases. e.g., with the > following JDOException, we should retry but it's ignored. > {noformat} > 2017-04-04 17:28:21,602 ERROR metastore.ObjectStore > (ObjectStore.java:getMTableColumnStatistics(6555)) - Error retrieving > statistics via jdo > javax.jdo.JDOException: Exception thrown when executing query > at > org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:596) > at > org.datanucleus.api.jdo.JDOQuery.executeWithArray(JDOQuery.java:321) > at > org.apache.hadoop.hive.metastore.ObjectStore.getMTableColumnStatistics(ObjectStore.java:6546) > at > org.apache.hadoop.hive.metastore.ObjectStore.access$1200(ObjectStore.java:171) > at > org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6606) > at > org.apache.hadoop.hive.metastore.ObjectStore$9.getJdoResult(ObjectStore.java:6595) > at > org.apache.hadoop.hive.metastore.ObjectStore$GetHelper.run(ObjectStore.java:2633) > at > org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatisticsInternal(ObjectStore.java:6594) > at > org.apache.hadoop.hive.metastore.ObjectStore.getTableColumnStatistics(ObjectStore.java:6588) > at sun.reflect.GeneratedMethodAccessor23.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:103) > at com.sun.proxy.$Proxy0.getTableColumnStatistics(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTableUpdateTableColumnStats(HiveAlterHandler.java:787) > at > org.apache.hadoop.hive.metastore.HiveAlterHandler.alterTable(HiveAlterHandler.java:247) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_core(HiveMetaStore.java:3809) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.alter_table_with_environment_context(HiveMetaStore.java:3779) > at sun.reflect.GeneratedMethodAccessor67.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invokeInternal(RetryingHMSHandler.java:140) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:99) > at com.sun.proxy.$Proxy3.alter_table_with_environment_context(Unknown > Source) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9617) > at > org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$alter_table_with_environment_context.getResult(ThriftHiveMetastore.java:9601) > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:110) > at > org.apache.hadoop.hive.metastore.TUGIBasedProcessor$1.run(TUGIBasedProcessor.java:106) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at >