[jira] [Updated] (HIVE-14045) (Vectorization) Add missing case for BINARY in VectorizationContext.getNormalizedName method
[ https://issues.apache.org/jira/browse/HIVE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14045: Status: Patch Available (was: In Progress) > (Vectorization) Add missing case for BINARY in > VectorizationContext.getNormalizedName method > > > Key: HIVE-14045 > URL: https://issues.apache.org/jira/browse/HIVE-14045 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline > Fix For: 2.2.0 > > Attachments: HIVE-14045.01.patch, HIVE-14045.02.patch > > > Missing case for BINARY data type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14045) (Vectorization) Add missing case for BINARY in VectorizationContext.getNormalizedName method
[ https://issues.apache.org/jira/browse/HIVE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14045: Attachment: HIVE-14045.02.patch > (Vectorization) Add missing case for BINARY in > VectorizationContext.getNormalizedName method > > > Key: HIVE-14045 > URL: https://issues.apache.org/jira/browse/HIVE-14045 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline > Fix For: 2.2.0 > > Attachments: HIVE-14045.01.patch, HIVE-14045.02.patch > > > Missing case for BINARY data type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits
[ https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341114#comment-15341114 ] Hive QA commented on HIVE-14060: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12812001/HIVE-14060.1.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10235 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-vectorization_16.q-vector_decimal_round.q-orc_merge6.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/198/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/198/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-198/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12812001 - PreCommit-HIVE-MASTER-Build > Hive: Remove bogus "localhost" from Hive splits > --- > > Key: HIVE-14060 > URL: https://issues.apache.org/jira/browse/HIVE-14060 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.1.0, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-14060.1.patch > > > On remote filesystems like Azure, GCP and S3, the splits contain a filler > location of "localhost". > This is worse than having no location information at all - on large clusters > yarn waits upto 200[1] seconds for heartbeat from "localhost" before > allocating a container. > To speed up this process, the split affinity provider should scrub the bogus > "localhost" from the locations and allow for the allocation of "*" containers > instead on each heartbeat. > [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14045) (Vectorization) Add missing case for BINARY in VectorizationContext.getNormalizedName method
[ https://issues.apache.org/jira/browse/HIVE-14045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-14045: Status: In Progress (was: Patch Available) > (Vectorization) Add missing case for BINARY in > VectorizationContext.getNormalizedName method > > > Key: HIVE-14045 > URL: https://issues.apache.org/jira/browse/HIVE-14045 > Project: Hive > Issue Type: Bug > Components: Hive >Reporter: Matt McCline >Assignee: Matt McCline > Fix For: 2.2.0 > > Attachments: HIVE-14045.01.patch > > > Missing case for BINARY data type. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11527) bypass HiveServer2 thrift interface for query results
[ https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15341082#comment-15341082 ] Takanobu Asanuma commented on HIVE-11527: - BTW, somehow Jenkins did not run for HIVE-11527.10.patch. This time Jenkins likely to run for the new patch. > bypass HiveServer2 thrift interface for query results > - > > Key: HIVE-11527 > URL: https://issues.apache.org/jira/browse/HIVE-11527 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sergey Shelukhin >Assignee: Takanobu Asanuma > Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, > HIVE-11527.WIP.patch > > > Right now, HS2 reads query results and returns them to the caller via its > thrift API. > There should be an option for HS2 to return some pointer to results (an HDFS > link?) and for the user to read the results directly off HDFS inside the > cluster, or via something like WebHDFS outside the cluster > Review board link: https://reviews.apache.org/r/40867 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11527) bypass HiveServer2 thrift interface for query results
[ https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HIVE-11527: Status: Open (was: Patch Available) > bypass HiveServer2 thrift interface for query results > - > > Key: HIVE-11527 > URL: https://issues.apache.org/jira/browse/HIVE-11527 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sergey Shelukhin >Assignee: Takanobu Asanuma > Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, > HIVE-11527.WIP.patch > > > Right now, HS2 reads query results and returns them to the caller via its > thrift API. > There should be an option for HS2 to return some pointer to results (an HDFS > link?) and for the user to read the results directly off HDFS inside the > cluster, or via something like WebHDFS outside the cluster > Review board link: https://reviews.apache.org/r/40867 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11527) bypass HiveServer2 thrift interface for query results
[ https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HIVE-11527: Status: Patch Available (was: Open) > bypass HiveServer2 thrift interface for query results > - > > Key: HIVE-11527 > URL: https://issues.apache.org/jira/browse/HIVE-11527 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sergey Shelukhin >Assignee: Takanobu Asanuma > Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, > HIVE-11527.WIP.patch > > > Right now, HS2 reads query results and returns them to the caller via its > thrift API. > There should be an option for HS2 to return some pointer to results (an HDFS > link?) and for the user to read the results directly off HDFS inside the > cluster, or via something like WebHDFS outside the cluster > Review board link: https://reviews.apache.org/r/40867 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11527) bypass HiveServer2 thrift interface for query results
[ https://issues.apache.org/jira/browse/HIVE-11527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takanobu Asanuma updated HIVE-11527: Attachment: HIVE-11527.11.patch [~thejas] I uploaded a new patch in this jira and RB. And I left some comments in RB. Please could you check it? > bypass HiveServer2 thrift interface for query results > - > > Key: HIVE-11527 > URL: https://issues.apache.org/jira/browse/HIVE-11527 > Project: Hive > Issue Type: Improvement > Components: HiveServer2 >Reporter: Sergey Shelukhin >Assignee: Takanobu Asanuma > Attachments: HIVE-11527.10.patch, HIVE-11527.11.patch, > HIVE-11527.WIP.patch > > > Right now, HS2 reads query results and returns them to the caller via its > thrift API. > There should be an option for HS2 to return some pointer to results (an HDFS > link?) and for the user to read the results directly off HDFS inside the > cluster, or via something like WebHDFS outside the cluster > Review board link: https://reviews.apache.org/r/40867 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13872) Vectorization: Fix cross-product reduce sink serialization
[ https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matt McCline updated HIVE-13872: Attachment: HIVE-13872.03.patch > Vectorization: Fix cross-product reduce sink serialization > -- > > Key: HIVE-13872 > URL: https://issues.apache.org/jira/browse/HIVE-13872 > Project: Hive > Issue Type: Bug > Components: Vectorization >Affects Versions: 2.1.0 >Reporter: Gopal V >Assignee: Matt McCline > Attachments: HIVE-13872.01.patch, HIVE-13872.02.patch, > HIVE-13872.03.patch, HIVE-13872.WIP.patch, customer_demographics.txt, > vector_include_no_sel.q, vector_include_no_sel.q.out > > > TPC-DS Q13 produces a cross-product without CBO simplifying the query > {code} > Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0 > projection column num 1 > at > org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349) > at > org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267) > at > org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343) > at > org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130) > at > org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762) > ... 18 more > {code} > Simplified query > {code} > set hive.cbo.enable=false; > -- explain > select count(1) > from store_sales > ,customer_demographics > where ( > ( > customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk > and customer_demographics.cd_marital_status = 'M' > )or > ( >customer_demographics.cd_demo_sk = ss_cdemo_sk > and customer_demographics.cd_marital_status = 'U' > )) > ; > {code} > {code} > Map 3 > Map Operator Tree: > TableScan > alias: customer_demographics > Statistics: Num rows: 1920800 Data size: 717255532 Basic > stats: COMPLETE Column stats: NONE > Reduce Output Operator > sort order: > Statistics: Num rows: 1920800 Data size: 717255532 Basic > stats: COMPLETE Column stats: NONE > value expressions: cd_demo_sk (type: int), > cd_marital_status (type: string) > Execution mode: vectorized, llap > LLAP IO: all inputs > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Attachment: HIVE-13982.4.patch [~ashutoshc], later version of the patch deals with PTF operator, which could idd could problems if we ignore the order direction. I have updated the RB link accordingly. > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, > HIVE-13982.4.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Attachment: (was: HIVE-13982.4.patch) > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14069) update curator version to 2.10.0
[ https://issues.apache.org/jira/browse/HIVE-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14069: - Target Version/s: 2.2.0 Status: Patch Available (was: Open) > update curator version to 2.10.0 > - > > Key: HIVE-14069 > URL: https://issues.apache.org/jira/browse/HIVE-14069 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Metastore >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14069.1.patch > > > curator-2.10.0 has several bug fixes over current version (2.6.0), updating > would help improve stability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14069) update curator version to 2.10.0
[ https://issues.apache.org/jira/browse/HIVE-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14069: - Attachment: HIVE-14069.1.patch > update curator version to 2.10.0 > - > > Key: HIVE-14069 > URL: https://issues.apache.org/jira/browse/HIVE-14069 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Metastore >Reporter: Thejas M Nair >Assignee: Thejas M Nair > Attachments: HIVE-14069.1.patch > > > curator-2.10.0 has several bug fixes over current version (2.6.0), updating > would help improve stability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14069) update curator version to 2.10.0
[ https://issues.apache.org/jira/browse/HIVE-14069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-14069: - Issue Type: Improvement (was: Bug) > update curator version to 2.10.0 > - > > Key: HIVE-14069 > URL: https://issues.apache.org/jira/browse/HIVE-14069 > Project: Hive > Issue Type: Improvement > Components: HiveServer2, Metastore >Reporter: Thejas M Nair >Assignee: Thejas M Nair > > curator-2.10.0 has several bug fixes over current version (2.6.0), updating > would help improve stability. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits
[ https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340988#comment-15340988 ] Gopal V commented on HIVE-14060: This happens to any FS which calls FileSystem.listLocatedStatus via super(). https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java#L697 > Hive: Remove bogus "localhost" from Hive splits > --- > > Key: HIVE-14060 > URL: https://issues.apache.org/jira/browse/HIVE-14060 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.1.0, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-14060.1.patch > > > On remote filesystems like Azure, GCP and S3, the splits contain a filler > location of "localhost". > This is worse than having no location information at all - on large clusters > yarn waits upto 200[1] seconds for heartbeat from "localhost" before > allocating a container. > To speed up this process, the split affinity provider should scrub the bogus > "localhost" from the locations and allow for the allocation of "*" containers > instead on each heartbeat. > [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13725) ACID: Streaming API should synchronize calls when multiple threads use the same endpoint
[ https://issues.apache.org/jira/browse/HIVE-13725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340986#comment-15340986 ] Hive QA commented on HIVE-13725: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12812006/HIVE-13725.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10250 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/197/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/197/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-197/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12812006 - PreCommit-HIVE-MASTER-Build > ACID: Streaming API should synchronize calls when multiple threads use the > same endpoint > > > Key: HIVE-13725 > URL: https://issues.apache.org/jira/browse/HIVE-13725 > Project: Hive > Issue Type: Bug > Components: HCatalog, Metastore, Transactions >Affects Versions: 1.2.1, 2.0.0 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta >Priority: Critical > Labels: ACID, Streaming > Attachments: HIVE-13725.1.patch, HIVE-13725.2.patch > > > Currently, the streaming endpoint creates a metastore client which gets used > for RPC. The client itself is not internally thread safe. Therefore, the API > methods should provide the relevant synchronization so that the methods can > be called from different threads. A sample use case is as follows: > 1. Thread 1 creates a streaming endpoint and opens a txn batch. > 2. Thread 2 heartbeats the txn batch. > With the current impl, this can result in an "out of sequence response", > since the response of the calls in thread1 might end up going to thread2 and > vice-versa. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13985: - Fix Version/s: (was: 2.1.0) 2.1.1 > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.2.0, 2.1.1 > > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340893#comment-15340893 ] Prasanth Jayachandran commented on HIVE-13985: -- The last test run was initialization test failure. I ran them locally to make sure this patch did not break anything and it ran successfully. > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.1.0, 2.2.0 > > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support
[ https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13744: - Attachment: HIVE-13744.2.patch Fixed the comment > LLAP IO - add complex types support > --- > > Key: HIVE-13744 > URL: https://issues.apache.org/jira/browse/HIVE-13744 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Labels: llap, orc > Attachments: HIVE-13744.1.patch, HIVE-13744.2.patch > > > Recently, complex type column vectors were added to Hive. We should use them > in IO elevator. > Vectorization itself doesn't support complex types (yet), but this would be > useful when it does, also it will enable LLAP IO elevator to be used in > non-vectorized context with complex types after HIVE-13617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566
[ https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340867#comment-15340867 ] Aihua Xu commented on HIVE-14062: - The patch looks good. +1 pending test. > Changes from HIVE-13502 overwritten by HIVE-13566 > - > > Key: HIVE-14062 > URL: https://issues.apache.org/jira/browse/HIVE-14062 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-14062.1.patch > > > Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I > will confirm with the author that it was inadvertent before I re-add it. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13744) LLAP IO - add complex types support
[ https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340865#comment-15340865 ] Sergey Shelukhin commented on HIVE-13744: - +1, small nit on rb > LLAP IO - add complex types support > --- > > Key: HIVE-13744 > URL: https://issues.apache.org/jira/browse/HIVE-13744 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Labels: llap, orc > Attachments: HIVE-13744.1.patch > > > Recently, complex type column vectors were added to Hive. We should use them > in IO elevator. > Vectorization itself doesn't support complex types (yet), but this would be > useful when it does, also it will enable LLAP IO elevator to be used in > non-vectorized context with complex types after HIVE-13617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13913) LLAP: introduce backpressure to recordreader
[ https://issues.apache.org/jira/browse/HIVE-13913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340861#comment-15340861 ] Sergey Shelukhin commented on HIVE-13913: - [~gopalv] do you want to review? > LLAP: introduce backpressure to recordreader > > > Key: HIVE-13913 > URL: https://issues.apache.org/jira/browse/HIVE-13913 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13913.01.patch, HIVE-13913.02.patch, > HIVE-13913.03.patch, HIVE-13913.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14065) Provide an API for making Hive read-only for a short period
[ https://issues.apache.org/jira/browse/HIVE-14065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340856#comment-15340856 ] Mohit Sabharwal commented on HIVE-14065: [~alangates], I think Colin is requesting an API that effectively takes a (shared?) lock at the metastore level, disallowing all writes that currently each need an exclusive Zk lock. > Provide an API for making Hive read-only for a short period > --- > > Key: HIVE-14065 > URL: https://issues.apache.org/jira/browse/HIVE-14065 > Project: Hive > Issue Type: Improvement >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > HIVE-7973 added a notification log which allows clients to do incremental > replication of the Hive metastore. However, it is a challenge to get the > initial state of the Hive database. Using existing APIs may give us an > inconsistent state. For example, if a Hive table is renamed while we're > loading all tables, we may miss that information. > The easiest way to fix this would be to provide an API for making Hive > read-only for a short period. This locking API would come with a timeout so > that if the locker failed, the system would not stay down. It would return > an ID which uniquely identified the lock instance. The read-only lock itself > could be implemented by taking all the ZooKeeper locks. The RPC for removing > the lock would return back a status indicating whether the lock had timed out > before being removed or not. If it had timed out, we could retry our > snapshot loading process with a longer timeout period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14055) directSql - getting the number of partitions is broken
[ https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14055: Attachment: HIVE-14055.01.patch Nm, this actually uses filter not expr... I just c/ped the better way to handle it without the exception from the other filter call. If GetHelper::getSqlResult either throws or disables directsql for itself, the caller falls back to JDO. Added a comment to this effect... it looks like JDO path also exists for this call. > directSql - getting the number of partitions is broken > -- > > Key: HIVE-14055 > URL: https://issues.apache.org/jira/browse/HIVE-14055 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14055.01.patch, HIVE-14055.patch > > > Noticed while looking at something else. If the filter cannot be pushed down > it just returns 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14038: -- Status: Patch Available (was: Open) > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, > HIVE-14038.7.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14038: -- Attachment: HIVE-14038.7.patch > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, > HIVE-14038.7.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14038: -- Status: Open (was: Patch Available) > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, > HIVE-14038.7.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340842#comment-15340842 ] Hive QA commented on HIVE-13985: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811988/HIVE-13985.6.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 22 failed/errored test(s), 10234 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-vector_interval_2.q-dynamic_partition_pruning.q-vectorization_10.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_13 org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.org.apache.hadoop.hive.cli.TestMiniTezCliDriver org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cbo_subq_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_cte_4 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_delete_all_non_partitioned org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_disable_merge_for_bucketing org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_empty_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_groupby3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_insert_into2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_orc_merge7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_partition_column_names_with_leading_and_trailing_spaces org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_schema_evol_orc_vec_mapwork_part_all_primitive org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_groupby_reduce org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_inner_join org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vector_struct_in org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_vectorized_case org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/196/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/196/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-196/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 22 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811988 - PreCommit-HIVE-MASTER-Build > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.1.0, 2.2.0 > > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference
[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340834#comment-15340834 ] Prasanth Jayachandran commented on HIVE-13985: -- The test failures are not related btw. > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.1.0, 2.2.0 > > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13985: - Resolution: Fixed Fix Version/s: 2.2.0 2.1.0 1.3.0 Status: Resolved (was: Patch Available) Thanks [~sershe] for the reviews! Committed to branch-2.1 and master as well. > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Fix For: 1.3.0, 2.1.0, 2.2.0 > > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340826#comment-15340826 ] Sergey Shelukhin edited comment on HIVE-14066 at 6/21/16 1:07 AM: -- heh... it's better to close the old one, since this one already had the patch... I was just being nitpicky was (Author: sershe): heh... it was better to close the old one, since this one already had the patch... I was just being nitpicky > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14055) directSql - getting the number of partitions is broken
[ https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340824#comment-15340824 ] Sergey Shelukhin commented on HIVE-14055: - null means "filter cannot be pushed down", which is a normal condition, many filters cannot be pushed down; some of the methods (e.g. get partitions) evaluate it in metastore instead, using partition name list, some give up and fall back to ORM path. Perhaps I can c/p the evaluation path. > directSql - getting the number of partitions is broken > -- > > Key: HIVE-14055 > URL: https://issues.apache.org/jira/browse/HIVE-14055 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14055.patch > > > Noticed while looking at something else. If the filter cannot be pushed down > it just returns 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340826#comment-15340826 ] Sergey Shelukhin commented on HIVE-14066: - heh... it was better to close the old one, since this one already had the patch... I was just being nitpicky > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340825#comment-15340825 ] Jesus Camacho Rodriguez commented on HIVE-13982: New patch uploaded; I want to get a QA run. I still need to check whether PTF would cause troubles with the new dedup extension. I will update the JIRA case shortly. > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, > HIVE-13982.4.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Status: Patch Available (was: In Progress) > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, > HIVE-13982.4.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Work started] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HIVE-13982 started by Jesus Camacho Rodriguez. -- > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, > HIVE-13982.4.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Status: Open (was: Patch Available) > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, > HIVE-13982.4.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13982) Extensions to RS dedup: execute with different column order and sorting direction if possible
[ https://issues.apache.org/jira/browse/HIVE-13982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jesus Camacho Rodriguez updated HIVE-13982: --- Attachment: HIVE-13982.4.patch > Extensions to RS dedup: execute with different column order and sorting > direction if possible > - > > Key: HIVE-13982 > URL: https://issues.apache.org/jira/browse/HIVE-13982 > Project: Hive > Issue Type: Improvement > Components: Physical Optimizer >Affects Versions: 2.2.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-13982.2.patch, HIVE-13982.3.patch, > HIVE-13982.4.patch, HIVE-13982.patch > > > Pointed out by [~gopalv]. > RS dedup should kick in for these cases, avoiding an additional shuffle stage. > {code} > select state, city, sum(sales) from table > group by state, city > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state, city > limit 10; > {code} > {code} > select state, city, sum(sales) from table > group by city, state > order by state desc, city > limit 10; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits
[ https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340815#comment-15340815 ] Sergey Shelukhin commented on HIVE-14060: - Does it happen on FSes other than Azure? The culprit there seems to be AZURE_BLOCK_LOCATION_HOST_DEFAULT in the FS. It may be azure-specific... > Hive: Remove bogus "localhost" from Hive splits > --- > > Key: HIVE-14060 > URL: https://issues.apache.org/jira/browse/HIVE-14060 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.1.0, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-14060.1.patch > > > On remote filesystems like Azure, GCP and S3, the splits contain a filler > location of "localhost". > This is worse than having no location information at all - on large clusters > yarn waits upto 200[1] seconds for heartbeat from "localhost" before > allocating a container. > To speed up this process, the split affinity provider should scrub the bogus > "localhost" from the locations and allow for the allocation of "*" containers > instead on each heartbeat. > [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits
[ https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340815#comment-15340815 ] Sergey Shelukhin edited comment on HIVE-14060 at 6/21/16 1:02 AM: -- Does it happen on FSes other than Azure? The culprit there seems to be AZURE_BLOCK_LOCATION_HOST_DEFAULT in the FS. It may be azure-specific... maybe it should be fixed in HDFS too was (Author: sershe): Does it happen on FSes other than Azure? The culprit there seems to be AZURE_BLOCK_LOCATION_HOST_DEFAULT in the FS. It may be azure-specific... > Hive: Remove bogus "localhost" from Hive splits > --- > > Key: HIVE-14060 > URL: https://issues.apache.org/jira/browse/HIVE-14060 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.1.0, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-14060.1.patch > > > On remote filesystems like Azure, GCP and S3, the splits contain a filler > location of "localhost". > This is worse than having no location information at all - on large clusters > yarn waits upto 200[1] seconds for heartbeat from "localhost" before > allocating a container. > To speed up this process, the split affinity provider should scrub the bogus > "localhost" from the locations and allow for the allocation of "*" containers > instead on each heartbeat. > [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14038: -- Attachment: HIVE-14038.6.patch > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.6.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14038: -- Attachment: (was: HIVE-14038.branch-1.2.patch) > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14066: - Resolution: Duplicate Status: Resolved (was: Patch Available) Duplicate of HIVE-13744. > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support
[ https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13744: - Affects Version/s: 2.2.0 > LLAP IO - add complex types support > --- > > Key: HIVE-13744 > URL: https://issues.apache.org/jira/browse/HIVE-13744 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Labels: llap, orc > Attachments: HIVE-13744.1.patch > > > Recently, complex type column vectors were added to Hive. We should use them > in IO elevator. > Vectorization itself doesn't support complex types (yet), but this would be > useful when it does, also it will enable LLAP IO elevator to be used in > non-vectorized context with complex types after HIVE-13617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support
[ https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13744: - Target Version/s: 2.2.0 Status: Patch Available (was: Open) > LLAP IO - add complex types support > --- > > Key: HIVE-13744 > URL: https://issues.apache.org/jira/browse/HIVE-13744 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Labels: llap, orc > Attachments: HIVE-13744.1.patch > > > Recently, complex type column vectors were added to Hive. We should use them > in IO elevator. > Vectorization itself doesn't support complex types (yet), but this would be > useful when it does, also it will enable LLAP IO elevator to be used in > non-vectorized context with complex types after HIVE-13617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support
[ https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13744: - Attachment: HIVE-13744.1.patch > LLAP IO - add complex types support > --- > > Key: HIVE-13744 > URL: https://issues.apache.org/jira/browse/HIVE-13744 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Labels: llap, orc > Attachments: HIVE-13744.1.patch > > > Recently, complex type column vectors were added to Hive. We should use them > in IO elevator. > Vectorization itself doesn't support complex types (yet), but this would be > useful when it does, also it will enable LLAP IO elevator to be used in > non-vectorized context with complex types after HIVE-13617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13744) LLAP IO - add complex types support
[ https://issues.apache.org/jira/browse/HIVE-13744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13744: - Labels: llap orc (was: ) > LLAP IO - add complex types support > --- > > Key: HIVE-13744 > URL: https://issues.apache.org/jira/browse/HIVE-13744 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Labels: llap, orc > Attachments: HIVE-13744.1.patch > > > Recently, complex type column vectors were added to Hive. We should use them > in IO elevator. > Vectorization itself doesn't support complex types (yet), but this would be > useful when it does, also it will enable LLAP IO elevator to be used in > non-vectorized context with complex types after HIVE-13617 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340799#comment-15340799 ] Prasanth Jayachandran commented on HIVE-14066: -- Sorry for the oversight :) I will close this one.. Will use the old one instead. > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14066: Reporter: Sergey Shelukhin (was: Prasanth Jayachandran) > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340795#comment-15340795 ] Sergey Shelukhin commented on HIVE-14066: - Dup of HIVE-13744 :P I'll take a look eventually > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Sergey Shelukhin >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340793#comment-15340793 ] Sergey Shelukhin commented on HIVE-13985: - +1 > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13590) Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
[ https://issues.apache.org/jira/browse/HIVE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-13590: --- Resolution: Fixed Fix Version/s: 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Patch has been committed to master (for 2.2.0) and branch-2.1 (for 2.1.1). Thank [~szehon] and [~spena] for review. > Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case > - > > Key: HIVE-13590 > URL: https://issues.apache.org/jira/browse/HIVE-13590 > Project: Hive > Issue Type: Bug > Components: Authentication, Security >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 2.2.0, 2.1.1 > > Attachments: HIVE-13590.1.patch, HIVE-13590.1.patch, > HIVE-13590.patch, HIVE-13590.patch > > > In a kerberized HS2 with LDAP authentication enabled, LDAP user usually logs > in using username in form of username@domain in LDAP multi-domain case. But > it fails if the domain was not in the Hadoop auth_to_local mapping rule, the > error is as following: > {code} > Caused by: > org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: > No rules applied to ct...@mydomain.com > at > org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) > at org.apache.hadoop.security.User.(User.java:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14057) Add an option in llapstatus to generate output to a file
[ https://issues.apache.org/jira/browse/HIVE-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340791#comment-15340791 ] Sergey Shelukhin commented on HIVE-14057: - if it's invoked remotely, then how would one use the file (which is remote)? Redirecting the remote command on the remote side will create the file there too, w/o motd/etc. Normally such tools would output the results into stdout and other crap into stderr. Also some comments on RB > Add an option in llapstatus to generate output to a file > > > Key: HIVE-14057 > URL: https://issues.apache.org/jira/browse/HIVE-14057 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14057.01.patch, HIVE-14057.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14023) LLAP: Make the Hive query id available in ContainerRunner
[ https://issues.apache.org/jira/browse/HIVE-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340778#comment-15340778 ] Sergey Shelukhin commented on HIVE-14023: - +1 > LLAP: Make the Hive query id available in ContainerRunner > - > > Key: HIVE-14023 > URL: https://issues.apache.org/jira/browse/HIVE-14023 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14023.01.patch, HIVE-14023.02.patch > > > Needed to generate logs per query. > We can use the dag identifier for now, but that isn't very useful. (The > queryId may not be too useful either if users cannot find it - but that's > better than a dagIdentifier) > The queryId is available right now after the Processor starts, which is too > late for log changes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14068) make more effort to find hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14068: Status: Patch Available (was: Open) > make more effort to find hive-site.xml > -- > > Key: HIVE-14068 > URL: https://issues.apache.org/jira/browse/HIVE-14068 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14068.patch > > > It pretty much doesn't make sense to run Hive w/o the config, so we should > make more effort to find one if it's missing on the classpath, or the > classloader does not return it for some reason (e.g. classloader ignores some > permission issues; explicitly looking for the file may expose them better) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14068) make more effort to find hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-14068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14068: Attachment: HIVE-14068.patch Small patch; no behavior changes as long as the config is in the classpath. [~ashutoshc] can you take a look? > make more effort to find hive-site.xml > -- > > Key: HIVE-14068 > URL: https://issues.apache.org/jira/browse/HIVE-14068 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14068.patch > > > It pretty much doesn't make sense to run Hive w/o the config, so we should > make more effort to find one if it's missing on the classpath, or the > classloader does not return it for some reason (e.g. classloader ignores some > permission issues; explicitly looking for the file may expose them better) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340768#comment-15340768 ] Prasanth Jayachandran commented on HIVE-13985: -- [~sershe] Can you take another look plz? > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems
[ https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340743#comment-15340743 ] Ashutosh Chauhan commented on HIVE-13901: - Some of these tests are still failing to me when ran locally: {code} Running org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore Tests run: 34, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 91.734 sec <<< FAILURE! - in org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore testPartition(org.apache.hadoop.hive.metastore.TestEmbeddedHiveMetaStore) Time elapsed: 1.749 sec <<< FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:55) at junit.framework.Assert.assertTrue(Assert.java:22) at junit.framework.Assert.assertTrue(Assert.java:31) at junit.framework.TestCase.assertTrue(TestCase.java:201) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.partitionTester(TestHiveMetaStore.java:443) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.testPartition(TestHiveMetaStore.java:146) {code} {code} Running org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore Tests run: 34, Failures: 1, Errors: 1, Skipped: 0, Time elapsed: 91.111 sec <<< FAILURE! - in org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore testTransactionalValidation(org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore) Time elapsed: 0.143 sec <<< ERROR! org.apache.hadoop.hive.metastore.api.AlreadyExistsException: Table acidTable already exists at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:41480) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result$create_table_with_environment_context_resultStandardScheme.read(ThriftHiveMetastore.java:41466) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$create_table_with_environment_context_result.read(ThriftHiveMetastore.java:41392) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:86) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_create_table_with_environment_context(ThriftHiveMetastore.java:1183) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.create_table_with_environment_context(ThriftHiveMetastore.java:1169) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.create_table_with_environment_context(HiveMetaStoreClient.java:2325) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:738) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.createTable(HiveMetaStoreClient.java:726) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.createTable(TestHiveMetaStore.java:2967) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.testTransactionalValidation(TestHiveMetaStore.java:2897) {code} {code} testPartition(org.apache.hadoop.hive.metastore.TestRemoteHiveMetaStore) Time elapsed: 1.675 sec <<< FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:55) at junit.framework.Assert.assertTrue(Assert.java:22) at junit.framework.Assert.assertTrue(Assert.java:31) at junit.framework.TestCase.assertTrue(TestCase.java:201) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.partitionTester(TestHiveMetaStore.java:443) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.testPartition(TestHiveMetaStore.java:146) {code} {code} testPartition(org.apache.hadoop.hive.metastore.TestSetUGIOnOnlyServer) Time elapsed: 1.771 sec <<< FAILURE! junit.framework.AssertionFailedError: null at junit.framework.Assert.fail(Assert.java:55) at junit.framework.Assert.assertTrue(Assert.java:22) at junit.framework.Assert.assertTrue(Assert.java:31) at junit.framework.TestCase.assertTrue(TestCase.java:201) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.partitionTester(TestHiveMetaStore.java:443) at org.apache.hadoop.hive.metastore.TestHiveMetaStore.testPartition(TestHiveMetaStore.java:146) {code} > Hivemetastore add partitions can be slow depending on filesystems > - > > Key: HIVE-13901 > URL: https://issues.apache.org/jira/browse/HIVE-13901 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, > HIVE-13901.6.patch > > > Depending on FS, creating external tables & adding partitions can be > expensive (e.g msck
[jira] [Commented] (HIVE-13960) Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.
[ https://issues.apache.org/jira/browse/HIVE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340739#comment-15340739 ] zhihai xu commented on HIVE-13960: -- Yes, renaming pendingCount to activeCalls sounds good to me. Will fix it in a follow up JIRA HIVE-14067. thanks for the review [~thejas]! > Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for > back-to-back synchronous operations. > > > Key: HIVE-13960 > URL: https://issues.apache.org/jira/browse/HIVE-13960 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.2.0 > > Attachments: HIVE-13960.000.patch > > > Session timeout may happen before > HIVE_SERVER2_IDLE_SESSION_TIMEOUT(hive.server2.idle.session.timeout) for > back-to-back synchronous operations. > This issue can happen with the following two operations op1 and op2: op2 is a > synchronous long running operation, op2 is running right after op1 is closed. > > 1. closeOperation(op1) is called: > this will set {{lastIdleTime}} with value System.currentTimeMillis() because > {{opHandleSet}} becomes empty after {{closeOperation}} remove op1 from > {{opHandleSet}}. > 2. op2 is running for long time by calling {{executeStatement}} right after > closeOperation(op1) is called. > If op2 is running for more than HIVE_SERVER2_IDLE_SESSION_TIMEOUT, then the > session will timeout even when op2 is still running. > We hit this issue when we use PyHive to execute non-async operation > The following is the exception we see: > {code} > File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 126, in > close > _check_status(response) > File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 362, in > _check_status > raise OperationalError(response) > OperationalError: TCloseSessionResp(status=TStatus(errorCode=0, > errorMessage='Session does not exist!', sqlState=None, > infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not > exist!:12:11', > 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311', > 'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', > 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258', > 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', > 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', > 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', > > 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', > > 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', > > 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615', > 'java.lang.Thread:run:Thread.java:745'], statusCode=3)) > TCloseSessionResp(status=TStatus(errorCode=0, errorMessage='Session does not > exist!', sqlState=None, > infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not > exist!:12:11', > 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311', > 'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', > 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258', > 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', > 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', > 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', > > 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', > > 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', > > 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615', > 'java.lang.Thread:run:Thread.java:745'], statusCode=3)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14065) Provide an API for making Hive read-only for a short period
[ https://issues.apache.org/jira/browse/HIVE-14065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340696#comment-15340696 ] Alan Gates commented on HIVE-14065: --- I'm unclear what you mean by "taking all the ZooKeeper locks". Can you elaborate? > Provide an API for making Hive read-only for a short period > --- > > Key: HIVE-14065 > URL: https://issues.apache.org/jira/browse/HIVE-14065 > Project: Hive > Issue Type: Improvement >Reporter: Colin Patrick McCabe >Assignee: Colin Patrick McCabe > > HIVE-7973 added a notification log which allows clients to do incremental > replication of the Hive metastore. However, it is a challenge to get the > initial state of the Hive database. Using existing APIs may give us an > inconsistent state. For example, if a Hive table is renamed while we're > loading all tables, we may miss that information. > The easiest way to fix this would be to provide an API for making Hive > read-only for a short period. This locking API would come with a timeout so > that if the locker failed, the system would not stay down. It would return > an ID which uniquely identified the lock instance. The read-only lock itself > could be implemented by taking all the ZooKeeper locks. The RPC for removing > the lock would return back a status indicating whether the lock had timed out > before being removed or not. If it had timed out, we could retry our > snapshot loading process with a longer timeout period. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14066: - Status: Patch Available (was: Open) > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14066) LLAP: Orc encoded data reader should support complex types
[ https://issues.apache.org/jira/browse/HIVE-14066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-14066: - Attachment: HIVE-14066.1.patch NOTE: Vectorized operator pipeline does not support complex types yet. Although reader supports reading out complex vectors the Vectorizer will fail to vectorize. We use the same test cases from vectorization in MiniLlap, when vectorization supports complex types the encoded data reader should work fine. > LLAP: Orc encoded data reader should support complex types > -- > > Key: HIVE-14066 > URL: https://issues.apache.org/jira/browse/HIVE-14066 > Project: Hive > Issue Type: Bug >Affects Versions: 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-14066.1.patch > > > Currently LLAP encoded data reader does not support complex types. Now that > ORC supports reading complex vectors we should support in LLAP as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14057) Add an option in llapstatus to generate output to a file
[ https://issues.apache.org/jira/browse/HIVE-14057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340652#comment-15340652 ] Hive QA commented on HIVE-14057: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811969/HIVE-14057.02.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 10237 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/194/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/194/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-194/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811969 - PreCommit-HIVE-MASTER-Build > Add an option in llapstatus to generate output to a file > > > Key: HIVE-14057 > URL: https://issues.apache.org/jira/browse/HIVE-14057 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14057.01.patch, HIVE-14057.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566
[ https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340637#comment-15340637 ] Naveen Gangam commented on HIVE-14062: -- [~aihuaxu] Could you please re-review and re-commit please? Thank you > Changes from HIVE-13502 overwritten by HIVE-13566 > - > > Key: HIVE-14062 > URL: https://issues.apache.org/jira/browse/HIVE-14062 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-14062.1.patch > > > Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I > will confirm with the author that it was inadvertent before I re-add it. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566
[ https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14062: - Attachment: HIVE-14062.1.patch The fix and test is exactly same as HIVE-13502. > Changes from HIVE-13502 overwritten by HIVE-13566 > - > > Key: HIVE-14062 > URL: https://issues.apache.org/jira/browse/HIVE-14062 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-14062.1.patch > > > Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I > will confirm with the author that it was inadvertent before I re-add it. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566
[ https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Naveen Gangam updated HIVE-14062: - Status: Patch Available (was: Open) > Changes from HIVE-13502 overwritten by HIVE-13566 > - > > Key: HIVE-14062 > URL: https://issues.apache.org/jira/browse/HIVE-14062 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > Attachments: HIVE-14062.1.patch > > > Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I > will confirm with the author that it was inadvertent before I re-add it. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13930) upgrade Hive to latest Hadoop version
[ https://issues.apache.org/jira/browse/HIVE-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340620#comment-15340620 ] Sergio Peña commented on HIVE-13930: [~sershe] I don't know why you're getting those dependencies issues. The Spark file that is downloaded is written on {{itests/thirdparty}}, but I do not see any JARs there. Please don't remove the {{SparkCliDriver}} yet. That is an important test that validates hive on spark. [~xuefuz] Do you have any idea about why this is failing? > upgrade Hive to latest Hadoop version > - > > Key: HIVE-13930 > URL: https://issues.apache.org/jira/browse/HIVE-13930 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-13930.01.patch, HIVE-13930.02.patch, > HIVE-13930.03.patch, HIVE-13930.04.patch, HIVE-13930.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13960) Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.
[ https://issues.apache.org/jira/browse/HIVE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340617#comment-15340617 ] Thejas M Nair commented on HIVE-13960: -- Thanks for the patch [~zxu] and [~jxiang] >From the variable name pendingCount, its hard to understand what it >represents. Should we name it activeCalls instead ? If you agree, the change can be done in a follow up jira. > Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for > back-to-back synchronous operations. > > > Key: HIVE-13960 > URL: https://issues.apache.org/jira/browse/HIVE-13960 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.2.0 > > Attachments: HIVE-13960.000.patch > > > Session timeout may happen before > HIVE_SERVER2_IDLE_SESSION_TIMEOUT(hive.server2.idle.session.timeout) for > back-to-back synchronous operations. > This issue can happen with the following two operations op1 and op2: op2 is a > synchronous long running operation, op2 is running right after op1 is closed. > > 1. closeOperation(op1) is called: > this will set {{lastIdleTime}} with value System.currentTimeMillis() because > {{opHandleSet}} becomes empty after {{closeOperation}} remove op1 from > {{opHandleSet}}. > 2. op2 is running for long time by calling {{executeStatement}} right after > closeOperation(op1) is called. > If op2 is running for more than HIVE_SERVER2_IDLE_SESSION_TIMEOUT, then the > session will timeout even when op2 is still running. > We hit this issue when we use PyHive to execute non-async operation > The following is the exception we see: > {code} > File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 126, in > close > _check_status(response) > File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 362, in > _check_status > raise OperationalError(response) > OperationalError: TCloseSessionResp(status=TStatus(errorCode=0, > errorMessage='Session does not exist!', sqlState=None, > infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not > exist!:12:11', > 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311', > 'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', > 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258', > 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', > 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', > 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', > > 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', > > 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', > > 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615', > 'java.lang.Thread:run:Thread.java:745'], statusCode=3)) > TCloseSessionResp(status=TStatus(errorCode=0, errorMessage='Session does not > exist!', sqlState=None, > infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not > exist!:12:11', > 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311', > 'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', > 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258', > 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', > 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', > 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', > > 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', > > 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', > > 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615', > 'java.lang.Thread:run:Thread.java:745'], statusCode=3)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-13960) Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for back-to-back synchronous operations.
[ https://issues.apache.org/jira/browse/HIVE-13960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340617#comment-15340617 ] Thejas M Nair edited comment on HIVE-13960 at 6/20/16 11:00 PM: Thanks for the patch [~zxu] and [~jxiang] >From the variable name pendingCount, its hard to understand what it >represents. Should we name it activeCalls (or something on that lines) instead >? If you agree, the change can be done in a follow up jira. was (Author: thejas): Thanks for the patch [~zxu] and [~jxiang] >From the variable name pendingCount, its hard to understand what it >represents. Should we name it activeCalls instead ? If you agree, the change can be done in a follow up jira. > Session timeout may happen before HIVE_SERVER2_IDLE_SESSION_TIMEOUT for > back-to-back synchronous operations. > > > Key: HIVE-13960 > URL: https://issues.apache.org/jira/browse/HIVE-13960 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Reporter: zhihai xu >Assignee: zhihai xu > Fix For: 2.2.0 > > Attachments: HIVE-13960.000.patch > > > Session timeout may happen before > HIVE_SERVER2_IDLE_SESSION_TIMEOUT(hive.server2.idle.session.timeout) for > back-to-back synchronous operations. > This issue can happen with the following two operations op1 and op2: op2 is a > synchronous long running operation, op2 is running right after op1 is closed. > > 1. closeOperation(op1) is called: > this will set {{lastIdleTime}} with value System.currentTimeMillis() because > {{opHandleSet}} becomes empty after {{closeOperation}} remove op1 from > {{opHandleSet}}. > 2. op2 is running for long time by calling {{executeStatement}} right after > closeOperation(op1) is called. > If op2 is running for more than HIVE_SERVER2_IDLE_SESSION_TIMEOUT, then the > session will timeout even when op2 is still running. > We hit this issue when we use PyHive to execute non-async operation > The following is the exception we see: > {code} > File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 126, in > close > _check_status(response) > File "/usr/local/lib/python2.7/dist-packages/pyhive/hive.py", line 362, in > _check_status > raise OperationalError(response) > OperationalError: TCloseSessionResp(status=TStatus(errorCode=0, > errorMessage='Session does not exist!', sqlState=None, > infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not > exist!:12:11', > 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311', > 'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', > 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258', > 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', > 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', > 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', > > 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', > > 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', > > 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615', > 'java.lang.Thread:run:Thread.java:745'], statusCode=3)) > TCloseSessionResp(status=TStatus(errorCode=0, errorMessage='Session does not > exist!', sqlState=None, > infoMessages=['*org.apache.hive.service.cli.HiveSQLException:Session does not > exist!:12:11', > 'org.apache.hive.service.cli.session.SessionManager:closeSession:SessionManager.java:311', > 'org.apache.hive.service.cli.CLIService:closeSession:CLIService.java:221', > 'org.apache.hive.service.cli.thrift.ThriftCLIService:CloseSession:ThriftCLIService.java:471', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1273', > > 'org.apache.hive.service.cli.thrift.TCLIService$Processor$CloseSession:getResult:TCLIService.java:1258', > 'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39', > 'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39', > 'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56', > > 'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285', > > 'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145', > > 'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615', >
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340603#comment-15340603 ] Sergey Shelukhin commented on HIVE-13884: - It would fall back to ORM in this case. Assuming there was ORM implementation in the original patch > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340594#comment-15340594 ] Sergio Peña commented on HIVE-13884: If {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns an error or throws an exception whenever the internal {{PartitionFilterGenerator.generateSqlFilter}} fails, then how should we handle the partition limit request? There is no data to validate this, and we cannot abort the query because of this. [~sershe] [~mohitsabharwal] Any ideas on this? Should we fix the {[generateSqlFilter}} to avoid returning NULL when the filter cannot be formed? > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14041) llap scripts add hadoop and other libraries from the machine local install to the daemon classpath
[ https://issues.apache.org/jira/browse/HIVE-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340584#comment-15340584 ] Siddharth Seth commented on HIVE-14041: --- Ok. Looked into this some more. `hadoop classpath` does not provide the LD_LIBRARY_PATH. This is setup separately via template.py, and required hadoop_home to be set. (Ends up being "None/lib/native"). Somehow on the cluster I was using to test this - the LD_LIBRARY_PATH was setup correctly before invoking runLlapDaemon.sh. A yarn NM export maybe ? IAC - the native part seems unrelated to this jira, and can be investigated in a follow up. [~gopalv] - please review. > llap scripts add hadoop and other libraries from the machine local install to > the daemon classpath > -- > > Key: HIVE-14041 > URL: https://issues.apache.org/jira/browse/HIVE-14041 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14041.01.patch > > > `hadoop classpath` ends up getting added to the classpath of llap daemons. > This essentially means picking up the classpath from the local deploy. > This isn't required since the slider package includes relevant libraries > (shipped from the client) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13884) Disallow queries fetching more than a configured number of partitions in PartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-13884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340585#comment-15340585 ] Sergio Peña commented on HIVE-13884: Sometimes, the {{MetastoreDirectSql.getNumPartitionsViaSqlFilter()}} returns 0 when the query filter expression couldn't be created. This number makes a false positive to the limit request if the number of partitions is too large, so causing the query to fetch all partitions. HIVE-14055 is required for this patch. > Disallow queries fetching more than a configured number of partitions in > PartitionPruner > > > Key: HIVE-13884 > URL: https://issues.apache.org/jira/browse/HIVE-13884 > Project: Hive > Issue Type: Improvement >Reporter: Mohit Sabharwal >Assignee: Sergio Peña > Attachments: HIVE-13884.1.patch, HIVE-13884.2.patch, > HIVE-13884.3.patch, HIVE-13884.4.patch, HIVE-13884.5.patch, HIVE-13884.6.patch > > > Currently the PartitionPruner requests either all partitions or partitions > based on filter expression. In either scenarios, if the number of partitions > accessed is large there can be significant memory pressure at the HMS server > end. > We already have a config {{hive.limit.query.max.table.partition}} that > enforces limits on number of partitions that may be scanned per operator. But > this check happens after the PartitionPruner has already fetched all > partitions. > We should add an option at PartitionPruner level to disallow queries that > attempt to access number of partitions beyond a configurable limit. > Note that {{hive.mapred.mode=strict}} disallow queries without a partition > filter in PartitionPruner, but this check accepts any query with a pruning > condition, even if partitions fetched are large. In multi-tenant > environments, admins could use more control w.r.t. number of partitions > allowed based on HMS memory capacity. > One option is to have PartitionPruner first fetch the partition names > (instead of partition specs) and throw an exception if number of partitions > exceeds the configured value. Otherwise, fetch the partition specs. > Looks like the existing {{listPartitionNames}} call could be used if extended > to take partition filter expressions like {{getPartitionsByExpr}} call does. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size
[ https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Zheng updated HIVE-13809: - Resolution: Fixed Fix Version/s: 2.2.0 2.1.0 Status: Resolved (was: Patch Available) Thanks [~gopalv] for the review. Committed to master and branch-2.1. > Hybrid Grace Hash Join memory usage estimation didn't take into account the > bloom filter size > - > > Key: HIVE-13809 > URL: https://issues.apache.org/jira/browse/HIVE-13809 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Fix For: 2.1.0, 2.2.0 > > Attachments: HIVE-13809.1.patch > > > Memory estimation is important during hash table loading, because we need to > make the decision of whether to load the next hash partition in memory or > spill it. If the assumption is there's enough memory but it turns out not the > case, we will run into OOM problem. > Currently hybrid grace hash join memory usage estimation didn't take into > account the bloom filter size. In large test cases (TB scale) the bloom > filter grows as big as hundreds of MB, big enough to cause estimation error. > The solution is to count in the bloom filter size into memory estimation. > Another issue this patch will fix is possible NPE due to object cache reuse > during hybrid grace hash join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (HIVE-14064) beeline to auto connect to the HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar reopened HIVE-14064: > beeline to auto connect to the HiveServer2 > -- > > Key: HIVE-14064 > URL: https://issues.apache.org/jira/browse/HIVE-14064 > Project: Hive > Issue Type: Improvement > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > > Currently one has to give an jdbc:hive2 url in order for Beeline to connect a > hiveserver2 instance. It would be great if Beeline can get the info somehow > (from a properties file at a well-known location?) and connect automatically > if user doesn't specify such a url. If the properties file is not present, > then beeline would expect user to provide the url and credentials using > !connect or ./beeline -u .. commands > While Beeline is flexible (being a mere JDBC client), most environments would > have just a single HS2. Having users to manually connect into this via either > "beeline ~/.propsfile" or -u or !connect statements is lowering the > experience part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-14064) beeline to auto connect to the HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-14064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vihang Karajgaonkar resolved HIVE-14064. Resolution: Duplicate > beeline to auto connect to the HiveServer2 > -- > > Key: HIVE-14064 > URL: https://issues.apache.org/jira/browse/HIVE-14064 > Project: Hive > Issue Type: Improvement > Components: Beeline >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Minor > > Currently one has to give an jdbc:hive2 url in order for Beeline to connect a > hiveserver2 instance. It would be great if Beeline can get the info somehow > (from a properties file at a well-known location?) and connect automatically > if user doesn't specify such a url. If the properties file is not present, > then beeline would expect user to provide the url and credentials using > !connect or ./beeline -u .. commands > While Beeline is flexible (being a mere JDBC client), most environments would > have just a single HS2. Having users to manually connect into this via either > "beeline ~/.propsfile" or -u or !connect statements is lowering the > experience part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14038: -- Attachment: HIVE-14038.branch-1.2.patch > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.5.patch, HIVE-14038.branch-1.2.patch, > HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566
[ https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340543#comment-15340543 ] Naveen Gangam commented on HIVE-14062: -- [~pxiong] No problem at all. Its all good. Thanks for the quick confirmation. > Changes from HIVE-13502 overwritten by HIVE-13566 > - > > Key: HIVE-14062 > URL: https://issues.apache.org/jira/browse/HIVE-14062 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > > Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I > will confirm with the author that it was inadvertent before I re-add it. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566
[ https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340529#comment-15340529 ] Pengcheng Xiong commented on HIVE-14062: It was lost during rebase and they were not intentional at all. I am sorry about that and please add them back. Thanks. > Changes from HIVE-13502 overwritten by HIVE-13566 > - > > Key: HIVE-14062 > URL: https://issues.apache.org/jira/browse/HIVE-14062 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > > Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I > will confirm with the author that it was inadvertent before I re-add it. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14062) Changes from HIVE-13502 overwritten by HIVE-13566
[ https://issues.apache.org/jira/browse/HIVE-14062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340512#comment-15340512 ] Naveen Gangam commented on HIVE-14062: -- [~pxiong] Could you please take a look at the change in HIVE-13566 and let me know if the overwritten changes were inadvertent that may have been lost during rebase or were they intentional? Thanks > Changes from HIVE-13502 overwritten by HIVE-13566 > - > > Key: HIVE-14062 > URL: https://issues.apache.org/jira/browse/HIVE-14062 > Project: Hive > Issue Type: Bug > Components: Beeline >Affects Versions: 2.1.0 >Reporter: Naveen Gangam >Assignee: Naveen Gangam > > Appears that changes from HIVE-13566 overwrote the changes from HIVE-13502. I > will confirm with the author that it was inadvertent before I re-add it. > Thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13809) Hybrid Grace Hash Join memory usage estimation didn't take into account the bloom filter size
[ https://issues.apache.org/jira/browse/HIVE-13809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340503#comment-15340503 ] Gopal V commented on HIVE-13809: LGTM - +1. The bloom filter sizing needs a revisit, since this is pre-allocated based on estimates, not on real row-counts - allowing more false positives at higher cardinalities, to keep the memory utilization under check. > Hybrid Grace Hash Join memory usage estimation didn't take into account the > bloom filter size > - > > Key: HIVE-13809 > URL: https://issues.apache.org/jira/browse/HIVE-13809 > Project: Hive > Issue Type: Bug > Components: Hive >Affects Versions: 2.0.0, 2.1.0 >Reporter: Wei Zheng >Assignee: Wei Zheng > Attachments: HIVE-13809.1.patch > > > Memory estimation is important during hash table loading, because we need to > make the decision of whether to load the next hash partition in memory or > spill it. If the assumption is there's enough memory but it turns out not the > case, we will run into OOM problem. > Currently hybrid grace hash join memory usage estimation didn't take into > account the bloom filter size. In large test cases (TB scale) the bloom > filter grows as big as hundreds of MB, big enough to cause estimation error. > The solution is to count in the bloom filter size into memory estimation. > Another issue this patch will fix is possible NPE due to object cache reuse > during hybrid grace hash join. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits
[ https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-14060: --- Status: Patch Available (was: Open) > Hive: Remove bogus "localhost" from Hive splits > --- > > Key: HIVE-14060 > URL: https://issues.apache.org/jira/browse/HIVE-14060 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.1.0, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-14060.1.patch > > > On remote filesystems like Azure, GCP and S3, the splits contain a filler > location of "localhost". > This is worse than having no location information at all - on large clusters > yarn waits upto 200[1] seconds for heartbeat from "localhost" before > allocating a container. > To speed up this process, the split affinity provider should scrub the bogus > "localhost" from the locations and allow for the allocation of "*" containers > instead on each heartbeat. > [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14060) Hive: Remove bogus "localhost" from Hive splits
[ https://issues.apache.org/jira/browse/HIVE-14060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated HIVE-14060: --- Attachment: HIVE-14060.1.patch > Hive: Remove bogus "localhost" from Hive splits > --- > > Key: HIVE-14060 > URL: https://issues.apache.org/jira/browse/HIVE-14060 > Project: Hive > Issue Type: Bug > Components: Tez >Affects Versions: 2.1.0, 2.2.0 >Reporter: Gopal V >Assignee: Gopal V > Attachments: HIVE-14060.1.patch > > > On remote filesystems like Azure, GCP and S3, the splits contain a filler > location of "localhost". > This is worse than having no location information at all - on large clusters > yarn waits upto 200[1] seconds for heartbeat from "localhost" before > allocating a container. > To speed up this process, the split affinity provider should scrub the bogus > "localhost" from the locations and allow for the allocation of "*" containers > instead on each heartbeat. > [1] - yarn.scheduler.capacity.node-locality-delay=40 x heartbeat of 5s -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13566) Auto-gather column stats - phase 1
[ https://issues.apache.org/jira/browse/HIVE-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340489#comment-15340489 ] Pengcheng Xiong commented on HIVE-13566: [~vihangk1], sorry about that, could u add the fix back? thanks. > Auto-gather column stats - phase 1 > -- > > Key: HIVE-13566 > URL: https://issues.apache.org/jira/browse/HIVE-13566 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13566.01.patch, HIVE-13566.02.patch, > HIVE-13566.03.patch > > > This jira adds code and tests for auto-gather column stats. Golden file > update will be done in phase 2 - HIVE-11160 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13566) Auto-gather column stats - phase 1
[ https://issues.apache.org/jira/browse/HIVE-13566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340476#comment-15340476 ] Vihang Karajgaonkar commented on HIVE-13566: Looks like the commit for this Jira removed the fix for HIVE-13502 too. > Auto-gather column stats - phase 1 > -- > > Key: HIVE-13566 > URL: https://issues.apache.org/jira/browse/HIVE-13566 > Project: Hive > Issue Type: Sub-task >Affects Versions: 2.0.0 >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Labels: TODOC2.1 > Fix For: 2.1.0 > > Attachments: HIVE-13566.01.patch, HIVE-13566.02.patch, > HIVE-13566.03.patch > > > This jira adds code and tests for auto-gather column stats. Golden file > update will be done in phase 2 - HIVE-11160 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems
[ https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340475#comment-15340475 ] Sergey Shelukhin commented on HIVE-13901: - +1 > Hivemetastore add partitions can be slow depending on filesystems > - > > Key: HIVE-13901 > URL: https://issues.apache.org/jira/browse/HIVE-13901 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, > HIVE-13901.6.patch > > > Depending on FS, creating external tables & adding partitions can be > expensive (e.g msck which adds all partitions). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13901) Hivemetastore add partitions can be slow depending on filesystems
[ https://issues.apache.org/jira/browse/HIVE-13901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340468#comment-15340468 ] Rajesh Balamohan commented on HIVE-13901: - This is due to HIVE-14054. With HIVE-14054, all these tests pass in my machine. > Hivemetastore add partitions can be slow depending on filesystems > - > > Key: HIVE-13901 > URL: https://issues.apache.org/jira/browse/HIVE-13901 > Project: Hive > Issue Type: Sub-task > Components: Metastore >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan >Priority: Minor > Attachments: HIVE-13901.1.patch, HIVE-13901.2.patch, > HIVE-13901.6.patch > > > Depending on FS, creating external tables & adding partitions can be > expensive (e.g msck which adds all partitions). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14012) some ColumnVector-s are missing ensureSize
[ https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14012: Resolution: Fixed Fix Version/s: 2.0.2 2.1.1 2.2.0 1.3.0 Status: Resolved (was: Patch Available) Committed to all the affected branches... > some ColumnVector-s are missing ensureSize > -- > > Key: HIVE-14012 > URL: https://issues.apache.org/jira/browse/HIVE-14012 > Project: Hive > Issue Type: Bug >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.2.0, 2.1.1, 2.0.2 > > Attachments: HIVE-14012.01.patch, HIVE-14012.01.patch, > HIVE-14012.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14024) setAllColumns is called incorrectly after some changes
[ https://issues.apache.org/jira/browse/HIVE-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-14024: Resolution: Fixed Fix Version/s: 2.0.2 2.1.1 2.2.0 Status: Resolved (was: Patch Available) Committed > setAllColumns is called incorrectly after some changes > -- > > Key: HIVE-14024 > URL: https://issues.apache.org/jira/browse/HIVE-14024 > Project: Hive > Issue Type: Bug >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Fix For: 2.2.0, 2.1.1, 2.0.2 > > Attachments: HIVE-14024.01.patch, HIVE-14024.patch > > > h/t [~gopalv] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14012) some ColumnVector-s are missing ensureSize
[ https://issues.apache.org/jira/browse/HIVE-14012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340426#comment-15340426 ] Sergey Shelukhin commented on HIVE-14012: - All the failures are known > some ColumnVector-s are missing ensureSize > -- > > Key: HIVE-14012 > URL: https://issues.apache.org/jira/browse/HIVE-14012 > Project: Hive > Issue Type: Bug >Reporter: Takahiko Saito >Assignee: Sergey Shelukhin > Attachments: HIVE-14012.01.patch, HIVE-14012.01.patch, > HIVE-14012.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340422#comment-15340422 ] Eugene Koifman commented on HIVE-14038: --- [~wzheng] could you review please > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340418#comment-15340418 ] Hive QA commented on HIVE-14038: Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12811851/HIVE-14038.3.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 6 failed/errored test(s), 10223 tests executed *Failed tests:* {noformat} TestMiniTezCliDriver-tez_union_group_by.q-vector_auto_smb_mapjoin_14.q-union_fast_stats.q-and-12-more - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_constantPropagateForSubQuery org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_list_bucket_dml_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_list_bucket org.apache.hadoop.hive.cli.TestMiniSparkOnYarnCliDriver.testCliDriver_index_bitmap3 org.apache.hive.jdbc.TestJdbcWithMiniLlap.testLlapInputFormatEndToEnd {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/193/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-MASTER-Build/193/console Test logs: http://ec2-50-18-27-0.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-MASTER-Build-193/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 6 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12811851 - PreCommit-HIVE-MASTER-Build > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13490) Change itests to be part of the main Hive build
[ https://issues.apache.org/jira/browse/HIVE-13490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340368#comment-15340368 ] Zoltan Haindrich commented on HIVE-13490: - This whole sentence was confusing I just wanted to point out the positive side of this: that it will compile the integration tests too when it's enabled; and that might come handy if someone is working on api changesafter a few minutes i ended up removing it because this thing doesn't really align with the section's topic. > Change itests to be part of the main Hive build > --- > > Key: HIVE-13490 > URL: https://issues.apache.org/jira/browse/HIVE-13490 > Project: Hive > Issue Type: Improvement >Reporter: Siddharth Seth >Assignee: Zoltan Haindrich > Fix For: 2.2.0 > > Attachments: HIVE-13490.01.patch, HIVE-13490.02.patch, > HIVE-13490.03.patch > > > Instead of having to build Hive, and then itests separately. > With IntelliJ, this ends up being loaded as two separate dependencies, and > there's a lot of hops involved to make changes. > Does anyone know why these have been kept separate ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13590) Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case
[ https://issues.apache.org/jira/browse/HIVE-13590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340353#comment-15340353 ] Sergio Peña commented on HIVE-13590: Thanks [~ctang.ma] +1 > Kerberized HS2 with LDAP auth enabled fails in multi-domain LDAP case > - > > Key: HIVE-13590 > URL: https://issues.apache.org/jira/browse/HIVE-13590 > Project: Hive > Issue Type: Bug > Components: Authentication, Security >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Attachments: HIVE-13590.1.patch, HIVE-13590.1.patch, > HIVE-13590.patch, HIVE-13590.patch > > > In a kerberized HS2 with LDAP authentication enabled, LDAP user usually logs > in using username in form of username@domain in LDAP multi-domain case. But > it fails if the domain was not in the Hadoop auth_to_local mapping rule, the > error is as following: > {code} > Caused by: > org.apache.hadoop.security.authentication.util.KerberosName$NoMatchingRule: > No rules applied to ct...@mydomain.com > at > org.apache.hadoop.security.authentication.util.KerberosName.getShortName(KerberosName.java:389) > at org.apache.hadoop.security.User.(User.java:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-13985) ORC improvements for reducing the file system calls in task side
[ https://issues.apache.org/jira/browse/HIVE-13985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated HIVE-13985: - Attachment: HIVE-13985.6.patch Addressed [~sershe]'s review comments for master patch. > ORC improvements for reducing the file system calls in task side > > > Key: HIVE-13985 > URL: https://issues.apache.org/jira/browse/HIVE-13985 > Project: Hive > Issue Type: Bug > Components: ORC >Affects Versions: 1.3.0, 2.2.0 >Reporter: Prasanth Jayachandran >Assignee: Prasanth Jayachandran > Attachments: HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-1.patch, HIVE-13985-branch-1.patch, > HIVE-13985-branch-2.1.patch, HIVE-13985.1.patch, HIVE-13985.2.patch, > HIVE-13985.3.patch, HIVE-13985.4.patch, HIVE-13985.5.patch, HIVE-13985.6.patch > > > HIVE-13840 fixed some issues with addition file system invocations during > split generation. Similarly, this jira will fix issues with additional file > system invocations on the task side. To avoid reading footers on the task > side, users can set hive.orc.splits.include.file.footer to true which will > serialize the orc footers on the splits. But this has issues with serializing > unwanted information like column statistics and other metadata which are not > really required for reading orc split on the task side. We can reduce the > payload on the orc splits by serializing only the minimum required > information (stripe information, types, compression details). This will > decrease the payload on the orc splits and can potentially avoid OOMs in > application master (AM) during split generation. This jira also address other > issues concerning the AM cache. The local cache used by AM is soft reference > cache. This can introduce unpredictability across multiple runs of the same > query. We can cache the serialized footer in the local cache and also use > strong reference cache which should avoid memory pressure and will have > better predictability. > One other improvement that we can do is when > hive.orc.splits.include.file.footer is set to false, on the task side we make > one additional file system call to know the size of the file. If we can > serialize the file length in the orc split this can be avoided. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14041) llap scripts add hadoop and other libraries from the machine local install to the daemon classpath
[ https://issues.apache.org/jira/browse/HIVE-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Siddharth Seth updated HIVE-14041: -- Status: Open (was: Patch Available) > llap scripts add hadoop and other libraries from the machine local install to > the daemon classpath > -- > > Key: HIVE-14041 > URL: https://issues.apache.org/jira/browse/HIVE-14041 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14041.01.patch > > > `hadoop classpath` ends up getting added to the classpath of llap daemons. > This essentially means picking up the classpath from the local deploy. > This isn't required since the slider package includes relevant libraries > (shipped from the client) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14041) llap scripts add hadoop and other libraries from the machine local install to the daemon classpath
[ https://issues.apache.org/jira/browse/HIVE-14041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340347#comment-15340347 ] Siddharth Seth commented on HIVE-14041: --- >From talking with [~gopalv] - hadoop classpath was making native libs >available. Will make some changes to the patch for the same. > llap scripts add hadoop and other libraries from the machine local install to > the daemon classpath > -- > > Key: HIVE-14041 > URL: https://issues.apache.org/jira/browse/HIVE-14041 > Project: Hive > Issue Type: Bug > Components: llap >Reporter: Siddharth Seth >Assignee: Siddharth Seth > Attachments: HIVE-14041.01.patch > > > `hadoop classpath` ends up getting added to the classpath of llap daemons. > This essentially means picking up the classpath from the local deploy. > This isn't required since the slider package includes relevant libraries > (shipped from the client) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14038) miscellaneous acid improvements
[ https://issues.apache.org/jira/browse/HIVE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-14038: -- Attachment: HIVE-14038.4.patch > miscellaneous acid improvements > --- > > Key: HIVE-14038 > URL: https://issues.apache.org/jira/browse/HIVE-14038 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 2.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-14038.2.patch, HIVE-14038.3.patch, > HIVE-14038.4.patch, HIVE-14038.patch > > > 1. fix thread name inHouseKeeperServiceBase (currently they are all > "org.apache.hadoop.hive.ql.txn.compactor.HouseKeeperServiceBase$1-0") > 2. dump metastore configs from HiveConf on start up to help record values of > properties > 3. add some tests -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14055) directSql - getting the number of partitions is broken
[ https://issues.apache.org/jira/browse/HIVE-14055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340243#comment-15340243 ] Sergio Peña commented on HIVE-14055: What about throwing a checked exception? I agree with you of invalid values. I see that a NULL value can be returned in case the filter couldn't be formed correctly. If NULL means an error, then a checked exception should be better handled by developers, shouldn't it? > directSql - getting the number of partitions is broken > -- > > Key: HIVE-14055 > URL: https://issues.apache.org/jira/browse/HIVE-14055 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-14055.patch > > > Noticed while looking at something else. If the filter cannot be pushed down > it just returns 0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-13441) LLAPIF: security and signed fragments
[ https://issues.apache.org/jira/browse/HIVE-13441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340226#comment-15340226 ] Sergey Shelukhin commented on HIVE-13441: - [~jdere] is there the documentation for external access? We could add some information there > LLAPIF: security and signed fragments > - > > Key: HIVE-13441 > URL: https://issues.apache.org/jira/browse/HIVE-13441 > Project: Hive > Issue Type: Improvement >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Labels: llap > Fix For: 2.2.0 > > > Allows external clients to get securely signed splits from HS2, and submit > them to LLAP without running as a privileged user; LLAP will verify the > splits before running. -- This message was sent by Atlassian JIRA (v6.3.4#6332)