[jira] [Commented] (HIVE-11533) Loop optimization for SIMD in integer comparisons
[ https://issues.apache.org/jira/browse/HIVE-11533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948096#comment-14948096 ] Hive QA commented on HIVE-11533: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765289/HIVE-11533.4.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9563 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5565/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5565/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5565/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765289 - PreCommit-HIVE-TRUNK-Build > Loop optimization for SIMD in integer comparisons > - > > Key: HIVE-11533 > URL: https://issues.apache.org/jira/browse/HIVE-11533 > Project: Hive > Issue Type: Sub-task > Components: Vectorization >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Attachments: HIVE-11533.1.patch, HIVE-11533.2.patch, > HIVE-11533.3.patch, HIVE-11533.4.patch > > > Long*CompareLong* classes can be optimized with subtraction and bitwise > operators for better SIMD optimization. > {code} > for(int i = 0; i != n; i++) { > outputVector[i] = vector1[0] > vector2[i] ? 1 : 0; > } > {code} > This issue will cover following classes; > - LongColEqualLongColumn > - LongColNotEqualLongColumn > - LongColGreaterLongColumn > - LongColGreaterEqualLongColumn > - LongColLessLongColumn > - LongColLessEqualLongColumn > - LongScalarEqualLongColumn > - LongScalarNotEqualLongColumn > - LongScalarGreaterLongColumn > - LongScalarGreaterEqualLongColumn > - LongScalarLessLongColumn > - LongScalarLessEqualLongColumn > - LongColEqualLongScalar > - LongColNotEqualLongScalar > - LongColGreaterLongScalar > - LongColGreaterEqualLongScalar > - LongColLessLongScalar > - LongColLessEqualLongScalar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11954) Extend logic to choose side table in MapJoin Conversion algorithm
[ https://issues.apache.org/jira/browse/HIVE-11954?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948033#comment-14948033 ] Laljo John Pullokkaran commented on HIVE-11954: --- "getNumberOfCostlyOps" could be made either recursive or use graph walker or by modifying nodeutils. > Extend logic to choose side table in MapJoin Conversion algorithm > - > > Key: HIVE-11954 > URL: https://issues.apache.org/jira/browse/HIVE-11954 > Project: Hive > Issue Type: Bug > Components: Physical Optimizer >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11954.01.patch, HIVE-11954.02.patch, > HIVE-11954.03.patch, HIVE-11954.patch, HIVE-11954.patch > > > Selection of side table (in memory/hash table) in MapJoin Conversion > algorithm needs to be more sophisticated. > In an N way Map Join, Hive should pick an input stream as side table (in > memory table) that has least cost in producing relation (like TS(FIL|Proj)*). > Cost based choice needs extended cost model; without return path its going to > be hard to do this. > For the time being we could employ a modified cost based algorithm for side > table selection. > New algorithm is described below: > 1. Identify the candidate set of inputs for side table (in memory/hash table) > from the inputs (based on conditional task size) > 2. For each of the input identify its cost, memory requirement. Cost is 1 for > each heavy weight relation op (Join, GB, PTF/Windowing, TF, etc.). Cost for > an input is the total no of heavy weight ops in its branch. > 3. Order set from #1 on cost & memory req (ascending order) > 4. Pick the first element from #3 as the side table. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12025) refactor bucketId generating code
[ https://issues.apache.org/jira/browse/HIVE-12025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948024#comment-14948024 ] Hive QA commented on HIVE-12025: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765297/HIVE-12025.2.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 9638 tests executed *Failed tests:* {noformat} TestMarkPartition - did not produce a TEST-*.xml file org.apache.hadoop.hive.cli.TestMiniTezCliDriver.initializationError org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.hcatalog.streaming.mutate.worker.TestBucketIdResolverImpl.testAttachBucketIdToRecord org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5564/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5564/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5564/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765297 - PreCommit-HIVE-TRUNK-Build > refactor bucketId generating code > - > > Key: HIVE-12025 > URL: https://issues.apache.org/jira/browse/HIVE-12025 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.0.1 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-12025.2.patch, HIVE-12025.patch > > > HIVE-11983 adds ObjectInspectorUtils.getBucketHashCode() and > getBucketNumber(). > There are several (at least) places in Hive that perform this computation: > # ReduceSinkOperator.computeBucketNumber > # ReduceSinkOperator.computeHashCode > # BucketIdResolverImpl - only in 2.0.0 ASF line > # FileSinkOperator.findWriterOffset > # GenericUDFHash > Should refactor it and make sure they all call methods from > ObjectInspectorUtils. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11880) filter bug of UNION ALL when hive.ppd.remove.duplicatefilters=true and filter condition is type incompatible column
[ https://issues.apache.org/jira/browse/HIVE-11880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948023#comment-14948023 ] WangMeng commented on HIVE-11880: - [~jpullokkaran] I have added Review Board in Issue Links. The execution engine is MR( I don't use TEZ) .You can use TPC-H(http://www.tpc.org/tpch/) to reproduce this Jira according to the descriptition above. Thanks. Different from HIVE-11919 , only when occurs "union type mismatch" and one of the type mismatch column is constant and this type mismatch column is filter column, then UNION ALL will throws HIVE-11880. > filter bug of UNION ALL when hive.ppd.remove.duplicatefilters=true and > filter condition is type incompatible column > - > > Key: HIVE-11880 > URL: https://issues.apache.org/jira/browse/HIVE-11880 > Project: Hive > Issue Type: Bug > Components: Logical Optimizer >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Attachments: HIVE-11880.01.patch, HIVE-11880.02.patch, > HIVE-11880.03.patch, HIVE-11880.04.patch > > >For UNION ALL , when an union operator is constant column (such as '0L', > BIGINT Type) and its corresponding column has incompatible type (such as INT > type). > Query with filter condition on type incompatible column on this UNION ALL > will cause IndexOutOfBoundsException. > Such as TPC-H table "orders",in the following query: > Type of 'orders'.'o_custkey' is INT normally, while the type of > corresponding constant column "0" is BIGINT( `0L AS `o_custkey` ). > This query (with filter "type incompatible column 'o_custkey' ") will fail > with java.lang.IndexOutOfBoundsException : > {code} > SELECT Count(1) > FROM ( > SELECT `o_orderkey` , > `o_custkey` > FROM `orders` > UNION ALL > SELECT `o_orderkey`, > 0L AS `o_custkey` > FROM `orders`) `oo` > WHERE o_custkey<10 limit 4 ; > {code} > When > {code} > set hive.ppd.remove.duplicatefilters=true > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11901) StorageBasedAuthorizationProvider requires write permission on table for SELECT statements
[ https://issues.apache.org/jira/browse/HIVE-11901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948018#comment-14948018 ] Chengbing Liu commented on HIVE-11901: -- [~thejas], I think we can add test cases for the authorization part in another JIRA and check this in first, if you think the patch is ok. > StorageBasedAuthorizationProvider requires write permission on table for > SELECT statements > -- > > Key: HIVE-11901 > URL: https://issues.apache.org/jira/browse/HIVE-11901 > Project: Hive > Issue Type: Bug > Components: Authorization >Affects Versions: 1.2.1 >Reporter: Chengbing Liu >Assignee: Chengbing Liu > Attachments: HIVE-11901.01.patch > > > With HIVE-7895, it will require write permission on the table directory even > for a SELECT statement. > Looking at the stacktrace, it seems the method > {{StorageBasedAuthorizationProvider#authorize(Table table, Partition part, > Privilege[] readRequiredPriv, Privilege[] writeRequiredPriv)}} always treats > a null partition as a CREATE statement, which can also be a SELECT. > We may have to check {{readRequiredPriv}} and {{writeRequiredPriv}} first > in order to tell which statement it is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11149) Fix issue with sometimes HashMap in PerfLogger.java hangs
[ https://issues.apache.org/jira/browse/HIVE-11149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14948011#comment-14948011 ] Chengbing Liu commented on HIVE-11149: -- [~sershe], would you commit this? > Fix issue with sometimes HashMap in PerfLogger.java hangs > -- > > Key: HIVE-11149 > URL: https://issues.apache.org/jira/browse/HIVE-11149 > Project: Hive > Issue Type: Bug > Components: Logging >Affects Versions: 1.2.1 >Reporter: WangMeng >Assignee: WangMeng > Attachments: HIVE-11149.01.patch, HIVE-11149.02.patch, > HIVE-11149.03.patch, HIVE-11149.04.patch > > > In Multi-thread environment, sometimes the HashMap in PerfLogger.java > will casue massive Java Processes hang and cost large amounts of > unnecessary CPU and Memory. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11768) java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances
[ https://issues.apache.org/jira/browse/HIVE-11768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nemon Lou updated HIVE-11768: - Priority: Minor (was: Major) > java.io.DeleteOnExitHook leaks memory on long running Hive Server2 Instances > > > Key: HIVE-11768 > URL: https://issues.apache.org/jira/browse/HIVE-11768 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 1.2.1 >Reporter: Nemon Lou >Assignee: Navis >Priority: Minor > Fix For: 2.0.0 > > Attachments: HIVE-11768.1.patch.txt, HIVE-11768.2.patch.txt > > > More than 490,000 paths was added to java.io.DeleteOnExitHook on one of our > long running HiveServer2 instances,taken up more than 100MB on heap. > Most of the paths contains a suffix of ".pipeout". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11533) Loop optimization for SIMD in integer comparisons
[ https://issues.apache.org/jira/browse/HIVE-11533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947998#comment-14947998 ] Chengxiang Li commented on HIVE-11533: -- Very nice job, the patch looks good, just one thing to remind. I guess the performance data is tested with "selectedInUse" is false. while "selectedInUse" is true, it could not benefit from SIMD instructions, during my previous experience, it might downgrade performance sometimes after the optimization, have you verified that? > Loop optimization for SIMD in integer comparisons > - > > Key: HIVE-11533 > URL: https://issues.apache.org/jira/browse/HIVE-11533 > Project: Hive > Issue Type: Sub-task > Components: Vectorization >Reporter: Teddy Choi >Assignee: Teddy Choi >Priority: Minor > Attachments: HIVE-11533.1.patch, HIVE-11533.2.patch, > HIVE-11533.3.patch, HIVE-11533.4.patch > > > Long*CompareLong* classes can be optimized with subtraction and bitwise > operators for better SIMD optimization. > {code} > for(int i = 0; i != n; i++) { > outputVector[i] = vector1[0] > vector2[i] ? 1 : 0; > } > {code} > This issue will cover following classes; > - LongColEqualLongColumn > - LongColNotEqualLongColumn > - LongColGreaterLongColumn > - LongColGreaterEqualLongColumn > - LongColLessLongColumn > - LongColLessEqualLongColumn > - LongScalarEqualLongColumn > - LongScalarNotEqualLongColumn > - LongScalarGreaterLongColumn > - LongScalarGreaterEqualLongColumn > - LongScalarLessLongColumn > - LongScalarLessEqualLongColumn > - LongColEqualLongScalar > - LongColNotEqualLongScalar > - LongColGreaterLongScalar > - LongColGreaterEqualLongScalar > - LongColLessLongScalar > - LongColLessEqualLongScalar -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12064) prevent transactional=false
[ https://issues.apache.org/jira/browse/HIVE-12064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-12064: -- Attachment: HIVE-12064.patch > prevent transactional=false > --- > > Key: HIVE-12064 > URL: https://issues.apache.org/jira/browse/HIVE-12064 > Project: Hive > Issue Type: Bug > Components: Transactions >Affects Versions: 1.0.0 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-12064.patch > > > currently a tblproperty transactional=true must be set to make a table behave > in ACID compliant way. > This is misleading in that it seems like changing it to transactional=false > makes the table non-acid but on disk layout of acid table is different than > plain tables. So changing this property may cause wrong data to be returned. > Should prevent transactional=false. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12021) HivePreFilteringRule may introduce wrong common operands
[ https://issues.apache.org/jira/browse/HIVE-12021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947975#comment-14947975 ] Laljo John Pullokkaran commented on HIVE-12021: --- The multi map should also get pruned. Consider the following case: (x=f(y) and x=10 and expr1) or (x=10 and expr2) The multi-map "reductionCondition" will contain both 1) "x=10" and 2) x=f(y) However x=f(y) is not present in all DNF elements. One may argue transitive effects (i.e x=f(y) should also be right since we have x=10); i think its brittle to leave multi map entries unchanged. > HivePreFilteringRule may introduce wrong common operands > > > Key: HIVE-12021 > URL: https://issues.apache.org/jira/browse/HIVE-12021 > Project: Hive > Issue Type: Bug > Components: CBO >Affects Versions: 1.3.0, 1.2.1, 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Fix For: 1.3.0, 2.0.0, 1.2.2 > > Attachments: HIVE-12021.01.patch, HIVE-12021.02.patch, > HIVE-12021.02.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11887) spark tests break the build on a shared machine
[ https://issues.apache.org/jira/browse/HIVE-11887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947965#comment-14947965 ] Rui Li commented on HIVE-11887: --- Sorry for the late response. Seems the change was introduced in HIVE-9664. [~nntnag17] do you have any idea for this issue? Thanks. > spark tests break the build on a shared machine > --- > > Key: HIVE-11887 > URL: https://issues.apache.org/jira/browse/HIVE-11887 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > > Spark download creates UDFExampleAdd jar in /tmp; when building on a shared > machine, someone else's jar from a build prevents this jar from being created > (I have no permissions to this file because it was created by a different > user) and the build fails. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-6643) Add a check for cross products in plans and output a warning
[ https://issues.apache.org/jira/browse/HIVE-6643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anthony Hsu updated HIVE-6643: -- Description: Now that we support old style join syntax, it is easy to write queries that generate a plan with a cross product. For e.g. say you have A join B join C join D on A.x = B.x and A.y = D.y and C.z = D.z So the JoinTree is: A — B \|__ D — C Since we don't reorder join graphs, we will end up with a cross product between (A join B) and C was: Now that we support old style join syntax, it is easy to write queries that generate a plan with a cross product. For e.g. say you have A join B join C join D on A.x = B.x and A.y = D.y and C.z = D.z So the JoinTree is: A — B |__ D — C Since we don't reorder join graphs, we will end up with a cross product between (A join B) and C > Add a check for cross products in plans and output a warning > > > Key: HIVE-6643 > URL: https://issues.apache.org/jira/browse/HIVE-6643 > Project: Hive > Issue Type: Bug >Reporter: Harish Butani >Assignee: Harish Butani > Fix For: 0.13.0 > > Attachments: HIVE-6643.1.patch, HIVE-6643.2.patch, HIVE-6643.3.patch, > HIVE-6643.4.patch, HIVE-6643.5.patch, HIVE-6643.6.patch, HIVE-6643.7.patch > > > Now that we support old style join syntax, it is easy to write queries that > generate a plan with a cross product. > For e.g. say you have A join B join C join D on A.x = B.x and A.y = D.y and > C.z = D.z > So the JoinTree is: > A — B > \|__ D — C > Since we don't reorder join graphs, we will end up with a cross product > between (A join B) and C -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947948#comment-14947948 ] Swarnim Kulkarni commented on HIVE-11609: - [~ashutoshc] Mind giving this another quick look and let me know if my comment [here|https://issues.apache.org/jira/browse/HIVE-11609?focusedCommentId=14935951&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14935951] makes sense? > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11609) Capability to add a filter to hbase scan via composite key doesn't work
[ https://issues.apache.org/jira/browse/HIVE-11609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Swarnim Kulkarni updated HIVE-11609: Attachment: HIVE-11609.3.patch.txt Reattaching patch rebasing with master and very minor updates. > Capability to add a filter to hbase scan via composite key doesn't work > --- > > Key: HIVE-11609 > URL: https://issues.apache.org/jira/browse/HIVE-11609 > Project: Hive > Issue Type: Bug > Components: HBase Handler >Reporter: Swarnim Kulkarni >Assignee: Swarnim Kulkarni > Attachments: HIVE-11609.1.patch.txt, HIVE-11609.2.patch.txt, > HIVE-11609.3.patch.txt > > > It seems like the capability to add filter to an hbase scan which was added > as part of HIVE-6411 doesn't work. This is primarily because in the > HiveHBaseInputFormat, the filter is added in the getsplits instead of > getrecordreader. This works fine for start and stop keys but not for filter > because a filter is respected only when an actual scan is performed. This is > also related to the initial refactoring that was done as part of HIVE-3420. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12061) add file type support to file metadata by expr call
[ https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12061: Description: Expr filtering, automatic caching, etc. should be aware of file types for advanced features. For now only ORC is supported, but I want to add boundary for between ORC-specific and general metastore code, that could later be used for other formats if needed. NO PRECOMMIT TESTS was:Expr filtering, automatic caching, etc. should be aware of file types for advanced features. For now only ORC is supported, but I want to add boundary for between ORC-specific and general metastore code, that could later be used for other formats if needed. > add file type support to file metadata by expr call > --- > > Key: HIVE-12061 > URL: https://issues.apache.org/jira/browse/HIVE-12061 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12061.nogen.patch, HIVE-12061.patch > > > Expr filtering, automatic caching, etc. should be aware of file types for > advanced features. For now only ORC is supported, but I want to add boundary > for between ORC-specific and general metastore code, that could later be used > for other formats if needed. > NO PRECOMMIT TESTS -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12061) add file type support to file metadata by expr call
[ https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12061: Attachment: HIVE-12061.patch HIVE-12061.nogen.patch Patch on top of HIVE-11676 > add file type support to file metadata by expr call > --- > > Key: HIVE-12061 > URL: https://issues.apache.org/jira/browse/HIVE-12061 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12061.nogen.patch, HIVE-12061.patch > > > Expr filtering, automatic caching, etc. should be aware of file types for > advanced features. For now only ORC is supported, but I want to add boundary > for between ORC-specific and general metastore code, that could later be used > for other formats if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12065) FS stats collection may generate incorrect stats for multi-insert query
[ https://issues.apache.org/jira/browse/HIVE-12065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-12065: Attachment: HIVE-12065.patch > FS stats collection may generate incorrect stats for multi-insert query > --- > > Key: HIVE-12065 > URL: https://issues.apache.org/jira/browse/HIVE-12065 > Project: Hive > Issue Type: Bug > Components: Statistics >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0 >Reporter: Ashutosh Chauhan >Assignee: Ashutosh Chauhan > Attachments: HIVE-12065.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12057) ORC sarg is logged too much
[ https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12057: Attachment: HIVE-12057.02.patch Patch with caching > ORC sarg is logged too much > --- > > Key: HIVE-12057 > URL: https://issues.apache.org/jira/browse/HIVE-12057 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HIVE-12057.01.patch, HIVE-12057.02.patch, > HIVE-12057.patch > > > SARG itself has too many newlines and it's logged for every splitgenerator in > split generation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HIVE-12062) enable HBase metastore file metadata cache for tez tests
[ https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947883#comment-14947883 ] Vikram Dixit K edited comment on HIVE-12062 at 10/8/15 12:46 AM: - Yes. It will propagate to the AM when we create the tez session. TestMiniTezCliDriver picks up the config from this site.xm and then the config is shipped from the client to the AM at the session creation time. was (Author: vikram.dixit): Yes. It will propagate to the AM when we create the tez session. It is shipped from the client to the AM at that time. > enable HBase metastore file metadata cache for tez tests > > > Key: HIVE-12062 > URL: https://issues.apache.org/jira/browse/HIVE-12062 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12062.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12062) enable HBase metastore file metadata cache for tez tests
[ https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947883#comment-14947883 ] Vikram Dixit K commented on HIVE-12062: --- Yes. It will propagate to the AM when we create the tez session. It is shipped from the client to the AM at that time. > enable HBase metastore file metadata cache for tez tests > > > Key: HIVE-12062 > URL: https://issues.apache.org/jira/browse/HIVE-12062 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12062.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12062) enable HBase metastore file metadata cache for tez tests
[ https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12062: Attachment: HIVE-12062.patch [~daijy] should this be sufficient? [~vikram.dixit] does this config propagate to AM in MiniTez? This is a client-side setting (as in, AM-side, for metastore cache). > enable HBase metastore file metadata cache for tez tests > > > Key: HIVE-12062 > URL: https://issues.apache.org/jira/browse/HIVE-12062 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12062.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled
[ https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947872#comment-14947872 ] Ashutosh Chauhan commented on HIVE-9695: +1 LGTM > Redundant filter operator in reducer Vertex when CBO is disabled > > > Key: HIVE-9695 > URL: https://issues.apache.org/jira/browse/HIVE-9695 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer >Affects Versions: 0.14.0, 1.0.0, 1.1.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch > > > There is a redundant filter operator in reducer Vertex when CBO is disabled. > Query > {code} > select > ss_item_sk, ss_ticket_number, ss_store_sk > from > store_sales a, store_returns b, store > where > a.ss_item_sk = b.sr_item_sk > and a.ss_ticket_number = b.sr_ticket_number > and ss_sold_date_sk between 2450816 and 2451500 > and sr_returned_date_sk between 2450816 and 2451500 > and s_store_sk = ss_store_sk; > {code} > Plan snippet > {code} > Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: (_col1 = _col27) and (_col8 = _col34)) and > _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) > and (_col49 = _col6)) (type: boolean) > {code} > Full plan with CBO disabled > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 > (SIMPLE_EDGE) > DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: b > filterExpr: ((sr_item_sk is not null and sr_ticket_number > is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: > boolean) > Statistics: Num rows: 2370038095 Data size: 170506118656 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (sr_item_sk is not null and sr_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: sr_item_sk (type: int), > sr_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: sr_item_sk (type: int), > sr_ticket_number (type: int) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: sr_returned_date_sk (type: int) > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: store > filterExpr: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 3256276 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: s_store_sk (type: int) > sort order: + > Map-reduce partition columns: s_store_sk (type: int) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: a > filterExpr: (((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 > AND 2451500) (type: boolean) > Statistics: Num rows: 28878719387 Data size: 2405805439460 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) (type: boolean) > Statistics: Num rows: 8405840828 Data size: 110101408700 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key
[jira] [Commented] (HIVE-11976) Extend CBO rules to being able to apply rules only once on a given operator
[ https://issues.apache.org/jira/browse/HIVE-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947861#comment-14947861 ] Hive QA commented on HIVE-11976: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765278/HIVE-11976.04.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9654 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5563/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5563/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5563/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765278 - PreCommit-HIVE-TRUNK-Build > Extend CBO rules to being able to apply rules only once on a given operator > --- > > Key: HIVE-11976 > URL: https://issues.apache.org/jira/browse/HIVE-11976 > Project: Hive > Issue Type: New Feature > Components: CBO >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11976.01.patch, HIVE-11976.02.patch, > HIVE-11976.03.patch, HIVE-11976.04.patch, HIVE-11976.patch > > > Create a way to bail out quickly from HepPlanner if the rule has been already > applied on a certain operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12062) enable HBase metastore file metadata cache for tez tests
[ https://issues.apache.org/jira/browse/HIVE-12062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12062: --- Assignee: Sergey Shelukhin > enable HBase metastore file metadata cache for tez tests > > > Key: HIVE-12062 > URL: https://issues.apache.org/jira/browse/HIVE-12062 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables
[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947848#comment-14947848 ] Chaoyu Tang commented on HIVE-11786: If the query is rewritten to get PART_ID first and then use PART_ID and COLUMN_NAME to query PART_COL_STATS table, probably a new index on COLUMN_NAME, PART_ID (CREATE INDEX COLNAME_PARTID_IDX ON PART_COL_STATS (COLUMN_NAME, PART_ID)) is still needed. I am going to work on the new queries. > Deprecate the use of redundant column in colunm stats related tables > > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, > HIVE-11786.2.patch, HIVE-11786.patch > > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate database normalization rules and cause a lot > of inconvenience (sometimes difficult) in column stats related feature > implementation. For example, when renaming a table, we have to update > TABLE_NAME column in these tables as well which is unnecessary. > This JIRA is first to deprecate the use of these columns at HMS code level. A > followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11894) CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table column name in CTAS queries
[ https://issues.apache.org/jira/browse/HIVE-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pengcheng Xiong updated HIVE-11894: --- Attachment: HIVE-11894.05.patch > CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table > column name in CTAS queries > --- > > Key: HIVE-11894 > URL: https://issues.apache.org/jira/browse/HIVE-11894 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-11894.01.patch, HIVE-11894.02.patch, > HIVE-11894.03.patch, HIVE-11894.04.patch, HIVE-11894.05.patch > > > To repro, run lineage2.q with return path turned on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11212) Create vectorized types for complex types
[ https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947835#comment-14947835 ] Sergey Shelukhin commented on HIVE-11212: - Actually nm I don't think this will affect much > Create vectorized types for complex types > - > > Key: HIVE-11212 > URL: https://issues.apache.org/jira/browse/HIVE-11212 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, > HIVE-11212.patch > > > We need vectorized types for structs, maps, lists, and unions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11212) Create vectorized types for complex types
[ https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947834#comment-14947834 ] Matt McCline commented on HIVE-11212: - ([~owen.omalley] talk to Sergey about this might affect his merge) > Create vectorized types for complex types > - > > Key: HIVE-11212 > URL: https://issues.apache.org/jira/browse/HIVE-11212 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, > HIVE-11212.patch > > > We need vectorized types for structs, maps, lists, and unions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11212) Create vectorized types for complex types
[ https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947831#comment-14947831 ] Sergey Shelukhin commented on HIVE-11212: - Is it possible to hold off until llap branch merge? > Create vectorized types for complex types > - > > Key: HIVE-11212 > URL: https://issues.apache.org/jira/browse/HIVE-11212 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, > HIVE-11212.patch > > > We need vectorized types for structs, maps, lists, and unions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12063) Pad Decimal numbers with trailing zeros to the scale of the column
[ https://issues.apache.org/jira/browse/HIVE-12063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-12063: --- Description: HIVE-7373 was to address the problems of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so on cannot be read into decimal(1,1). However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0. The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however. was: HIVE-7373 was to address the problem of trimming tailing zeros by Hive, which caused many problems including treating 0.0, 0.00 and so on as 0, which has different precision/scale. Please refer to HIVE-7373 description. However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems remained. HIVE-11835 was resolved recently to address one of the problems, where 0.0, 0.00, and so cannot be read into decimal(1,1). However, HIVE-11835 didn't address the problem of showing as 0 in query result for any decimal values such as 0.0, 0.00, etc. This causes confusion as 0 and 0.0 have different precision/scale than 0. The proposal here is to pad zeros for query result to the type's scale. This not only removes the confusion described above, but also aligns with many other DBs. Internal decimal number representation doesn't change, however. > Pad Decimal numbers with trailing zeros to the scale of the column > -- > > Key: HIVE-12063 > URL: https://issues.apache.org/jira/browse/HIVE-12063 > Project: Hive > Issue Type: Improvement > Components: Types >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 0.13 >Reporter: Xuefu Zhang >Assignee: Xuefu Zhang > > HIVE-7373 was to address the problems of trimming tailing zeros by Hive, > which caused many problems including treating 0.0, 0.00 and so on as 0, which > has different precision/scale. Please refer to HIVE-7373 description. > However, HIVE-7373 was reverted by HIVE-8745 while the underlying problems > remained. HIVE-11835 was resolved recently to address one of the problems, > where 0.0, 0.00, and so on cannot be read into decimal(1,1). > However, HIVE-11835 didn't address the problem of showing as 0 in query > result for any decimal values such as 0.0, 0.00, etc. This causes confusion > as 0 and 0.0 have different precision/scale than 0. > The proposal here is to pad zeros for query result to the type's scale. This > not only removes the confusion described above, but also aligns with many > other DBs. Internal decimal number representation doesn't change, however. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables
[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947807#comment-14947807 ] Siddharth Seth commented on HIVE-11786: --- Yes. There's a large number of entries - 2million+ in PART_COL_STATS. The number of new entries there is what I was worried about - how many new entries per table / partition, and whether that can have a significant impact. IAC, if the query is being re-written - maybe the indexes will not be required. > Deprecate the use of redundant column in colunm stats related tables > > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, > HIVE-11786.2.patch, HIVE-11786.patch > > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate database normalization rules and cause a lot > of inconvenience (sometimes difficult) in column stats related feature > implementation. For example, when renaming a table, we have to update > TABLE_NAME column in these tables as well which is unnecessary. > This JIRA is first to deprecate the use of these columns at HMS code level. A > followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11642: Attachment: (was: HIVE-11642.22.patch) > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.22.patch, HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11642: Attachment: HIVE-11642.22.patch > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.22.patch, HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947784#comment-14947784 ] Xuefu Zhang commented on HIVE-3976: --- [~DoingDone9], There is no change on the rules, which are enforced in their respective UDFs. > Support specifying scale and precision with Hive decimal type > - > > Key: HIVE-3976 > URL: https://issues.apache.org/jira/browse/HIVE-3976 > Project: Hive > Issue Type: New Feature > Components: Query Processor, Types >Affects Versions: 0.11.0 >Reporter: Mark Grover >Assignee: Xuefu Zhang > Fix For: 0.13.0 > > Attachments: HIVE-3976.1.patch, HIVE-3976.10.patch, > HIVE-3976.11.patch, HIVE-3976.2.patch, HIVE-3976.3.patch, HIVE-3976.4.patch, > HIVE-3976.5.patch, HIVE-3976.6.patch, HIVE-3976.7.patch, HIVE-3976.8.patch, > HIVE-3976.9.patch, HIVE-3976.patch, remove_prec_scale.diff > > > HIVE-2693 introduced support for Decimal datatype in Hive. However, the > current implementation has unlimited precision and provides no way to specify > precision and scale when creating the table. > For example, MySQL allows users to specify scale and precision of the decimal > datatype when creating the table: > {code} > CREATE TABLE numbers (a DECIMAL(20,2)); > {code} > Hive should support something similar too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11253) Move SearchArgument and VectorizedRowBatch classes to storage-api.
[ https://issues.apache.org/jira/browse/HIVE-11253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947769#comment-14947769 ] Xuefu Zhang commented on HIVE-11253: Just curious, why did we single out and move HiveDecimal.java to storage-api? It seems natural that it stays with other data types such as CHAR or VARCHAR. > Move SearchArgument and VectorizedRowBatch classes to storage-api. > -- > > Key: HIVE-11253 > URL: https://issues.apache.org/jira/browse/HIVE-11253 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.0.0 > > Attachments: HIVE-11253.patch, HIVE-11253.patch, HIVE-11253.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12053) Stats performance regression caused by HIVE-11786
[ https://issues.apache.org/jira/browse/HIVE-12053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947764#comment-14947764 ] Chaoyu Tang commented on HIVE-12053: [~sseth] have found that creating following indexes would not help to improve the stats performance either {code} CREATE INDEX COLNAME_TBLID_IDX ON TAB_COL_STATS (COLUMN_NAME, TBL_ID); CREATE INDEX COLNAME_IDX ON TAB_COL_STATS (COLUMN_NAME); CREATE INDEX COLNAME_PARTID_IDX ON PART_COL_STATS (COLUMN_NAME, PART_ID); CREATE INDEX COLNAME_IDX ON PART_COL_STATS (COLUMN_NAME); CREATE INDEX PARTNAME_IDX ON PARTITIONS (PART_NAME); CREATE INDEX TBLNAME_IDX ON TBLS (TBL_NAME); {code} > Stats performance regression caused by HIVE-11786 > - > > Key: HIVE-12053 > URL: https://issues.apache.org/jira/browse/HIVE-12053 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chaoyu Tang > > HIVE-11786 tried to normalize table TAB_COL_STATS/PART_COL_STATS but caused > performance regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables
[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947760#comment-14947760 ] Chaoyu Tang commented on HIVE-11786: Thanks [~sseth]. Maybe there are a large number of rows in TAB_COL_STATS/PART_COL_STATS, so it took a long time to create the index on their column "COLUMN_NAME". I am going to change the query to see if it is helpful. BTW, I have created a JIRA HIVE-12053 for the performance regression you found. > Deprecate the use of redundant column in colunm stats related tables > > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, > HIVE-11786.2.patch, HIVE-11786.patch > > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate database normalization rules and cause a lot > of inconvenience (sometimes difficult) in column stats related feature > implementation. For example, when renaming a table, we have to update > TABLE_NAME column in these tables as well which is unnecessary. > This JIRA is first to deprecate the use of these columns at HMS code level. A > followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11634: - Attachment: (was: HIVE-11634.990.patch) > Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) > -- > > Key: HIVE-11634 > URL: https://issues.apache.org/jira/browse/HIVE-11634 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, > HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, > HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, > HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, > HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, > HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, > HIVE-11634.99.patch, HIVE-11634.990.patch > > > Currently, we do not support partition pruning for the following scenario > {code} > create table pcr_t1 (key int, value string) partitioned by (ds string); > insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src > where key < 20 order by key; > explain extended select ds from pcr_t1 where struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > If we run the above query, we see that all the partitions of table pcr_t1 are > present in the filter predicate where as we can prune partition > (ds='2000-04-10'). > The optimization is to rewrite the above query into the following. > {code} > explain extended select ds from pcr_t1 where (struct(ds)) IN > (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) > is used by partition pruner to prune the columns which otherwise will not be > pruned. > This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11634: - Attachment: HIVE-11634.990.patch > Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) > -- > > Key: HIVE-11634 > URL: https://issues.apache.org/jira/browse/HIVE-11634 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, > HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, > HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, > HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, > HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, > HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, > HIVE-11634.99.patch, HIVE-11634.990.patch > > > Currently, we do not support partition pruning for the following scenario > {code} > create table pcr_t1 (key int, value string) partitioned by (ds string); > insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src > where key < 20 order by key; > explain extended select ds from pcr_t1 where struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > If we run the above query, we see that all the partitions of table pcr_t1 are > present in the filter predicate where as we can prune partition > (ds='2000-04-10'). > The optimization is to rewrite the above query into the following. > {code} > explain extended select ds from pcr_t1 where (struct(ds)) IN > (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) > is used by partition pruner to prune the columns which otherwise will not be > pruned. > This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11634: - Attachment: HIVE-11634.990.patch Added new test cases. > Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) > -- > > Key: HIVE-11634 > URL: https://issues.apache.org/jira/browse/HIVE-11634 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, > HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, > HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, > HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, > HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, > HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, > HIVE-11634.99.patch, HIVE-11634.990.patch > > > Currently, we do not support partition pruning for the following scenario > {code} > create table pcr_t1 (key int, value string) partitioned by (ds string); > insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src > where key < 20 order by key; > explain extended select ds from pcr_t1 where struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > If we run the above query, we see that all the partitions of table pcr_t1 are > present in the filter predicate where as we can prune partition > (ds='2000-04-10'). > The optimization is to rewrite the above query into the following. > {code} > explain extended select ds from pcr_t1 where (struct(ds)) IN > (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) > is used by partition pruner to prune the columns which otherwise will not be > pruned. > This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11212) Create vectorized types for complex types
[ https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947747#comment-14947747 ] Matt McCline commented on HIVE-11212: - +1 lgtm. So much code drives through these basic objects that a successful Hive QA test run is ok in lieu of additional unit tests. > Create vectorized types for complex types > - > > Key: HIVE-11212 > URL: https://issues.apache.org/jira/browse/HIVE-11212 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, > HIVE-11212.patch > > > We need vectorized types for structs, maps, lists, and unions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12061) add file type support to file metadata by expr call
[ https://issues.apache.org/jira/browse/HIVE-12061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12061: --- Assignee: Sergey Shelukhin > add file type support to file metadata by expr call > --- > > Key: HIVE-12061 > URL: https://issues.apache.org/jira/browse/HIVE-12061 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > Expr filtering, automatic caching, etc. should be aware of file types for > advanced features. For now only ORC is supported, but I want to add boundary > for between ORC-specific and general metastore code, that could later be used > for other formats if needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables
[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947732#comment-14947732 ] Siddharth Seth commented on HIVE-11786: --- No difference with the remaining indexes. (The index creation takes a long time btw - and may impact stat generation ?) {code} 2015-10-07T18:38:09,444 DEBUG [main([])]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 16195.018669ms + 0.058186ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE", min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), sum("NUM_FALSES"), avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE" as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME" "DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME" "PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE", "PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE", "PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE", "PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS", "PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES", "PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN "PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON ("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" = "DBS"."DB_ID")) VW where "DB_NAME" = ? and "TABLE_NAME" = ? and "COLUMN_NAME" in (?) and "PARTITION_NAME" in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,? {code} {code} 2015-10-07T18:38:29,309 DEBUG [main([])]: metastore.MetaStoreDirectSql (MetaStoreDirectSql.java:timingTrace(819)) - Direct SQL query in 18651.1996ms + 0.050665ms, the query is [select "COLUMN_NAME", "COLUMN_TYPE", min("LONG_LOW_VALUE"), max("LONG_HIGH_VALUE"), min("DOUBLE_LOW_VALUE"), max("DOUBLE_HIGH_VALUE"), min(cast("BIG_DECIMAL_LOW_VALUE" as decimal)), max(cast("BIG_DECIMAL_HIGH_VALUE" as decimal)), sum("NUM_NULLS"), max("NUM_DISTINCTS"), max("AVG_COL_LEN"), max("MAX_COL_LEN"), sum("NUM_TRUES"), sum("NUM_FALSES"), avg(("LONG_HIGH_VALUE"-"LONG_LOW_VALUE")/cast("NUM_DISTINCTS" as decimal)),avg(("DOUBLE_HIGH_VALUE"-"DOUBLE_LOW_VALUE")/"NUM_DISTINCTS"),avg((cast("BIG_DECIMAL_HIGH_VALUE" as decimal)-cast("BIG_DECIMAL_LOW_VALUE" as decimal))/"NUM_DISTINCTS"),sum("NUM_DISTINCTS") from (SELECT "DBS"."NAME" "DB_NAME", "TBLS"."TBL_NAME" "TABLE_NAME", "PARTITIONS"."PART_NAME" "PARTITION_NAME", "PCS"."COLUMN_NAME", "PCS"."COLUMN_TYPE", "PCS"."LONG_LOW_VALUE", "PCS"."LONG_HIGH_VALUE", "PCS"."DOUBLE_HIGH_VALUE", "PCS"."DOUBLE_LOW_VALUE", "PCS"."BIG_DECIMAL_LOW_VALUE", "PCS"."BIG_DECIMAL_HIGH_VALUE", "PCS"."NUM_NULLS", "PCS"."NUM_DISTINCTS", "PCS"."AVG_COL_LEN","PCS"."MAX_COL_LEN", "PCS"."NUM_TRUES", "PCS"."NUM_FALSES","PCS"."LAST_ANALYZED" FROM "PART_COL_STATS" "PCS" JOIN "PARTITIONS" ON ("PCS"."PART_ID" = "PARTITIONS"."PART_ID") JOIN "TBLS" ON ("PARTITIONS"."TBL_ID" = "TBLS"."TBL_ID") JOIN "DBS" ON ("TBLS"."DB_ID" = "DBS"."DB_ID")) VW where "DB_NAME" = ? and "TABLE_NAME" = ? and "COLUMN_NAME" in (?) and "PARTITION_NAME" in (?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,?,? {code} > Deprecate the use of redundant column in colunm stats related tables > > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, > HIVE-11786.2.patch, HIVE-11786.patch > > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate
[jira] [Commented] (HIVE-12057) ORC sarg is logged too much
[ https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947734#comment-14947734 ] Prasanth Jayachandran commented on HIVE-12057: -- Actualy never mind. serSearchArgument is called during during reader creation i.e for each splits. That shouldn't be a big problem I guess as it happens after split gen and goes to task logs. Few log lines per task should be fine. > ORC sarg is logged too much > --- > > Key: HIVE-12057 > URL: https://issues.apache.org/jira/browse/HIVE-12057 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HIVE-12057.01.patch, HIVE-12057.patch > > > SARG itself has too many newlines and it's logged for every splitgenerator in > split generation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11212) Create vectorized types for complex types
[ https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11212: - Attachment: HIVE-11212.patch Added the extra null check that Matthew asked for. > Create vectorized types for complex types > - > > Key: HIVE-11212 > URL: https://issues.apache.org/jira/browse/HIVE-11212 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch, > HIVE-11212.patch > > > We need vectorized types for structs, maps, lists, and unions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11894) CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table column name in CTAS queries
[ https://issues.apache.org/jira/browse/HIVE-11894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947728#comment-14947728 ] Hive QA commented on HIVE-11894: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765271/HIVE-11894.04.patch {color:green}SUCCESS:{color} +1 due to 3 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9655 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5562/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5562/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5562/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765271 - PreCommit-HIVE-TRUNK-Build > CBO: Calcite Operator To Hive Operator (Calcite Return Path): correct table > column name in CTAS queries > --- > > Key: HIVE-11894 > URL: https://issues.apache.org/jira/browse/HIVE-11894 > Project: Hive > Issue Type: Sub-task > Components: CBO >Reporter: Pengcheng Xiong >Assignee: Pengcheng Xiong > Attachments: HIVE-11894.01.patch, HIVE-11894.02.patch, > HIVE-11894.03.patch, HIVE-11894.04.patch > > > To repro, run lineage2.q with return path turned on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8343) Return value from BlockingQueue.offer() is not checked in DynamicPartitionPruner
[ https://issues.apache.org/jira/browse/HIVE-8343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8343: - Description: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html was: In addEvent() and processVertex(), there is call such as the following: {code} queue.offer(event); {code} The return value should be checked. If false is returned, event would not have been queued. Take a look at line 328 in: http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html > Return value from BlockingQueue.offer() is not checked in > DynamicPartitionPruner > > > Key: HIVE-8343 > URL: https://issues.apache.org/jira/browse/HIVE-8343 > Project: Hive > Issue Type: Bug >Reporter: Ted Yu >Assignee: JongWon Park >Priority: Minor > Attachments: HIVE-8343.patch > > > In addEvent() and processVertex(), there is call such as the following: > {code} > queue.offer(event); > {code} > The return value should be checked. If false is returned, event would not > have been queued. > Take a look at line 328 in: > http://fuseyism.com/classpath/doc/java/util/concurrent/LinkedBlockingQueue-source.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8285) Reference equality is used on boolean values in PartitionPruner#removeTruePredciates()
[ https://issues.apache.org/jira/browse/HIVE-8285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8285: - Description: {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo && eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. was: {code} if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo && eC.getValue() == Boolean.TRUE) { {code} equals() should be used in the above comparison. > Reference equality is used on boolean values in > PartitionPruner#removeTruePredciates() > -- > > Key: HIVE-8285 > URL: https://issues.apache.org/jira/browse/HIVE-8285 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Ted Yu >Priority: Minor > Attachments: HIVE-8285.patch > > > {code} > if (e.getTypeInfo() == TypeInfoFactory.booleanTypeInfo > && eC.getValue() == Boolean.TRUE) { > {code} > equals() should be used in the above comparison. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8282) Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin()
[ https://issues.apache.org/jira/browse/HIVE-8282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8282: - Description: In convertJoinMapJoin(): {code} for (Operator parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} was: In convertJoinMapJoin(): {code} for (Operator parentOp : joinOp.getParentOperators()) { if (parentOp instanceof MuxOperator) { return null; } } {code} NPE would result if convertJoinMapJoin() returns null: {code} MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, bigTablePosition); MapJoinDesc joinDesc = mapJoinOp.getConf(); {code} > Potential null deference in ConvertJoinMapJoin#convertJoinBucketMapJoin() > - > > Key: HIVE-8282 > URL: https://issues.apache.org/jira/browse/HIVE-8282 > Project: Hive > Issue Type: Bug >Affects Versions: 0.14.0 >Reporter: Ted Yu >Priority: Minor > Attachments: HIVE-8282.patch > > > In convertJoinMapJoin(): > {code} > for (Operator parentOp : > joinOp.getParentOperators()) { > if (parentOp instanceof MuxOperator) { > return null; > } > } > {code} > NPE would result if convertJoinMapJoin() returns null: > {code} > MapJoinOperator mapJoinOp = convertJoinMapJoin(joinOp, context, > bigTablePosition); > MapJoinDesc joinDesc = mapJoinOp.getConf(); > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-8458) Potential null dereference in Utilities#clearWork()
[ https://issues.apache.org/jira/browse/HIVE-8458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HIVE-8458: - Description: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null && reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE was: {code} Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); // if the plan path hasn't been initialized just return, nothing to clean. if (mapPath == null && reducePath == null) { return; } try { FileSystem fs = mapPath.getFileSystem(conf); {code} If mapPath is null but reducePath is not null, getFileSystem() call would produce NPE > Potential null dereference in Utilities#clearWork() > --- > > Key: HIVE-8458 > URL: https://issues.apache.org/jira/browse/HIVE-8458 > Project: Hive > Issue Type: Bug >Affects Versions: 0.13.1 >Reporter: Ted Yu >Assignee: skrho >Priority: Minor > Attachments: HIVE-8458_001.patch > > > {code} > Path mapPath = getPlanPath(conf, MAP_PLAN_NAME); > Path reducePath = getPlanPath(conf, REDUCE_PLAN_NAME); > // if the plan path hasn't been initialized just return, nothing to clean. > if (mapPath == null && reducePath == null) { > return; > } > try { > FileSystem fs = mapPath.getFileSystem(conf); > {code} > If mapPath is null but reducePath is not null, getFileSystem() call would > produce NPE -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12057) ORC sarg is logged too much
[ https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947709#comment-14947709 ] Sergey Shelukhin commented on HIVE-12057: - Are you sure setSearchArgument is called for splits? I don't think split generation ever creates the actual record reader, it only creates ReaderImpl for metadata. As for caching sure, will make a v2 > ORC sarg is logged too much > --- > > Key: HIVE-12057 > URL: https://issues.apache.org/jira/browse/HIVE-12057 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HIVE-12057.01.patch, HIVE-12057.patch > > > SARG itself has too many newlines and it's logged for every splitgenerator in > split generation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12025) refactor bucketId generating code
[ https://issues.apache.org/jira/browse/HIVE-12025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947706#comment-14947706 ] Prasanth Jayachandran commented on HIVE-12025: -- The changes introduced in this patch in BucketIdResolverImpl is the correct way to compute bucket number. ReduceSinkOperator had a bug in bucket number computation regarding negative hashcodes (multiplying by -1 vs mast with Int.MAX). There might be some test failures related to this change but that is the expected change. Since these are util methods, it will be good to have unit tests for these (if one doesnot exist). Other than that, lgtm +1. Pending tests. > refactor bucketId generating code > - > > Key: HIVE-12025 > URL: https://issues.apache.org/jira/browse/HIVE-12025 > Project: Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 1.0.1 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-12025.2.patch, HIVE-12025.patch > > > HIVE-11983 adds ObjectInspectorUtils.getBucketHashCode() and > getBucketNumber(). > There are several (at least) places in Hive that perform this computation: > # ReduceSinkOperator.computeBucketNumber > # ReduceSinkOperator.computeHashCode > # BucketIdResolverImpl - only in 2.0.0 ASF line > # FileSinkOperator.findWriterOffset > # GenericUDFHash > Should refactor it and make sure they all call methods from > ObjectInspectorUtils. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12052) automatically populate file metadata to HBase metastore based on config or table properties
[ https://issues.apache.org/jira/browse/HIVE-12052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12052: --- Assignee: Sergey Shelukhin > automatically populate file metadata to HBase metastore based on config or > table properties > --- > > Key: HIVE-12052 > URL: https://issues.apache.org/jira/browse/HIVE-12052 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > > As discussed in HIVE-11500 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12057) ORC sarg is logged too much
[ https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947702#comment-14947702 ] Prasanth Jayachandran commented on HIVE-12057: -- This patch removes SARG logging from split generation. But there is going to be log lines for every split during reader creation (OrcInputFormat.setSearchArgument()). Also, regd. the log line immediately after HiveConf object creation. Can we create a SARG once, cache it in the Context? We can avoid multiple creations for SARG that way. Ideally, we should have SARG object created once and logged once per query. > ORC sarg is logged too much > --- > > Key: HIVE-12057 > URL: https://issues.apache.org/jira/browse/HIVE-12057 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HIVE-12057.01.patch, HIVE-12057.patch > > > SARG itself has too many newlines and it's logged for every splitgenerator in > split generation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12057) ORC sarg is logged too much
[ https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12057: Attachment: HIVE-12057.01.patch > ORC sarg is logged too much > --- > > Key: HIVE-12057 > URL: https://issues.apache.org/jira/browse/HIVE-12057 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HIVE-12057.01.patch, HIVE-12057.patch > > > SARG itself has too many newlines and it's logged for every splitgenerator in > split generation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12060) LLAP: create separate variable for llap tests
[ https://issues.apache.org/jira/browse/HIVE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12060: Attachment: HIVE-12060.01.patch Not sure if HiveQA will pick up new variable.. will try that > LLAP: create separate variable for llap tests > - > > Key: HIVE-12060 > URL: https://issues.apache.org/jira/browse/HIVE-12060 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin > Attachments: HIVE-12060.01.patch > > > No real reason to just reuse tez one -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12060) LLAP: create separate variable for llap tests
[ https://issues.apache.org/jira/browse/HIVE-12060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12060: --- Assignee: Sergey Shelukhin > LLAP: create separate variable for llap tests > - > > Key: HIVE-12060 > URL: https://issues.apache.org/jira/browse/HIVE-12060 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12060.01.patch > > > No real reason to just reuse tez one -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11969) start Tez session in background when starting CLI
[ https://issues.apache.org/jira/browse/HIVE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11969: Fix Version/s: 1.3.0 > start Tez session in background when starting CLI > - > > Key: HIVE-11969 > URL: https://issues.apache.org/jira/browse/HIVE-11969 > Project: Hive > Issue Type: Bug > Components: Tez >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11969.01.patch, HIVE-11969.02.patch, > HIVE-11969.03.patch, HIVE-11969.04.patch, HIVE-11969.patch, Screen Shot > 2015-10-02 at 14.23.17 .png > > > Tez session spins up AM, which can cause delays, esp. if the cluster is very > busy. > This can be done in background, so the AM might get started while the user is > running local commands and doing other things. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947610#comment-14947610 ] Thejas M Nair commented on HIVE-11408: -- +1 > HiveServer2 is leaking ClassLoaders when add jar / temporary functions are > used due to constructor caching in Hadoop ReflectionUtils > > > Key: HIVE-11408 > URL: https://issues.apache.org/jira/browse/HIVE-11408 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch > > > I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue > (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). > Basically, add jar creates a new classloader for loading the classes from the > new jar and adds the new classloader to the SessionState object of user's > session, making the older one its parent. Creating a temporary function uses > the new classloader to load the class used for the function. On closing a > session, although there is code to close the classloader for the session, I'm > not seeing the new classloader getting GCed and from the heapdump I can see > it holds on to the temporary function's class that should have gone away > after the session close. > Steps to reproduce: > 1. > {code} > jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar; > {code} > 2. > Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was > added. > 3. > {code} > jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS > 'org.gumashta.udf.AUDF'; > {code} > 4. > Close the jdbc session. > 5. > Take the memory snapshot and verify that the new URLClassLoader is indeed > there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the > session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947611#comment-14947611 ] Thejas M Nair commented on HIVE-11408: -- Thanks for adding the test case! > HiveServer2 is leaking ClassLoaders when add jar / temporary functions are > used due to constructor caching in Hadoop ReflectionUtils > > > Key: HIVE-11408 > URL: https://issues.apache.org/jira/browse/HIVE-11408 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch > > > I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue > (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). > Basically, add jar creates a new classloader for loading the classes from the > new jar and adds the new classloader to the SessionState object of user's > session, making the older one its parent. Creating a temporary function uses > the new classloader to load the class used for the function. On closing a > session, although there is code to close the classloader for the session, I'm > not seeing the new classloader getting GCed and from the heapdump I can see > it holds on to the temporary function's class that should have gone away > after the session close. > Steps to reproduce: > 1. > {code} > jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar; > {code} > 2. > Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was > added. > 3. > {code} > jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS > 'org.gumashta.udf.AUDF'; > {code} > 4. > Close the jdbc session. > 5. > Take the memory snapshot and verify that the new URLClassLoader is indeed > there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the > session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11212) Create vectorized types for complex types
[ https://issues.apache.org/jira/browse/HIVE-11212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947600#comment-14947600 ] Matt McCline commented on HIVE-11212: - Noticed the new ensureSize method -- growing a batch beyond VectorizedRowBatch.DEFAULT_SIZE. -- to support the new ListColumnVector's storing a range of elements. The current vectorized operators that currently only support primitive do have some hard coded assumptions where they allocate various arrays (usually copies of the selected array) as being no more than DEFAULT_SIZE. This doesn't affect this patch, but we'll need to be wary when we later try to make the vectorized operators support complex types. I think DecimalColumnVector.setElement needs to check if the return result from ...getHiveDecimal(precision, scale) is null and mark the vector column entry as null. I think in some cases that method returns null when the value doesn't fit, etc. I'm still trying to grok the flatten/unflatten stuff... > Create vectorized types for complex types > - > > Key: HIVE-11212 > URL: https://issues.apache.org/jira/browse/HIVE-11212 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Attachments: HIVE-11212.patch, HIVE-11212.patch, HIVE-11212.patch > > > We need vectorized types for structs, maps, lists, and unions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables
[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947594#comment-14947594 ] Chaoyu Tang commented on HIVE-11786: Thank [~sseth] very much for the help. yes, the remaining two have been covered by the composite index. But anyway, please give it a try and let me know. If that does not help, I will rewrite the query as [~sershe] suggested, which is also being used in directsql for getPartition. > Deprecate the use of redundant column in colunm stats related tables > > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, > HIVE-11786.2.patch, HIVE-11786.patch > > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate database normalization rules and cause a lot > of inconvenience (sometimes difficult) in column stats related feature > implementation. For example, when renaming a table, we have to update > TABLE_NAME column in these tables as well which is unnecessary. > This JIRA is first to deprecate the use of these columns at HMS code level. A > followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked
[ https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-12056: Attachment: HIVE-12056.1.patch > Branch 1.1.1: root pom and itest pom are not linked > --- > > Key: HIVE-12056 > URL: https://issues.apache.org/jira/browse/HIVE-12056 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-12056.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked
[ https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947592#comment-14947592 ] Thejas M Nair commented on HIVE-12056: -- +1 > Branch 1.1.1: root pom and itest pom are not linked > --- > > Key: HIVE-12056 > URL: https://issues.apache.org/jira/browse/HIVE-12056 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-12056.1.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked
[ https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-12056: Attachment: HIVE-12056.1.1patch > Branch 1.1.1: root pom and itest pom are not linked > --- > > Key: HIVE-12056 > URL: https://issues.apache.org/jira/browse/HIVE-12056 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12056) Branch 1.1.1: root pom and itest pom are not linked
[ https://issues.apache.org/jira/browse/HIVE-12056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-12056: Attachment: (was: HIVE-12056.1.1patch) > Branch 1.1.1: root pom and itest pom are not linked > --- > > Key: HIVE-12056 > URL: https://issues.apache.org/jira/browse/HIVE-12056 > Project: Hive > Issue Type: Bug > Components: Testing Infrastructure >Affects Versions: 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11969) start Tez session in background when starting CLI
[ https://issues.apache.org/jira/browse/HIVE-11969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947578#comment-14947578 ] Sergey Shelukhin commented on HIVE-11969: - It's a simple backport to branch-1; should we do that? > start Tez session in background when starting CLI > - > > Key: HIVE-11969 > URL: https://issues.apache.org/jira/browse/HIVE-11969 > Project: Hive > Issue Type: Bug > Components: Tez >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Fix For: 2.0.0 > > Attachments: HIVE-11969.01.patch, HIVE-11969.02.patch, > HIVE-11969.03.patch, HIVE-11969.04.patch, HIVE-11969.patch, Screen Shot > 2015-10-02 at 14.23.17 .png > > > Tez session spins up AM, which can cause delays, esp. if the cluster is very > busy. > This can be done in background, so the AM might get started while the user is > running local commands and doing other things. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12059) Clean up reference to deprecated constants in AvroSerdeUtils
[ https://issues.apache.org/jira/browse/HIVE-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947575#comment-14947575 ] Aaron Dossett commented on HIVE-12059: -- My patch gets all deprecated references out EXCEPT from the SerDeSpec annotation in AvroSerDe. I don't have any experience developing annotations so that fix for that isn't obvious to me. One approach would be to add some redundant Strings to AvroSerdeUtils with a level of access below public that AvroSerDe could use. Open to other suggestions if this is important enough. > Clean up reference to deprecated constants in AvroSerdeUtils > > > Key: HIVE-12059 > URL: https://issues.apache.org/jira/browse/HIVE-12059 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Aaron Dossett >Assignee: Aaron Dossett >Priority: Minor > Attachments: HIVE-12059.patch > > > AvroSerdeUtils contains several deprecated String constants that are used by > other Hive modules. Those should be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-11408: Attachment: HIVE-11408.2.patch > HiveServer2 is leaking ClassLoaders when add jar / temporary functions are > used due to constructor caching in Hadoop ReflectionUtils > > > Key: HIVE-11408 > URL: https://issues.apache.org/jira/browse/HIVE-11408 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch > > > I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue > (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). > Basically, add jar creates a new classloader for loading the classes from the > new jar and adds the new classloader to the SessionState object of user's > session, making the older one its parent. Creating a temporary function uses > the new classloader to load the class used for the function. On closing a > session, although there is code to close the classloader for the session, I'm > not seeing the new classloader getting GCed and from the heapdump I can see > it holds on to the temporary function's class that should have gone away > after the session close. > Steps to reproduce: > 1. > {code} > jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar; > {code} > 2. > Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was > added. > 3. > {code} > jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS > 'org.gumashta.udf.AUDF'; > {code} > 4. > Close the jdbc session. > 5. > Take the memory snapshot and verify that the new URLClassLoader is indeed > there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the > session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-11408: Attachment: (was: HIVE-11408.2.patch) > HiveServer2 is leaking ClassLoaders when add jar / temporary functions are > used due to constructor caching in Hadoop ReflectionUtils > > > Key: HIVE-11408 > URL: https://issues.apache.org/jira/browse/HIVE-11408 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch > > > I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue > (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). > Basically, add jar creates a new classloader for loading the classes from the > new jar and adds the new classloader to the SessionState object of user's > session, making the older one its parent. Creating a temporary function uses > the new classloader to load the class used for the function. On closing a > session, although there is code to close the classloader for the session, I'm > not seeing the new classloader getting GCed and from the heapdump I can see > it holds on to the temporary function's class that should have gone away > after the session close. > Steps to reproduce: > 1. > {code} > jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar; > {code} > 2. > Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was > added. > 3. > {code} > jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS > 'org.gumashta.udf.AUDF'; > {code} > 4. > Close the jdbc session. > 5. > Take the memory snapshot and verify that the new URLClassLoader is indeed > there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the > session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947569#comment-14947569 ] Vaibhav Gumashta commented on HIVE-11408: - [~thejas] Attached patch is based on branch 1.1.1. > HiveServer2 is leaking ClassLoaders when add jar / temporary functions are > used due to constructor caching in Hadoop ReflectionUtils > > > Key: HIVE-11408 > URL: https://issues.apache.org/jira/browse/HIVE-11408 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch > > > I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue > (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). > Basically, add jar creates a new classloader for loading the classes from the > new jar and adds the new classloader to the SessionState object of user's > session, making the older one its parent. Creating a temporary function uses > the new classloader to load the class used for the function. On closing a > session, although there is code to close the classloader for the session, I'm > not seeing the new classloader getting GCed and from the heapdump I can see > it holds on to the temporary function's class that should have gone away > after the session close. > Steps to reproduce: > 1. > {code} > jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar; > {code} > 2. > Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was > added. > 3. > {code} > jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS > 'org.gumashta.udf.AUDF'; > {code} > 4. > Close the jdbc session. > 5. > Take the memory snapshot and verify that the new URLClassLoader is indeed > there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the > session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11642: Attachment: HIVE-11642.22.patch > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.22.patch, HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12059) Clean up reference to deprecated constants in AvroSerdeUtils
[ https://issues.apache.org/jira/browse/HIVE-12059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Dossett updated HIVE-12059: - Attachment: HIVE-12059.patch > Clean up reference to deprecated constants in AvroSerdeUtils > > > Key: HIVE-12059 > URL: https://issues.apache.org/jira/browse/HIVE-12059 > Project: Hive > Issue Type: Improvement > Components: Serializers/Deserializers >Reporter: Aaron Dossett >Priority: Minor > Attachments: HIVE-12059.patch > > > AvroSerdeUtils contains several deprecated String constants that are used by > other Hive modules. Those should be cleaned up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11642: Attachment: (was: HIVE-11642.22.patch) > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.22.patch, HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11642: Attachment: (was: HIVE-11642.21.patch) > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.22.patch, HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-11408: Attachment: HIVE-11408.2.patch > HiveServer2 is leaking ClassLoaders when add jar / temporary functions are > used due to constructor caching in Hadoop ReflectionUtils > > > Key: HIVE-11408 > URL: https://issues.apache.org/jira/browse/HIVE-11408 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-11408.1.patch, HIVE-11408.2.patch > > > I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue > (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). > Basically, add jar creates a new classloader for loading the classes from the > new jar and adds the new classloader to the SessionState object of user's > session, making the older one its parent. Creating a temporary function uses > the new classloader to load the class used for the function. On closing a > session, although there is code to close the classloader for the session, I'm > not seeing the new classloader getting GCed and from the heapdump I can see > it holds on to the temporary function's class that should have gone away > after the session close. > Steps to reproduce: > 1. > {code} > jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar; > {code} > 2. > Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was > added. > 3. > {code} > jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS > 'org.gumashta.udf.AUDF'; > {code} > 4. > Close the jdbc session. > 5. > Take the memory snapshot and verify that the new URLClassLoader is indeed > there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the > session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11642: Attachment: (was: HIVE-11642.20.patch) > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.22.patch, HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11408) HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils
[ https://issues.apache.org/jira/browse/HIVE-11408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vaibhav Gumashta updated HIVE-11408: Summary: HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used due to constructor caching in Hadoop ReflectionUtils (was: HiveServer2 is leaking ClassLoaders when add jar / temporary functions are used) > HiveServer2 is leaking ClassLoaders when add jar / temporary functions are > used due to constructor caching in Hadoop ReflectionUtils > > > Key: HIVE-11408 > URL: https://issues.apache.org/jira/browse/HIVE-11408 > Project: Hive > Issue Type: Bug > Components: HiveServer2 >Affects Versions: 0.13.0, 0.14.0, 0.13.1, 1.0.0, 1.1.1 >Reporter: Vaibhav Gumashta >Assignee: Vaibhav Gumashta > Attachments: HIVE-11408.1.patch > > > I'm able to reproduce with 0.14. I'm yet to see if HIVE-10453 fixes the issue > (since it's on top of a larger patch: HIVE-2573 that was added in 1.2). > Basically, add jar creates a new classloader for loading the classes from the > new jar and adds the new classloader to the SessionState object of user's > session, making the older one its parent. Creating a temporary function uses > the new classloader to load the class used for the function. On closing a > session, although there is code to close the classloader for the session, I'm > not seeing the new classloader getting GCed and from the heapdump I can see > it holds on to the temporary function's class that should have gone away > after the session close. > Steps to reproduce: > 1. > {code} > jdbc:hive2://localhost:1/> add jar hdfs:///tmp/audf.jar; > {code} > 2. > Use a profiler (I'm using yourkit) to verify that a new URLClassLoader was > added. > 3. > {code} > jdbc:hive2://localhost:1/> CREATE TEMPORARY FUNCTION funcA AS > 'org.gumashta.udf.AUDF'; > {code} > 4. > Close the jdbc session. > 5. > Take the memory snapshot and verify that the new URLClassLoader is indeed > there and is holding onto the class it loaded (org.gumashta.udf.AUDF) for the > session which we already closed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11786) Deprecate the use of redundant column in colunm stats related tables
[ https://issues.apache.org/jira/browse/HIVE-11786?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947562#comment-14947562 ] Siddharth Seth commented on HIVE-11786: --- Quick note. I created the following indexes. {code} CREATE INDEX COLNAME_TBLID_IDX ON TAB_COL_STATS (COLUMN_NAME, TBL_ID); CREATE INDEX COLNAME_PARTID_IDX ON PART_COL_STATS (COLUMN_NAME, PART_ID); CREATE INDEX PARTNAME_IDX ON PARTITIONS (PART_NAME); CREATE INDEX TBLNAME_IDX ON TBLS (TBL_NAME); {code} This did not improve performance. I'm creating the remaining 2 indexes (even though I don't think they're required given they're part of a multi column index) right now. Will post an update once the index creation is done. > Deprecate the use of redundant column in colunm stats related tables > > > Key: HIVE-11786 > URL: https://issues.apache.org/jira/browse/HIVE-11786 > Project: Hive > Issue Type: Bug > Components: Metastore >Reporter: Chaoyu Tang >Assignee: Chaoyu Tang > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-11786.1.patch, HIVE-11786.1.patch, > HIVE-11786.2.patch, HIVE-11786.patch > > > The stats tables such as TAB_COL_STATS, PART_COL_STATS have redundant columns > such as DB_NAME, TABLE_NAME, PARTITION_NAME since these tables already have > foreign key like TBL_ID, or PART_ID referencing to TBLS or PARTITIONS. > These redundant columns violate database normalization rules and cause a lot > of inconvenience (sometimes difficult) in column stats related feature > implementation. For example, when renaming a table, we have to update > TABLE_NAME column in these tables as well which is unnecessary. > This JIRA is first to deprecate the use of these columns at HMS code level. A > followed JIRA is to be opened to focus on DB schema change and upgrade. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11914) When transactions gets a heartbeat, it doesn't update the lock heartbeat.
[ https://issues.apache.org/jira/browse/HIVE-11914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947561#comment-14947561 ] Hive QA commented on HIVE-11914: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765272/HIVE-11914.2.patch {color:green}SUCCESS:{color} +1 due to 1 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 9654 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.metastore.txn.TestCompactionTxnHandler.testRevokeTimedOutWorkers org.apache.hadoop.hive.ql.lockmgr.TestDbTxnManager.testExceptions org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5561/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5561/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5561/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765272 - PreCommit-HIVE-TRUNK-Build > When transactions gets a heartbeat, it doesn't update the lock heartbeat. > - > > Key: HIVE-11914 > URL: https://issues.apache.org/jira/browse/HIVE-11914 > Project: Hive > Issue Type: Bug > Components: HCatalog, Transactions >Affects Versions: 1.0.1 >Reporter: Eugene Koifman >Assignee: Eugene Koifman > Attachments: HIVE-11914.2.patch, HIVE-11914.patch > > > TxnHandler.heartbeatTxn() updates the timestamp on the txn but not on the > associated locks. This makes SHOW LOCKS confusing/misleading. > This is especially visible in Streaming API use cases which use > TxnHandler.heartbeatTxnRange(HeartbeatTxnRangeRequest rqst) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947554#comment-14947554 ] Sergey Shelukhin commented on HIVE-11642: - Cannot repro either of these tests. Will try again... we might just disable explain test for minillap, I have no idea why stats keep changing. > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.20.patch, HIVE-11642.21.patch, HIVE-11642.22.patch, > HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HIVE-12057) ORC sarg is logged too much
[ https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-12057: --- Assignee: Sergey Shelukhin > ORC sarg is logged too much > --- > > Key: HIVE-12057 > URL: https://issues.apache.org/jira/browse/HIVE-12057 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Minor > Attachments: HIVE-12057.patch > > > SARG itself has too many newlines and it's logged for every splitgenerator in > split generation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12057) ORC sarg is logged too much
[ https://issues.apache.org/jira/browse/HIVE-12057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-12057: Attachment: HIVE-12057.patch [~hagleitn] can you take a look? > ORC sarg is logged too much > --- > > Key: HIVE-12057 > URL: https://issues.apache.org/jira/browse/HIVE-12057 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Priority: Minor > Attachments: HIVE-12057.patch > > > SARG itself has too many newlines and it's logged for every splitgenerator in > split generation -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11634) Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...)
[ https://issues.apache.org/jira/browse/HIVE-11634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sankar Sivarama Subramaniyan updated HIVE-11634: - Attachment: HIVE-11634.99.patch > Support partition pruning for IN(STRUCT(partcol, nonpartcol..)...) > -- > > Key: HIVE-11634 > URL: https://issues.apache.org/jira/browse/HIVE-11634 > Project: Hive > Issue Type: Bug > Components: CBO >Reporter: Hari Sankar Sivarama Subramaniyan >Assignee: Hari Sankar Sivarama Subramaniyan > Attachments: HIVE-11634.1.patch, HIVE-11634.2.patch, > HIVE-11634.3.patch, HIVE-11634.4.patch, HIVE-11634.5.patch, > HIVE-11634.6.patch, HIVE-11634.7.patch, HIVE-11634.8.patch, > HIVE-11634.9.patch, HIVE-11634.91.patch, HIVE-11634.92.patch, > HIVE-11634.93.patch, HIVE-11634.94.patch, HIVE-11634.95.patch, > HIVE-11634.96.patch, HIVE-11634.97.patch, HIVE-11634.98.patch, > HIVE-11634.99.patch > > > Currently, we do not support partition pruning for the following scenario > {code} > create table pcr_t1 (key int, value string) partitioned by (ds string); > insert overwrite table pcr_t1 partition (ds='2000-04-08') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-09') select * from src > where key < 20 order by key; > insert overwrite table pcr_t1 partition (ds='2000-04-10') select * from src > where key < 20 order by key; > explain extended select ds from pcr_t1 where struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > If we run the above query, we see that all the partitions of table pcr_t1 are > present in the filter predicate where as we can prune partition > (ds='2000-04-10'). > The optimization is to rewrite the above query into the following. > {code} > explain extended select ds from pcr_t1 where (struct(ds)) IN > (struct('2000-04-08'), struct('2000-04-09')) and struct(ds, key) in > (struct('2000-04-08',1), struct('2000-04-09',2)); > {code} > The predicate (struct(ds)) IN (struct('2000-04-08'), struct('2000-04-09')) > is used by partition pruner to prune the columns which otherwise will not be > pruned. > This is an extension of the idea presented in HIVE-11573. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HIVE-5205) Javadoc warnings in HCatalog prevent Hive from building under OpenJDK7
[ https://issues.apache.org/jira/browse/HIVE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Boudnik resolved HIVE-5205. -- Resolution: Won't Fix Release Note: Perhaps it is of no interest anymore, closing. > Javadoc warnings in HCatalog prevent Hive from building under OpenJDK7 > -- > > Key: HIVE-5205 > URL: https://issues.apache.org/jira/browse/HIVE-5205 > Project: Hive > Issue Type: Bug > Components: HCatalog >Affects Versions: 0.11.0 >Reporter: Konstantin Boudnik >Assignee: Konstantin Boudnik > Labels: target-version_0.11 > Fix For: 0.11.1 > > Attachments: HIVE-5205.patch > > > when building Hive with OpenJDK7 the following warning message makes the > build fail: > [javadoc] > /var/lib/jenkins/workspace/Shark-Hive-0.11-OJDK7/hcatalog/storage-handlers/hbase/src/java/org/apache/hcatalog/hbase/snapshot/RevisionManagerFactory.java:81: > warning - @return tag has no arguments. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-12008) Make last two tests added by HIVE-11384 pass when hive.in.test is false
[ https://issues.apache.org/jira/browse/HIVE-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yongzhi Chen updated HIVE-12008: Attachment: HIVE-12008.2.patch Second patch handle SelectOpt's prunelist when there is a constant column. > Make last two tests added by HIVE-11384 pass when hive.in.test is false > --- > > Key: HIVE-12008 > URL: https://issues.apache.org/jira/browse/HIVE-12008 > Project: Hive > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Yongzhi Chen >Assignee: Yongzhi Chen > Attachments: HIVE-12008.1.patch, HIVE-12008.2.patch > > > The last two qfile unit tests fail when hive.in.test is false. It may relate > how we handle prunelist for select. When select include every column in a > table, the prunelist for the select is empty. It may cause issues to > calculate its parent's prunelist.. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11417) Create shims for the row by row read path that is backed by VectorizedRowBatch
[ https://issues.apache.org/jira/browse/HIVE-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley updated HIVE-11417: - Description: I'd like to make the default path for reading and writing ORC files to be vectorized. To ensure that Hive can still read row by row, we'll need shims to support the old API. (was: I'd like to make the default path for reading and writing ORC files to be vectorized. To ensure that Hive can still read row by row, I'll make ObjectInspectors that are backed by the VectorizedRowBatch.) Summary: Create shims for the row by row read path that is backed by VectorizedRowBatch (was: Create ObjectInspectors for VectorizedRowBatch) > Create shims for the row by row read path that is backed by VectorizedRowBatch > -- > > Key: HIVE-11417 > URL: https://issues.apache.org/jira/browse/HIVE-11417 > Project: Hive > Issue Type: Sub-task >Reporter: Owen O'Malley >Assignee: Owen O'Malley > Fix For: 2.0.0 > > > I'd like to make the default path for reading and writing ORC files to be > vectorized. To ensure that Hive can still read row by row, we'll need shims > to support the old API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11985) handle long typenames from Avro schema in metastore
[ https://issues.apache.org/jira/browse/HIVE-11985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947406#comment-14947406 ] Sergey Shelukhin commented on HIVE-11985: - [~ashutoshc] ping? > handle long typenames from Avro schema in metastore > --- > > Key: HIVE-11985 > URL: https://issues.apache.org/jira/browse/HIVE-11985 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11985.01.patch, HIVE-11985.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12048) metastore file metadata cache should not be used when deltas are present
[ https://issues.apache.org/jira/browse/HIVE-12048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947382#comment-14947382 ] Hive QA commented on HIVE-12048: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765250/HIVE-12048.patch {color:red}ERROR:{color} -1 due to no test(s) being added or modified. {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 9653 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5560/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5560/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5560/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765250 - PreCommit-HIVE-TRUNK-Build > metastore file metadata cache should not be used when deltas are present > > > Key: HIVE-12048 > URL: https://issues.apache.org/jira/browse/HIVE-12048 > Project: Hive > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-12048.patch > > > Previous code doesn't check for deltas before getting footers from local > cache even though stripe filtering with deltas is not possible; this is > because checking local cache is cheap I guess. Make sure we check early for > metastore-based cache. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11642) LLAP: make sure tests pass #3
[ https://issues.apache.org/jira/browse/HIVE-11642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-11642: Attachment: HIVE-11642.22.patch Not sure why some processes failed > LLAP: make sure tests pass #3 > - > > Key: HIVE-11642 > URL: https://issues.apache.org/jira/browse/HIVE-11642 > Project: Hive > Issue Type: Sub-task >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin > Attachments: HIVE-11642.01.patch, HIVE-11642.02.patch, > HIVE-11642.03.patch, HIVE-11642.04.patch, HIVE-11642.05.patch, > HIVE-11642.20.patch, HIVE-11642.21.patch, HIVE-11642.22.patch, > HIVE-11642.patch > > > Tests should pass against the most recent branch and Tez 0.8. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11892) UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()
[ https://issues.apache.org/jira/browse/HIVE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-11892: -- Attachment: HIVE-11892.2.patch Updated golden files. Also removed the special case pattern mask for GenericUDTFCount2, I removed the query explain from udtf_nofetchtask.q > UDTF run in local fetch task does not return rows forwarded during > GenericUDTF.close() > -- > > Key: HIVE-11892 > URL: https://issues.apache.org/jira/browse/HIVE-11892 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-11892.1.patch, HIVE-11892.2.patch > > > Using the example UDTF GenericUDTFCount2, which is part of hive-contrib: > {noformat} > create temporary function udtfCount2 as > 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount2'; > set hive.fetch.task.conversion=minimal; > -- Task created, correct output (2 rows) > select udtfCount2() from src; > set hive.fetch.task.conversion=more; > -- Runs in local task, incorrect output (0 rows) > select udtfCount2() from src; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11976) Extend CBO rules to being able to apply rules only once on a given operator
[ https://issues.apache.org/jira/browse/HIVE-11976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947346#comment-14947346 ] Laljo John Pullokkaran commented on HIVE-11976: --- +1 > Extend CBO rules to being able to apply rules only once on a given operator > --- > > Key: HIVE-11976 > URL: https://issues.apache.org/jira/browse/HIVE-11976 > Project: Hive > Issue Type: New Feature > Components: CBO >Affects Versions: 2.0.0 >Reporter: Jesus Camacho Rodriguez >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-11976.01.patch, HIVE-11976.02.patch, > HIVE-11976.03.patch, HIVE-11976.04.patch, HIVE-11976.patch > > > Create a way to bail out quickly from HepPlanner if the rule has been already > applied on a certain operator. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-11892) UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()
[ https://issues.apache.org/jira/browse/HIVE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947291#comment-14947291 ] Jason Dere commented on HIVE-11892: --- Test failures are due to explain plan differences now that UDTFs will not use fetch task conversion. Will regenerate the golden files for these tests. > UDTF run in local fetch task does not return rows forwarded during > GenericUDTF.close() > -- > > Key: HIVE-11892 > URL: https://issues.apache.org/jira/browse/HIVE-11892 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-11892.1.patch > > > Using the example UDTF GenericUDTFCount2, which is part of hive-contrib: > {noformat} > create temporary function udtfCount2 as > 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount2'; > set hive.fetch.task.conversion=minimal; > -- Task created, correct output (2 rows) > select udtfCount2() from src; > set hive.fetch.task.conversion=more; > -- Runs in local task, incorrect output (0 rows) > select udtfCount2() from src; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11785) Support escaping carriage return and new line for LazySimpleSerDe
[ https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11785: Attachment: HIVE-11785.3.patch > Support escaping carriage return and new line for LazySimpleSerDe > - > > Key: HIVE-11785 > URL: https://issues.apache.org/jira/browse/HIVE-11785 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.0.0 > > Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch, > HIVE-11785.patch, test.parquet > > > Create the table and perform the queries as follows. You will see different > results when the setting changes. > The expected result should be: > {noformat} > 1 newline > here > 2 carriage return > 3 both > here > {noformat} > {noformat} > hive> create table repo (lvalue int, charstring string) stored as parquet; > OK > Time taken: 0.34 seconds > hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo; > Loading data to table default.repo > chgrp: changing ownership of > 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not > belong to hive > Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, > rawDataSize=0] > OK > Time taken: 0.732 seconds > hive> set hive.fetch.task.conversion=more; > hive> select * from repo; > OK > 1 newline > here > here carriage return > 3 both > here > Time taken: 0.253 seconds, Fetched: 3 row(s) > hive> set hive.fetch.task.conversion=none; > hive> select * from repo; > Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1441752031022_0006, Tracking URL = > http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/ > Kill Command = > /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job > -kill job_1441752031022_0006 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0 > 2015-09-09 11:35:54,127 Stage-1 map = 0%, reduce = 0% > 2015-09-09 11:36:04,664 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.98 > sec > MapReduce Total cumulative CPU time: 2 seconds 980 msec > Ended Job = job_1441752031022_0006 > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 2.98 sec HDFS Read: 4251 HDFS > Write: 51 SUCCESS > Total MapReduce CPU Time Spent: 2 seconds 980 msec > OK > 1 newline > NULL NULL > 2 carriage return > NULL NULL > 3 both > NULL NULL > Time taken: 25.131 seconds, Fetched: 6 row(s) > hive> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled
[ https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9695: --- Affects Version/s: (was: 2.0.0) 0.14.0 1.0.0 1.1.0 > Redundant filter operator in reducer Vertex when CBO is disabled > > > Key: HIVE-9695 > URL: https://issues.apache.org/jira/browse/HIVE-9695 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer >Affects Versions: 0.14.0, 1.0.0, 1.1.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch > > > There is a redundant filter operator in reducer Vertex when CBO is disabled. > Query > {code} > select > ss_item_sk, ss_ticket_number, ss_store_sk > from > store_sales a, store_returns b, store > where > a.ss_item_sk = b.sr_item_sk > and a.ss_ticket_number = b.sr_ticket_number > and ss_sold_date_sk between 2450816 and 2451500 > and sr_returned_date_sk between 2450816 and 2451500 > and s_store_sk = ss_store_sk; > {code} > Plan snippet > {code} > Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: (_col1 = _col27) and (_col8 = _col34)) and > _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) > and (_col49 = _col6)) (type: boolean) > {code} > Full plan with CBO disabled > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 > (SIMPLE_EDGE) > DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: b > filterExpr: ((sr_item_sk is not null and sr_ticket_number > is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: > boolean) > Statistics: Num rows: 2370038095 Data size: 170506118656 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (sr_item_sk is not null and sr_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: sr_item_sk (type: int), > sr_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: sr_item_sk (type: int), > sr_ticket_number (type: int) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: sr_returned_date_sk (type: int) > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: store > filterExpr: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 3256276 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: s_store_sk (type: int) > sort order: + > Map-reduce partition columns: s_store_sk (type: int) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: a > filterExpr: (((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 > AND 2451500) (type: boolean) > Statistics: Num rows: 28878719387 Data size: 2405805439460 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) (type: boolean) > Statistics: Num rows: 8405840828 Data size: 110101408700 > Basic stats: COMPLETE Column stats: COMPLETE >
[jira] [Updated] (HIVE-11785) Support escaping carriage return and new line for LazySimpleSerDe
[ https://issues.apache.org/jira/browse/HIVE-11785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aihua Xu updated HIVE-11785: Attachment: (was: HIVE-11785.3.patch) > Support escaping carriage return and new line for LazySimpleSerDe > - > > Key: HIVE-11785 > URL: https://issues.apache.org/jira/browse/HIVE-11785 > Project: Hive > Issue Type: New Feature > Components: Query Processor >Affects Versions: 2.0.0 >Reporter: Aihua Xu >Assignee: Aihua Xu > Fix For: 2.0.0 > > Attachments: HIVE-11785.2.patch, HIVE-11785.3.patch, > HIVE-11785.patch, test.parquet > > > Create the table and perform the queries as follows. You will see different > results when the setting changes. > The expected result should be: > {noformat} > 1 newline > here > 2 carriage return > 3 both > here > {noformat} > {noformat} > hive> create table repo (lvalue int, charstring string) stored as parquet; > OK > Time taken: 0.34 seconds > hive> load data inpath '/tmp/repo/test.parquet' overwrite into table repo; > Loading data to table default.repo > chgrp: changing ownership of > 'hdfs://nameservice1/user/hive/warehouse/repo/test.parquet': User does not > belong to hive > Table default.repo stats: [numFiles=1, numRows=0, totalSize=610, > rawDataSize=0] > OK > Time taken: 0.732 seconds > hive> set hive.fetch.task.conversion=more; > hive> select * from repo; > OK > 1 newline > here > here carriage return > 3 both > here > Time taken: 0.253 seconds, Fetched: 3 row(s) > hive> set hive.fetch.task.conversion=none; > hive> select * from repo; > Query ID = root_20150909113535_e081db8b-ccd9-4c44-aad9-d990ffb8edf3 > Total jobs = 1 > Launching Job 1 out of 1 > Number of reduce tasks is set to 0 since there's no reduce operator > Starting Job = job_1441752031022_0006, Tracking URL = > http://host-10-17-81-63.coe.cloudera.com:8088/proxy/application_1441752031022_0006/ > Kill Command = > /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hadoop/bin/hadoop job > -kill job_1441752031022_0006 > Hadoop job information for Stage-1: number of mappers: 1; number of reducers: > 0 > 2015-09-09 11:35:54,127 Stage-1 map = 0%, reduce = 0% > 2015-09-09 11:36:04,664 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.98 > sec > MapReduce Total cumulative CPU time: 2 seconds 980 msec > Ended Job = job_1441752031022_0006 > MapReduce Jobs Launched: > Stage-Stage-1: Map: 1 Cumulative CPU: 2.98 sec HDFS Read: 4251 HDFS > Write: 51 SUCCESS > Total MapReduce CPU Time Spent: 2 seconds 980 msec > OK > 1 newline > NULL NULL > 2 carriage return > NULL NULL > 3 both > NULL NULL > Time taken: 25.131 seconds, Fetched: 6 row(s) > hive> > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-9695) Redundant filter operator in reducer Vertex when CBO is disabled
[ https://issues.apache.org/jira/browse/HIVE-9695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-9695: --- Component/s: (was: Physical Optimizer) Logical Optimizer > Redundant filter operator in reducer Vertex when CBO is disabled > > > Key: HIVE-9695 > URL: https://issues.apache.org/jira/browse/HIVE-9695 > Project: Hive > Issue Type: Improvement > Components: Logical Optimizer >Affects Versions: 2.0.0 >Reporter: Mostafa Mokhtar >Assignee: Jesus Camacho Rodriguez > Attachments: HIVE-9695.01.patch, HIVE-9695.01.patch, HIVE-9695.patch > > > There is a redundant filter operator in reducer Vertex when CBO is disabled. > Query > {code} > select > ss_item_sk, ss_ticket_number, ss_store_sk > from > store_sales a, store_returns b, store > where > a.ss_item_sk = b.sr_item_sk > and a.ss_ticket_number = b.sr_ticket_number > and ss_sold_date_sk between 2450816 and 2451500 > and sr_returned_date_sk between 2450816 and 2451500 > and s_store_sk = ss_store_sk; > {code} > Plan snippet > {code} > Statistics: Num rows: 57439344 Data size: 1838059008 Basic stats: COMPLETE > Column stats: COMPLETE > Filter Operator > predicate: (_col1 = _col27) and (_col8 = _col34)) and > _col22 BETWEEN 2450816 AND 2451500) and _col45 BETWEEN 2450816 AND 2451500) > and (_col49 = _col6)) (type: boolean) > {code} > Full plan with CBO disabled > {code} > STAGE DEPENDENCIES: > Stage-1 is a root stage > Stage-0 depends on stages: Stage-1 > STAGE PLANS: > Stage: Stage-1 > Tez > Edges: > Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 3 (BROADCAST_EDGE), Map 4 > (SIMPLE_EDGE) > DagName: mmokhtar_20150214182626_ad6820c7-b667-4652-ab25-cb60deed1a6d:13 > Vertices: > Map 1 > Map Operator Tree: > TableScan > alias: b > filterExpr: ((sr_item_sk is not null and sr_ticket_number > is not null) and sr_returned_date_sk BETWEEN 2450816 AND 2451500) (type: > boolean) > Statistics: Num rows: 2370038095 Data size: 170506118656 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: (sr_item_sk is not null and sr_ticket_number > is not null) (type: boolean) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: sr_item_sk (type: int), > sr_ticket_number (type: int) > sort order: ++ > Map-reduce partition columns: sr_item_sk (type: int), > sr_ticket_number (type: int) > Statistics: Num rows: 706893063 Data size: 6498502768 > Basic stats: COMPLETE Column stats: COMPLETE > value expressions: sr_returned_date_sk (type: int) > Execution mode: vectorized > Map 3 > Map Operator Tree: > TableScan > alias: store > filterExpr: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 3256276 Basic stats: > COMPLETE Column stats: COMPLETE > Filter Operator > predicate: s_store_sk is not null (type: boolean) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Reduce Output Operator > key expressions: s_store_sk (type: int) > sort order: + > Map-reduce partition columns: s_store_sk (type: int) > Statistics: Num rows: 1704 Data size: 6816 Basic stats: > COMPLETE Column stats: COMPLETE > Execution mode: vectorized > Map 4 > Map Operator Tree: > TableScan > alias: a > filterExpr: (((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) and ss_sold_date_sk BETWEEN 2450816 > AND 2451500) (type: boolean) > Statistics: Num rows: 28878719387 Data size: 2405805439460 > Basic stats: COMPLETE Column stats: COMPLETE > Filter Operator > predicate: ((ss_item_sk is not null and ss_ticket_number > is not null) and ss_store_sk is not null) (type: boolean) > Statistics: Num rows: 8405840828 Data size: 110101408700 > Basic stats: COMPLETE Column stats: COMPLETE > Reduce Output Operator > ke
[jira] [Commented] (HIVE-11892) UDTF run in local fetch task does not return rows forwarded during GenericUDTF.close()
[ https://issues.apache.org/jira/browse/HIVE-11892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947235#comment-14947235 ] Hive QA commented on HIVE-11892: {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12765242/HIVE-11892.1.patch {color:green}SUCCESS:{color} +1 due to 2 test(s) being added or modified. {color:red}ERROR:{color} -1 due to 12 failed/errored test(s), 9653 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_lateral_view_noalias org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_select_dummy_source org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_explode org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_inline org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udtf_explode org.apache.hadoop.hive.cli.TestContribCliDriver.testCliDriver_udtf_output_on_close org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_1 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_explainuser_3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_select_dummy_source org.apache.hive.hcatalog.api.TestHCatClient.testTableSchemaPropagation org.apache.hive.jdbc.TestSSL.testSSLVersion {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5559/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/5559/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-5559/ Messages: {noformat} Executing org.apache.hive.ptest.execution.TestCheckPhase Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 12 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12765242 - PreCommit-HIVE-TRUNK-Build > UDTF run in local fetch task does not return rows forwarded during > GenericUDTF.close() > -- > > Key: HIVE-11892 > URL: https://issues.apache.org/jira/browse/HIVE-11892 > Project: Hive > Issue Type: Bug > Components: UDF >Reporter: Jason Dere >Assignee: Jason Dere > Attachments: HIVE-11892.1.patch > > > Using the example UDTF GenericUDTFCount2, which is part of hive-contrib: > {noformat} > create temporary function udtfCount2 as > 'org.apache.hadoop.hive.contrib.udtf.example.GenericUDTFCount2'; > set hive.fetch.task.conversion=minimal; > -- Task created, correct output (2 rows) > select udtfCount2() from src; > set hive.fetch.task.conversion=more; > -- Runs in local task, incorrect output (0 rows) > select udtfCount2() from src; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-12046) Re-create spark client if connection is dropped
[ https://issues.apache.org/jira/browse/HIVE-12046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14947172#comment-14947172 ] Jimmy Xiang commented on HIVE-12046: For getDefaultParallelism() and getExecutorCount(), connection drop doesn't break the query. We log a warning in this case. That's why I didn't touch them. Right, the remote client can become bad right after the isActive() check. User will get an exception in this case. We can enhance the error message and ask the user to re-try the query, which should be more convenient than logging out and in again. > Re-create spark client if connection is dropped > --- > > Key: HIVE-12046 > URL: https://issues.apache.org/jira/browse/HIVE-12046 > Project: Hive > Issue Type: Bug > Components: Spark >Reporter: Jimmy Xiang >Assignee: Jimmy Xiang >Priority: Minor > Fix For: 1.3.0, 2.0.0 > > Attachments: HIVE-12046.1.patch > > > Currently, if the connection to the spark cluster is dropped, the spark > client will stay in a bad state. A new Hive session is needed to re-establish > the connection. It is better to auto reconnect in this case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-11977) Hive should handle an external avro table with zero length files present
[ https://issues.apache.org/jira/browse/HIVE-11977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-11977: Affects Version/s: 0.14.0 1.0.0 1.2.0 1.1.0 > Hive should handle an external avro table with zero length files present > > > Key: HIVE-11977 > URL: https://issues.apache.org/jira/browse/HIVE-11977 > Project: Hive > Issue Type: Bug > Components: Serializers/Deserializers >Affects Versions: 0.14.0, 1.0.0, 1.2.0, 1.1.0, 1.2.1 >Reporter: Aaron Dossett >Assignee: Aaron Dossett > Fix For: 2.0.0 > > Attachments: HIVE-11977.2.patch, HIVE-11977.patch > > > If a zero length file is in the top level directory housing an external avro > table, all hive queries on the table fail. > This issue is that org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader > creates a new org.apache.avro.file.DataFileReader and DataFileReader throws > an exception when trying to read an empty file (because the empty file lacks > the magic number marking it as avro). > AvroGenericRecordReader should detect an empty file and then behave > reasonably. > Caused by: java.io.IOException: Not a data file. > at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:102) > at org.apache.avro.file.DataFileReader.(DataFileReader.java:97) > at > org.apache.hadoop.hive.ql.io.avro.AvroGenericRecordReader.(AvroGenericRecordReader.java:81) > at > org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat.getRecordReader(AvroContainerInputFormat.java:51) > at > org.apache.hadoop.hive.ql.io.HiveInputFormat.getRecordReader(HiveInputFormat.java:246) > ... 25 more -- This message was sent by Atlassian JIRA (v6.3.4#6332)