[jira] [Commented] (HIVE-6003) bin/hive --debug should not append HIVE_CLIENT_OPTS to HADOOP_OPTS
[ https://issues.apache.org/jira/browse/HIVE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845173#comment-13845173 ] Hive QA commented on HIVE-6003: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618173/HIVE-6003.1.patch {color:green}SUCCESS:{color} +1 4762 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/609/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/609/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618173 bin/hive --debug should not append HIVE_CLIENT_OPTS to HADOOP_OPTS --- Key: HIVE-6003 URL: https://issues.apache.org/jira/browse/HIVE-6003 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-6003.1.patch hadoop (0.20.2, 1.x, 2.x) appends HADOOP_CLIENT_OPTS to HADOO_OPTS. So it is unnecessary to have this statement in bin/hive, under debug mode - export HADOOP_OPTS=$HADOOP_OPTS $HADOOP_CLIENT_OPTS It results in the HADOOP_CLIENT_OPTS being appended twice, resulting in this error in debug mode. {code} bin/hive --debug ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options. Error occurred during initialization of VM agent library failed to init: jdwp {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5978) Rollups not supported in vector mode.
[ https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845189#comment-13845189 ] Lefty Leverenz commented on HIVE-5978: -- Does this need to be documented in the design doc (HIVE-4160)? Rollups not supported in vector mode. - Key: HIVE-5978 URL: https://issues.apache.org/jira/browse/HIVE-5978 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.13.0 Attachments: HIVE-5978.1.patch Rollups are not supported in vector mode, the query should fail to vectorize. A separate jira will be filed to implement rollups in vector mode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
Kostiantyn Kudriavtsev created HIVE-6006: Summary: Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Reporter: Kostiantyn Kudriavtsev Priority: Minor It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845224#comment-13845224 ] Kostiantyn Kudriavtsev commented on HIVE-6006: -- Hi guys, could please anyone assign this issue to me? I'll provide path as soon as possible Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Reporter: Kostiantyn Kudriavtsev Priority: Minor Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5978) Rollups not supported in vector mode.
[ https://issues.apache.org/jira/browse/HIVE-5978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845228#comment-13845228 ] Lefty Leverenz commented on HIVE-5978: -- Oh, just realized the wiki has a vectorization design doc now: [https://cwiki.apache.org/confluence/display/Hive/Vectorized+Query+Execution]. It links to HIVE-4160 for the complete doc, very helpful. ROLLUP isn't in the list of supported data types and operations, so perhaps its current lack of support is implied. Should there be a list of supported aggregates? Rollups not supported in vector mode. - Key: HIVE-5978 URL: https://issues.apache.org/jira/browse/HIVE-5978 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.13.0 Attachments: HIVE-5978.1.patch Rollups are not supported in vector mode, the query should fail to vectorize. A separate jira will be filed to implement rollups in vector mode. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
[ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845262#comment-13845262 ] Hive QA commented on HIVE-5945: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618185/HIVE-5945.2.patch.txt {color:green}SUCCESS:{color} +1 4761 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/611/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/611/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618185 ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. - Key: HIVE-5945 URL: https://issues.apache.org/jira/browse/HIVE-5945 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Yin Huai Assignee: Navis Priority: Critical Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt Here is an example {code} select i_item_id, s_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 FROM store_sales JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) JOIN item on (store_sales.ss_item_sk = item.i_item_sk) JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) JOIN store on (store_sales.ss_store_sk = store.s_store_sk) where cd_gender = 'F' and cd_marital_status = 'U' and cd_education_status = 'Primary' and d_year = 2002 and s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') group by i_item_id, s_state order by i_item_id, s_state limit 100; {\code} I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: doc on predicate pushdown in joins
Happy to fix the sentence and the link. I pointed out the name change just so you would review it, so please don't apologize! One more question: why am I not finding getQualifiedAliases() in the SemanticAnalyzer class? It turns up in OpProcFactory.java with javadoc comments, but I can't find it anywhere in the API docs -- not even in the index (Hive 0.12.0 API http://hive.apache.org/docs/r0.12.0/api/): *getQMap()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html#getQMap() - Method in class org.apache.hadoop.hive.ql.QTestUtilhttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html *getQualifiedName()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.html#getQualifiedName() - Method in class org.apache.hadoop.hive.serde2.typeinfo.TypeInfohttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.htmlString representing the qualified type name.*getQualifiers()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiers() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html *getQualifiersSize()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiersSize() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html Most mysterious. -- Lefty On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.comwrote: I can see why you would rename. But this sentence is not correct: 'Hive enforces the predicate pushdown rules by these methods in the SemanticAnalyzer and JoinPPD classes:' It should be: Hive enforces the rules by these methods in the SemanticAnalyzer and JoinPPD classes: (The implementation involves both predicate pushdown and analyzing join conditions) Sorry about this. So the link should say 'Hive Outer Join Behavior' regards, Harish. On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com wrote: How's this? Hive Implementationhttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-HiveImplementation Also, I moved the link on the Design Docs pagehttps://cwiki.apache.org/confluence/display/Hive/DesignDocsfrom *Proposed* to *Other*. (It's called SQL Outer Join Predicate Pushdown Rules https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior which doesn't match the title, but seems okay because it's more descriptive.) -- Lefty On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani hbut...@hortonworks.comwrote: You are correct, it is plural. regards, Harish. On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com wrote: Okay, then monospace with () after the method name is a good way to show them: parseJoinCondition() and getQualifiedAlias() ... but I only found the latter pluralized, instead of singular, so should it be getQualifiedAliases() or am I missing something? trunk *grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'* ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221: * the comments for getQualifiedAliases function. ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230: SetString aliases = getQualifiedAliases((JoinOperator) nd, owi ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242: // be pushed down per getQualifiedAliases ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471: private SetString getQualifiedAliases(JoinOperator op, RowResolver rr) { -- Lefty On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani hbut...@hortonworks.comwrote: Looks good. Thanks for doing this. Minor point: *Rule 1:* During *QBJoinTree* construction in Plan Gen, the parse Join Condition logic applies this rule. *Rule 2:* During *JoinPPD* (Join Predicate Pushdown) the get Qualified Alias logic applies this rule. FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the SemanticAnalyzer and JoinPPD classes respectively. Writing these as separate words maybe confusing. You are better judge of how to represent this(quoted/bold etc.) regards, Harish. On Dec 9, 2013, at 1:52 AM, Lefty Leverenz leftylever...@gmail.com wrote: The Outer Join Behavior wikidoc https://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavioris done, with links from the Design Docs https://cwiki.apache.org/confluence/display/Hive/DesignDocs page and the Joins doc https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinOptimization . Harish (or anyone else) would you please review the changes I made to the definition for Null Supplying table
[jira] [Commented] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845307#comment-13845307 ] Hive QA commented on HIVE-5936: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618158/HIVE-5936.9.patch.txt {color:red}ERROR:{color} -1 due to 5 failed/errored test(s), 4762 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_aggregator_error_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats_publisher_error_1 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_1 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_aggregator_error_2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_stats_publisher_error_2 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/612/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/612/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 5 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12618158 analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5595) Implement vectorized SMB JOIN
[ https://issues.apache.org/jira/browse/HIVE-5595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845334#comment-13845334 ] Hive QA commented on HIVE-5595: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618196/HIVE-5595.2.patch {color:green}SUCCESS:{color} +1 4764 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/613/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/613/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618196 Implement vectorized SMB JOIN - Key: HIVE-5595 URL: https://issues.apache.org/jira/browse/HIVE-5595 Project: Hive Issue Type: Sub-task Reporter: Remus Rusanu Assignee: Remus Rusanu Priority: Critical Attachments: HIVE-5595.1.patch, HIVE-5595.2.patch Original Estimate: 168h Remaining Estimate: 168h -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6007) Make the output of the reduce side plan optimized by the correlation optimizer more reader-friendly.
Yin Huai created HIVE-6007: -- Summary: Make the output of the reduce side plan optimized by the correlation optimizer more reader-friendly. Key: HIVE-6007 URL: https://issues.apache.org/jira/browse/HIVE-6007 Project: Hive Issue Type: Sub-task Reporter: Yin Huai Assignee: Yin Huai Priority: Minor Because a MuxOperator can have multiple parents, the output of the plan can show the sub-plan starting from this MuxOperator multiple times, which makes the reduce side plan confusing. An example is shown in https://mail-archives.apache.org/mod_mbox/hive-user/201312.mbox/%3CCAO0ZKSjniR0z%2BOt4KWouq236fKXo%3D5nE_Oih7A87e3HiuBsG9w%40mail.gmail.com%3E. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: doc on predicate pushdown in joins
getQualifiedAliases is a private method in JoinPPD. Maybe we should remove the section on Hive Implementation here. It is in the Design doc; this information only concerns developers. regards, Harish. On Dec 11, 2013, at 3:05 AM, Lefty Leverenz leftylever...@gmail.com wrote: Happy to fix the sentence and the link. I pointed out the name change just so you would review it, so please don't apologize! One more question: why am I not finding getQualifiedAliases() in the SemanticAnalyzer class? It turns up in OpProcFactory.java with javadoc comments, but I can't find it anywhere in the API docs -- not even in the index (Hive 0.12.0 API): getQMap() - Method in class org.apache.hadoop.hive.ql.QTestUtil getQualifiedName() - Method in class org.apache.hadoop.hive.serde2.typeinfo.TypeInfo String representing the qualified type name. getQualifiers() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiers getQualifiersSize() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiers Most mysterious. -- Lefty On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.com wrote: I can see why you would rename. But this sentence is not correct: 'Hive enforces the predicate pushdown rules by these methods in the SemanticAnalyzer and JoinPPD classes:' It should be: Hive enforces the rules by these methods in the SemanticAnalyzer and JoinPPD classes: (The implementation involves both predicate pushdown and analyzing join conditions) Sorry about this. So the link should say 'Hive Outer Join Behavior' regards, Harish. On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com wrote: How's this? Hive Implementation Also, I moved the link on the Design Docs page from Proposed to Other. (It's called SQL Outer Join Predicate Pushdown Rules which doesn't match the title, but seems okay because it's more descriptive.) -- Lefty On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani hbut...@hortonworks.com wrote: You are correct, it is plural. regards, Harish. On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com wrote: Okay, then monospace with () after the method name is a good way to show them: parseJoinCondition() and getQualifiedAlias() ... but I only found the latter pluralized, instead of singular, so should it be getQualifiedAliases() or am I missing something? trunk grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn' ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221: * the comments for getQualifiedAliases function. ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230: SetString aliases = getQualifiedAliases((JoinOperator) nd, owi ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242: // be pushed down per getQualifiedAliases ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471: private SetString getQualifiedAliases(JoinOperator op, RowResolver rr) { -- Lefty On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani hbut...@hortonworks.com wrote: Looks good. Thanks for doing this. Minor point: Rule 1: During QBJoinTree construction in Plan Gen, the parse Join Condition logic applies this rule. Rule 2: During JoinPPD (Join Predicate Pushdown) the get Qualified Alias logic applies this rule. FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the SemanticAnalyzer and JoinPPD classes respectively. Writing these as separate words maybe confusing. You are better judge of how to represent this(quoted/bold etc.) regards, Harish. On Dec 9, 2013, at 1:52 AM, Lefty Leverenz leftylever...@gmail.com wrote: The Outer Join Behavior wikidochttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavioris done, with links from the Design Docs https://cwiki.apache.org/confluence/display/Hive/DesignDocs page and the Joins dochttps://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins#LanguageManualJoins-JoinOptimization . Harish (or anyone else) would you please review the changes I made to the definition for Null Supplying tablehttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-Definitions ? -- Lefty On Mon, Dec 2, 2013 at 6:46 PM, Thejas Nair the...@hortonworks.com wrote: :) On Mon, Dec 2, 2013 at 6:18 PM, Lefty Leverenz leftylever...@gmail.com wrote: Easy as 3.14159 (I can take a hint.) -- Lefty On Mon, Dec 2, 2013 at 5:34 PM, Thejas Nair the...@hortonworks.com wrote: FYI, Harish has a written a very nice doc describing predicate push down rules for join. I have attached it to the design doc page. It will be very useful for anyone looking at joins. https://cwiki.apache.org/confluence/download/attachments/27362075/OuterJoinBehavior.html (any help converting it to wiki format from html is welcome!). -- CONFIDENTIALITY
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Affects Version/s: 0.13.0 Tags: UDF Fix Version/s: 0.13.0 Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5897) Fix hadoop2 execution environment Milestone 2
[ https://issues.apache.org/jira/browse/HIVE-5897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-5897: --- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Vikram! Fix hadoop2 execution environment Milestone 2 - Key: HIVE-5897 URL: https://issues.apache.org/jira/browse/HIVE-5897 Project: Hive Issue Type: Sub-task Reporter: Brock Noland Assignee: Vikram Dixit K Fix For: 0.13.0 Attachments: HIVE-5897.4.patch, HIVE-5897.5.patch, HIVE-5897.patch, HIVE-5897.patch, HIVE-5897.patch Follow on to HIVE-5755. List of known issues: hcatalog-pig-adapter and ql need {noformat} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-common/artifactId version${hadoop-23.version}/version scopetest/scope /dependency {noformat} hcatalog core and hbase storage handler needs {noformat} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-common/artifactId version${hadoop-23.version}/version classifiertests/classifier scopetest/scope /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-hs/artifactId version${hadoop-23.version}/version scopetest/scope /dependency dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-yarn-server-tests/artifactId version${hadoop-23.version}/version classifiertests/classifier scopetest/scope /dependency {noformat} hcatalog core needs: {noformat} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-jobclient/artifactId version${hadoop-23.version}/version scopetest/scope /dependency {noformat} beeline needs {noformat} dependency groupIdorg.apache.hadoop/groupId artifactIdhadoop-mapreduce-client-core/artifactId version${hadoop-23.version}/version scopetest/scope /dependency {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845634#comment-13845634 ] Eric Hanson commented on HIVE-5996: --- -1 We should not be changing the output data types of expression results that are arguably reasonable. It causes code churn and can break existing apps. Having sum(bigint) return bigint is long standing behavior in Hive and is reasonable. As a side note, SQL Server returns bigint for sum(bigint). If users need more digits, they can cast the input to sum to a decimal. Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16146: HIVE-5993: JDBC Driver should not hard-code the database name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16146/#review30215 --- The patch look fine. It would be better to avoid duplicating the code to call GetInfo(). jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java https://reviews.apache.org/r/16146/#comment57827 I think it would be better to add a helper method to call GetInfo() with given InfoType. We'll endup duplicating this code in multiple places. - Prasad Mujumdar On Dec. 10, 2013, 12:54 a.m., Szehon Ho wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16146/ --- (Updated Dec. 10, 2013, 12:54 a.m.) Review request for hive and Prasad Mujumdar. Bugs: HIVE-5993 https://issues.apache.org/jira/browse/HIVE-5993 Repository: hive-git Description --- Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded string Hive. This should instead call the existing Hive-server2 api to return the db name. Incidentally, the server returns Apache Hive. Diffs - itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 1ba8ad3 jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java 5087ded Diff: https://reviews.apache.org/r/16146/diff/ Testing --- Ran TestJdbcDriver2. Thanks, Szehon Ho
[jira] [Commented] (HIVE-5993) JDBC Driver should not hard-code the database name
[ https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845646#comment-13845646 ] Prasad Mujumdar commented on HIVE-5993: --- [~szehon] I added a comment on the reviewboard. JDBC Driver should not hard-code the database name -- Key: HIVE-5993 URL: https://issues.apache.org/jira/browse/HIVE-5993 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-5993.patch, HIVE-5993.patch Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded string hive. This should instead call the existing Hive-server2 api to return the db name. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845650#comment-13845650 ] Thejas M Nair commented on HIVE-5996: - [~xuefuz] Can you please mark any jiras that make/propose such non backward compatible changes with the 'Incompatible change' flag ? That would ensure that the community reviews such changes more carefully. Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5996: Hadoop Flags: Incompatible change Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6004) Fix statistics annotation related test failures in hadoop2
[ https://issues.apache.org/jira/browse/HIVE-6004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845655#comment-13845655 ] Harish Butani commented on HIVE-6004: - +1 Fix statistics annotation related test failures in hadoop2 -- Key: HIVE-6004 URL: https://issues.apache.org/jira/browse/HIVE-6004 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.13.0 Attachments: HIVE-6004.1.patch Fix test failures that are related to HIVE-5369 and its subtask changes. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL
[ https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845666#comment-13845666 ] Sergey Shelukhin commented on HIVE-5679: added extra date parsing to metastore itself add date support to metastore JDO/SQL - Key: HIVE-5679 URL: https://issues.apache.org/jira/browse/HIVE-5679 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5679.01.patch, HIVE-5679.patch Metastore supports strings and integral types in filters. It could also support dates. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5679) add date support to metastore JDO/SQL
[ https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-5679: --- Attachment: HIVE-5679.01.patch add date support to metastore JDO/SQL - Key: HIVE-5679 URL: https://issues.apache.org/jira/browse/HIVE-5679 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5679.01.patch, HIVE-5679.patch Metastore supports strings and integral types in filters. It could also support dates. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845664#comment-13845664 ] Xuefu Zhang commented on HIVE-5996: --- Okay. Will do. Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16171: HIVE-5679 add date support to metastore JDO/SQL
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16171/ --- (Updated Dec. 11, 2013, 7:41 p.m.) Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- See JIRA Diffs (updated) - metastore/src/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java 01c2626 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java a98d9d1 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 04d399f metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 93e9942 metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g 00e90cb ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 4b7fc73 ql/src/test/queries/clientpositive/partition_date.q 3c031db ql/src/test/results/clientpositive/partition_date.q.out 3462a1b Diff: https://reviews.apache.org/r/16171/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Updated] (HIVE-6008) optionally include metastore info into explain output
[ https://issues.apache.org/jira/browse/HIVE-6008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6008: --- Summary: optionally include metastore info into explain output (was: optionally include metastore path into explain output) optionally include metastore info into explain output - Key: HIVE-6008 URL: https://issues.apache.org/jira/browse/HIVE-6008 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Priority: Minor To verify some metastore perf improvements are working, it would be nice to (optionally) output basic description of what metastore did (whether filter was pushed down, whether sql was used) into explain output, and enable this option for some q tests (e.g. partition_date, and a few others), or all of them. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6008) optionally include metastore path into explain output
Sergey Shelukhin created HIVE-6008: -- Summary: optionally include metastore path into explain output Key: HIVE-6008 URL: https://issues.apache.org/jira/browse/HIVE-6008 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Priority: Minor To verify some metastore perf improvements are working, it would be nice to (optionally) output basic description of what metastore did (whether filter was pushed down, whether sql was used) into explain output, and enable this option for some q tests (e.g. partition_date, and a few others), or all of them. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations
[ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845677#comment-13845677 ] Eric Hanson commented on HIVE-5356: --- Vectorization has to implement a specific semantics for an operation. So if the semantics change, the vectorized implementation of the operation must be changed too, or the operation could either fail to vectorize or give wrong results. Move arithmatic UDFs to generic UDF implementations --- Key: HIVE-5356 URL: https://issues.apache.org/jira/browse/HIVE-5356 Project: Hive Issue Type: Task Components: UDF Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, HIVE-5356.8.patch, HIVE-5356.9.patch Currently, all of the arithmetic operators, such as add/sub/mult/div, are implemented as old-style UDFs and java reflection is used to determine the return type TypeInfos/ObjectInspectors, based on the return type of the evaluate() method chosen for the expression. This works fine for types that don't have type params. Hive decimal type participates in these operations just like int or double. Different from double or int, however, decimal has precision and scale, which cannot be determined by just looking at the return type (decimal) of the UDF evaluate() method, even though the operands have certain precision/scale. With the default of decimal without precision/scale, then (10, 0) will be the type params. This is certainly not desirable. To solve this problem, all of the arithmetic operators would need to be implemented as GenericUDFs, which allow returning ObjectInspector during the initialize() method. The object inspectors returned can carry type params, from which the exact return type can be determined. It's worth mentioning that, for user UDF implemented in non-generic way, if the return type of the chosen evaluate() method is decimal, the return type actually has (10,0) as precision/scale, which might not be desirable. This needs to be documented. This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit the scope of review. The remaining ones will be covered under HIVE-5706. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations
[ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845682#comment-13845682 ] Sergey Shelukhin commented on HIVE-5356: Maybe some special test can be added for this? Run a set of simple queries with and without, and ensure results are the same. That will solve the problem before commits. Move arithmatic UDFs to generic UDF implementations --- Key: HIVE-5356 URL: https://issues.apache.org/jira/browse/HIVE-5356 Project: Hive Issue Type: Task Components: UDF Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, HIVE-5356.8.patch, HIVE-5356.9.patch Currently, all of the arithmetic operators, such as add/sub/mult/div, are implemented as old-style UDFs and java reflection is used to determine the return type TypeInfos/ObjectInspectors, based on the return type of the evaluate() method chosen for the expression. This works fine for types that don't have type params. Hive decimal type participates in these operations just like int or double. Different from double or int, however, decimal has precision and scale, which cannot be determined by just looking at the return type (decimal) of the UDF evaluate() method, even though the operands have certain precision/scale. With the default of decimal without precision/scale, then (10, 0) will be the type params. This is certainly not desirable. To solve this problem, all of the arithmetic operators would need to be implemented as GenericUDFs, which allow returning ObjectInspector during the initialize() method. The object inspectors returned can carry type params, from which the exact return type can be determined. It's worth mentioning that, for user UDF implemented in non-generic way, if the return type of the chosen evaluate() method is decimal, the return type actually has (10,0) as precision/scale, which might not be desirable. This needs to be documented. This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit the scope of review. The remaining ones will be covered under HIVE-5706. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5679) add date support to metastore JDO/SQL
[ https://issues.apache.org/jira/browse/HIVE-5679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845711#comment-13845711 ] Hive QA commented on HIVE-5679: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618282/HIVE-5679.01.patch {color:green}SUCCESS:{color} +1 4762 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/614/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/614/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618282 add date support to metastore JDO/SQL - Key: HIVE-5679 URL: https://issues.apache.org/jira/browse/HIVE-5679 Project: Hive Issue Type: Improvement Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-5679.01.patch, HIVE-5679.patch Metastore supports strings and integral types in filters. It could also support dates. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6003) bin/hive --debug should not append HIVE_CLIENT_OPTS to HADOOP_OPTS
[ https://issues.apache.org/jira/browse/HIVE-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845714#comment-13845714 ] Thejas M Nair commented on HIVE-6003: - bq. I think it is sufficient to just remove HADOOP_CLIENT_OPTS from HADOOP_OPTS to make it work. That is what I am doing in the patch. bin/hive --debug should not append HIVE_CLIENT_OPTS to HADOOP_OPTS --- Key: HIVE-6003 URL: https://issues.apache.org/jira/browse/HIVE-6003 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-6003.1.patch hadoop (0.20.2, 1.x, 2.x) appends HADOOP_CLIENT_OPTS to HADOO_OPTS. So it is unnecessary to have this statement in bin/hive, under debug mode - export HADOOP_OPTS=$HADOOP_OPTS $HADOOP_CLIENT_OPTS It results in the HADOOP_CLIENT_OPTS being appended twice, resulting in this error in debug mode. {code} bin/hive --debug ERROR: Cannot load this JVM TI agent twice, check your java command line for duplicate jdwp options. Error occurred during initialization of VM agent library failed to init: jdwp {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5872) Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types
[ https://issues.apache.org/jira/browse/HIVE-5872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845715#comment-13845715 ] Xuefu Zhang commented on HIVE-5872: --- Thanks for the review, Prasad. {quote} In general, it looks like we need more exception logic for handling decimals in UDAF (HIVE-5872, HIVE-5866). It might be useful to add a note in the dev guide for future work .. {quote} I assume you are referring the following code snippet: {code} if (t == null) { return warnedOnceNullMapKey; } {code} I agree with your assessment. Currently Hive emits null as the only error handling option. Thus, null check is (or is missed) everywhere in the code, not specific to decimal. For a long run, I agree we need to have a better exception handling, especially when we introduce different error handling (HIVE-5438). Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types --- Key: HIVE-5872 URL: https://issues.apache.org/jira/browse/HIVE-5872 Project: Hive Issue Type: Improvement Components: Types, UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-5872.1.patch, HIVE-5872.2.patch, HIVE-5872.3.patch, HIVE-5872.4.patch, HIVE-5872.patch Currently UDAFs are still reporting system default precision/scale (38, 18) for decimal results. Not only this is coarse, but also this can cause problems in subsequent operators such as division, where the result is dependent on the precision/scale of the input, which can go out of bound (38,38). Thus, these UDAFs should correctly report the precision/scale of the result. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6009) Add from_unixtime UDF that has controllable Timezone
Johndee Burks created HIVE-6009: --- Summary: Add from_unixtime UDF that has controllable Timezone Key: HIVE-6009 URL: https://issues.apache.org/jira/browse/HIVE-6009 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.10.0 Environment: CDH4.4 Reporter: Johndee Burks Priority: Trivial Currently the from_unixtime UDF takes into a account timezone of the system doing the transformation. I think that implementation is good, but it would be nice to include or change the current UDF to have a configurable timezone. It would be useful for looking at timestamp data from different regions in the native region's timezone. Example: from_unixtime(unix_time, format, timezone) from_unixtime(129384, dd MMM , GMT-5) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Review Request 16184: Hive should be able to skip header and footer rows when reading data file for a table (HIVE-5795)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16184/ --- Review request for hive, Eric Hanson and Thejas Nair. Bugs: hive-5795 https://issues.apache.org/jira/browse/hive-5795 Repository: hive-git Description --- Hive should be able to skip header and footer rows when reading data file for a table (follow up with review https://reviews.apache.org/r/15663/diff/#index_header) Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 conf/hive-default.xml.template c61a0bb data/files/header_footer_table_1/0001.txt PRE-CREATION data/files/header_footer_table_1/0002.txt PRE-CREATION data/files/header_footer_table_1/0003.txt PRE-CREATION data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION itests/qtest/pom.xml c3cbb89 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java d2b2526 ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java dd5cb6b ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 974a5d6 ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java 85dd975 ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 0686d9b ql/src/test/queries/clientnegative/file_with_header_footer_negative.q PRE-CREATION ql/src/test/queries/clientpositive/file_with_header_footer.q PRE-CREATION ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out PRE-CREATION ql/src/test/results/clientpositive/file_with_header_footer.q.out PRE-CREATION serde/if/serde.thrift 2ceb572 serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 22a6168 Diff: https://reviews.apache.org/r/16184/diff/ Testing --- Thanks, Shuaishuai Nie
[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845731#comment-13845731 ] Shuaishuai Nie commented on HIVE-5795: -- Updated the code review at https://reviews.apache.org/r/16174/ Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.number=1, skip.footer.number=2); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5979) Failure in cast to timestamps.
[ https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845745#comment-13845745 ] Jitendra Nath Pandey commented on HIVE-5979: Committed to trunk. Failure in cast to timestamps. -- Key: HIVE-5979 URL: https://issues.apache.org/jira/browse/HIVE-5979 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.13.0 Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch Query ran: {code} select cast(t as timestamp), cast(si as timestamp), cast(i as timestamp), cast(b as timestamp), cast(f as string), cast(d as timestamp), cast(bo as timestamp), cast(b * 0 as timestamp), cast(ts as timestamp), cast(s as timestamp), cast(substr(s, 1, 1) as timestamp) from Table1; {code} Running this query with hive.vectorized.execution.enabled=true fails with the following exception: {noformat} 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201) at org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193) ... 8 more Caused by: java.lang.IllegalArgumentException: nanos 9 or 0 at java.sql.Timestamp.setNanos(Timestamp.java:383) at org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43) ... 9 more {noformat} Full log is attached. Schema for the table is as follows: {code} hive desc Table1; OK t tinyint from deserializer sismallintfrom deserializer i int from deserializer b bigint from deserializer f float from deserializer d double from deserializer boboolean from deserializer s string from deserializer s2string from deserializer tstimestamp from deserializer Time taken: 0.521 seconds, Fetched: 10 row(s) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.
[ https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5979: --- Release Note: (was: Committed to trunk.) Failure in cast to timestamps. -- Key: HIVE-5979 URL: https://issues.apache.org/jira/browse/HIVE-5979 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.13.0 Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch Query ran: {code} select cast(t as timestamp), cast(si as timestamp), cast(i as timestamp), cast(b as timestamp), cast(f as string), cast(d as timestamp), cast(bo as timestamp), cast(b * 0 as timestamp), cast(ts as timestamp), cast(s as timestamp), cast(substr(s, 1, 1) as timestamp) from Table1; {code} Running this query with hive.vectorized.execution.enabled=true fails with the following exception: {noformat} 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201) at org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193) ... 8 more Caused by: java.lang.IllegalArgumentException: nanos 9 or 0 at java.sql.Timestamp.setNanos(Timestamp.java:383) at org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43) ... 9 more {noformat} Full log is attached. Schema for the table is as follows: {code} hive desc Table1; OK t tinyint from deserializer sismallintfrom deserializer i int from deserializer b bigint from deserializer f float from deserializer d double from deserializer boboolean from deserializer s string from deserializer s2string from deserializer tstimestamp from deserializer Time taken: 0.521 seconds, Fetched: 10 row(s) {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5979) Failure in cast to timestamps.
[ https://issues.apache.org/jira/browse/HIVE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey updated HIVE-5979: --- Resolution: Fixed Fix Version/s: 0.13.0 Release Note: Committed to trunk. Status: Resolved (was: Patch Available) Failure in cast to timestamps. -- Key: HIVE-5979 URL: https://issues.apache.org/jira/browse/HIVE-5979 Project: Hive Issue Type: Sub-task Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: 0.13.0 Attachments: HIVE-5979.1.patch, HIVE-5979.2.patch Query ran: {code} select cast(t as timestamp), cast(si as timestamp), cast(i as timestamp), cast(b as timestamp), cast(f as string), cast(d as timestamp), cast(bo as timestamp), cast(b * 0 as timestamp), cast(ts as timestamp), cast(s as timestamp), cast(substr(s, 1, 1) as timestamp) from Table1; {code} Running this query with hive.vectorized.execution.enabled=true fails with the following exception: {noformat} 13/12/05 07:56:36 ERROR tez.TezJobMonitor: Status: Failed Vertex failed, vertexName=Map 1, vertexId=vertex_1386227234886_0482_1_00, diagnostics=[Task failed, taskId=task_1386227234886_0482_1_00_00, diagnostics=[AttemptID:attempt_1386227234886_0482_1_00_00_0 Info:Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:205) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:171) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:112) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:201) at org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:484) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491) at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:474) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.processRow(MapRecordProcessor.java:193) ... 8 more Caused by: java.lang.IllegalArgumentException: nanos 9 or 0 at java.sql.Timestamp.setNanos(Timestamp.java:383) at org.apache.hadoop.hive.ql.exec.vector.TimestampUtils.assignTimeInNanoSec(TimestampUtils.java:27) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$1.writeValue(VectorExpressionWriterFactory.java:412) at org.apache.hadoop.hive.ql.exec.vector.expressions.VectorExpressionWriterFactory$VectorExpressionWriterLong.writeValue(VectorExpressionWriterFactory.java:162) at org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch.toString(VectorizedRowBatch.java:152) at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.processOp(VectorFileSinkOperator.java:85) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.processOp(VectorSelectOperator.java:129) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:93) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:786) at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:43) ... 9 more {noformat} Full log is attached. Schema for the table is as follows: {code} hive desc Table1; OK t tinyint from deserializer sismallintfrom deserializer i int from deserializer b bigint from deserializer f float from deserializer d double from deserializer boboolean from deserializer s string from deserializer s2string from deserializer tstimestamp from deserializer Time taken: 0.521 seconds, Fetched: 10 row(s) {code} -- This message was sent by Atlassian JIRA
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845760#comment-13845760 ] Xuefu Zhang commented on HIVE-5996: --- {quote} Having sum(bigint) return bigint is long standing behavior in Hive and is reasonable. As a side note, SQL Server returns bigint for sum(bigint). If users need more digits, they can cast the input to sum to a decimal. {quote} My concern is not about the number of digits that long can hold. Hive processes large number of rows that traditional DBs are shy of, and the chance of getting overflow error is bigger. With the proposed change, Hive can guarantee 10b (or certain number of) rows without worry about this problem. Without it, Hive has such guaranty, and two valid rows can overflow, as demonstrated in this JIRA. Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5356) Move arithmatic UDFs to generic UDF implementations
[ https://issues.apache.org/jira/browse/HIVE-5356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845784#comment-13845784 ] Sergey Shelukhin commented on HIVE-5356: Filed HIVE-6010 Move arithmatic UDFs to generic UDF implementations --- Key: HIVE-5356 URL: https://issues.apache.org/jira/browse/HIVE-5356 Project: Hive Issue Type: Task Components: UDF Affects Versions: 0.11.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Fix For: 0.13.0 Attachments: HIVE-5356.1.patch, HIVE-5356.10.patch, HIVE-5356.11.patch, HIVE-5356.12.patch, HIVE-5356.2.patch, HIVE-5356.3.patch, HIVE-5356.4.patch, HIVE-5356.5.patch, HIVE-5356.6.patch, HIVE-5356.7.patch, HIVE-5356.8.patch, HIVE-5356.9.patch Currently, all of the arithmetic operators, such as add/sub/mult/div, are implemented as old-style UDFs and java reflection is used to determine the return type TypeInfos/ObjectInspectors, based on the return type of the evaluate() method chosen for the expression. This works fine for types that don't have type params. Hive decimal type participates in these operations just like int or double. Different from double or int, however, decimal has precision and scale, which cannot be determined by just looking at the return type (decimal) of the UDF evaluate() method, even though the operands have certain precision/scale. With the default of decimal without precision/scale, then (10, 0) will be the type params. This is certainly not desirable. To solve this problem, all of the arithmetic operators would need to be implemented as GenericUDFs, which allow returning ObjectInspector during the initialize() method. The object inspectors returned can carry type params, from which the exact return type can be determined. It's worth mentioning that, for user UDF implemented in non-generic way, if the return type of the chosen evaluate() method is decimal, the return type actually has (10,0) as precision/scale, which might not be desirable. This needs to be documented. This JIRA will cover minus, plus, divide, multiply, mod, and pmod, to limit the scope of review. The remaining ones will be covered under HIVE-5706. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
Sergey Shelukhin created HIVE-6010: -- Summary: create a test that would ensure vectorization produces same results as non-vectorized execution Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845783#comment-13845783 ] Sergey Shelukhin commented on HIVE-6010: I will likely take it later this week if noone else takes it before. create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Reporter: Sergey Shelukhin So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6010: --- Component/s: Vectorization Tests create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16146: HIVE-5993: JDBC Driver should not hard-code the database name
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16146/ --- (Updated Dec. 11, 2013, 10:33 p.m.) Review request for hive and Prasad Mujumdar. Changes --- Thanks for the suggestion. Refactored the getInfo logic into a single method. Bugs: HIVE-5993 https://issues.apache.org/jira/browse/HIVE-5993 Repository: hive-git Description --- Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded string Hive. This should instead call the existing Hive-server2 api to return the db name. Incidentally, the server returns Apache Hive. Diffs (updated) - itests/hive-unit/src/test/java/org/apache/hive/jdbc/TestJdbcDriver2.java 1ba8ad3 jdbc/src/java/org/apache/hive/jdbc/HiveDatabaseMetaData.java 5087ded Diff: https://reviews.apache.org/r/16146/diff/ Testing --- Ran TestJdbcDriver2. Thanks, Szehon Ho
[jira] [Updated] (HIVE-5993) JDBC Driver should not hard-code the database name
[ https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Szehon Ho updated HIVE-5993: Attachment: HIVE-5993.1.patch Incorporating review feedback. JDBC Driver should not hard-code the database name -- Key: HIVE-5993 URL: https://issues.apache.org/jira/browse/HIVE-5993 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-5993.1.patch, HIVE-5993.patch, HIVE-5993.patch Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded string hive. This should instead call the existing Hive-server2 api to return the db name. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5521) Remove CommonRCFileInputFormat
[ https://issues.apache.org/jira/browse/HIVE-5521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845825#comment-13845825 ] Jitendra Nath Pandey commented on HIVE-5521: +1 Remove CommonRCFileInputFormat -- Key: HIVE-5521 URL: https://issues.apache.org/jira/browse/HIVE-5521 Project: Hive Issue Type: Bug Components: File Formats, Vectorization Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-5521.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6011) correlation optimizer unit tests are failing on tez
Ashutosh Chauhan created HIVE-6011: -- Summary: correlation optimizer unit tests are failing on tez Key: HIVE-6011 URL: https://issues.apache.org/jira/browse/HIVE-6011 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Some extra clean-ups in tez branch made this to fail. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6011) correlation optimizer unit tests are failing on tez
[ https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-6011: --- Attachment: HIVE-6011-tez-branch.patch correlation optimizer unit tests are failing on tez Key: HIVE-6011 URL: https://issues.apache.org/jira/browse/HIVE-6011 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6011-tez-branch.patch Some extra clean-ups in tez branch made this to fail. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-4574) XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck
[ https://issues.apache.org/jira/browse/HIVE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845856#comment-13845856 ] Steven Wong commented on HIVE-4574: --- [https://bugs.openjdk.java.net/browse/JDK-8028054] now says that the bug is fixed in 8 and backported to 7u60. XMLEncoder thread safety issues in openjdk7 causes HiveServer2 to be stuck -- Key: HIVE-4574 URL: https://issues.apache.org/jira/browse/HIVE-4574 Project: Hive Issue Type: Bug Components: HiveServer2 Affects Versions: 0.11.0, 0.12.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4574.1.patch In open jdk7, XMLEncoder.writeObject call leads to calls to java.beans.MethodFinder.findMethod(). MethodFinder class not thread safe because it uses a static WeakHashMap that would get used from multiple threads. See - http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7-b147/com/sun/beans/finder/MethodFinder.java#46 Concurrent access to HashMap implementation that are not thread safe can sometimes result in infinite-loops and other problems. If jdk7 is in use, it makes sense to synchronize calls to XMLEncoder.writeObject . -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6012) restore backward compatibility of arithmetic operations
Thejas M Nair created HIVE-6012: --- Summary: restore backward compatibility of arithmetic operations Key: HIVE-6012 URL: https://issues.apache.org/jira/browse/HIVE-6012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Thejas M Nair HIVE-5356 changed the behavior of some of the arithmetic operations, and the change is not backward compatible, as pointed out in this [jira comment|https://issues.apache.org/jira/browse/HIVE-5356?focusedCommentId=13813398page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13813398] {code} int / int = decimal float / float = double float * float = double float + float = double {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Attachment: QuotedIdentifier.html Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Attachments: QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6013) Supporting Quoted Identifiers in Column Names
Harish Butani created HIVE-6013: --- Summary: Supporting Quoted Identifiers in Column Names Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Attachments: QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6011) correlation optimizer unit tests are failing on tez
[ https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845878#comment-13845878 ] Gunther Hagleitner commented on HIVE-6011: -- LGTM +1 correlation optimizer unit tests are failing on tez Key: HIVE-6011 URL: https://issues.apache.org/jira/browse/HIVE-6011 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6011-tez-branch.patch Some extra clean-ups in tez branch made this to fail. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Attachment: HIVE-6013.1.patch Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani reassigned HIVE-6013: --- Assignee: Harish Butani Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Fix Version/s: 0.13.0 Status: Patch Available (was: Open) Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6014) Stage ids differ in the tez branch
Vikram Dixit K created HIVE-6014: Summary: Stage ids differ in the tez branch Key: HIVE-6014 URL: https://issues.apache.org/jira/browse/HIVE-6014 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6014.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-3183) case expression should allow different types per ISO-SQL 2011
[ https://issues.apache.org/jira/browse/HIVE-3183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845883#comment-13845883 ] Szehon Ho commented on HIVE-3183: - I guess we can resolve this one as duplicate, unless there is something of this JIRA not captured by the other? case expression should allow different types per ISO-SQL 2011 - Key: HIVE-3183 URL: https://issues.apache.org/jira/browse/HIVE-3183 Project: Hive Issue Type: Bug Components: SQL Affects Versions: 0.8.0 Reporter: N Campbell Attachments: Hive-3183.patch.txt, udf_when_type_wrong2.q.out, udf_when_type_wrong3.q.out The ISO-SQL standard specification for CASE allows the specification to include different types in the WHEN and ELSE blocks including this example which mixes smallint and integer types select case when vsint.csint is not null then vsint.csint else 1 end from cert.vsint vsint The Apache Hive docs do not state how it deviates from the standard or any given restrictions so unsure if this is a bug vs an enhancement. Many SQL applications mix so this seems to be a restrictive implementation if this is by design. Argument type mismatch '1': The expression after ELSE should have the same type as those after THEN: smallint is expected but int is found -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6014) Stage ids differ in the tez branch
[ https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6014: - Status: Patch Available (was: Open) Stage ids differ in the tez branch -- Key: HIVE-6014 URL: https://issues.apache.org/jira/browse/HIVE-6014 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6014.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6014) Stage ids differ in the tez branch
[ https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-6014: - Attachment: HIVE-6014.1.patch Stage ids differ in the tez branch -- Key: HIVE-6014 URL: https://issues.apache.org/jira/browse/HIVE-6014 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6014.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6014) Stage ids differ in the tez branch
[ https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845892#comment-13845892 ] Gunther Hagleitner commented on HIVE-6014: -- LGTM +1 Stage ids differ in the tez branch -- Key: HIVE-6014 URL: https://issues.apache.org/jira/browse/HIVE-6014 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6014.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-6011) correlation optimizer unit tests are failing on tez
[ https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-6011. -- Resolution: Fixed Committed to branch. Thanks Ashutosh! correlation optimizer unit tests are failing on tez Key: HIVE-6011 URL: https://issues.apache.org/jira/browse/HIVE-6011 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: HIVE-6011-tez-branch.patch Some extra clean-ups in tez branch made this to fail. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6011) correlation optimizer unit tests are failing on tez
[ https://issues.apache.org/jira/browse/HIVE-6011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6011: - Fix Version/s: tez-branch correlation optimizer unit tests are failing on tez Key: HIVE-6011 URL: https://issues.apache.org/jira/browse/HIVE-6011 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Fix For: tez-branch Attachments: HIVE-6011-tez-branch.patch Some extra clean-ups in tez branch made this to fail. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one
Sergey Shelukhin created HIVE-6015: -- Summary: vectorized logarithm produces results for 0 that are different from a non-vectorized one Key: HIVE-6015 URL: https://issues.apache.org/jira/browse/HIVE-6015 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5993) JDBC Driver should not hard-code the database name
[ https://issues.apache.org/jira/browse/HIVE-5993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845900#comment-13845900 ] Hive QA commented on HIVE-5993: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618310/HIVE-5993.1.patch {color:green}SUCCESS:{color} +1 4763 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/615/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/615/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12618310 JDBC Driver should not hard-code the database name -- Key: HIVE-5993 URL: https://issues.apache.org/jira/browse/HIVE-5993 Project: Hive Issue Type: Improvement Components: JDBC Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-5993.1.patch, HIVE-5993.patch, HIVE-5993.patch Method HiveDatabaseMetadata.getDatabaseProductName() returns a hard-coded string hive. This should instead call the existing Hive-server2 api to return the db name. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
adding ANSI flag for hive
Hi. There's recently been some discussion about data type changes in Hive (double to decimal), and result changes for special cases like division by zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an example; I am assuming ANSI SQL is meant). The latter are non-controversial (I guess), but for the former, performance may suffer and/or backward compat may be broken if Hive is brought in compliance. If fuller ANSI compat is sought in the future, there may be some even hairier issues such as double-quoted identifiers. In light of that, and also following MySQL, I wonder if we should add a flag, or set of flags, to HIVE to be able to force ANSI compliance. When this/ese flag/s is/are not set, for example, int/int division could return double for backward compat/perf, vectorization can skip the special case handling for division by zero/etc., etc. Wdyt? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-6012) restore backward compatibility of arithmetic operations
[ https://issues.apache.org/jira/browse/HIVE-6012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845916#comment-13845916 ] Sergey Shelukhin commented on HIVE-6012: I started a dev alias thread about having ANSI flag to choose between old Hive mode and ANSI SQL mode restore backward compatibility of arithmetic operations --- Key: HIVE-6012 URL: https://issues.apache.org/jira/browse/HIVE-6012 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.13.0 Reporter: Thejas M Nair HIVE-5356 changed the behavior of some of the arithmetic operations, and the change is not backward compatible, as pointed out in this [jira comment|https://issues.apache.org/jira/browse/HIVE-5356?focusedCommentId=13813398page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13813398] {code} int / int = decimal float / float = double float * float = double float + float = double {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one
[ https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6015: --- Status: Patch Available (was: Open) vectorized logarithm produces results for 0 that are different from a non-vectorized one Key: HIVE-6015 URL: https://issues.apache.org/jira/browse/HIVE-6015 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6015.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one
[ https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6015: --- Attachment: HIVE-6015.patch Small (logically) patch vectorized logarithm produces results for 0 that are different from a non-vectorized one Key: HIVE-6015 URL: https://issues.apache.org/jira/browse/HIVE-6015 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6015.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Resolved] (HIVE-6005) BETWEEN is broken after using KRYO
[ https://issues.apache.org/jira/browse/HIVE-6005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin resolved HIVE-6005. Resolution: Duplicate HIVE-5263 appears to fix this. Can you try that patch? BETWEEN is broken after using KRYO -- Key: HIVE-6005 URL: https://issues.apache.org/jira/browse/HIVE-6005 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Eric Chu After taking in HIVE-1511, HIVE-5422, and HIVE-5257 on top of Hive 0.12 to use Kryo, queries with BETWEEN start to fail with the following exception: com.esotericsoftware.kryo.KryoException: Class cannot be created (missing no-arg constructor): org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableConstantBooleanObjectInspector Serialization trace: argumentOIs (org.apache.hadoop.hive.ql.udf.generic.GenericUDFBetween) genericUDF (org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc) filters (org.apache.hadoop.hive.ql.plan.JoinDesc) conf (org.apache.hadoop.hive.ql.exec.JoinOperator) reducer (org.apache.hadoop.hive.ql.plan.ReduceWork) at com.esotericsoftware.kryo.Kryo.newInstantiator(Kryo.java:1097) at com.esotericsoftware.kryo.Kryo.newInstance(Kryo.java:1109) at com.esotericsoftware.kryo.serializers.FieldSerializer.create(FieldSerializer.java:526) ... A workaround is to replace BETWEEN with = and =, but I think this failure is a bug and not by design. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
Sushanth Sowmyan created HIVE-6016: -- Summary: Hadoop23Shims has a bug in listLocatedStatus impl. Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). The problem with this approach, however, is that if you have an underlying (a,_s,b) and expect a (a,b) from it, you won't, because it translates to a (a,null,b), which then translates to a (a). Furthermore, there's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845932#comment-13845932 ] Sushanth Sowmyan commented on HIVE-6016: Thanks for the correction, Prashanth, I've edited the bug report to remove that case. Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6016: - Attachment: HIVE-6016.1.patch Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6016: --- Description: Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. was: Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). The problem with this approach, however, is that if you have an underlying (a,_s,b) and expect a (a,b) from it, you won't, because it translates to a (a,null,b), which then translates to a (a). Furthermore, there's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845931#comment-13845931 ] Prasanth J commented on HIVE-6016: -- There is a correction to the description. I think only (_s,a,b) is a problem. The logic seems not to apply the PathFilter for first file alone. For other cases it works fine as there is a while loop in next() which keeps iterating to next valid file by applying filter. So in case of (a,_s,b), first file is a for which no filter is applied. For the next file _s filter is applied and next becomes null. But the while continues to next valid file in which case its b. So finally only (a,b) is returned. The iterator will not return null under any case. Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). The problem with this approach, however, is that if you have an underlying (a,_s,b) and expect a (a,b) from it, you won't, because it translates to a (a,null,b), which then translates to a (a). Furthermore, there's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sushanth Sowmyan updated HIVE-6016: --- Description: Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. was: Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845936#comment-13845936 ] Prasanth J commented on HIVE-6016: -- This should fix hcatalog unit test failure TestOrcDynamicPartitioned in hadoop2. Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6015) vectorized logarithm produces results for 0 that are different from a non-vectorized one
[ https://issues.apache.org/jira/browse/HIVE-6015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6015: --- Labels: vectorization (was: ) vectorized logarithm produces results for 0 that are different from a non-vectorized one Key: HIVE-6015 URL: https://issues.apache.org/jira/browse/HIVE-6015 Project: Hive Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Labels: vectorization Attachments: HIVE-6015.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6016: - Status: Patch Available (was: Open) Making it as patch available for precommit tests. Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6016) Hadoop23Shims has a bug in listLocatedStatus impl.
[ https://issues.apache.org/jira/browse/HIVE-6016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845946#comment-13845946 ] Sushanth Sowmyan commented on HIVE-6016: Patch looks good to me. +1. Paging [~ashutoshc]/[~owen.omalley] for another review. :) Hadoop23Shims has a bug in listLocatedStatus impl. -- Key: HIVE-6016 URL: https://issues.apache.org/jira/browse/HIVE-6016 Project: Hive Issue Type: Bug Components: Shims Affects Versions: 0.13.0 Reporter: Sushanth Sowmyan Assignee: Prasanth J Attachments: HIVE-6016.1.patch Prashant and I discovered that the implementation of the wrapping Iterator in listLocatedStatus at https://github.com/apache/hive/blob/2d2f89c21618341987c1257a88691981f1f606c7/shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java#L350-L393 is broken. Basically, if you had files (a,b,_s) , with a filter that is supposed to filter out _s, we expect an output result of (a,b). Instead, we get (a,b,null), with hasNext looking at the next value to see if it's null, and using that to decide if it has any more entries, and thus, (a,b,_s) becomes (a,b). There's a boundary condition on the very first pick, which causes a (_s,a,b) to result in (_s,a,b), bypassing the filter, and thus, we wind up with a resultant unfiltered (_s,a,b) which orc breaks on. The effect of this bug is that Orc will not be able to read directories where there is a _SUCCESS file, say, as the first entry returned by the FileStatus. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin reassigned HIVE-6010: -- Assignee: Sergey Shelukhin create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: adding ANSI flag for hive
Having too many configs complicates things for the user, and also complicates the code, and you also end up having many untested combinations of config flags. I think we should identify a bunch of non compatible changes that we think are important, fix it in a branch and make a major version release (say 1.x). This is also related to HIVE-5875, where there is a discussion on switching the defaults for some of the configs to more desirable values, but non backward compatible values. On Wed, Dec 11, 2013 at 4:33 PM, Sergey Shelukhin ser...@hortonworks.com wrote: Hi. There's recently been some discussion about data type changes in Hive (double to decimal), and result changes for special cases like division by zero, etc., to bring it in compliance with MySQL (that's what JIRAs use an example; I am assuming ANSI SQL is meant). The latter are non-controversial (I guess), but for the former, performance may suffer and/or backward compat may be broken if Hive is brought in compliance. If fuller ANSI compat is sought in the future, there may be some even hairier issues such as double-quoted identifiers. In light of that, and also following MySQL, I wonder if we should add a flag, or set of flags, to HIVE to be able to force ANSI compliance. When this/ese flag/s is/are not set, for example, int/int division could return double for backward compat/perf, vectorization can skip the special case handling for division by zero/etc., etc. Wdyt? -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: doc on predicate pushdown in joins
Maybe we should remove the section on Hive Implementation here. It is in the Design doc; this information only concerns developers. But this is the Design doc (unless there's another one somewhere -- maybe attached to a JIRA ticket?) and it's in the Resources for Contributors part of the wiki, so it seems appropriate to me. I'll delete the implementation section if that's your preference. Here are the links again, with fixes: - Design Docshttps://cwiki.apache.org/confluence/display/Hive/DesignDocs (bottom of list) - Predicate Pushdown Ruleshttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-PredicatePushdownRules Speaking of JIRA tickets, is there one for this and should I add any version information? -- Lefty On Wed, Dec 11, 2013 at 7:59 AM, Harish Butani hbut...@hortonworks.comwrote: getQualifiedAliases is a private method in JoinPPD. Maybe we should remove the section on Hive Implementation here. It is in the Design doc; this information only concerns developers. regards, Harish. On Dec 11, 2013, at 3:05 AM, Lefty Leverenz leftylever...@gmail.com wrote: Happy to fix the sentence and the link. I pointed out the name change just so you would review it, so please don't apologize! One more question: why am I not finding getQualifiedAliases() in the SemanticAnalyzer class? It turns up in OpProcFactory.java with javadoc comments, but I can't find it anywhere in the API docs -- not even in the index (Hive 0.12.0 API http://hive.apache.org/docs/r0.12.0/api/): *getQMap()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html#getQMap() - Method in class org.apache.hadoop.hive.ql.QTestUtilhttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/ql/QTestUtil.html *getQualifiedName()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.html#getQualifiedName() - Method in class org.apache.hadoop.hive.serde2.typeinfo.TypeInfohttp://hive.apache.org/docs/r0.12.0/api/org/apache/hadoop/hive/serde2/typeinfo/TypeInfo.html String representing the qualified type name.*getQualifiers()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiers() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html *getQualifiersSize()*http://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html#getQualifiersSize() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiershttp://hive.apache.org/docs/r0.12.0/api/org/apache/hive/service/cli/thrift/TTypeQualifiers.html Most mysterious. -- Lefty On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.comwrote: I can see why you would rename. But this sentence is not correct: 'Hive enforces the predicate pushdown rules by these methods in the SemanticAnalyzer and JoinPPD classes:' It should be: Hive enforces the rules by these methods in the SemanticAnalyzer and JoinPPD classes: (The implementation involves both predicate pushdown and analyzing join conditions) Sorry about this. So the link should say 'Hive Outer Join Behavior' regards, Harish. On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com wrote: How's this? Hive Implementationhttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior#OuterJoinBehavior-HiveImplementation Also, I moved the link on the Design Docs pagehttps://cwiki.apache.org/confluence/display/Hive/DesignDocsfrom *Proposed* to *Other*. (It's called SQL Outer Join Predicate Pushdown Ruleshttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavior which doesn't match the title, but seems okay because it's more descriptive.) -- Lefty On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani hbut...@hortonworks.comwrote: You are correct, it is plural. regards, Harish. On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com wrote: Okay, then monospace with () after the method name is a good way to show them: parseJoinCondition() and getQualifiedAlias() ... but I only found the latter pluralized, instead of singular, so should it be getQualifiedAliases() or am I missing something? trunk *grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn'* ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221: * the comments for getQualifiedAliases function. ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230: SetString aliases = getQualifiedAliases((JoinOperator) nd, owi ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242: // be pushed down per getQualifiedAliases ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471: private SetString getQualifiedAliases(JoinOperator op, RowResolver rr) { -- Lefty On Mon, Dec 9, 2013 at
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845963#comment-13845963 ] Hive QA commented on HIVE-6013: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618322/HIVE-6013.1.patch {color:red}ERROR:{color} -1 due to 4 failed/errored test(s), 4768 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedId_alter org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedId_skew org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quotedId_smb org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalid_columns {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/616/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/616/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 4 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12618322 Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6014) Stage ids differ in the tez branch
[ https://issues.apache.org/jira/browse/HIVE-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845975#comment-13845975 ] Hive QA commented on HIVE-6014: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12618323/HIVE-6014.1.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/617/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/617/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-617/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java' Reverted 'itests/qtest/pom.xml' Reverted 'common/src/java/org/apache/hadoop/hive/conf/HiveConf.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/assembly/target shims/0.20S/target shims/0.23/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen service/target contrib/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target ql/src/test/results/clientpositive/quotedid_alter.q.out ql/src/test/results/clientpositive/quotedid_partition.q.out ql/src/test/results/clientpositive/quotedid_basic.q.out ql/src/test/results/clientpositive/quotedid_skew.q.out ql/src/test/results/clientpositive/quotedId_smb.q.out ql/src/test/queries/clientpositive/quotedId_alter.q ql/src/test/queries/clientpositive/quotedId_skew.q ql/src/test/queries/clientpositive/quotedid_basic.q ql/src/test/queries/clientpositive/quotedid_partition.q ql/src/test/queries/clientpositive/quotedId_smb.q + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1550329. At revision 1550329. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12618323 Stage ids differ in the tez branch -- Key: HIVE-6014 URL: https://issues.apache.org/jira/browse/HIVE-6014 Project: Hive Issue Type: Bug Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-6014.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2
[ https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6002: - Attachment: HIVE-6002.1.patch Bumped the ORC write version number to 0.12.1. [~owen.omalley] Can you please review this change? Create new ORC write version to address the changes to RLEv2 Key: HIVE-6002 URL: https://issues.apache.org/jira/browse/HIVE-6002 Project: Hive Issue Type: Bug Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-6002.1.patch HIVE-5994 encodes large negative big integers wrongly. This results in loss of original data that is being written using orc write version 0.12. Bump up the version number to differentiate the bad writes by 0.12 and the good writes by this new version (0.12.1?). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2
[ https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6002: - Attachment: (was: HIVE-6002.1.patch) Create new ORC write version to address the changes to RLEv2 Key: HIVE-6002 URL: https://issues.apache.org/jira/browse/HIVE-6002 Project: Hive Issue Type: Bug Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile HIVE-5994 encodes large negative big integers wrongly. This results in loss of original data that is being written using orc write version 0.12. Bump up the version number to differentiate the bad writes by 0.12 and the good writes by this new version (0.12.1?). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6002) Create new ORC write version to address the changes to RLEv2
[ https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-6002: - Attachment: HIVE-6002.1.patch Create new ORC write version to address the changes to RLEv2 Key: HIVE-6002 URL: https://issues.apache.org/jira/browse/HIVE-6002 Project: Hive Issue Type: Bug Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-6002.1.patch HIVE-5994 encodes large negative big integers wrongly. This results in loss of original data that is being written using orc write version 0.12. Bump up the version number to differentiate the bad writes by 0.12 and the good writes by this new version (0.12.1?). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5975) [WebHCat] templeton mapreduce job failed if provide define parameters
[ https://issues.apache.org/jira/browse/HIVE-5975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845986#comment-13845986 ] Eugene Koifman commented on HIVE-5975: -- +1 [WebHCat] templeton mapreduce job failed if provide define parameters --- Key: HIVE-5975 URL: https://issues.apache.org/jira/browse/HIVE-5975 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0, 0.13.0 Reporter: shanyu zhao Assignee: shanyu zhao Attachments: hive-5975.2.patch, hive-5975.patch Trying to submit a mapreduce job through templeton failed: curl -k -u user:pass -d user.name=user -d define=JobName=MRPiJob -d class=pi -d arg=16 -d arg=100 -d jar=hadoop-mapreduce-examples.jar https://xxx/templeton/v1/mapreduce/jar The error message is: Usage: org.apache.hadoop.examples.QuasiMonteCarlo nMaps nSamples Generic options supported are -conf configuration file specify an application configuration file -D property=value use value for given property -fs local|namenode:port specify a namenode -jt local|jobtracker:port specify a job tracker -files comma separated list of files specify comma separated files to be copied to the map reduce cluster -libjars comma separated list of jars specify comma separated jar files to include in the classpath. -archives comma separated list of archives specify comma separated archives to be unarchived on the compute machines. The general command line syntax is bin/hadoop command [genericOptions] [commandOptions] templeton: job failed with exit code 2 Note that if we remove the define parameter it works fine. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6017) Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive
Eric Hanson created HIVE-6017: - Summary: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive Key: HIVE-6017 URL: https://issues.apache.org/jira/browse/HIVE-6017 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Eric Hanson Contribute the Decimal128 high-performance decimal package developed by Microsoft to Hive. This was originally written for Microsoft PolyBase by Hideaki Kimura. This code is about 8X more efficient than Java BigDecimal for typical operations. It uses a finite (128 bit) precision and can handle up to decimal(38, X). It is also mutable so you can change the contents of an existing object. This helps reduce the cost of new() and garbage collection. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6002) Create new ORC write version to address the changes to RLEv2
[ https://issues.apache.org/jira/browse/HIVE-6002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13845998#comment-13845998 ] Prasanth J commented on HIVE-6002: -- Do we need to discard 0.12 version completely? 0.12 version is not valid anymore. But config option still allows users to specify 0.12 version. In which case, should can we forcefully bump version to 0.12.1? Create new ORC write version to address the changes to RLEv2 Key: HIVE-6002 URL: https://issues.apache.org/jira/browse/HIVE-6002 Project: Hive Issue Type: Bug Reporter: Prasanth J Assignee: Prasanth J Labels: orcfile Attachments: HIVE-6002.1.patch HIVE-5994 encodes large negative big integers wrongly. This results in loss of original data that is being written using orc write version 0.12. Bump up the version number to differentiate the bad writes by 0.12 and the good writes by this new version (0.12.1?). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5936: Attachment: HIVE-5936.10.patch.txt Fixed error message analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5936: Status: Open (was: Patch Available) analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5936) analyze command failing to collect stats with counter mechanism
[ https://issues.apache.org/jira/browse/HIVE-5936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-5936: Status: Patch Available (was: Open) analyze command failing to collect stats with counter mechanism --- Key: HIVE-5936 URL: https://issues.apache.org/jira/browse/HIVE-5936 Project: Hive Issue Type: Bug Components: Statistics Affects Versions: 0.13.0 Reporter: Ashutosh Chauhan Assignee: Navis Attachments: HIVE-5936.1.patch.txt, HIVE-5936.10.patch.txt, HIVE-5936.2.patch.txt, HIVE-5936.3.patch.txt, HIVE-5936.4.patch.txt, HIVE-5936.5.patch.txt, HIVE-5936.6.patch.txt, HIVE-5936.7.patch.txt, HIVE-5936.8.patch.txt, HIVE-5936.9.patch.txt With counter mechanism, MR job is successful, but StatsTask on client fails with NPE. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: doc on predicate pushdown in joins
I see. Let's leave it in. This is old code, hard to attribute to jiras: - The PPD code comes from: HIVE-279, HIVE-2337 - I cannot tell when the join condition parsing code was added. regards, Harish. On Dec 11, 2013, at 5:17 PM, Lefty Leverenz leftylever...@gmail.com wrote: Maybe we should remove the section on Hive Implementation here. It is in the Design doc; this information only concerns developers. But this is the Design doc (unless there's another one somewhere -- maybe attached to a JIRA ticket?) and it's in the Resources for Contributors part of the wiki, so it seems appropriate to me. I'll delete the implementation section if that's your preference. Here are the links again, with fixes: Design Docs (bottom of list) Predicate Pushdown Rules Speaking of JIRA tickets, is there one for this and should I add any version information? -- Lefty On Wed, Dec 11, 2013 at 7:59 AM, Harish Butani hbut...@hortonworks.com wrote: getQualifiedAliases is a private method in JoinPPD. Maybe we should remove the section on Hive Implementation here. It is in the Design doc; this information only concerns developers. regards, Harish. On Dec 11, 2013, at 3:05 AM, Lefty Leverenz leftylever...@gmail.com wrote: Happy to fix the sentence and the link. I pointed out the name change just so you would review it, so please don't apologize! One more question: why am I not finding getQualifiedAliases() in the SemanticAnalyzer class? It turns up in OpProcFactory.java with javadoc comments, but I can't find it anywhere in the API docs -- not even in the index (Hive 0.12.0 API): getQMap() - Method in class org.apache.hadoop.hive.ql.QTestUtil getQualifiedName() - Method in class org.apache.hadoop.hive.serde2.typeinfo.TypeInfo String representing the qualified type name. getQualifiers() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiers getQualifiersSize() - Method in class org.apache.hive.service.cli.thrift.TTypeQualifiers Most mysterious. -- Lefty On Tue, Dec 10, 2013 at 2:35 PM, Harish Butani hbut...@hortonworks.com wrote: I can see why you would rename. But this sentence is not correct: 'Hive enforces the predicate pushdown rules by these methods in the SemanticAnalyzer and JoinPPD classes:' It should be: Hive enforces the rules by these methods in the SemanticAnalyzer and JoinPPD classes: (The implementation involves both predicate pushdown and analyzing join conditions) Sorry about this. So the link should say 'Hive Outer Join Behavior' regards, Harish. On Dec 10, 2013, at 2:01 PM, Lefty Leverenz leftylever...@gmail.com wrote: How's this? Hive Implementation Also, I moved the link on the Design Docs page from Proposed to Other. (It's called SQL Outer Join Predicate Pushdown Rules which doesn't match the title, but seems okay because it's more descriptive.) -- Lefty On Tue, Dec 10, 2013 at 7:27 AM, Harish Butani hbut...@hortonworks.com wrote: You are correct, it is plural. regards, Harish. On Dec 10, 2013, at 4:03 AM, Lefty Leverenz leftylever...@gmail.com wrote: Okay, then monospace with () after the method name is a good way to show them: parseJoinCondition() and getQualifiedAlias() ... but I only found the latter pluralized, instead of singular, so should it be getQualifiedAliases() or am I missing something? trunk grep -nr 'getQualifiedAlias' ./ql/src/java/* | grep -v 'svn' ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:221: * the comments for getQualifiedAliases function. ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:230: SetString aliases = getQualifiedAliases((JoinOperator) nd, owi ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:242: // be pushed down per getQualifiedAliases ./ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java:471: private SetString getQualifiedAliases(JoinOperator op, RowResolver rr) { -- Lefty On Mon, Dec 9, 2013 at 2:12 PM, Harish Butani hbut...@hortonworks.com wrote: Looks good. Thanks for doing this. Minor point: Rule 1: During QBJoinTree construction in Plan Gen, the parse Join Condition logic applies this rule. Rule 2: During JoinPPD (Join Predicate Pushdown) the get Qualified Alias logic applies this rule. FYI 'parseJoinCondition' and 'getQualifiedAlias' are methods in the SemanticAnalyzer and JoinPPD classes respectively. Writing these as separate words maybe confusing. You are better judge of how to represent this(quoted/bold etc.) regards, Harish. On Dec 9, 2013, at 1:52 AM, Lefty Leverenz leftylever...@gmail.com wrote: The Outer Join Behavior wikidochttps://cwiki.apache.org/confluence/display/Hive/OuterJoinBehavioris done, with links from the Design Docs https://cwiki.apache.org/confluence/display/Hive/DesignDocs page and the Joins
[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846029#comment-13846029 ] Sergey Shelukhin commented on HIVE-6010: I looked at CliDriver generation/code/flow... the plan is as such (this can also be used for other stuff later if needed). There will be new CliDriver template, called TestCompareCliDriver, with separate set of .q files. Unlike normal CliDriver, it will not use .out files; instead, there will be multiple .qv (query version) initialization files; I haven't decided yet whether these should be a set per query (q file), or a set applied to all queries. The latter is simpler and solves the problem for vectorization, but the former may make sense for other things, esp. if we need to compare more things, Nqv x Nq combinations to run will quickly become ugly. Perhaps per-query qv files can be added when needed. The test, for each of its q files, will concatenate all the requisite qv files in turn with the q file, run each of the resulting queries w/different output files, and diff the outputs with each other. It will fail if they don't match. So, for vectorization we can have some simple queries (arithmetics, functions, etc.), with qv files being one-liners to enable and disable vectorization. [~ehans] [~jnp] opinions? create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6010) create a test that would ensure vectorization produces same results as non-vectorized execution
[ https://issues.apache.org/jira/browse/HIVE-6010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846030#comment-13846030 ] Sergey Shelukhin commented on HIVE-6010: Actually, this can also be used instead of VerifyingObjectStore to verify MetaStoreDirectSql matches JDO, come thing of it. Will reduce the coverage but also remove the crutch from that part of code. create a test that would ensure vectorization produces same results as non-vectorized execution --- Key: HIVE-6010 URL: https://issues.apache.org/jira/browse/HIVE-6010 Project: Hive Issue Type: Test Components: Tests, Vectorization Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin So as to ensure that vectorization is not forgotten when changes are made to things. Obviously it would not be viable to have a bulletproof test, but at least a subset of operations can be verified. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5996) Query for sum of a long column of a table with only two rows produces wrong result
[ https://issues.apache.org/jira/browse/HIVE-5996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846033#comment-13846033 ] Thejas M Nair commented on HIVE-5996: - I am curious what the datatype for sum(l) is in mysql, where l is a bigint. Is it also using decimal ? Query for sum of a long column of a table with only two rows produces wrong result -- Key: HIVE-5996 URL: https://issues.apache.org/jira/browse/HIVE-5996 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5996.patch {code} hive desc test2; OK l bigint None hive select * from test2; OK 666 555 hive select sum(l) from test2; OK -6224521851487329395 {code} It's believed that a wrap-around error occurred. It's surprising that it happens only with two rows. Same query in MySQL returns: {code} mysql select sum(l) from test; +--+ | sum(l) | +--+ | 1221 | +--+ 1 row in set (0.00 sec) {code} Hive should accommodate large number of rows. Overflowing with only two rows is very unusable. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6018) FetchTask should not reference metastore classes
Navis created HIVE-6018: --- Summary: FetchTask should not reference metastore classes Key: HIVE-6018 URL: https://issues.apache.org/jira/browse/HIVE-6018 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial The below code parts in PartitionDesc throws NoClassDefFounError sometimes in execution. {noformat} public Deserializer getDeserializer() { try { return MetaStoreUtils.getDeserializer(Hive.get().getConf(), getProperties()); } catch (Exception e) { return null; } } {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6018) FetchTask should not reference metastore classes
[ https://issues.apache.org/jira/browse/HIVE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6018: Status: Patch Available (was: Open) FetchTask should not reference metastore classes Key: HIVE-6018 URL: https://issues.apache.org/jira/browse/HIVE-6018 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-6018.1.patch.txt The below code parts in PartitionDesc throws NoClassDefFounError sometimes in execution. {noformat} public Deserializer getDeserializer() { try { return MetaStoreUtils.getDeserializer(Hive.get().getConf(), getProperties()); } catch (Exception e) { return null; } } {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6018) FetchTask should not reference metastore classes
[ https://issues.apache.org/jira/browse/HIVE-6018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6018: Attachment: HIVE-6018.1.patch.txt FetchTask should not reference metastore classes Key: HIVE-6018 URL: https://issues.apache.org/jira/browse/HIVE-6018 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Navis Assignee: Navis Priority: Trivial Attachments: HIVE-6018.1.patch.txt The below code parts in PartitionDesc throws NoClassDefFounError sometimes in execution. {noformat} public Deserializer getDeserializer() { try { return MetaStoreUtils.getDeserializer(Hive.get().getConf(), getProperties()); } catch (Exception e) { return null; } } {noformat} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-2093) create/drop database should populate inputs/outputs and check concurrency and user permission
[ https://issues.apache.org/jira/browse/HIVE-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13846046#comment-13846046 ] Phabricator commented on HIVE-2093: --- thejas has commented on the revision HIVE-2093 [jira] create/drop database should populate inputs/outputs and check concurrency and user permission. +1 REVISION DETAIL https://reviews.facebook.net/D12807 To: JIRA, navis Cc: thejas create/drop database should populate inputs/outputs and check concurrency and user permission - Key: HIVE-2093 URL: https://issues.apache.org/jira/browse/HIVE-2093 Project: Hive Issue Type: Bug Components: Authorization, Locking, Metastore, Security Reporter: Namit Jain Assignee: Navis Attachments: D12807.3.patch, D12807.4.patch, HIVE-2093.6.patch, HIVE-2093.7.patch.txt, HIVE-2093.8.patch.txt, HIVE-2093.9.patch.txt, HIVE-2093.D12807.1.patch, HIVE-2093.D12807.2.patch, HIVE.2093.1.patch, HIVE.2093.2.patch, HIVE.2093.3.patch, HIVE.2093.4.patch, HIVE.2093.5.patch concurrency and authorization are needed for create/drop table. Also to make concurrency work, it's better to have LOCK/UNLOCK DATABASE and SHOW LOCKS DATABASE -- This message was sent by Atlassian JIRA (v6.1.4#6159)