[jira] [Commented] (HIVE-6048) Hive load data command rejects file with '+' in the name
[ https://issues.apache.org/jira/browse/HIVE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851492#comment-13851492 ] Hive QA commented on HIVE-6048: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619258/HIVE-6048.patch {color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 4791 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketizedhiveinputformat_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_global_limit org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input40 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_check org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats3 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/683/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/683/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 44 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619258 Hive load data command rejects file with '+' in the name Key: HIVE-6048 URL: https://issues.apache.org/jira/browse/HIVE-6048 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-6048.patch '+' is a valid character in a file name on linux and HDFS. However, loading data from such a file into table results the following error: {code} hive load data local inpath './t+est' into table test; FAILED: SemanticException Line 1:23 Invalid path
[jira] [Commented] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851493#comment-13851493 ] Hive QA commented on HIVE-3746: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619267/HIVE-3746.5.patch.txt Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/685/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/685/console Messages: {noformat} This message was trimmed, see log for full details [INFO] Including org.json:json:jar:20090211 in the shaded jar. [INFO] Excluding stax:stax-api:jar:1.0.1 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-core:jar:1.2.1 from the shaded jar. [INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-core:jar:1.14 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-json:jar:1.14 from the shaded jar. [INFO] Excluding org.codehaus.jettison:jettison:jar:1.1 from the shaded jar. [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.3-1 from the shaded jar. [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.2 from the shaded jar. [INFO] Excluding javax.xml.stream:stax-api:jar:1.0-2 from the shaded jar. [INFO] Excluding javax.activation:activation:jar:1.1 from the shaded jar. [INFO] Excluding org.codehaus.jackson:jackson-jaxrs:jar:1.9.2 from the shaded jar. [INFO] Excluding org.codehaus.jackson:jackson-xc:jar:1.9.2 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-server:jar:1.14 from the shaded jar. [INFO] Excluding asm:asm:jar:3.1 from the shaded jar. [INFO] Excluding org.apache.commons:commons-math:jar:2.1 from the shaded jar. [INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the shaded jar. [INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the shaded jar. [INFO] Excluding commons-net:commons-net:jar:1.4.1 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar. [INFO] Excluding tomcat:jasper-runtime:jar:5.5.12 from the shaded jar. [INFO] Excluding tomcat:jasper-compiler:jar:5.5.12 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jsp-api-2.1:jar:6.1.14 from the shaded jar. [INFO] Excluding org.mortbay.jetty:servlet-api-2.5:jar:6.1.14 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jsp-2.1:jar:6.1.14 from the shaded jar. [INFO] Excluding ant:ant:jar:1.6.5 from the shaded jar. [INFO] Excluding commons-el:commons-el:jar:1.0 from the shaded jar. [INFO] Excluding net.java.dev.jets3t:jets3t:jar:0.6.1 from the shaded jar. [INFO] Excluding hsqldb:hsqldb:jar:1.8.0.10 from the shaded jar. [INFO] Excluding oro:oro:jar:2.0.8 from the shaded jar. [INFO] Excluding org.eclipse.jdt:core:jar:3.1.1 from the shaded jar. [INFO] Excluding org.slf4j:slf4j-api:jar:1.7.5 from the shaded jar. [INFO] Excluding org.slf4j:slf4j-log4j12:jar:1.7.5 from the shaded jar. [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar with /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-exec --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.pom [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-tests.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT-tests.jar [INFO] [INFO] [INFO] Building Hive Service 0.13.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-service --- [INFO] Deleting
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6041: Affects Version/s: 0.12.0 Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Assignee: Navis Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Assigned] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis reassigned HIVE-6041: --- Assignee: Navis Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.11.0, 0.12.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Assignee: Navis Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6041: Affects Version/s: 0.6.0 0.7.0 0.8.0 0.9.0 0.10.0 Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Assignee: Navis Priority: Critical The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6041: Status: Patch Available (was: Open) Running test Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.7.0, 0.6.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Assignee: Navis Priority: Critical Attachments: HIVE-6041.1.patch.txt The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-6041: Attachment: HIVE-6041.1.patch.txt Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Assignee: Navis Priority: Critical Attachments: HIVE-6041.1.patch.txt The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Status: Patch Available (was: Open) Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2, jdbc, thrift Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt, HIVE-3746.4.patch.txt, HIVE-3746.5.patch.txt, HIVE-3746.6.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Attachment: HIVE-3746.6.patch.txt Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2, jdbc, thrift Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt, HIVE-3746.4.patch.txt, HIVE-3746.5.patch.txt, HIVE-3746.6.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3746: Status: Open (was: Patch Available) Fix HS2 ResultSet Serialization Performance Regression -- Key: HIVE-3746 URL: https://issues.apache.org/jira/browse/HIVE-3746 Project: Hive Issue Type: Sub-task Components: HiveServer2, Server Infrastructure Reporter: Carl Steinbach Assignee: Navis Labels: HiveServer2, jdbc, thrift Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, HIVE-3746.3.patch.txt, HIVE-3746.4.patch.txt, HIVE-3746.5.patch.txt, HIVE-3746.6.patch.txt -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3286: Attachment: HIVE-3286.17.patch.txt Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any expression with a.key or b.key. But if join condition is a.key+1=b.key, user cannot make expression with a.key solely (should make expression with a.key+1). 2. all expressions should reference one and only-one side of aliases. For example, simple constant expressions or expressions referencing both side of join condition (a.key+b.key100) is not allowed. 3. all functions in expression should be deteministic and stateless. 4. if DISTRIBUTED BY expression is used, distibution expression also should have same alias with skew expression. **driver alias : 1. driver alias means the sole referenced alias from skew expression, which is important for RANDOM distribution. rows of driver
[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3286: Status: Patch Available (was: Open) Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any expression with a.key or b.key. But if join condition is a.key+1=b.key, user cannot make expression with a.key solely (should make expression with a.key+1). 2. all expressions should reference one and only-one side of aliases. For example, simple constant expressions or expressions referencing both side of join condition (a.key+b.key100) is not allowed. 3. all functions in expression should be deteministic and stateless. 4. if DISTRIBUTED BY expression is used, distibution expression also should have same alias with skew expression. **driver alias : 1. driver alias means the sole referenced alias from skew expression, which is important for RANDOM distribution. rows of driver
[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-3286: Status: Open (was: Patch Available) Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any expression with a.key or b.key. But if join condition is a.key+1=b.key, user cannot make expression with a.key solely (should make expression with a.key+1). 2. all expressions should reference one and only-one side of aliases. For example, simple constant expressions or expressions referencing both side of join condition (a.key+b.key100) is not allowed. 3. all functions in expression should be deteministic and stateless. 4. if DISTRIBUTED BY expression is used, distibution expression also should have same alias with skew expression. **driver alias : 1. driver alias means the sole referenced alias from skew expression, which is important for RANDOM distribution. rows of driver
[jira] [Commented] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression
[ https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851500#comment-13851500 ] Hive QA commented on HIVE-3746: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619276/HIVE-3746.6.patch.txt Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/686/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/686/console Messages: {noformat} This message was trimmed, see log for full details [INFO] Including org.json:json:jar:20090211 in the shaded jar. [INFO] Excluding stax:stax-api:jar:1.0.1 from the shaded jar. [INFO] Excluding org.apache.hadoop:hadoop-core:jar:1.2.1 from the shaded jar. [INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-core:jar:1.14 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-json:jar:1.14 from the shaded jar. [INFO] Excluding org.codehaus.jettison:jettison:jar:1.1 from the shaded jar. [INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.3-1 from the shaded jar. [INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.2 from the shaded jar. [INFO] Excluding javax.xml.stream:stax-api:jar:1.0-2 from the shaded jar. [INFO] Excluding javax.activation:activation:jar:1.1 from the shaded jar. [INFO] Excluding org.codehaus.jackson:jackson-jaxrs:jar:1.9.2 from the shaded jar. [INFO] Excluding org.codehaus.jackson:jackson-xc:jar:1.9.2 from the shaded jar. [INFO] Excluding com.sun.jersey:jersey-server:jar:1.14 from the shaded jar. [INFO] Excluding asm:asm:jar:3.1 from the shaded jar. [INFO] Excluding org.apache.commons:commons-math:jar:2.1 from the shaded jar. [INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the shaded jar. [INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded jar. [INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the shaded jar. [INFO] Excluding commons-net:commons-net:jar:1.4.1 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar. [INFO] Excluding tomcat:jasper-runtime:jar:5.5.12 from the shaded jar. [INFO] Excluding tomcat:jasper-compiler:jar:5.5.12 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jsp-api-2.1:jar:6.1.14 from the shaded jar. [INFO] Excluding org.mortbay.jetty:servlet-api-2.5:jar:6.1.14 from the shaded jar. [INFO] Excluding org.mortbay.jetty:jsp-2.1:jar:6.1.14 from the shaded jar. [INFO] Excluding ant:ant:jar:1.6.5 from the shaded jar. [INFO] Excluding commons-el:commons-el:jar:1.0 from the shaded jar. [INFO] Excluding net.java.dev.jets3t:jets3t:jar:0.6.1 from the shaded jar. [INFO] Excluding hsqldb:hsqldb:jar:1.8.0.10 from the shaded jar. [INFO] Excluding oro:oro:jar:2.0.8 from the shaded jar. [INFO] Excluding org.eclipse.jdt:core:jar:3.1.1 from the shaded jar. [INFO] Excluding org.slf4j:slf4j-api:jar:1.7.5 from the shaded jar. [INFO] Excluding org.slf4j:slf4j-log4j12:jar:1.7.5 from the shaded jar. [INFO] Replacing original artifact with shaded artifact. [INFO] Replacing /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar with /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-shaded.jar [INFO] Dependency-reduced POM written at: /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml [INFO] Dependency-reduced POM written at: /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml [INFO] [INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-exec --- [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.jar [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml to /data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.pom [INFO] Installing /data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-tests.jar to /data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT-tests.jar [INFO] [INFO] [INFO] Building Hive Service 0.13.0-SNAPSHOT [INFO] [INFO] [INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-service --- [INFO] Deleting
[jira] [Commented] (HIVE-6041) Incorrect task dependency graph for skewed join optimization
[ https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851528#comment-13851528 ] Hive QA commented on HIVE-6041: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619274/HIVE-6041.1.patch.txt {color:green}SUCCESS:{color} +1 4792 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/687/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/687/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12619274 Incorrect task dependency graph for skewed join optimization Key: HIVE-6041 URL: https://issues.apache.org/jira/browse/HIVE-6041 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0 Environment: Hadoop 1.0.3 Reporter: Adrian Popescu Assignee: Navis Priority: Critical Attachments: HIVE-6041.1.patch.txt The dependency graph among task stages is incorrect for the skewed join optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. For the case that skewed keys do not exist, all the tasks following the common join are filtered out at runtime. In particular, the conditional task in the optimized plan maintains no dependency with the child tasks of the common join task in the original plan. The conditional task is composed of the map join task which maintains all these dependencies, but for the case the map join task is filtered out (i.e., no skewed keys exist), all these dependencies are lost. Hence, all the other task stages of the query (e.g., move stage which writes down the results into the result table) are skipped. The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, processSkewJoin() function, immediately after the ConditionalTask is created and its dependencies are set. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition
[ https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851560#comment-13851560 ] Hive QA commented on HIVE-3286: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619277/HIVE-3286.17.patch.txt {color:green}SUCCESS:{color} +1 4796 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/688/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/688/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12619277 Explicit skew join on user provided condition - Key: HIVE-3286 URL: https://issues.apache.org/jira/browse/HIVE-3286 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch Join operation on table with skewed data takes most of execution time handling the skewed keys. But mostly we already know about that and even know what is look like the skewed keys. If we can explicitly assign reducer slots for the skewed keys, total execution time could be greatly shortened. As for a start, I've extended join grammar something like this. {code} select * from src a join src b on a.key=b.key skew on (a.key+1 50, a.key+1 100, a.key 150); {code} which means if above query is executed by 20 reducers, one reducer for a.key+1 50, one reducer for 50 = a.key+1 100, one reducer for 99 = a.key 150, and 17 reducers for others (could be extended to assign more than one reducer later) This can be only used with common-inner-equi joins. And skew condition should be composed of join keys only. Work till done now will be updated shortly after code cleanup. Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially at runtime, and first 'true' one decides skew group for the row. Each skew group has reserved partition slot(s), to which all rows in a group would be assigned. The number of partition slot reserved for each group is decided also at runtime by simple calculation of percentage. If a skew group is CLUSTER BY 20 PERCENT and total partition slot (=number of reducer) is 20, that group will reserve 4 partition slots, etc. DISTRIBUTE BY decides how the rows in a group is dispersed in the range of reserved slots (If there is only one slot for a group, this is meaningless). Currently, three distribution policies are available: RANDOM, KEYS, expression. 1. RANDOM : rows of driver** alias are dispersed by random and rows of non-driver alias are duplicated for all the slots (default if not specified) 2. KEYS : determined by hash value of keys (same with previous) 3. expression : determined by hash of object evaluated by user-provided expression Only possible with inner, equi, common-joins. Not yet supports join tree merging. Might be used by other RS users like SORT BY or GROUP BY If there exists column statistics for the key, it could be possible to apply automatically. For example, if 20 reducers are used for the query below, {code} select count(*) from src a join src b on a.key=b.key skew on ( a.key = '0' CLUSTER BY 10 PERCENT, b.key '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key), cast(a.key as int) 300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS); {code} group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will reserve slots 0~5. For a row with key='0' from alias a, the row is randomly assigned in the range of 6~7 (driver alias) : 6 or 7 For a row with key='0' from alias b, the row is disributed for all slots in 6~7 (non-driver alias) : 6 and 7 For a row with key='50', the row is assigned in the range of 8~11 by hashcode of upper(b.key) : 8 + (hash(upper(key)) % 4) For a row with key='500', the row is assigned in the range of 12~19 by hashcode of join key : 12 + (hash(key) % 8) For a row with key='200', this is not belong to any skew group : hash(key) % 6 *expressions in skew condition : 1. all expressions should be made of expression in join condition, which means if join condition is a.key=b.key, user can make any
Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/#review30610 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java https://reviews.apache.org/r/15654/#comment58620 Having gone through some pain with Hive on Windows, the bytes returned by String.getBytes() will not be in utf-8 if the default encoding is something other than utf-8. Would be safer here to either use getBytes(UTF-8), or Text.encode() if you want to get bytes from the string. Or just do the padding as Strings. - Jason Dere On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/ --- (Updated Dec. 18, 2013, 3:16 a.m.) Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra Pandey. Bugs: HIVE-5829 https://issues.apache.org/jira/browse/HIVE-5829 Repository: hive-git Description --- Rewrite the UDFS *pads and *trim using GenericUDF. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java bca1f26 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java eff251f ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java PRE-CREATION Diff: https://reviews.apache.org/r/15654/diff/ Testing --- Thanks, Mohammad Islam
[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive
[ https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851661#comment-13851661 ] Justin Coffey commented on HIVE-5783: - Yes this is true. We are refactoring to merge the whole parquet-hive project into hive. There are a couple of folks involved at this point and so it's taking a smidgen extra time what with holidays and all. Native Parquet Support in Hive -- Key: HIVE-5783 URL: https://issues.apache.org/jira/browse/HIVE-5783 Project: Hive Issue Type: New Feature Components: Serializers/Deserializers Reporter: Justin Coffey Assignee: Justin Coffey Priority: Minor Attachments: HIVE-5783.patch, hive-0.11-parquet.patch Problem Statement: Hive would be easier to use if it had native Parquet support. Our organization, Criteo, uses Hive extensively. Therefore we built the Parquet Hive integration and would like to now contribute that integration to Hive. About Parquet: Parquet is a columnar storage format for Hadoop and integrates with many Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native Parquet integration. Changes Details: Parquet was built with dependency management in mind and therefore only a single Parquet jar will be added as a dependency. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-h0.21 - Build # 2511 - Still Failing
Changes for Build #2473 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #2487 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) Changes for Build #2488 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #2489 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #2490 Changes for Build #2491 Changes for Build #2492 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #2493 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani
Re: Review Request 16172: ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16172/#review30612 --- ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java https://reviews.apache.org/r/16172/#comment58622 Seems it is not an error? If so, let's not put it in the ErrorMsg. ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java https://reviews.apache.org/r/16172/#comment58623 Is this one necessary? - Yin Huai On Dec. 18, 2013, 5:04 a.m., Navis Ryu wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16172/ --- (Updated Dec. 18, 2013, 5:04 a.m.) Review request for hive. Bugs: HIVE-5945 https://issues.apache.org/jira/browse/HIVE-5945 Repository: hive-git Description --- Here is an example {code} select i_item_id, s_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 FROM store_sales JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) JOIN item on (store_sales.ss_item_sk = item.i_item_sk) JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) JOIN store on (store_sales.ss_store_sk = store.s_store_sk) where cd_gender = 'F' and cd_marital_status = 'U' and cd_education_status = 'Primary' and d_year = 2002 and s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') group by i_item_id, s_state order by i_item_id, s_state limit 100; {\code} I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. Diffs - ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 45acc2b ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9afc80b ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java 2efa7c2 ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java faf2f9b ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java 67203c9 ql/src/test/results/clientpositive/auto_join25.q.out 7427239 ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out 7d06739 ql/src/test/results/clientpositive/mapjoin_hook.q.out d60d16e Diff: https://reviews.apache.org/r/16172/diff/ Testing --- Thanks, Navis Ryu
[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
[ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851756#comment-13851756 ] Yin Huai commented on HIVE-5945: Two minor comments in the review board. Two additional comments. When we find {code} bigTableFileAlias != null {\code} can we also log sumOfOthers and the threshold of the size of small tables? So, the log entry will show the size of the big table, the total size of other small tables, and the threshold of the size of small tables. Also, can you add a unit test? Thanks :) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. - Key: HIVE-5945 URL: https://issues.apache.org/jira/browse/HIVE-5945 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Yin Huai Assignee: Navis Priority: Critical Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, HIVE-5945.3.patch.txt Here is an example {code} select i_item_id, s_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 FROM store_sales JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) JOIN item on (store_sales.ss_item_sk = item.i_item_sk) JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) JOIN store on (store_sales.ss_store_sk = store.s_store_sk) where cd_gender = 'F' and cd_marital_status = 'U' and cd_education_status = 'Primary' and d_year = 2002 and s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') group by i_item_id, s_state order by i_item_id, s_state limit 100; {\code} I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
[ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-5945: --- Status: Open (was: Patch Available) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. - Key: HIVE-5945 URL: https://issues.apache.org/jira/browse/HIVE-5945 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.13.0 Reporter: Yin Huai Assignee: Navis Priority: Critical Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, HIVE-5945.3.patch.txt Here is an example {code} select i_item_id, s_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 FROM store_sales JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) JOIN item on (store_sales.ss_item_sk = item.i_item_sk) JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) JOIN store on (store_sales.ss_store_sk = store.s_store_sk) where cd_gender = 'F' and cd_marital_status = 'U' and cd_education_status = 'Primary' and d_year = 2002 and s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') group by i_item_id, s_state order by i_item_id, s_state limit 100; {\code} I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6050) JDBC backward compatibility is broken
[ https://issues.apache.org/jira/browse/HIVE-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6050: --- Priority: Blocker (was: Major) I think this should be a blocker. Changed to blocker status. JDBC backward compatibility is broken - Key: HIVE-6050 URL: https://issues.apache.org/jira/browse/HIVE-6050 Project: Hive Issue Type: Bug Components: JDBC Reporter: Szehon Ho Priority: Blocker Connect from JDBC driver of Hive 0.13 (TProtocolVersion=v4) to HiveServer2 of Hive 0.10 (TProtocolVersion=v1), will return the following exception: {noformat} java.sql.SQLException: Could not establish connection to jdbc:hive2://localhost:1/default: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:336) at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:158) at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105) at java.sql.DriverManager.getConnection(DriverManager.java:571) at java.sql.DriverManager.getConnection(DriverManager.java:187) at org.apache.hive.jdbc.MyTestJdbcDriver2.getConnection(MyTestJdbcDriver2.java:73) at org.apache.hive.jdbc.MyTestJdbcDriver2.lt;initgt;(MyTestJdbcDriver2.java:49) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:187) at org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:236) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) at org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:233) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222) at org.junit.runners.ParentRunner.run(ParentRunner.java:300) at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:523) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1063) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:914) Caused by: org.apache.thrift.TApplicationException: Required field 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null) at org.apache.thrift.TApplicationException.read(TApplicationException.java:108) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71) at org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:160) at org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:147) at org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:327) ... 37 more {noformat} On code analysis, it looks like the 'client_protocol' scheme is a ThriftEnum, which doesn't seem to be backward-compatible. Look at the code path in the generated file 'TOpenSessionReq.java', method TOpenSessionReqStandardScheme.read(): 1. The method will call 'TProtocolVersion.findValue()' on the thrift protocol's byte stream, which returns null if the client is sending an enum value unknown to the server. (v4 is unknown to server) 2. The method will then call struct.validate(), which will throw the above exception because of null version. So doesn't look like the current backward-compatibility scheme will work. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-hadoop2 - Build # 610 - Still Failing
Changes for Build #572 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #585 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) Changes for Build #586 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #587 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #588 Changes for Build #589 Changes for Build #590 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #591 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani via Ashutosh Chauhan) [hashutosh] HIVE-5598 :
[jira] [Commented] (HIVE-5958) SQL std auth - disable statements that work with paths
[ https://issues.apache.org/jira/browse/HIVE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851777#comment-13851777 ] Brock Noland commented on HIVE-5958: Hi Thejas, I feel disabling all commands which have a path will result in an unusable system since so many users user LOAD DATA ... and external tables. You agree that URI is essential: bq. URI authorization is very essential [source|https://issues.apache.org/jira/browse/HIVE-5837?focusedCommentId=13850885page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13850885] Therefore I assume in the first pass means in the very first iteration and that URI support will be supported in the first release? SQL std auth - disable statements that work with paths -- Key: HIVE-5958 URL: https://issues.apache.org/jira/browse/HIVE-5958 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Original Estimate: 24h Remaining Estimate: 24h In the first pass, statement such as create table, alter table that specify an path uri will get an authorization error under SQL std auth . -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
[ https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-5065: - Attachment: HIVE-5065-part-5.1.patch Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask Key: HIVE-5065 URL: https://issues.apache.org/jira/browse/HIVE-5065 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Blocker Fix For: tez-branch Attachments: HIVE-5065-part-1.1.patch, HIVE-5065-part-3.1.patch, HIVE-5065-part-4.1.patch, HIVE-5065-part-4.2.patch, HIVE-5065-part-5.1.patch, HIVE-5065-part2.1.patch -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16299: HIVE-6013: Supporting Quoted Identifiers in Column Names
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16299/ --- (Updated Dec. 18, 2013, 5:42 p.m.) Review request for hive, Ashutosh Chauhan and Alan Gates. Bugs: HIVE-6013 https://issues.apache.org/jira/browse/HIVE-6013 Repository: hive-git Description --- Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: Introduce 'standard' quoted identifiers for columns only. At the langauage level this is turned on by a flag. At the metadata level we relax the constraint on column names. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 itests/qtest/pom.xml 971c5d3 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 5b75ef3 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java eb26e7f ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java dbf3f91 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ed9917d ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java 1e6826f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b9cd65c ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 8fe2262 ql/src/test/queries/clientnegative/invalid_columns.q f8be8c8 ql/src/test/queries/clientpositive/quotedid_alter.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_basic.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_partition.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_skew.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_smb.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_tblproperty.q PRE-CREATION ql/src/test/results/clientnegative/invalid_columns.q.out 3311b0a ql/src/test/results/clientpositive/quotedid_alter.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_basic.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_smb.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_tblproperty.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16299/diff/ Testing --- added new tests for create, alter, delete, query with columns containing special characters. Tests start with quotedid Thanks, Harish Butani
Re: Review Request 16299: HIVE-6013: Supporting Quoted Identifiers in Column Names
On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote: common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 872 https://reviews.apache.org/r/16299/diff/2/?file=398469#file398469line872 class PatternValidator was recently introduced in HiveConf, which doesn't let user to specify invalid value for a config key. Using that here will be useful. done On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote: metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java, line 484 https://reviews.apache.org/r/16299/diff/2/?file=398471#file398471line484 Shall we remove this if() altogether and thus also above newly introduced method? i kept the validateColumnName method around in case we decide to change the validation logic in the future. But if you feel strongly about it, i will remove it. On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java, line 283 https://reviews.apache.org/r/16299/diff/2/?file=398472#file398472line283 conf should be null here. If it is null, then its a bug. Also, returning null in those cases seems incorrect. Lets remove this null conf check. done On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g, line 34 https://reviews.apache.org/r/16299/diff/2/?file=398475#file398475line34 There can never be the case that hiveconf == null. That will be a bug. Lets remove this null check. done On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g, line 400 https://reviews.apache.org/r/16299/diff/2/?file=398475#file398475line400 It will be good to document where all Identifier is used. Can be lifted straight from your html document. done On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote: ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g, line 403 https://reviews.apache.org/r/16299/diff/2/?file=398475#file398475line403 Good to add a note here saying QuotedIdentifier only optionally available for columns as of now. done - Harish --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16299/#review30570 --- On Dec. 18, 2013, 5:42 p.m., Harish Butani wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16299/ --- (Updated Dec. 18, 2013, 5:42 p.m.) Review request for hive, Ashutosh Chauhan and Alan Gates. Bugs: HIVE-6013 https://issues.apache.org/jira/browse/HIVE-6013 Repository: hive-git Description --- Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: Introduce 'standard' quoted identifiers for columns only. At the langauage level this is turned on by a flag. At the metadata level we relax the constraint on column names. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 itests/qtest/pom.xml 971c5d3 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 5b75ef3 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java eb26e7f ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java dbf3f91 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ed9917d ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java 1e6826f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b9cd65c ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 8fe2262 ql/src/test/queries/clientnegative/invalid_columns.q f8be8c8 ql/src/test/queries/clientpositive/quotedid_alter.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_basic.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_partition.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_skew.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_smb.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_tblproperty.q PRE-CREATION ql/src/test/results/clientnegative/invalid_columns.q.out 3311b0a ql/src/test/results/clientpositive/quotedid_alter.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_basic.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_skew.q.out PRE-CREATION
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Attachment: HIVE-6013.4.patch Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851966#comment-13851966 ] Harish Butani commented on HIVE-6013: - 1. there is a .q that covers this: regex_col.q 2. Oh, yes. I forgot about the jdbc metadata apis. Just looked at the jdbc project. Currently a lot of the methods in the DBMetadata class just throw SQLException(Method not supported). Who should I talk to about this? Can this be done in a followup jira. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851972#comment-13851972 ] Ashutosh Chauhan commented on HIVE-6013: 1. But that test case doesn't have {{set hive.support.quoted.identifiers=column;}} 2. Doing jdbc changes in follow-up is ok. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851982#comment-13851982 ] Harish Butani commented on HIVE-6013: - You mean set hive.support.quoted.identifiers=none; right? With it is set to 'column' it will treat it as a literal. And the query would be: select `a.*` from t1; You need the back-ticks. Otherwise this will not get past the lexer. Since 'none' is default setting, i thought the exiting test was enough. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-hadoop2 - Build # 611 - Still Failing
Changes for Build #572 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #585 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) Changes for Build #586 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #587 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #588 Changes for Build #589 Changes for Build #590 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #591 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani via Ashutosh Chauhan) [hashutosh] HIVE-5598 :
Hive-trunk-h0.21 - Build # 2512 - Still Failing
Changes for Build #2473 [brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad Mujumdar, Navis via Brock Noland) Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #2487 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) Changes for Build #2488 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #2489 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #2490 Changes for Build #2491 Changes for Build #2492 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #2493 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani
[jira] [Created] (HIVE-6055) Cleanup aisle tez
Gunther Hagleitner created HIVE-6055: Summary: Cleanup aisle tez Key: HIVE-6055 URL: https://issues.apache.org/jira/browse/HIVE-6055 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Some of the past merges have led to some dead code. Need to remove this from the tez branch. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations
[ https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6021: -- Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thank Sun Rui for the contribution. Problem in GroupByOperator for handling distinct aggrgations Key: HIVE-6021 URL: https://issues.apache.org/jira/browse/HIVE-6021 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Sun Rui Assignee: Sun Rui Fix For: 0.13.0 Attachments: HIVE-6021.1.patch, HIVE-6021.2.patch Use the following test case with HIVE 0.12: {code:sql} create table src(key int, value string); load data local inpath 'src/data/files/kv1.txt' overwrite into table src; set hive.map.aggr=false; select count(key),count(distinct value) from src group by key; {code} We will get an ArrayIndexOutOfBoundsException from GroupByOperator: {code} java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 5 more Caused by: java.lang.RuntimeException: Reduce operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159) ... 10 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377) at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152) ... 10 more {code} explain select count(key),count(distinct value) from src group by key; {code} STAGE PLANS: Stage: Stage-1 Map Reduce Alias - Map Operator Tree: src TableScan alias: src Select Operator expressions: expr: key type: int expr: value type: string outputColumnNames: key, value Reduce Output Operator key expressions: expr: key type: int expr: value type: string sort order: ++ Map-reduce partition columns: expr: key type: int tag: -1 Reduce Operator Tree: Group By Operator aggregations: expr: count(KEY._col0) // The parameter causes this problem ^^^ expr: count(DISTINCT KEY._col1:0._col0) bucketGroup: false keys: expr: KEY._col0 type: int mode: complete outputColumnNames: _col0, _col1, _col2 Select Operator expressions: expr: _col1 type: bigint expr: _col2 type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 {code} The root cause is within GroupByOperator.initializeOp(). The method forgets to handle the case: For a query has distinct aggregations, there is an aggregation function has a parameter which is a groupby key column but not distinct key column. {code} if (unionExprEval != null) { String[] names =
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852018#comment-13852018 ] Hive QA commented on HIVE-6013: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619370/HIVE-6013.4.patch {color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 4795 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_or_replace_view org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_formatted_view_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_formatted_view_partitioned_json org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auth org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_file_format org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_multiple org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_update org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_binary_search org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_serde org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale_partitioned org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_convert_join org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_edge_cases org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_syntax org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_as_select_with_partition org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_failure4 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_failure6 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_create_or_replace_view1 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_create_or_replace_view2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_bitmap_no_map_aggr org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_entry_limit org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_size_limit {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/689/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/689/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 48 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619370 Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013
[jira] [Updated] (HIVE-6055) Cleanup aisle tez
[ https://issues.apache.org/jira/browse/HIVE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6055: - Attachment: HIVE-6055.1.patch Cleanup aisle tez - Key: HIVE-6055 URL: https://issues.apache.org/jira/browse/HIVE-6055 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6055.1.patch Some of the past merges have led to some dead code. Need to remove this from the tez branch. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type
[ https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852027#comment-13852027 ] Eric Hanson commented on HIVE-5761: --- Please see my comments on ReviewBoard Implement vectorized support for the DATE data type --- Key: HIVE-5761 URL: https://issues.apache.org/jira/browse/HIVE-5761 Project: Hive Issue Type: Sub-task Reporter: Eric Hanson Assignee: Teddy Choi Attachments: HIVE-5761.1.patch Add support to allow queries referencing DATE columns and expression results to run efficiently in vectorized mode. This should re-use the code for the the integer/timestamp types to the extent possible and beneficial. Include unit tests and end-to-end tests. Consider re-using or extending existing end-to-end tests for vectorized integer and/or timestamp operations. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
[ https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852102#comment-13852102 ] Xuefu Zhang commented on HIVE-6045: --- In addition, please add a test case if it's not already in TestBeelineWithArgs. Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6045.1.patch, HIVE-6045.patch HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
[ https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852101#comment-13852101 ] Xuefu Zhang commented on HIVE-6045: --- [~szehon], I believe this (with '' as separator) used to work fine. Do you know what change broke this? TestBeelineWithArgs used to pass also. I think it makes sense to fix it and include it in the test suite? Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6045.1.patch, HIVE-6045.patch HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Created] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table
Rushil Gupta created HIVE-6056: -- Summary: The AvroSerDe gives out BadSchemaException if a partition is added to the table Key: HIVE-6056 URL: https://issues.apache.org/jira/browse/HIVE-6056 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.11.0 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5 Reporter: Rushil Gupta While creating an external table if I do not add a partition, I am able to read files using following format: CREATE external TABLE event ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/2013/14/10' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); but if I add a partition based on date CREATE external TABLE event PARTITIONED BY (ds STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION '2013/12/16/'; I get the following exception: java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Attachment: HIVE-6013.5.patch Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Status: Patch Available (was: Open) Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Status: Open (was: Patch Available) Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16299: HIVE-6013: Supporting Quoted Identifiers in Column Names
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16299/ --- (Updated Dec. 18, 2013, 8:38 p.m.) Review request for hive, Ashutosh Chauhan and Alan Gates. Changes --- the null check in HiveUtils.unparseIdentifier is needed. Most existing invocations(for everything other than columns) invoke the old function(line 273), which doesn't take a context object. Bugs: HIVE-6013 https://issues.apache.org/jira/browse/HIVE-6013 Repository: hive-git Description --- Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: Introduce 'standard' quoted identifiers for columns only. At the langauage level this is turned on by a flag. At the metadata level we relax the constraint on column names. Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 itests/qtest/pom.xml 971c5d3 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 5b75ef3 ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java eb26e7f ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java dbf3f91 ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ed9917d ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java 1e6826f ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b9cd65c ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 8fe2262 ql/src/test/queries/clientnegative/invalid_columns.q f8be8c8 ql/src/test/queries/clientpositive/quotedid_alter.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_basic.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_partition.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_skew.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_smb.q PRE-CREATION ql/src/test/queries/clientpositive/quotedid_tblproperty.q PRE-CREATION ql/src/test/results/clientnegative/invalid_columns.q.out 3311b0a ql/src/test/results/clientpositive/quotedid_alter.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_basic.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_partition.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_skew.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_smb.q.out PRE-CREATION ql/src/test/results/clientpositive/quotedid_tblproperty.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16299/diff/ Testing --- added new tests for create, alter, delete, query with columns containing special characters. Tests start with quotedid Thanks, Harish Butani
[jira] [Resolved] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table
[ https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushil Gupta resolved HIVE-6056. Resolution: Fixed The AvroSerDe gives out BadSchemaException if a partition is added to the table --- Key: HIVE-6056 URL: https://issues.apache.org/jira/browse/HIVE-6056 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.11.0 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5 Reporter: Rushil Gupta While creating an external table if I do not add a partition, I am able to read files using following format: CREATE external TABLE event ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/2013/14/10' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); but if I add a partition based on date CREATE external TABLE event PARTITIONED BY (ds STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION '2013/12/16/'; I get the following exception: java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table
[ https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852157#comment-13852157 ] Rushil Gupta commented on HIVE-6056: This bug is fixed as a part of hive 0.12 (https://issues.apache.org/jira/browse/HIVE-3953) The AvroSerDe gives out BadSchemaException if a partition is added to the table --- Key: HIVE-6056 URL: https://issues.apache.org/jira/browse/HIVE-6056 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.11.0 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5 Reporter: Rushil Gupta While creating an external table if I do not add a partition, I am able to read files using following format: CREATE external TABLE event ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/2013/14/10' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); but if I add a partition based on date CREATE external TABLE event PARTITIONED BY (ds STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION '2013/12/16/'; I get the following exception: java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6045) Beeline hivevars is broken for more than one hivevar
[ https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852167#comment-13852167 ] Szehon Ho commented on HIVE-6045: - I don't see any immediate changes. Do you remember if the parsing used to expect '' character? I believe that would be the only way it would have worked, but I guess there was never a unit test for this case. I will take a look at effort to move the test to /itests/hive-unit , there are a few tests about null as empty string that seem broken when I ran it, I will have to take a look. Beeline hivevars is broken for more than one hivevar Key: HIVE-6045 URL: https://issues.apache.org/jira/browse/HIVE-6045 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.0 Reporter: Szehon Ho Assignee: Szehon Ho Attachments: HIVE-6045.1.patch, HIVE-6045.patch HIVE-4568 introduced --hivevar flag. But if you specify more than one hivevar, for example {code} beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2 {code} then the variables during runtime get mangled to evaluate to: {code} file1=/user/szehon/file1file2=/user/szehon/file2 {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-h0.21 - Build # 2513 - Still Failing
Changes for Build #2474 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #2475 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #2476 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #2477 Changes for Build #2478 Changes for Build #2479 Changes for Build #2480 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #2481 Changes for Build #2482 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #2483 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #2484 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #2485 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #2486 [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #2487 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) Changes for Build #2488 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #2489 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #2490 Changes for Build #2491 Changes for Build #2492 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #2493 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani via Ashutosh Chauhan) [hashutosh] HIVE-5598 : Remove dummy new line at the end of non-sql commands (Navis via Ashutosh Chauhan) Changes
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Status: Open (was: Patch Available) Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Attachment: (was: hive-6006.patch) Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5957) Fix HCatalog Unit tests on Windows
[ https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852201#comment-13852201 ] Daniel Dai commented on HIVE-5957: -- Do we still need the change in TestMultiOutputFormat.java? Seems it is for debug? Fix HCatalog Unit tests on Windows -- Key: HIVE-5957 URL: https://issues.apache.org/jira/browse/HIVE-5957 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5957.patch org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It generates java.lang.IllegalStateException: Failed to setup cluster at org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). Digging further there is Please fix invalid configuration for hbase.rootdir file://C:/tmp/build/test/data/test_default_573990410261077827/hbase java.lang.IllegalArgumentException: Wrong FS: file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) This was fixed in HIVE-5015 (9/3/13) and and got clobbered by HIVE-5261 (9/12/13). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852205#comment-13852205 ] Hive QA commented on HIVE-6013: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619403/HIVE-6013.5.patch {color:green}SUCCESS:{color} +1 4796 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/690/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/690/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12619403 Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Status: Patch Available (was: Open) Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Attachments: hive-6006.patch Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Attachment: hive-6006.patch Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Attachments: hive-6006.patch Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5957) Fix HCatalog Unit tests on Windows
[ https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852240#comment-13852240 ] Eugene Koifman commented on HIVE-5957: -- [~daijy]Not sure I understand. It seems like a proper change. Fix HCatalog Unit tests on Windows -- Key: HIVE-5957 URL: https://issues.apache.org/jira/browse/HIVE-5957 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5957.patch org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It generates java.lang.IllegalStateException: Failed to setup cluster at org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). Digging further there is Please fix invalid configuration for hbase.rootdir file://C:/tmp/build/test/data/test_default_573990410261077827/hbase java.lang.IllegalArgumentException: Wrong FS: file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) This was fixed in HIVE-5015 (9/3/13) and and got clobbered by HIVE-5261 (9/12/13). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Hive-trunk-hadoop2 - Build # 612 - Still Failing
Changes for Build #573 [navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K and Szehon Ho via Navis) [thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed by Ashutosh Chauhan) [brock] HIVE-5704 - A couple of generic UDFs are not in the right folder/package (Xuefu Zhang via Brock Noland) [brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu Zhang via Brock Noland) [hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables (Prasanth J via Ashutosh Chauhan) [hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback (Ashutosh Chauhan via Thejas Nair) [brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland) Changes for Build #574 [brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit K via Brock Noland) Changes for Build #575 [xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to nonexistent column (Carl via Xuefu) [xuefu] HIVE-5684: Serde support for char (Jason via Xuefu) Changes for Build #576 Changes for Build #577 Changes for Build #578 Changes for Build #579 [brock] HIVE-5441 - Async query execution doesn't return resultset status (Prasad Mujumdar via Thejas M Nair) [brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock Noland reviewed by Prasad Mujumdar) Changes for Build #580 [ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string arguments (Teddy Choi via Eric Hanson) Changes for Build #581 [rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth Jayachandran via Harish Butani) Changes for Build #582 [brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks packaging (Xuefu Zhang via Brock Noland) Changes for Build #583 [xuefu] HIVE-5866: Hive divide operator generates wrong results in certain cases (reviewed by Prasad) [ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued expression (Eric Hanson) Changes for Build #584 [thejas] HIVE-5550 : Import fails for tables created with default text, sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas Nair) [ehans] HIVE-5895: vectorization handles division by zero differently from normal execution (Sergey Shelukhin via Eric Hanson) [hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via Ashutosh Chauhan) [xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via Xuefu) [brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles (Szehon Ho via Brock Noland) [brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock Noland reviewed by Navis) [brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh Chauhan) Changes for Build #585 [hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter mechanism (Ashutosh Chauhan via Navis) [xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock) Changes for Build #586 [hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis via Ashutosh Chauhan) [navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis) [hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates for join, limit and filter operator (Prasanth J via Harish Butani) [rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns (Jason Dere via Harish Butani) Changes for Build #587 [xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values (Szehon via Xuefu, reviewed by Navis) [brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by Prasad and Thejas) [hashutosh] HIVE-5909 : locate and instr throw java.nio.BufferUnderflowException when empty string as substring (Navis via Ashutosh Chauhan) [hashutosh] HIVE-5686 : partition column type validation doesn't quite work for dates (Sergey Shelukhin via Ashutosh Chauhan) [hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey Shelukhin via Ashutosh Chauhan) Changes for Build #588 Changes for Build #589 Changes for Build #590 [brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by Prasad) Changes for Build #591 [xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate precision/scale for decimal types (reviewed by Sergey Shelukhin) [hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath Pandey via Ashutosh Chauhan) [hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery contains nulls in matching column (Harish Butani via Ashutosh Chauhan) [hashutosh] HIVE-5598 : Remove dummy new line at the end of non-sql commands (Navis via Ashutosh Chauhan) Changes for Build #592 [hashutosh] HIVE-5982 :
[jira] [Commented] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852260#comment-13852260 ] Hive QA commented on HIVE-6006: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619415/hive-6006.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4796 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_haversinedistance {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/691/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/691/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619415 Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Attachments: hive-6006.patch Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Status: Open (was: Patch Available) Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates
[ https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kostiantyn Kudriavtsev updated HIVE-6006: - Attachment: (was: hive-6006.patch) Add UDF to calculate distance between geographic coordinates Key: HIVE-6006 URL: https://issues.apache.org/jira/browse/HIVE-6006 Project: Hive Issue Type: New Feature Components: UDF Affects Versions: 0.13.0 Reporter: Kostiantyn Kudriavtsev Priority: Minor Fix For: 0.13.0 Original Estimate: 336h Remaining Estimate: 336h It would be nice to have Hive UDF to calculate distance between two points on Earth. Haversine formula seems to be good enough to overcome this issue The next function is proposed: HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance between 2 points with coordinates (lat1, lon1) and (lat2, lon2) -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Reopened] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table
[ https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rushil Gupta reopened HIVE-6056: Thought this was fixed in hive-0.12. However when I tried that, it failed with the error: java.lang.NoSuchMethodError: org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(Lorg/apache/hadoop/hive/serde2/typeinfo/PrimitiveTypeSpec;)Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/AbstractPrimitiveJavaObjectInspector; The AvroSerDe gives out BadSchemaException if a partition is added to the table --- Key: HIVE-6056 URL: https://issues.apache.org/jira/browse/HIVE-6056 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.11.0 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5 Reporter: Rushil Gupta While creating an external table if I do not add a partition, I am able to read files using following format: CREATE external TABLE event ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/2013/14/10' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); but if I add a partition based on date CREATE external TABLE event PARTITIONED BY (ds STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION '2013/12/16/'; I get the following exception: java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table
[ https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852392#comment-13852392 ] Rushil Gupta commented on HIVE-6056: Here is the full stacktrace: at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:95) at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:81) at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.init(AvroObjectInspectorGenerator.java:55) at org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:69) at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260) at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253) at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490) at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:518) at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3305) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) The AvroSerDe gives out BadSchemaException if a partition is added to the table --- Key: HIVE-6056 URL: https://issues.apache.org/jira/browse/HIVE-6056 Project: Hive Issue Type: Bug Components: Database/Schema Affects Versions: 0.11.0 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5 Reporter: Rushil Gupta While creating an external table if I do not add a partition, I am able to read files using following format: CREATE external TABLE event ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/2013/14/10' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); but if I add a partition based on date CREATE external TABLE event PARTITIONED BY (ds STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 's3n://test-event/input/' TBLPROPERTIES ('avro.schema.literal' = '..some schema..'); ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION '2013/12/16/'; I get the following exception: java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852406#comment-13852406 ] Harish Butani commented on HIVE-6013: - had a discussion with [~ashutoshc] Leaning towards setting 'hive.support.quoted.identifiers' to support quoted identifiers by default. This is a backward incompatible change. Assumption is that the regex feature with backticks is a obscure feature; it makes more sense to have this feature on by default. Will document the incompatible change. If anybody strongly disagrees with this, please voice your concerns. Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16184: Hive should be able to skip header and footer rows when reading data file for a table (HIVE-5795)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16184/ --- (Updated Dec. 19, 2013, 12:40 a.m.) Review request for hive, Eric Hanson and Thejas Nair. Bugs: hive-5795 https://issues.apache.org/jira/browse/hive-5795 Repository: hive-git Description --- Hive should be able to skip header and footer rows when reading data file for a table (follow up with review https://reviews.apache.org/r/15663/diff/#index_header) Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 conf/hive-default.xml.template 1b30d19 data/files/header_footer_table_1/0001.txt PRE-CREATION data/files/header_footer_table_1/0002.txt PRE-CREATION data/files/header_footer_table_1/0003.txt PRE-CREATION data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION itests/qtest/pom.xml 971c5d3 ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java fc9b7e4 ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9afc80b ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java dd5cb6b ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 974a5d6 ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java 85dd975 ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 0686d9b ql/src/test/queries/clientnegative/file_with_header_footer_negative.q PRE-CREATION ql/src/test/queries/clientpositive/file_with_header_footer.q PRE-CREATION ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out PRE-CREATION ql/src/test/results/clientpositive/file_with_header_footer.q.out PRE-CREATION serde/if/serde.thrift 2ceb572 serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java 22a6168 Diff: https://reviews.apache.org/r/16184/diff/ Testing --- Thanks, Shuaishuai Nie
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-5795: - Attachment: HIVE-5795.3.patch Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.number=1, skip.footer.number=2); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Incompatible Changes affecting Serdes and UDFS
Incompatible in what sense, or with what -- previous releases? Thanks. -- Lefty On Tue, Dec 17, 2013 at 7:06 AM, Brock Noland br...@cloudera.com wrote: Hi, Hive 0.12 made some incompatible changes which impacts Serdes and it appears 0.13 makes more incompatible changes. I created HIVE-6043 to track this, if you know of any more changes than what is described there, please do add them. Thanks! Brock
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Status: Open (was: Patch Available) Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Status: Patch Available (was: Open) Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Harish Butani updated HIVE-6013: Attachment: HIVE-6013.6.patch Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Steinbach updated HIVE-3159: - Status: Open (was: Patch Available) Update AvroSerde to determine schema of new tables -- Key: HIVE-3159 URL: https://issues.apache.org/jira/browse/HIVE-3159 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Jakob Homan Assignee: Mohammad Kamrul Islam Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159v1.patch Currently when writing tables to Avro one must manually provide an Avro schema that matches what is being delivered by Hive. It'd be better to have the serde infer this schema by converting the table's TypeInfo into an appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852448#comment-13852448 ] Eric Hanson commented on HIVE-5795: --- The coding and comment style looks good now. Thanks. Did you accidentally pick up some changes to conf/hive-default.xml.template? There are many capitalization and spelling changes in there. Maybe you need to diff again from a clean latest version of trunk. Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.number=1, skip.footer.number=2); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4216: - Attachment: HIVE-4216.1.patch Cool, that did allow the test case to pass. Attaching patch, which also updates hbase_bulk.m to set the right properties settings and to fix some mask issues. TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0 Reporter: Viraj Bhat Attachments: HIVE-4216.1.patch After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119) at org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151) at java.lang.String.valueOf(String.java:2826) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90) at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67)
[jira] [Updated] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Dere updated HIVE-4216: - Status: Patch Available (was: Open) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0 Reporter: Viraj Bhat Attachments: HIVE-4216.1.patch After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119) at org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151) at java.lang.String.valueOf(String.java:2826) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90) at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67) at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getHiveRecordWriter(HiveHFileOutputFormat.java:104) at
[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852454#comment-13852454 ] Viraj Bhat commented on HIVE-4216: -- Hi Jason, Andrey and Sheng, Thanks for finding the issue and fixing it in the patch. Let me test the patch and run it on our test clusters. Viraj TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0 Reporter: Viraj Bhat Attachments: HIVE-4216.1.patch After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119) at org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151) at java.lang.String.valueOf(String.java:2826) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90) at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67)
[jira] [Updated] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viraj Bhat updated HIVE-4216: - Environment: Hadoop 23.X Affects Version/s: 0.11.0 0.12.0 Fix Version/s: 0.13.0 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0, 0.11.0, 0.12.0 Environment: Hadoop 23.X Reporter: Viraj Bhat Fix For: 0.13.0 Attachments: HIVE-4216.1.patch After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119) at org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151) at java.lang.String.valueOf(String.java:2826) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90) at org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67)
[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852467#comment-13852467 ] Hive QA commented on HIVE-5795: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619445/HIVE-5795.3.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4794 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_file_with_header_footer org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_file_with_header_footer_negative {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/692/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/692/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619445 Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.number=1, skip.footer.number=2); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6052) metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns
[ https://issues.apache.org/jira/browse/HIVE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated HIVE-6052: --- Attachment: HIVE-6052.01.patch updated patch metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns - Key: HIVE-6052 URL: https://issues.apache.org/jira/browse/HIVE-6052 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6052.01.patch, HIVE-6052.patch If integer partition columns have values stores in non-canonical form, for example with leading zeroes, the integer filter doesn't work. That is because JDO pushdown uses substrings to compare for equality, and SQL pushdown is intentionally crippled to do the same to produce same results. Probably, since both SQL pushdown and integers pushdown are just perf optimizations, we can remove it for JDO (or make configurable and disable by default), and uncripple SQL. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Status: Patch Available (was: Open) Update AvroSerde to determine schema of new tables -- Key: HIVE-3159 URL: https://issues.apache.org/jira/browse/HIVE-3159 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Jakob Homan Assignee: Mohammad Kamrul Islam Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159v1.patch Currently when writing tables to Avro one must manually provide an Avro schema that matches what is being delivered by Hive. It'd be better to have the serde infer this schema by converting the table's TypeInfo into an appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables
[ https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-3159: Attachment: HIVE-3159.6.patch Update AvroSerde to determine schema of new tables -- Key: HIVE-3159 URL: https://issues.apache.org/jira/browse/HIVE-3159 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Jakob Homan Assignee: Mohammad Kamrul Islam Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, HIVE-3159v1.patch Currently when writing tables to Avro one must manually provide an Avro schema that matches what is being delivered by Hive. It'd be better to have the serde infer this schema by converting the table's TypeInfo into an appropriate AvroSchema. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16339: HIVE-6052 metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16339/ --- (Updated Dec. 19, 2013, 2:17 a.m.) Review request for hive and Ashutosh Chauhan. Repository: hive-git Description --- see JIRA Diffs (updated) - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java a98d9d1 metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 04d399f metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 93e9942 ql/src/test/queries/clientpositive/alter_partition_coltype.q 5479afb ql/src/test/queries/clientpositive/annotate_stats_part.q 83510e3 ql/src/test/queries/clientpositive/dynamic_partition_skip_default.q 397a220 ql/src/test/results/clientpositive/alter_partition_coltype.q.out 27b1fbc ql/src/test/results/clientpositive/annotate_stats_part.q.out 87fb980 ql/src/test/results/clientpositive/dynamic_partition_skip_default.q.out baee525 Diff: https://reviews.apache.org/r/16339/diff/ Testing --- Thanks, Sergey Shelukhin
[jira] [Updated] (HIVE-5957) Fix HCatalog Unit tests on Windows
[ https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5957: - Status: Open (was: Patch Available) Fix HCatalog Unit tests on Windows -- Key: HIVE-5957 URL: https://issues.apache.org/jira/browse/HIVE-5957 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5957.patch org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It generates java.lang.IllegalStateException: Failed to setup cluster at org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). Digging further there is Please fix invalid configuration for hbase.rootdir file://C:/tmp/build/test/data/test_default_573990410261077827/hbase java.lang.IllegalArgumentException: Wrong FS: file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) This was fixed in HIVE-5015 (9/3/13) and and got clobbered by HIVE-5261 (9/12/13). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5957) Fix HCatalog Unit tests on Windows
[ https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5957: - Status: Patch Available (was: Open) Fix HCatalog Unit tests on Windows -- Key: HIVE-5957 URL: https://issues.apache.org/jira/browse/HIVE-5957 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5957.2.patch, HIVE-5957.patch org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It generates java.lang.IllegalStateException: Failed to setup cluster at org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). Digging further there is Please fix invalid configuration for hbase.rootdir file://C:/tmp/build/test/data/test_default_573990410261077827/hbase java.lang.IllegalArgumentException: Wrong FS: file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) This was fixed in HIVE-5015 (9/3/13) and and got clobbered by HIVE-5261 (9/12/13). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5957) Fix HCatalog Unit tests on Windows
[ https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eugene Koifman updated HIVE-5957: - Attachment: HIVE-5957.2.patch HIVE-5957.2.patch to address Daniel's comments Fix HCatalog Unit tests on Windows -- Key: HIVE-5957 URL: https://issues.apache.org/jira/browse/HIVE-5957 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.12.0 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-5957.2.patch, HIVE-5957.patch org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It generates java.lang.IllegalStateException: Failed to setup cluster at org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). Digging further there is Please fix invalid configuration for hbase.rootdir file://C:/tmp/build/test/data/test_default_573990410261077827/hbase java.lang.IllegalArgumentException: Wrong FS: file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, expected: file:/// at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642) This was fixed in HIVE-5015 (9/3/13) and and got clobbered by HIVE-5261 (9/12/13). -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-5795: - Attachment: (was: HIVE-5795.3.patch) Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.number=1, skip.footer.number=2); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/#review30669 --- ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java https://reviews.apache.org/r/15654/#comment58764 Would you please further explain on this? preferably with an example. - Mohammad Islam On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/ --- (Updated Dec. 18, 2013, 3:16 a.m.) Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra Pandey. Bugs: HIVE-5829 https://issues.apache.org/jira/browse/HIVE-5829 Repository: hive-git Description --- Rewrite the UDFS *pads and *trim using GenericUDF. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java bca1f26 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java eff251f ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java PRE-CREATION Diff: https://reviews.apache.org/r/15654/diff/ Testing --- Thanks, Mohammad Islam
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-5795: - Attachment: HIVE-5795.3.patch Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.number=1, skip.footer.number=2); {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF
On Dec. 18, 2013, 5:37 a.m., Xuefu Zhang wrote: ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java, line 774 https://reviews.apache.org/r/15654/diff/4/?file=399245#file399245line774 I don't think we need the bridge udf for generic UDFs. This change is only to replace UDFLTrim with new GenericUDFLTrim used in this test case. Generic bridge UDF is already there. On Dec. 18, 2013, 5:37 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java, line 109 https://reviews.apache.org/r/15654/diff/4/?file=399238#file399238line109 I'm not sure if this is intentional, but the logic here means that any of the three input can have a type of INT. If INT is okay, then why not BYTE, SHORT, or LONG? It's probably better to check each argument's type separately. will do. On Dec. 18, 2013, 5:37 a.m., Xuefu Zhang wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java, line 48 https://reviews.apache.org/r/15654/diff/4/?file=399238#file399238line48 Msg doesn't match the if condition. will correct. - Mohammad --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/#review30604 --- On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/ --- (Updated Dec. 18, 2013, 3:16 a.m.) Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra Pandey. Bugs: HIVE-5829 https://issues.apache.org/jira/browse/HIVE-5829 Repository: hive-git Description --- Rewrite the UDFS *pads and *trim using GenericUDF. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java bca1f26 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java eff251f ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java PRE-CREATION Diff: https://reviews.apache.org/r/15654/diff/ Testing --- Thanks, Mohammad Islam
Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF
On Dec. 18, 2013, 10:58 a.m., Jason Dere wrote: ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java, line 78 https://reviews.apache.org/r/15654/diff/4/?file=399238#file399238line78 Having gone through some pain with Hive on Windows, the bytes returned by String.getBytes() will not be in utf-8 if the default encoding is something other than utf-8. Would be safer here to either use getBytes(UTF-8), or Text.encode() if you want to get bytes from the string. Or just do the padding as Strings. str is of type Text. It doesn't have getBytes(UTF-8). only have getBytes(). - Mohammad --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/#review30610 --- On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15654/ --- (Updated Dec. 18, 2013, 3:16 a.m.) Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra Pandey. Bugs: HIVE-5829 https://issues.apache.org/jira/browse/HIVE-5829 Repository: hive-git Description --- Rewrite the UDFS *pads and *trim using GenericUDF. Diffs - ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java bca1f26 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java eff251f ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java PRE-CREATION Diff: https://reviews.apache.org/r/15654/diff/ Testing --- Thanks, Mohammad Islam
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: (was: HIVE-5829.3.patch) Rewrite Trim and Pad UDFs based on GenericUDF - Key: HIVE-5829 URL: https://issues.apache.org/jira/browse/HIVE-5829 Project: Hive Issue Type: Bug Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch This JIRA includes following UDFs: 1. trim() 2. ltrim() 3. rtrim() 4. lpad() 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Attachment: HIVE-5829.3.patch Rewrite Trim and Pad UDFs based on GenericUDF - Key: HIVE-5829 URL: https://issues.apache.org/jira/browse/HIVE-5829 Project: Hive Issue Type: Bug Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, tmp.HIVE-5829.patch This JIRA includes following UDFs: 1. trim() 2. ltrim() 3. rtrim() 4. lpad() 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF
[ https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mohammad Kamrul Islam updated HIVE-5829: Status: Patch Available (was: Open) Rewrite Trim and Pad UDFs based on GenericUDF - Key: HIVE-5829 URL: https://issues.apache.org/jira/browse/HIVE-5829 Project: Hive Issue Type: Bug Components: UDF Reporter: Mohammad Kamrul Islam Assignee: Mohammad Kamrul Islam Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, tmp.HIVE-5829.patch This JIRA includes following UDFs: 1. trim() 2. ltrim() 3. rtrim() 4. lpad() 5. rpad() -- This message was sent by Atlassian JIRA (v6.1.4#6159)
Re: Review Request 16329: HIVE-6039: Round, AVG and SUM functions reject char/varch input while accepting string input
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16329/#review30672 --- Looks good to go. very minor comments added. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java https://reviews.apache.org/r/16329/#comment58769 Will it be better to explicitly add char and varchar as accepted type in the message? ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java https://reviews.apache.org/r/16329/#comment58770 same here. ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRound.java https://reviews.apache.org/r/16329/#comment58771 Same. more informative message could be helpful. - Mohammad Islam On Dec. 17, 2013, 9:26 p.m., Xuefu Zhang wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/16329/ --- (Updated Dec. 17, 2013, 9:26 p.m.) Review request for hive and Prasad Mujumdar. Bugs: HIVE-6039 https://issues.apache.org/jira/browse/HIVE-6039 Repository: hive-git Description --- Allow input to these UDFs for char and varchar. Diffs - data/files/char_varchar_udf.txt PRE-CREATION ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 4b219bd ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 41d5efd ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRound.java fc9c1b2 ql/src/test/queries/clientpositive/char_varchar_udf.q PRE-CREATION ql/src/test/results/clientpositive/char_varchar_udf.q.out PRE-CREATION Diff: https://reviews.apache.org/r/16329/diff/ Testing --- Unit tested. New test added. Test suite passed. Thanks, Xuefu Zhang
[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names
[ https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852539#comment-13852539 ] Hive QA commented on HIVE-6013: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619449/HIVE-6013.6.patch {color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 4796 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_clusterby1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_distributeby1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_orderby1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_sortby1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quote1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tablestatus org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_index org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_ambiguous_col1 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_ambiguous_col2 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalidate_view1 {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/693/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/693/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 14 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619449 Supporting Quoted Identifiers in Column Names - Key: HIVE-6013 URL: https://issues.apache.org/jira/browse/HIVE-6013 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Harish Butani Assignee: Harish Butani Fix For: 0.13.0 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, QuotedIdentifier.html Hive's current behavior on Quoted Identifiers is different from the normal interpretation. Quoted Identifier (using backticks) has a special interpretation for Select expressions(as Regular Expressions). Have documented current behavior and proposed a solution in attached doc. Summary of solution is: - Introduce 'standard' quoted identifiers for columns only. - At the langauage level this is turned on by a flag. - At the metadata level we relax the constraint on column names. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-5992) Hive inconsistently converts timestamp in AVG and SUM UDAF's
[ https://issues.apache.org/jira/browse/HIVE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852540#comment-13852540 ] Mohammad Kamrul Islam commented on HIVE-5992: - Looks good. +1 Hive inconsistently converts timestamp in AVG and SUM UDAF's Key: HIVE-5992 URL: https://issues.apache.org/jira/browse/HIVE-5992 Project: Hive Issue Type: Bug Components: UDF Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-5992.patch {code} hive select t, sum(t), count(*), sum(t)/count(*), avg(t) from ts group by t; ... OK 1977-03-15 12:34:22.345678 227306062 1 227306062 2.27306062345678E8 {code} As it can be seen, timestamp value (1977-03-15 12:34:22.345678) is converted with fractional part ignored in sum, while preserved in avg. As a further result, sum()/count() is not equivalent to avg. -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852578#comment-13852578 ] Sheng Liu commented on HIVE-4216: - Hi! Thanks all for finding the issue and fixing it. I find the path in the patch shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java is not exist in release-0.11.0 and 0.12.0, is it should be shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java? TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0, 0.11.0, 0.12.0 Environment: Hadoop 23.X Reporter: Viraj Bhat Fix For: 0.13.0 Attachments: HIVE-4216.1.patch After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119) at org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151) at java.lang.String.valueOf(String.java:2826) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209) at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69)
[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852577#comment-13852577 ] Hive QA commented on HIVE-4216: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619451/HIVE-4216.1.patch {color:green}SUCCESS:{color} +1 4792 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/694/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/694/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12619451 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0, 0.11.0, 0.12.0 Environment: Hadoop 23.X Reporter: Viraj Bhat Fix For: 0.13.0 Attachments: HIVE-4216.1.patch After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at
[jira] [Commented] (HIVE-6048) Hive load data command rejects file with '+' in the name
[ https://issues.apache.org/jira/browse/HIVE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852594#comment-13852594 ] Xuefu Zhang commented on HIVE-6048: --- [~ashutoshc] I agree with you about the need for HIVE-6024. However, I thinking solving that problem doesn't necessary address the problem described here, as the same issue also occurs with non-local files. {code} hive dfs -ls /tmp/files/t+est.txt; Found 1 items -rw-r--r-- 1 xzhang supergroup 9 2013-12-18 20:20 /tmp/files/t+est.txt hive load data inpath '/tmp/files/t+est.txt' into table test; Loading data to table default.test Failed with exception Wrong file format. Please check the file's format. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask {code} Therefore, it might make sense to solve them separately. Hive load data command rejects file with '+' in the name Key: HIVE-6048 URL: https://issues.apache.org/jira/browse/HIVE-6048 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-6048.patch '+' is a valid character in a file name on linux and HDFS. However, loading data from such a file into table results the following error: {code} hive load data local inpath './t+est' into table test; FAILED: SemanticException Line 1:23 Invalid path ''./t+est'': No files matching path file:/home/xzhang/apache/hive7/t%20est {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Updated] (HIVE-6048) Hive load data command rejects file with '+' in the name
[ https://issues.apache.org/jira/browse/HIVE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated HIVE-6048: -- Description: '+' is a valid character in a file name on linux and HDFS. However, loading data from such a file into table results the following error: {code} hive load data local inpath '/home/xzhang/temp/t+est.txt' into table test; Copying data from file:/home/xzhang/temp/t est.txt No files matching path: file:/home/xzhang/temp/t est.txt FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.CopyTask {code} was: '+' is a valid character in a file name on linux and HDFS. However, loading data from such a file into table results the following error: {code} hive load data local inpath './t+est' into table test; FAILED: SemanticException Line 1:23 Invalid path ''./t+est'': No files matching path file:/home/xzhang/apache/hive7/t%20est {code} Hive load data command rejects file with '+' in the name Key: HIVE-6048 URL: https://issues.apache.org/jira/browse/HIVE-6048 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.12.0 Reporter: Xuefu Zhang Assignee: Xuefu Zhang Attachments: HIVE-6048.patch '+' is a valid character in a file name on linux and HDFS. However, loading data from such a file into table results the following error: {code} hive load data local inpath '/home/xzhang/temp/t+est.txt' into table test; Copying data from file:/home/xzhang/temp/t est.txt No files matching path: file:/home/xzhang/temp/t est.txt FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.CopyTask {code} -- This message was sent by Atlassian JIRA (v6.1.4#6159)
[jira] [Commented] (HIVE-6052) metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns
[ https://issues.apache.org/jira/browse/HIVE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852604#comment-13852604 ] Hive QA commented on HIVE-6052: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619453/HIVE-6052.01.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4792 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_partition_coltype org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_dynamic_partition_skip_default {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/695/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/695/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12619453 metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns - Key: HIVE-6052 URL: https://issues.apache.org/jira/browse/HIVE-6052 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.13.0 Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Attachments: HIVE-6052.01.patch, HIVE-6052.patch If integer partition columns have values stores in non-canonical form, for example with leading zeroes, the integer filter doesn't work. That is because JDO pushdown uses substrings to compare for equality, and SQL pushdown is intentionally crippled to do the same to produce same results. Probably, since both SQL pushdown and integers pushdown are just perf optimizations, we can remove it for JDO (or make configurable and disable by default), and uncripple SQL. -- This message was sent by Atlassian JIRA (v6.1.4#6159)