date:20131218


[ 
https://issues.apache.org/jira/browse/HIVE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851492#comment-13851492
 ] 

Hive QA commented on HIVE-6048:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619258/HIVE-6048.patch

{color:red}ERROR:{color} -1 due to 44 failed/errored test(s), 4791 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_concatenate_indexed_table
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_alter_merge_stats
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_6
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_7
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketcontext_8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketizedhiveinputformat_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin10
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin12
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin8
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin9
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_bucketmapjoin_negative2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_global_limit
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input40
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_insert2_overwrite_partitions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition4
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_merge_dynamic_partition5
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_orc_split_elimination
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_partition_type_check
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats11
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_stats3
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/683/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/683/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 44 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619258

 Hive load data command rejects file with '+' in the name
 

 Key: HIVE-6048
 URL: https://issues.apache.org/jira/browse/HIVE-6048
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-6048.patch


 '+' is a valid character in a file name on linux and HDFS. However, loading 
 data from such a file into table results the following error:
 {code}
 hive load data local inpath './t+est' into table test;
 FAILED: SemanticException Line 1:23 Invalid path

[jira] [Commented] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression


[ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851493#comment-13851493
 ] 

Hive QA commented on HIVE-3746:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619267/HIVE-3746.5.patch.txt

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/685/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/685/console

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Including org.json:json:jar:20090211 in the shaded jar.
[INFO] Excluding stax:stax-api:jar:1.0.1 from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-core:jar:1.2.1 from the shaded jar.
[INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-core:jar:1.14 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-json:jar:1.14 from the shaded jar.
[INFO] Excluding org.codehaus.jettison:jettison:jar:1.1 from the shaded jar.
[INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.3-1 from the shaded jar.
[INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.2 from the shaded jar.
[INFO] Excluding javax.xml.stream:stax-api:jar:1.0-2 from the shaded jar.
[INFO] Excluding javax.activation:activation:jar:1.1 from the shaded jar.
[INFO] Excluding org.codehaus.jackson:jackson-jaxrs:jar:1.9.2 from the shaded 
jar.
[INFO] Excluding org.codehaus.jackson:jackson-xc:jar:1.9.2 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-server:jar:1.14 from the shaded jar.
[INFO] Excluding asm:asm:jar:3.1 from the shaded jar.
[INFO] Excluding org.apache.commons:commons-math:jar:2.1 from the shaded jar.
[INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the 
shaded jar.
[INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar.
[INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded 
jar.
[INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the 
shaded jar.
[INFO] Excluding commons-net:commons-net:jar:1.4.1 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar.
[INFO] Excluding tomcat:jasper-runtime:jar:5.5.12 from the shaded jar.
[INFO] Excluding tomcat:jasper-compiler:jar:5.5.12 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jsp-api-2.1:jar:6.1.14 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:servlet-api-2.5:jar:6.1.14 from the shaded 
jar.
[INFO] Excluding org.mortbay.jetty:jsp-2.1:jar:6.1.14 from the shaded jar.
[INFO] Excluding ant:ant:jar:1.6.5 from the shaded jar.
[INFO] Excluding commons-el:commons-el:jar:1.0 from the shaded jar.
[INFO] Excluding net.java.dev.jets3t:jets3t:jar:0.6.1 from the shaded jar.
[INFO] Excluding hsqldb:hsqldb:jar:1.8.0.10 from the shaded jar.
[INFO] Excluding oro:oro:jar:2.0.8 from the shaded jar.
[INFO] Excluding org.eclipse.jdt:core:jar:3.1.1 from the shaded jar.
[INFO] Excluding org.slf4j:slf4j-api:jar:1.7.5 from the shaded jar.
[INFO] Excluding org.slf4j:slf4j-log4j12:jar:1.7.5 from the shaded jar.
[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar
 with 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-shaded.jar
[INFO] Dependency-reduced POM written at: 
/data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml
[INFO] Dependency-reduced POM written at: 
/data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-exec ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.pom
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-tests.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] 
[INFO] Building Hive Service 0.13.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-service ---
[INFO] Deleting

[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6041:


Affects Version/s: 0.12.0

 Incorrect task dependency graph for skewed join optimization
 

 Key: HIVE-6041
 URL: https://issues.apache.org/jira/browse/HIVE-6041
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
 Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Assignee: Navis
Priority: Critical

 The dependency graph among task stages is incorrect for the skewed join 
 optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. 
 For the case that skewed keys do not exist, all the tasks following the 
 common join are filtered out at runtime.
 In particular, the conditional task in the optimized plan maintains no 
 dependency with the child tasks of the common join task in the original plan. 
 The conditional task is composed of the map join task which maintains all 
 these dependencies, but for the case the map join task is filtered out (i.e., 
 no skewed keys exist), all these dependencies are lost. Hence, all the other 
 task stages of the query (e.g., move stage which writes down the results into 
 the result table) are skipped.
 The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
 processSkewJoin() function, immediately after the ConditionalTask is created 
 and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Assigned] (HIVE-6041) Incorrect task dependency graph for skewed join optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis reassigned HIVE-6041:
---

Assignee: Navis

 Incorrect task dependency graph for skewed join optimization
 

 Key: HIVE-6041
 URL: https://issues.apache.org/jira/browse/HIVE-6041
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.11.0, 0.12.0
 Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Assignee: Navis
Priority: Critical

 The dependency graph among task stages is incorrect for the skewed join 
 optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. 
 For the case that skewed keys do not exist, all the tasks following the 
 common join are filtered out at runtime.
 In particular, the conditional task in the optimized plan maintains no 
 dependency with the child tasks of the common join task in the original plan. 
 The conditional task is composed of the map join task which maintains all 
 these dependencies, but for the case the map join task is filtered out (i.e., 
 no skewed keys exist), all these dependencies are lost. Hence, all the other 
 task stages of the query (e.g., move stage which writes down the results into 
 the result table) are skipped.
 The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
 processSkewJoin() function, immediately after the ConditionalTask is created 
 and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6041:


Affects Version/s: 0.6.0
   0.7.0
   0.8.0
   0.9.0
   0.10.0

 Incorrect task dependency graph for skewed join optimization
 

 Key: HIVE-6041
 URL: https://issues.apache.org/jira/browse/HIVE-6041
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0
 Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Assignee: Navis
Priority: Critical

 The dependency graph among task stages is incorrect for the skewed join 
 optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. 
 For the case that skewed keys do not exist, all the tasks following the 
 common join are filtered out at runtime.
 In particular, the conditional task in the optimized plan maintains no 
 dependency with the child tasks of the common join task in the original plan. 
 The conditional task is composed of the map join task which maintains all 
 these dependencies, but for the case the map join task is filtered out (i.e., 
 no skewed keys exist), all these dependencies are lost. Hence, all the other 
 task stages of the query (e.g., move stage which writes down the results into 
 the result table) are skipped.
 The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
 processSkewJoin() function, immediately after the ConditionalTask is created 
 and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6041:


Status: Patch Available  (was: Open)

Running test

 Incorrect task dependency graph for skewed join optimization
 

 Key: HIVE-6041
 URL: https://issues.apache.org/jira/browse/HIVE-6041
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.7.0, 0.6.0
 Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Assignee: Navis
Priority: Critical
 Attachments: HIVE-6041.1.patch.txt


 The dependency graph among task stages is incorrect for the skewed join 
 optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. 
 For the case that skewed keys do not exist, all the tasks following the 
 common join are filtered out at runtime.
 In particular, the conditional task in the optimized plan maintains no 
 dependency with the child tasks of the common join task in the original plan. 
 The conditional task is composed of the map join task which maintains all 
 these dependencies, but for the case the map join task is filtered out (i.e., 
 no skewed keys exist), all these dependencies are lost. Hence, all the other 
 task stages of the query (e.g., move stage which writes down the results into 
 the result table) are skipped.
 The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
 processSkewJoin() function, immediately after the ConditionalTask is created 
 and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6041) Incorrect task dependency graph for skewed join optimization


 [ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-6041:


Attachment: HIVE-6041.1.patch.txt

 Incorrect task dependency graph for skewed join optimization
 

 Key: HIVE-6041
 URL: https://issues.apache.org/jira/browse/HIVE-6041
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0
 Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Assignee: Navis
Priority: Critical
 Attachments: HIVE-6041.1.patch.txt


 The dependency graph among task stages is incorrect for the skewed join 
 optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. 
 For the case that skewed keys do not exist, all the tasks following the 
 common join are filtered out at runtime.
 In particular, the conditional task in the optimized plan maintains no 
 dependency with the child tasks of the common join task in the original plan. 
 The conditional task is composed of the map join task which maintains all 
 these dependencies, but for the case the map join task is filtered out (i.e., 
 no skewed keys exist), all these dependencies are lost. Hence, all the other 
 task stages of the query (e.g., move stage which writes down the results into 
 the result table) are skipped.
 The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
 processSkewJoin() function, immediately after the ConditionalTask is created 
 and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression


 [ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3746:


Status: Patch Available  (was: Open)

 Fix HS2 ResultSet Serialization Performance Regression
 --

 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, Server Infrastructure
Reporter: Carl Steinbach
Assignee: Navis
  Labels: HiveServer2, jdbc, thrift
 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, 
 HIVE-3746.3.patch.txt, HIVE-3746.4.patch.txt, HIVE-3746.5.patch.txt, 
 HIVE-3746.6.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression


 [ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3746:


Attachment: HIVE-3746.6.patch.txt

 Fix HS2 ResultSet Serialization Performance Regression
 --

 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, Server Infrastructure
Reporter: Carl Steinbach
Assignee: Navis
  Labels: HiveServer2, jdbc, thrift
 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, 
 HIVE-3746.3.patch.txt, HIVE-3746.4.patch.txt, HIVE-3746.5.patch.txt, 
 HIVE-3746.6.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression


 [ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3746:


Status: Open  (was: Patch Available)

 Fix HS2 ResultSet Serialization Performance Regression
 --

 Key: HIVE-3746
 URL: https://issues.apache.org/jira/browse/HIVE-3746
 Project: Hive
  Issue Type: Sub-task
  Components: HiveServer2, Server Infrastructure
Reporter: Carl Steinbach
Assignee: Navis
  Labels: HiveServer2, jdbc, thrift
 Attachments: HIVE-3746.1.patch.txt, HIVE-3746.2.patch.txt, 
 HIVE-3746.3.patch.txt, HIVE-3746.4.patch.txt, HIVE-3746.5.patch.txt, 
 HIVE-3746.6.patch.txt






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition


 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3286:


Attachment: HIVE-3286.17.patch.txt

 Explicit skew join on user provided condition
 -

 Key: HIVE-3286
 URL: https://issues.apache.org/jira/browse/HIVE-3286
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, 
 HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, 
 HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, 
 HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, 
 HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch


 Join operation on table with skewed data takes most of execution time 
 handling the skewed keys. But mostly we already know about that and even know 
 what is look like the skewed keys.
 If we can explicitly assign reducer slots for the skewed keys, total 
 execution time could be greatly shortened.
 As for a start, I've extended join grammar something like this.
 {code}
 select * from src a join src b on a.key=b.key skew on (a.key+1  50, a.key+1 
  100, a.key  150);
 {code}
 which means if above query is executed by 20 reducers, one reducer for 
 a.key+1  50, one reducer for 50 = a.key+1  100, one reducer for 99 = 
 a.key  150, and 17 reducers for others (could be extended to assign more 
 than one reducer later)
 This can be only used with common-inner-equi joins. And skew condition should 
 be composed of join keys only.
 Work till done now will be updated shortly after code cleanup.
 
 Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially 
 at runtime, and first 'true' one decides skew group for the row. Each skew 
 group has reserved partition slot(s), to which all rows in a group would be 
 assigned. 
 The number of partition slot reserved for each group is decided also at 
 runtime by simple calculation of percentage. If a skew group is CLUSTER BY 
 20 PERCENT and total partition slot (=number of reducer) is 20, that group 
 will reserve 4 partition slots, etc.
 DISTRIBUTE BY decides how the rows in a group is dispersed in the range of 
 reserved slots (If there is only one slot for a group, this is meaningless). 
 Currently, three distribution policies are available: RANDOM, KEYS, 
 expression. 
 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
 non-driver alias are duplicated for all the slots (default if not specified)
 2. KEYS : determined by hash value of keys (same with previous)
 3. expression : determined by hash of object evaluated by user-provided 
 expression
 Only possible with inner, equi, common-joins. Not yet supports join tree 
 merging.
 Might be used by other RS users like SORT BY or GROUP BY
 If there exists column statistics for the key, it could be possible to apply 
 automatically.
 For example, if 20 reducers are used for the query below,
 {code}
 select count(*) from src a join src b on a.key=b.key skew on (
a.key = '0' CLUSTER BY 10 PERCENT,
b.key  '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
cast(a.key as int)  300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
 {code}
 group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
 reserve slots 0~5.
 For a row with key='0' from alias a, the row is randomly assigned in the 
 range of 6~7 (driver alias) : 6 or 7
 For a row with key='0' from alias b, the row is disributed for all slots in 
 6~7 (non-driver alias) : 6 and 7
 For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
 of upper(b.key) : 8 + (hash(upper(key)) % 4)
 For a row with key='500', the row is assigned in the range of 12~19 by 
 hashcode of join key : 12 + (hash(key) % 8)
 For a row with key='200', this is not belong to any skew group : hash(key) % 6
 *expressions in skew condition : 
 1. all expressions should be made of expression in join condition, which 
 means if join condition is a.key=b.key, user can make any expression with 
 a.key or b.key. But if join condition is a.key+1=b.key, user cannot make 
 expression with a.key solely (should make expression with a.key+1). 
 2. all expressions should reference one and only-one side of aliases. For 
 example, simple constant expressions or expressions referencing both side of 
 join condition (a.key+b.key100) is not allowed.
 3. all functions in expression should be deteministic and stateless.
 4. if DISTRIBUTED BY expression is used, distibution expression also should 
 have same alias with skew expression.
 **driver alias :
 1. driver alias means the sole referenced alias from skew expression, which 
 is important for RANDOM distribution. rows of driver

[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition


 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3286:


Status: Patch Available  (was: Open)

 Explicit skew join on user provided condition
 -

 Key: HIVE-3286
 URL: https://issues.apache.org/jira/browse/HIVE-3286
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, 
 HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, 
 HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, 
 HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, 
 HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch


 Join operation on table with skewed data takes most of execution time 
 handling the skewed keys. But mostly we already know about that and even know 
 what is look like the skewed keys.
 If we can explicitly assign reducer slots for the skewed keys, total 
 execution time could be greatly shortened.
 As for a start, I've extended join grammar something like this.
 {code}
 select * from src a join src b on a.key=b.key skew on (a.key+1  50, a.key+1 
  100, a.key  150);
 {code}
 which means if above query is executed by 20 reducers, one reducer for 
 a.key+1  50, one reducer for 50 = a.key+1  100, one reducer for 99 = 
 a.key  150, and 17 reducers for others (could be extended to assign more 
 than one reducer later)
 This can be only used with common-inner-equi joins. And skew condition should 
 be composed of join keys only.
 Work till done now will be updated shortly after code cleanup.
 
 Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially 
 at runtime, and first 'true' one decides skew group for the row. Each skew 
 group has reserved partition slot(s), to which all rows in a group would be 
 assigned. 
 The number of partition slot reserved for each group is decided also at 
 runtime by simple calculation of percentage. If a skew group is CLUSTER BY 
 20 PERCENT and total partition slot (=number of reducer) is 20, that group 
 will reserve 4 partition slots, etc.
 DISTRIBUTE BY decides how the rows in a group is dispersed in the range of 
 reserved slots (If there is only one slot for a group, this is meaningless). 
 Currently, three distribution policies are available: RANDOM, KEYS, 
 expression. 
 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
 non-driver alias are duplicated for all the slots (default if not specified)
 2. KEYS : determined by hash value of keys (same with previous)
 3. expression : determined by hash of object evaluated by user-provided 
 expression
 Only possible with inner, equi, common-joins. Not yet supports join tree 
 merging.
 Might be used by other RS users like SORT BY or GROUP BY
 If there exists column statistics for the key, it could be possible to apply 
 automatically.
 For example, if 20 reducers are used for the query below,
 {code}
 select count(*) from src a join src b on a.key=b.key skew on (
a.key = '0' CLUSTER BY 10 PERCENT,
b.key  '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
cast(a.key as int)  300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
 {code}
 group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
 reserve slots 0~5.
 For a row with key='0' from alias a, the row is randomly assigned in the 
 range of 6~7 (driver alias) : 6 or 7
 For a row with key='0' from alias b, the row is disributed for all slots in 
 6~7 (non-driver alias) : 6 and 7
 For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
 of upper(b.key) : 8 + (hash(upper(key)) % 4)
 For a row with key='500', the row is assigned in the range of 12~19 by 
 hashcode of join key : 12 + (hash(key) % 8)
 For a row with key='200', this is not belong to any skew group : hash(key) % 6
 *expressions in skew condition : 
 1. all expressions should be made of expression in join condition, which 
 means if join condition is a.key=b.key, user can make any expression with 
 a.key or b.key. But if join condition is a.key+1=b.key, user cannot make 
 expression with a.key solely (should make expression with a.key+1). 
 2. all expressions should reference one and only-one side of aliases. For 
 example, simple constant expressions or expressions referencing both side of 
 join condition (a.key+b.key100) is not allowed.
 3. all functions in expression should be deteministic and stateless.
 4. if DISTRIBUTED BY expression is used, distibution expression also should 
 have same alias with skew expression.
 **driver alias :
 1. driver alias means the sole referenced alias from skew expression, which 
 is important for RANDOM distribution. rows of driver

[jira] [Updated] (HIVE-3286) Explicit skew join on user provided condition


 [ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3286:


Status: Open  (was: Patch Available)

 Explicit skew join on user provided condition
 -

 Key: HIVE-3286
 URL: https://issues.apache.org/jira/browse/HIVE-3286
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, 
 HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, 
 HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, 
 HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, 
 HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch


 Join operation on table with skewed data takes most of execution time 
 handling the skewed keys. But mostly we already know about that and even know 
 what is look like the skewed keys.
 If we can explicitly assign reducer slots for the skewed keys, total 
 execution time could be greatly shortened.
 As for a start, I've extended join grammar something like this.
 {code}
 select * from src a join src b on a.key=b.key skew on (a.key+1  50, a.key+1 
  100, a.key  150);
 {code}
 which means if above query is executed by 20 reducers, one reducer for 
 a.key+1  50, one reducer for 50 = a.key+1  100, one reducer for 99 = 
 a.key  150, and 17 reducers for others (could be extended to assign more 
 than one reducer later)
 This can be only used with common-inner-equi joins. And skew condition should 
 be composed of join keys only.
 Work till done now will be updated shortly after code cleanup.
 
 Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially 
 at runtime, and first 'true' one decides skew group for the row. Each skew 
 group has reserved partition slot(s), to which all rows in a group would be 
 assigned. 
 The number of partition slot reserved for each group is decided also at 
 runtime by simple calculation of percentage. If a skew group is CLUSTER BY 
 20 PERCENT and total partition slot (=number of reducer) is 20, that group 
 will reserve 4 partition slots, etc.
 DISTRIBUTE BY decides how the rows in a group is dispersed in the range of 
 reserved slots (If there is only one slot for a group, this is meaningless). 
 Currently, three distribution policies are available: RANDOM, KEYS, 
 expression. 
 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
 non-driver alias are duplicated for all the slots (default if not specified)
 2. KEYS : determined by hash value of keys (same with previous)
 3. expression : determined by hash of object evaluated by user-provided 
 expression
 Only possible with inner, equi, common-joins. Not yet supports join tree 
 merging.
 Might be used by other RS users like SORT BY or GROUP BY
 If there exists column statistics for the key, it could be possible to apply 
 automatically.
 For example, if 20 reducers are used for the query below,
 {code}
 select count(*) from src a join src b on a.key=b.key skew on (
a.key = '0' CLUSTER BY 10 PERCENT,
b.key  '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
cast(a.key as int)  300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
 {code}
 group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
 reserve slots 0~5.
 For a row with key='0' from alias a, the row is randomly assigned in the 
 range of 6~7 (driver alias) : 6 or 7
 For a row with key='0' from alias b, the row is disributed for all slots in 
 6~7 (non-driver alias) : 6 and 7
 For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
 of upper(b.key) : 8 + (hash(upper(key)) % 4)
 For a row with key='500', the row is assigned in the range of 12~19 by 
 hashcode of join key : 12 + (hash(key) % 8)
 For a row with key='200', this is not belong to any skew group : hash(key) % 6
 *expressions in skew condition : 
 1. all expressions should be made of expression in join condition, which 
 means if join condition is a.key=b.key, user can make any expression with 
 a.key or b.key. But if join condition is a.key+1=b.key, user cannot make 
 expression with a.key solely (should make expression with a.key+1). 
 2. all expressions should reference one and only-one side of aliases. For 
 example, simple constant expressions or expressions referencing both side of 
 join condition (a.key+b.key100) is not allowed.
 3. all functions in expression should be deteministic and stateless.
 4. if DISTRIBUTED BY expression is used, distibution expression also should 
 have same alias with skew expression.
 **driver alias :
 1. driver alias means the sole referenced alias from skew expression, which 
 is important for RANDOM distribution. rows of driver

[jira] [Commented] (HIVE-3746) Fix HS2 ResultSet Serialization Performance Regression


[ 
https://issues.apache.org/jira/browse/HIVE-3746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851500#comment-13851500
 ] 

Hive QA commented on HIVE-3746:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619276/HIVE-3746.6.patch.txt

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/686/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/686/console

Messages:
{noformat}
 This message was trimmed, see log for full details 
[INFO] Including org.json:json:jar:20090211 in the shaded jar.
[INFO] Excluding stax:stax-api:jar:1.0.1 from the shaded jar.
[INFO] Excluding org.apache.hadoop:hadoop-core:jar:1.2.1 from the shaded jar.
[INFO] Excluding xmlenc:xmlenc:jar:0.52 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-core:jar:1.14 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-json:jar:1.14 from the shaded jar.
[INFO] Excluding org.codehaus.jettison:jettison:jar:1.1 from the shaded jar.
[INFO] Excluding com.sun.xml.bind:jaxb-impl:jar:2.2.3-1 from the shaded jar.
[INFO] Excluding javax.xml.bind:jaxb-api:jar:2.2.2 from the shaded jar.
[INFO] Excluding javax.xml.stream:stax-api:jar:1.0-2 from the shaded jar.
[INFO] Excluding javax.activation:activation:jar:1.1 from the shaded jar.
[INFO] Excluding org.codehaus.jackson:jackson-jaxrs:jar:1.9.2 from the shaded 
jar.
[INFO] Excluding org.codehaus.jackson:jackson-xc:jar:1.9.2 from the shaded jar.
[INFO] Excluding com.sun.jersey:jersey-server:jar:1.14 from the shaded jar.
[INFO] Excluding asm:asm:jar:3.1 from the shaded jar.
[INFO] Excluding org.apache.commons:commons-math:jar:2.1 from the shaded jar.
[INFO] Excluding commons-configuration:commons-configuration:jar:1.6 from the 
shaded jar.
[INFO] Excluding commons-digester:commons-digester:jar:1.8 from the shaded jar.
[INFO] Excluding commons-beanutils:commons-beanutils:jar:1.7.0 from the shaded 
jar.
[INFO] Excluding commons-beanutils:commons-beanutils-core:jar:1.8.0 from the 
shaded jar.
[INFO] Excluding commons-net:commons-net:jar:1.4.1 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty:jar:6.1.26 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jetty-util:jar:6.1.26 from the shaded jar.
[INFO] Excluding tomcat:jasper-runtime:jar:5.5.12 from the shaded jar.
[INFO] Excluding tomcat:jasper-compiler:jar:5.5.12 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:jsp-api-2.1:jar:6.1.14 from the shaded jar.
[INFO] Excluding org.mortbay.jetty:servlet-api-2.5:jar:6.1.14 from the shaded 
jar.
[INFO] Excluding org.mortbay.jetty:jsp-2.1:jar:6.1.14 from the shaded jar.
[INFO] Excluding ant:ant:jar:1.6.5 from the shaded jar.
[INFO] Excluding commons-el:commons-el:jar:1.0 from the shaded jar.
[INFO] Excluding net.java.dev.jets3t:jets3t:jar:0.6.1 from the shaded jar.
[INFO] Excluding hsqldb:hsqldb:jar:1.8.0.10 from the shaded jar.
[INFO] Excluding oro:oro:jar:2.0.8 from the shaded jar.
[INFO] Excluding org.eclipse.jdt:core:jar:3.1.1 from the shaded jar.
[INFO] Excluding org.slf4j:slf4j-api:jar:1.7.5 from the shaded jar.
[INFO] Excluding org.slf4j:slf4j-log4j12:jar:1.7.5 from the shaded jar.
[INFO] Replacing original artifact with shaded artifact.
[INFO] Replacing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar
 with 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-shaded.jar
[INFO] Dependency-reduced POM written at: 
/data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml
[INFO] Dependency-reduced POM written at: 
/data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml
[INFO] 
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-exec ---
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.jar
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/dependency-reduced-pom.xml 
to 
/data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT.pom
[INFO] Installing 
/data/hive-ptest/working/apache-svn-trunk-source/ql/target/hive-exec-0.13.0-SNAPSHOT-tests.jar
 to 
/data/hive-ptest/working/maven/org/apache/hive/hive-exec/0.13.0-SNAPSHOT/hive-exec-0.13.0-SNAPSHOT-tests.jar
[INFO] 
[INFO] 
[INFO] Building Hive Service 0.13.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.5:clean (default-clean) @ hive-service ---
[INFO] Deleting

[jira] [Commented] (HIVE-6041) Incorrect task dependency graph for skewed join optimization


[ 
https://issues.apache.org/jira/browse/HIVE-6041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851528#comment-13851528
 ] 

Hive QA commented on HIVE-6041:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619274/HIVE-6041.1.patch.txt

{color:green}SUCCESS:{color} +1 4792 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/687/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/687/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619274

 Incorrect task dependency graph for skewed join optimization
 

 Key: HIVE-6041
 URL: https://issues.apache.org/jira/browse/HIVE-6041
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.6.0, 0.7.0, 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0
 Environment: Hadoop 1.0.3
Reporter: Adrian Popescu
Assignee: Navis
Priority: Critical
 Attachments: HIVE-6041.1.patch.txt


 The dependency graph among task stages is incorrect for the skewed join 
 optimized plan. Skewed joins are enabled through hive.optimize.skewjoin. 
 For the case that skewed keys do not exist, all the tasks following the 
 common join are filtered out at runtime.
 In particular, the conditional task in the optimized plan maintains no 
 dependency with the child tasks of the common join task in the original plan. 
 The conditional task is composed of the map join task which maintains all 
 these dependencies, but for the case the map join task is filtered out (i.e., 
 no skewed keys exist), all these dependencies are lost. Hence, all the other 
 task stages of the query (e.g., move stage which writes down the results into 
 the result table) are skipped.
 The bug resides in ql/optimizer/physical/GenMRSkewJoinProcessor.java, 
 processSkewJoin() function, immediately after the ConditionalTask is created 
 and its dependencies are set.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-3286) Explicit skew join on user provided condition


[ 
https://issues.apache.org/jira/browse/HIVE-3286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851560#comment-13851560
 ] 

Hive QA commented on HIVE-3286:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619277/HIVE-3286.17.patch.txt

{color:green}SUCCESS:{color} +1 4796 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/688/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/688/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619277

 Explicit skew join on user provided condition
 -

 Key: HIVE-3286
 URL: https://issues.apache.org/jira/browse/HIVE-3286
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: D4287.11.patch, HIVE-3286.12.patch.txt, 
 HIVE-3286.13.patch.txt, HIVE-3286.14.patch.txt, HIVE-3286.15.patch.txt, 
 HIVE-3286.16.patch.txt, HIVE-3286.17.patch.txt, HIVE-3286.D4287.10.patch, 
 HIVE-3286.D4287.5.patch, HIVE-3286.D4287.6.patch, HIVE-3286.D4287.7.patch, 
 HIVE-3286.D4287.8.patch, HIVE-3286.D4287.9.patch


 Join operation on table with skewed data takes most of execution time 
 handling the skewed keys. But mostly we already know about that and even know 
 what is look like the skewed keys.
 If we can explicitly assign reducer slots for the skewed keys, total 
 execution time could be greatly shortened.
 As for a start, I've extended join grammar something like this.
 {code}
 select * from src a join src b on a.key=b.key skew on (a.key+1  50, a.key+1 
  100, a.key  150);
 {code}
 which means if above query is executed by 20 reducers, one reducer for 
 a.key+1  50, one reducer for 50 = a.key+1  100, one reducer for 99 = 
 a.key  150, and 17 reducers for others (could be extended to assign more 
 than one reducer later)
 This can be only used with common-inner-equi joins. And skew condition should 
 be composed of join keys only.
 Work till done now will be updated shortly after code cleanup.
 
 Skew expressions* in SKEW ON (expr, expr, ...) are evaluated sequentially 
 at runtime, and first 'true' one decides skew group for the row. Each skew 
 group has reserved partition slot(s), to which all rows in a group would be 
 assigned. 
 The number of partition slot reserved for each group is decided also at 
 runtime by simple calculation of percentage. If a skew group is CLUSTER BY 
 20 PERCENT and total partition slot (=number of reducer) is 20, that group 
 will reserve 4 partition slots, etc.
 DISTRIBUTE BY decides how the rows in a group is dispersed in the range of 
 reserved slots (If there is only one slot for a group, this is meaningless). 
 Currently, three distribution policies are available: RANDOM, KEYS, 
 expression. 
 1. RANDOM : rows of driver** alias are dispersed by random and rows of 
 non-driver alias are duplicated for all the slots (default if not specified)
 2. KEYS : determined by hash value of keys (same with previous)
 3. expression : determined by hash of object evaluated by user-provided 
 expression
 Only possible with inner, equi, common-joins. Not yet supports join tree 
 merging.
 Might be used by other RS users like SORT BY or GROUP BY
 If there exists column statistics for the key, it could be possible to apply 
 automatically.
 For example, if 20 reducers are used for the query below,
 {code}
 select count(*) from src a join src b on a.key=b.key skew on (
a.key = '0' CLUSTER BY 10 PERCENT,
b.key  '100' CLUSTER BY 20 PERCENT DISTRIBUTE BY upper(b.key),
cast(a.key as int)  300 CLUSTER BY 40 PERCENT DISTRIBUTE BY KEYS);
 {code}
 group-0 will reserve slots 6~7, group-1 8~11, group-2 12~19 and others will 
 reserve slots 0~5.
 For a row with key='0' from alias a, the row is randomly assigned in the 
 range of 6~7 (driver alias) : 6 or 7
 For a row with key='0' from alias b, the row is disributed for all slots in 
 6~7 (non-driver alias) : 6 and 7
 For a row with key='50', the row is assigned in the range of 8~11 by hashcode 
 of upper(b.key) : 8 + (hash(upper(key)) % 4)
 For a row with key='500', the row is assigned in the range of 12~19 by 
 hashcode of join key : 12 + (hash(key) % 8)
 For a row with key='200', this is not belong to any skew group : hash(key) % 6
 *expressions in skew condition : 
 1. all expressions should be made of expression in join condition, which 
 means if join condition is a.key=b.key, user can make any

Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF

2013-12-18 Thread Jason Dere


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15654/#review30610
---



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java
https://reviews.apache.org/r/15654/#comment58620

Having gone through some pain with Hive on Windows, the bytes returned by 
String.getBytes() will not be in utf-8 if the default encoding is something 
other than utf-8. Would be safer here to either use getBytes(UTF-8), or 
Text.encode() if you want to get bytes from the string. Or just do the padding 
as Strings.


- Jason Dere


On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15654/
 ---
 
 (Updated Dec. 18, 2013, 3:16 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra 
 Pandey.
 
 
 Bugs: HIVE-5829
 https://issues.apache.org/jira/browse/HIVE-5829
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Rewrite the UDFS *pads and *trim using GenericUDF.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 bca1f26 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
  eff251f 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/15654/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mohammad Islam

[jira] [Commented] (HIVE-5783) Native Parquet Support in Hive

2013-12-18 Thread Justin Coffey (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851661#comment-13851661
 ] 

Justin Coffey commented on HIVE-5783:
-

Yes this is true.  We are refactoring to merge the whole parquet-hive project 
into hive.  There are a couple of folks involved at this point and so it's 
taking a smidgen extra time what with holidays and all.

 Native Parquet Support in Hive
 --

 Key: HIVE-5783
 URL: https://issues.apache.org/jira/browse/HIVE-5783
 Project: Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Reporter: Justin Coffey
Assignee: Justin Coffey
Priority: Minor
 Attachments: HIVE-5783.patch, hive-0.11-parquet.patch


 Problem Statement:
 Hive would be easier to use if it had native Parquet support. Our 
 organization, Criteo, uses Hive extensively. Therefore we built the Parquet 
 Hive integration and would like to now contribute that integration to Hive.
 About Parquet:
 Parquet is a columnar storage format for Hadoop and integrates with many 
 Hadoop ecosystem tools such as Thrift, Avro, Hadoop MapReduce, Cascading, 
 Pig, Drill, Crunch, and Hive. Pig, Crunch, and Drill all contain native 
 Parquet integration.
 Changes Details:
 Parquet was built with dependency management in mind and therefore only a 
 single Parquet jar will be added as a dependency.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Hive-trunk-h0.21 - Build # 2511 - Still Failing

Changes for Build #2473
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #2474
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #2475
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #2476
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #2477

Changes for Build #2478

Changes for Build #2479

Changes for Build #2480
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #2481

Changes for Build #2482
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #2483
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #2484
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #2485
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #2486
[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey Shelukhin via Eric Hanson)

[hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via 
Xuefu)

[brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles 
(Szehon Ho via Brock Noland)

[brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock 
Noland reviewed by Navis)

[brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh 
Chauhan)


Changes for Build #2487
[hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter 
mechanism (Ashutosh Chauhan via Navis)

[xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock)

[thejas] HIVE-5550 : Import fails for tables created with default text, 
sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas 
Nair)


Changes for Build #2488
[hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis 
via Ashutosh Chauhan)

[navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis)

[hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates 
for join, limit and filter operator (Prasanth J via Harish Butani)

[rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns 
(Jason Dere via Harish Butani)


Changes for Build #2489
[xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values 
(Szehon via Xuefu, reviewed by Navis)

[brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by 
Prasad and Thejas)

[hashutosh] HIVE-5909 : locate and instr throw 
java.nio.BufferUnderflowException when empty string as substring (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-5686 : partition column type validation doesn't quite work for 
dates (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey 
Shelukhin via Ashutosh Chauhan)


Changes for Build #2490

Changes for Build #2491

Changes for Build #2492
[brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by 
Prasad)


Changes for Build #2493
[xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate 
precision/scale for decimal types (reviewed by Sergey Shelukhin)

[hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath 
Pandey via Ashutosh Chauhan)

[hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery 
contains nulls in matching column (Harish Butani

Re: Review Request 16172: ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-18 Thread Yin Huai


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16172/#review30612
---



ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java
https://reviews.apache.org/r/16172/#comment58622

Seems it is not an error? If so, let's not put it in the ErrorMsg.



ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java
https://reviews.apache.org/r/16172/#comment58623

Is this one necessary?


- Yin Huai


On Dec. 18, 2013, 5:04 a.m., Navis Ryu wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16172/
 ---
 
 (Updated Dec. 18, 2013, 5:04 a.m.)
 
 
 Review request for hive.
 
 
 Bugs: HIVE-5945
 https://issues.apache.org/jira/browse/HIVE-5945
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Here is an example
 {code}
 select
i_item_id,
s_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 FROM store_sales
 JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
 JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
 JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
 customer_demographics.cd_demo_sk)
 JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
 where
cd_gender = 'F' and
cd_marital_status = 'U' and
cd_education_status = 'Primary' and
d_year = 2002 and
s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
 group by
i_item_id,
s_state
 order by
i_item_id,
s_state
 limit 100;
 {\code}
 I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
 jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
 date_dim) and 3 MR job (for reduce joins.)
 
 So, I checked the conditional task determining the plan of the join involving 
 item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
 aliasToFileSizeMap contains all input tables used in this query and the 
 intermediate table generated by joining store_sales and date_dim. So, when we 
 sum the size of all small tables, the size of store_sales (which is around 
 45GB in my test) will be also counted.  
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/ErrorMsg.java 45acc2b 
   ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9afc80b 
   
 ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/CommonJoinTaskDispatcher.java
  2efa7c2 
   
 ql/src/java/org/apache/hadoop/hive/ql/plan/ConditionalResolverCommonJoin.java 
 faf2f9b 
   
 ql/src/test/org/apache/hadoop/hive/ql/plan/TestConditionalResolverCommonJoin.java
  67203c9 
   ql/src/test/results/clientpositive/auto_join25.q.out 7427239 
   ql/src/test/results/clientpositive/infer_bucket_sort_convert_join.q.out 
 7d06739 
   ql/src/test/results/clientpositive/mapjoin_hook.q.out d60d16e 
 
 Diff: https://reviews.apache.org/r/16172/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Navis Ryu

[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-18 Thread Yin Huai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851756#comment-13851756
 ] 

Yin Huai commented on HIVE-5945:


Two minor comments in the review board.

Two additional comments.
When we find 
{code}
bigTableFileAlias != null
{\code}
can we also log sumOfOthers and the threshold of the size of small tables? So, 
the log entry will show the size of the big table, the total size of other 
small tables, and the threshold of the size of small tables.
Also, can you add a unit test?

Thanks :)

 ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
 tables which are not used in the child of this conditional task.
 -

 Key: HIVE-5945
 URL: https://issues.apache.org/jira/browse/HIVE-5945
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0
Reporter: Yin Huai
Assignee: Navis
Priority: Critical
 Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, 
 HIVE-5945.3.patch.txt


 Here is an example
 {code}
 select
i_item_id,
s_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 FROM store_sales
 JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
 JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
 JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
 customer_demographics.cd_demo_sk)
 JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
 where
cd_gender = 'F' and
cd_marital_status = 'U' and
cd_education_status = 'Primary' and
d_year = 2002 and
s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
 group by
i_item_id,
s_state
 order by
i_item_id,
s_state
 limit 100;
 {\code}
 I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
 jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
 date_dim) and 3 MR job (for reduce joins.)
 So, I checked the conditional task determining the plan of the join involving 
 item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
 aliasToFileSizeMap contains all input tables used in this query and the 
 intermediate table generated by joining store_sales and date_dim. So, when we 
 sum the size of all small tables, the size of store_sales (which is around 
 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.

2013-12-18 Thread Yin Huai (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-5945:
---

Status: Open  (was: Patch Available)

 ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those 
 tables which are not used in the child of this conditional task.
 -

 Key: HIVE-5945
 URL: https://issues.apache.org/jira/browse/HIVE-5945
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.9.0, 0.8.0, 0.13.0
Reporter: Yin Huai
Assignee: Navis
Priority: Critical
 Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, 
 HIVE-5945.3.patch.txt


 Here is an example
 {code}
 select
i_item_id,
s_state,
avg(ss_quantity) agg1,
avg(ss_list_price) agg2,
avg(ss_coupon_amt) agg3,
avg(ss_sales_price) agg4
 FROM store_sales
 JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk)
 JOIN item on (store_sales.ss_item_sk = item.i_item_sk)
 JOIN customer_demographics on (store_sales.ss_cdemo_sk = 
 customer_demographics.cd_demo_sk)
 JOIN store on (store_sales.ss_store_sk = store.s_store_sk)
 where
cd_gender = 'F' and
cd_marital_status = 'U' and
cd_education_status = 'Primary' and
d_year = 2002 and
s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL')
 group by
i_item_id,
s_state
 order by
i_item_id,
s_state
 limit 100;
 {\code}
 I turned off noconditionaltask. So, I expected that there will be 4 Map-only 
 jobs for this query. However, I got 1 Map-only job (joining strore_sales and 
 date_dim) and 3 MR job (for reduce joins.)
 So, I checked the conditional task determining the plan of the join involving 
 item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, 
 aliasToFileSizeMap contains all input tables used in this query and the 
 intermediate table generated by joining store_sales and date_dim. So, when we 
 sum the size of all small tables, the size of store_sales (which is around 
 45GB in my test) will be also counted.  



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6050) JDBC backward compatibility is broken

2013-12-18 Thread Brock Noland (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-6050:
---

Priority: Blocker  (was: Major)

I think this should be a blocker. Changed to blocker status.

 JDBC backward compatibility is broken
 -

 Key: HIVE-6050
 URL: https://issues.apache.org/jira/browse/HIVE-6050
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Reporter: Szehon Ho
Priority: Blocker

 Connect from JDBC driver of Hive 0.13 (TProtocolVersion=v4) to HiveServer2 of 
 Hive 0.10 (TProtocolVersion=v1), will return the following exception:
 {noformat}
 java.sql.SQLException: Could not establish connection to 
 jdbc:hive2://localhost:1/default: Required field 'client_protocol' is 
 unset! Struct:TOpenSessionReq(client_protocol:null)
   at 
 org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:336)
   at org.apache.hive.jdbc.HiveConnection.init(HiveConnection.java:158)
   at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:105)
   at java.sql.DriverManager.getConnection(DriverManager.java:571)
   at java.sql.DriverManager.getConnection(DriverManager.java:187)
   at 
 org.apache.hive.jdbc.MyTestJdbcDriver2.getConnection(MyTestJdbcDriver2.java:73)
   at 
 org.apache.hive.jdbc.MyTestJdbcDriver2.lt;initgt;(MyTestJdbcDriver2.java:49)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.createTest(BlockJUnit4ClassRunner.java:187)
   at 
 org.junit.runners.BlockJUnit4ClassRunner$1.runReflectiveCall(BlockJUnit4ClassRunner.java:236)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.methodBlock(BlockJUnit4ClassRunner.java:233)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
   at junit.framework.JUnit4TestAdapter.run(JUnit4TestAdapter.java:39)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:523)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:1063)
   at 
 org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:914)
 Caused by: org.apache.thrift.TApplicationException: Required field 
 'client_protocol' is unset! Struct:TOpenSessionReq(client_protocol:null)
   at 
 org.apache.thrift.TApplicationException.read(TApplicationException.java:108)
   at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Client.recv_OpenSession(TCLIService.java:160)
   at 
 org.apache.hive.service.cli.thrift.TCLIService$Client.OpenSession(TCLIService.java:147)
   at 
 org.apache.hive.jdbc.HiveConnection.openSession(HiveConnection.java:327)
   ... 37 more
 {noformat}
 On code analysis, it looks like the 'client_protocol' scheme is a ThriftEnum, 
 which doesn't seem to be backward-compatible.  Look at the code path in the 
 generated file 'TOpenSessionReq.java', method 
 TOpenSessionReqStandardScheme.read():
 1. The method will call 'TProtocolVersion.findValue()' on the thrift 
 protocol's byte stream, which returns null if the client is sending an enum 
 value unknown to the server.  (v4 is unknown to server)
 2. The method will then call struct.validate(), which will throw the above 
 exception because of null version.  
 So doesn't look like the current backward-compatibility scheme will work.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Hive-trunk-hadoop2 - Build # 610 - Still Failing

Changes for Build #572
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #573
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #574
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #575
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #576

Changes for Build #577

Changes for Build #578

Changes for Build #579
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #580
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #581
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #582
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #583
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #584
[thejas] HIVE-5550 : Import fails for tables created with default text, 
sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas 
Nair)

[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey Shelukhin via Eric Hanson)

[hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via 
Xuefu)

[brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles 
(Szehon Ho via Brock Noland)

[brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock 
Noland reviewed by Navis)

[brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh 
Chauhan)


Changes for Build #585
[hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter 
mechanism (Ashutosh Chauhan via Navis)

[xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock)


Changes for Build #586
[hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis 
via Ashutosh Chauhan)

[navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis)

[hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates 
for join, limit and filter operator (Prasanth J via Harish Butani)

[rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns 
(Jason Dere via Harish Butani)


Changes for Build #587
[xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values 
(Szehon via Xuefu, reviewed by Navis)

[brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by 
Prasad and Thejas)

[hashutosh] HIVE-5909 : locate and instr throw 
java.nio.BufferUnderflowException when empty string as substring (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-5686 : partition column type validation doesn't quite work for 
dates (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey 
Shelukhin via Ashutosh Chauhan)


Changes for Build #588

Changes for Build #589

Changes for Build #590
[brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by 
Prasad)


Changes for Build #591
[xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate 
precision/scale for decimal types (reviewed by Sergey Shelukhin)

[hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath 
Pandey via Ashutosh Chauhan)

[hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery 
contains nulls in matching column (Harish Butani via Ashutosh Chauhan)

[hashutosh] HIVE-5598 :

[jira] [Commented] (HIVE-5958) SQL std auth - disable statements that work with paths

2013-12-18 Thread Brock Noland (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851777#comment-13851777
 ] 

Brock Noland commented on HIVE-5958:


Hi Thejas,

I feel disabling all commands which have a path will result in an unusable 
system since so many users user LOAD DATA ... and external tables. You agree 
that URI is essential:

bq. URI authorization is very essential 
[source|https://issues.apache.org/jira/browse/HIVE-5837?focusedCommentId=13850885page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13850885]

Therefore I assume in the first pass means in the very first iteration and 
that URI support will be supported in the first release?


 SQL std auth - disable statements that work with paths
 --

 Key: HIVE-5958
 URL: https://issues.apache.org/jira/browse/HIVE-5958
 Project: Hive
  Issue Type: Sub-task
  Components: Authorization
Reporter: Thejas M Nair
   Original Estimate: 24h
  Remaining Estimate: 24h

 In the first pass, statement such as create table, alter table that specify 
 an path uri will get an authorization error under SQL std auth .



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5065) Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask

2013-12-18 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-5065:
-

Attachment: HIVE-5065-part-5.1.patch

 Create proper (i.e.: non .q file based) junit tests for DagUtils and TezTask
 

 Key: HIVE-5065
 URL: https://issues.apache.org/jira/browse/HIVE-5065
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Blocker
 Fix For: tez-branch

 Attachments: HIVE-5065-part-1.1.patch, HIVE-5065-part-3.1.patch, 
 HIVE-5065-part-4.1.patch, HIVE-5065-part-4.2.patch, HIVE-5065-part-5.1.patch, 
 HIVE-5065-part2.1.patch






--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16299: HIVE-6013: Supporting Quoted Identifiers in Column Names

2013-12-18 Thread Harish Butani


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16299/
---

(Updated Dec. 18, 2013, 5:42 p.m.)


Review request for hive, Ashutosh Chauhan and Alan Gates.


Bugs: HIVE-6013
https://issues.apache.org/jira/browse/HIVE-6013


Repository: hive-git


Description
---

Hive's current behavior on Quoted Identifiers is different from the normal 
interpretation. Quoted Identifier (using backticks) has a special 
interpretation for Select expressions(as Regular Expressions). Have documented 
current behavior and proposed a solution in attached doc.
Summary of solution is:
Introduce 'standard' quoted identifiers for columns only.
At the langauage level this is turned on by a flag.
At the metadata level we relax the constraint on column names.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 
  itests/qtest/pom.xml 971c5d3 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
5b75ef3 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java eb26e7f 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java dbf3f91 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ed9917d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java 1e6826f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b9cd65c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 8fe2262 
  ql/src/test/queries/clientnegative/invalid_columns.q f8be8c8 
  ql/src/test/queries/clientpositive/quotedid_alter.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_basic.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_partition.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_skew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_smb.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_tblproperty.q PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_columns.q.out 3311b0a 
  ql/src/test/results/clientpositive/quotedid_alter.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_basic.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_partition.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_smb.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_tblproperty.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/16299/diff/


Testing
---

added new tests for create, alter, delete, query with columns containing 
special characters.
Tests start with quotedid


Thanks,

Harish Butani

Re: Review Request 16299: HIVE-6013: Supporting Quoted Identifiers in Column Names

2013-12-18 Thread Harish Butani



 On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote:
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java, line 872
  https://reviews.apache.org/r/16299/diff/2/?file=398469#file398469line872
 
  class PatternValidator was recently introduced in HiveConf, which 
  doesn't let user to specify invalid value for a config key. Using that here 
  will be useful.

done


 On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote:
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java, 
  line 484
  https://reviews.apache.org/r/16299/diff/2/?file=398471#file398471line484
 
  Shall we remove this  if() altogether and thus also above newly 
  introduced method?

i kept the validateColumnName method around in case we decide to change the 
validation logic in the future.
But if you feel strongly about it, i will remove it.


 On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java, line 283
  https://reviews.apache.org/r/16299/diff/2/?file=398472#file398472line283
 
  conf should be null here. If it is null, then its a bug. Also, 
  returning null in those cases seems incorrect. Lets remove this null conf 
  check.

done


 On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g, line 34
  https://reviews.apache.org/r/16299/diff/2/?file=398475#file398475line34
 
  There can never be the case that hiveconf == null. That will be a bug. 
  Lets remove this null check.

done


 On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g, line 400
  https://reviews.apache.org/r/16299/diff/2/?file=398475#file398475line400
 
  It will be good to document where all Identifier is used. Can be lifted 
  straight from your html document.
 

done


 On Dec. 18, 2013, 12:59 a.m., Ashutosh Chauhan wrote:
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g, line 403
  https://reviews.apache.org/r/16299/diff/2/?file=398475#file398475line403
 
  Good to add a note here saying QuotedIdentifier only optionally 
  available for columns as of now.

done


- Harish


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16299/#review30570
---


On Dec. 18, 2013, 5:42 p.m., Harish Butani wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16299/
 ---
 
 (Updated Dec. 18, 2013, 5:42 p.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Alan Gates.
 
 
 Bugs: HIVE-6013
 https://issues.apache.org/jira/browse/HIVE-6013
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 Introduce 'standard' quoted identifiers for columns only.
 At the langauage level this is turned on by a flag.
 At the metadata level we relax the constraint on column names.
 
 
 Diffs
 -
 
   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 
   itests/qtest/pom.xml 971c5d3 
   metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
 5b75ef3 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java eb26e7f 
   ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b 
   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 
 dbf3f91 
   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ed9917d 
   ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java 1e6826f 
   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b9cd65c 
   ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 8fe2262 
   ql/src/test/queries/clientnegative/invalid_columns.q f8be8c8 
   ql/src/test/queries/clientpositive/quotedid_alter.q PRE-CREATION 
   ql/src/test/queries/clientpositive/quotedid_basic.q PRE-CREATION 
   ql/src/test/queries/clientpositive/quotedid_partition.q PRE-CREATION 
   ql/src/test/queries/clientpositive/quotedid_skew.q PRE-CREATION 
   ql/src/test/queries/clientpositive/quotedid_smb.q PRE-CREATION 
   ql/src/test/queries/clientpositive/quotedid_tblproperty.q PRE-CREATION 
   ql/src/test/results/clientnegative/invalid_columns.q.out 3311b0a 
   ql/src/test/results/clientpositive/quotedid_alter.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/quotedid_basic.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/quotedid_partition.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/quotedid_skew.q.out PRE-CREATION

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Attachment: HIVE-6013.4.patch

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names


[ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851966#comment-13851966
 ] 

Harish Butani commented on HIVE-6013:
-

1. there is a .q that covers this: regex_col.q
2. Oh, yes. I forgot about the jdbc metadata apis. Just looked at the jdbc 
project. Currently a lot of the methods in the DBMetadata class just throw  
SQLException(Method not supported).  Who should I talk to about this?  Can 
this be done in a followup jira.

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names

2013-12-18 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851972#comment-13851972
 ] 

Ashutosh Chauhan commented on HIVE-6013:


1. But that test case doesn't have {{set 
hive.support.quoted.identifiers=column;}} 
2. Doing jdbc changes in follow-up is ok.

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names


[ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13851982#comment-13851982
 ] 

Harish Butani commented on HIVE-6013:
-

You mean set hive.support.quoted.identifiers=none; right?
With it is set to 'column' it will treat it as a literal.
And the query would be: select `a.*` from t1;
You need the back-ticks. Otherwise this will not get past the lexer.

Since 'none' is default setting, i thought the exiting test was enough. 


 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Hive-trunk-hadoop2 - Build # 611 - Still Failing

Changes for Build #572
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #573
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #574
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #575
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #576

Changes for Build #577

Changes for Build #578

Changes for Build #579
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #580
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #581
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #582
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #583
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #584
[thejas] HIVE-5550 : Import fails for tables created with default text, 
sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas 
Nair)

[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey Shelukhin via Eric Hanson)

[hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via 
Xuefu)

[brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles 
(Szehon Ho via Brock Noland)

[brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock 
Noland reviewed by Navis)

[brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh 
Chauhan)


Changes for Build #585
[hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter 
mechanism (Ashutosh Chauhan via Navis)

[xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock)


Changes for Build #586
[hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis 
via Ashutosh Chauhan)

[navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis)

[hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates 
for join, limit and filter operator (Prasanth J via Harish Butani)

[rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns 
(Jason Dere via Harish Butani)


Changes for Build #587
[xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values 
(Szehon via Xuefu, reviewed by Navis)

[brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by 
Prasad and Thejas)

[hashutosh] HIVE-5909 : locate and instr throw 
java.nio.BufferUnderflowException when empty string as substring (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-5686 : partition column type validation doesn't quite work for 
dates (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey 
Shelukhin via Ashutosh Chauhan)


Changes for Build #588

Changes for Build #589

Changes for Build #590
[brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by 
Prasad)


Changes for Build #591
[xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate 
precision/scale for decimal types (reviewed by Sergey Shelukhin)

[hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath 
Pandey via Ashutosh Chauhan)

[hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery 
contains nulls in matching column (Harish Butani via Ashutosh Chauhan)

[hashutosh] HIVE-5598 :

Hive-trunk-h0.21 - Build # 2512 - Still Failing

Changes for Build #2473
[brock] HIVE-4741 - Add Hive config API to modify the restrict list (Prasad 
Mujumdar, Navis via Brock Noland)


Changes for Build #2474
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #2475
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #2476
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #2477

Changes for Build #2478

Changes for Build #2479

Changes for Build #2480
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #2481

Changes for Build #2482
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #2483
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #2484
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #2485
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #2486
[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey Shelukhin via Eric Hanson)

[hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via 
Xuefu)

[brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles 
(Szehon Ho via Brock Noland)

[brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock 
Noland reviewed by Navis)

[brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh 
Chauhan)


Changes for Build #2487
[hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter 
mechanism (Ashutosh Chauhan via Navis)

[xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock)

[thejas] HIVE-5550 : Import fails for tables created with default text, 
sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas 
Nair)


Changes for Build #2488
[hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis 
via Ashutosh Chauhan)

[navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis)

[hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates 
for join, limit and filter operator (Prasanth J via Harish Butani)

[rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns 
(Jason Dere via Harish Butani)


Changes for Build #2489
[xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values 
(Szehon via Xuefu, reviewed by Navis)

[brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by 
Prasad and Thejas)

[hashutosh] HIVE-5909 : locate and instr throw 
java.nio.BufferUnderflowException when empty string as substring (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-5686 : partition column type validation doesn't quite work for 
dates (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey 
Shelukhin via Ashutosh Chauhan)


Changes for Build #2490

Changes for Build #2491

Changes for Build #2492
[brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by 
Prasad)


Changes for Build #2493
[xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate 
precision/scale for decimal types (reviewed by Sergey Shelukhin)

[hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath 
Pandey via Ashutosh Chauhan)

[hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery 
contains nulls in matching column (Harish Butani

[jira] [Created] (HIVE-6055) Cleanup aisle tez

2013-12-18 Thread Gunther Hagleitner (JIRA)

Gunther Hagleitner created HIVE-6055:


 Summary: Cleanup aisle tez
 Key: HIVE-6055
 URL: https://issues.apache.org/jira/browse/HIVE-6055
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch


Some of the past merges have led to some dead code. Need to remove this from 
the tez branch.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6021) Problem in GroupByOperator for handling distinct aggrgations


 [ 
https://issues.apache.org/jira/browse/HIVE-6021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6021:
--

   Resolution: Fixed
Fix Version/s: 0.13.0
   Status: Resolved  (was: Patch Available)

Patch committed to trunk. Thank Sun Rui for the contribution.

 Problem in GroupByOperator for handling distinct aggrgations
 

 Key: HIVE-6021
 URL: https://issues.apache.org/jira/browse/HIVE-6021
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Sun Rui
Assignee: Sun Rui
 Fix For: 0.13.0

 Attachments: HIVE-6021.1.patch, HIVE-6021.2.patch


 Use the following test case with HIVE 0.12:
 {code:sql}
 create table src(key int, value string);
 load data local inpath 'src/data/files/kv1.txt' overwrite into table src;
 set hive.map.aggr=false; 
 select count(key),count(distinct value) from src group by key;
 {code}
 We will get an ArrayIndexOutOfBoundsException from GroupByOperator:
 {code}
 java.lang.RuntimeException: Error in configuring object
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
   at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:485)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at 
 org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
   ... 5 more
 Caused by: java.lang.RuntimeException: Reduce operator initialization failed
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:159)
   ... 10 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
   at 
 org.apache.hadoop.hive.ql.exec.GroupByOperator.initializeOp(GroupByOperator.java:281)
   at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:377)
   at 
 org.apache.hadoop.hive.ql.exec.mr.ExecReducer.configure(ExecReducer.java:152)
   ... 10 more
 {code}
 explain select count(key),count(distinct value) from src group by key;
 {code}
 STAGE PLANS:
   Stage: Stage-1
 Map Reduce
   Alias - Map Operator Tree:
 src 
   TableScan
 alias: src
 Select Operator
   expressions:
 expr: key
 type: int
 expr: value
 type: string
   outputColumnNames: key, value
   Reduce Output Operator
 key expressions:
   expr: key
   type: int
   expr: value
   type: string
 sort order: ++
 Map-reduce partition columns:
   expr: key
   type: int
 tag: -1
   Reduce Operator Tree:
 Group By Operator
   aggregations:
 expr: count(KEY._col0)   // The parameter causes this problem
^^^
 expr: count(DISTINCT KEY._col1:0._col0)
   bucketGroup: false
   keys:
 expr: KEY._col0
 type: int
   mode: complete
   outputColumnNames: _col0, _col1, _col2
   Select Operator
 expressions:
   expr: _col1
   type: bigint
   expr: _col2
   type: bigint
 outputColumnNames: _col0, _col1
 File Output Operator
   compressed: false
   GlobalTableId: 0
   table:
   input format: org.apache.hadoop.mapred.TextInputFormat
   output format: 
 org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
   Stage: Stage-0
 Fetch Operator
   limit: -1
 {code}
 The root cause is within GroupByOperator.initializeOp(). The method forgets 
 to handle the case:
 For a query has distinct aggregations, there is an aggregation function has a 
 parameter which is a groupby key column but not distinct key column.
 {code}
 if (unionExprEval != null) {
   String[] names =

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names


[ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852018#comment-13852018
 ] 

Hive QA commented on HIVE-6013:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619370/HIVE-6013.4.patch

{color:red}ERROR:{color} -1 due to 48 failed/errored test(s), 4795 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_or_replace_view
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_create_view_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_database_drop
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_formatted_view_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_describe_formatted_view_partitioned_json
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auth
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_empty
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_file_format
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_mult_tables_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_multiple
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_self_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_unused
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_auto_update
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_rc
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_2
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compact_binary_search
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_compression
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_creation
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_serde
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_stale_partitioned
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_infer_bucket_sort_convert_join
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_input3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ql_rewrite_gbtoidx
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_edge_cases
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_indexes_syntax
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_view
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_as_select_with_partition
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_failure4
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_alter_view_failure6
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_create_or_replace_view1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_create_or_replace_view2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_bitmap_no_map_aggr
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_entry_limit
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_index_compact_size_limit
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/689/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/689/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 48 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619370

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013

[jira] [Updated] (HIVE-6055) Cleanup aisle tez

2013-12-18 Thread Gunther Hagleitner (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-6055:
-

Attachment: HIVE-6055.1.patch

 Cleanup aisle tez
 -

 Key: HIVE-6055
 URL: https://issues.apache.org/jira/browse/HIVE-6055
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-6055.1.patch


 Some of the past merges have led to some dead code. Need to remove this from 
 the tez branch.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5761) Implement vectorized support for the DATE data type

2013-12-18 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852027#comment-13852027
 ] 

Eric Hanson commented on HIVE-5761:
---

Please see my comments on ReviewBoard

 Implement vectorized support for the DATE data type
 ---

 Key: HIVE-5761
 URL: https://issues.apache.org/jira/browse/HIVE-5761
 Project: Hive
  Issue Type: Sub-task
Reporter: Eric Hanson
Assignee: Teddy Choi
 Attachments: HIVE-5761.1.patch


 Add support to allow queries referencing DATE columns and expression results 
 to run efficiently in vectorized mode. This should re-use the code for the 
 the integer/timestamp types to the extent possible and beneficial. Include 
 unit tests and end-to-end tests. Consider re-using or extending existing 
 end-to-end tests for vectorized integer and/or timestamp operations.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6045) Beeline hivevars is broken for more than one hivevar


[ 
https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852102#comment-13852102
 ] 

Xuefu Zhang commented on HIVE-6045:
---

In addition, please add a test case if it's not already in TestBeelineWithArgs.

 Beeline hivevars is broken for more than one hivevar
 

 Key: HIVE-6045
 URL: https://issues.apache.org/jira/browse/HIVE-6045
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-6045.1.patch, HIVE-6045.patch


 HIVE-4568 introduced --hivevar flag.  But if you specify more than one 
 hivevar, for example 
 {code}
 beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2
 {code}
 then the variables during runtime get mangled to evaluate to:
 {code}
 file1=/user/szehon/file1file2=/user/szehon/file2
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6045) Beeline hivevars is broken for more than one hivevar


[ 
https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852101#comment-13852101
 ] 

Xuefu Zhang commented on HIVE-6045:
---

[~szehon], I believe this (with '' as separator) used to work fine. Do you 
know what change broke this?

TestBeelineWithArgs used to pass also. I think it makes sense to fix it and 
include it in the test suite?

 Beeline hivevars is broken for more than one hivevar
 

 Key: HIVE-6045
 URL: https://issues.apache.org/jira/browse/HIVE-6045
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-6045.1.patch, HIVE-6045.patch


 HIVE-4568 introduced --hivevar flag.  But if you specify more than one 
 hivevar, for example 
 {code}
 beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2
 {code}
 then the variables during runtime get mangled to evaluate to:
 {code}
 file1=/user/szehon/file1file2=/user/szehon/file2
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Created] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table

Rushil Gupta created HIVE-6056:
--

 Summary: The AvroSerDe gives out BadSchemaException if a partition 
is added to the table
 Key: HIVE-6056
 URL: https://issues.apache.org/jira/browse/HIVE-6056
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.11.0
 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5
Reporter: Rushil Gupta


While creating an external table if I do not add a partition, I am able to read 
files using following format: 

CREATE external TABLE event
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3n://test-event/input/2013/14/10'
TBLPROPERTIES ('avro.schema.literal' = '..some schema..');

but if I add a partition based on date

CREATE external TABLE event
PARTITIONED BY (ds STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
STORED AS INPUTFORMAT 
'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
LOCATION 's3n://test-event/input/'
TBLPROPERTIES ('avro.schema.literal' = '..some schema..');

ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION 
'2013/12/16/';

I get the following exception:
java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Attachment: HIVE-6013.5.patch

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Status: Patch Available  (was: Open)

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Status: Open  (was: Patch Available)

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16299: HIVE-6013: Supporting Quoted Identifiers in Column Names

2013-12-18 Thread Harish Butani


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16299/
---

(Updated Dec. 18, 2013, 8:38 p.m.)


Review request for hive, Ashutosh Chauhan and Alan Gates.


Changes
---

the null check in HiveUtils.unparseIdentifier is needed.
Most existing invocations(for everything other than columns) invoke the old 
function(line 273), 
which doesn't take a context object.


Bugs: HIVE-6013
https://issues.apache.org/jira/browse/HIVE-6013


Repository: hive-git


Description
---

Hive's current behavior on Quoted Identifiers is different from the normal 
interpretation. Quoted Identifier (using backticks) has a special 
interpretation for Select expressions(as Regular Expressions). Have documented 
current behavior and proposed a solution in attached doc.
Summary of solution is:
Introduce 'standard' quoted identifiers for columns only.
At the langauage level this is turned on by a flag.
At the metadata level we relax the constraint on column names.


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 
  itests/qtest/pom.xml 971c5d3 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreUtils.java 
5b75ef3 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveUtils.java eb26e7f 
  ql/src/java/org/apache/hadoop/hive/ql/metadata/Table.java 321759b 
  ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java dbf3f91 
  ql/src/java/org/apache/hadoop/hive/ql/parse/HiveLexer.g ed9917d 
  ql/src/java/org/apache/hadoop/hive/ql/parse/ParseDriver.java 1e6826f 
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java b9cd65c 
  ql/src/java/org/apache/hadoop/hive/ql/parse/UnparseTranslator.java 8fe2262 
  ql/src/test/queries/clientnegative/invalid_columns.q f8be8c8 
  ql/src/test/queries/clientpositive/quotedid_alter.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_basic.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_partition.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_skew.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_smb.q PRE-CREATION 
  ql/src/test/queries/clientpositive/quotedid_tblproperty.q PRE-CREATION 
  ql/src/test/results/clientnegative/invalid_columns.q.out 3311b0a 
  ql/src/test/results/clientpositive/quotedid_alter.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_basic.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_partition.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_skew.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_smb.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/quotedid_tblproperty.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/16299/diff/


Testing
---

added new tests for create, alter, delete, query with columns containing 
special characters.
Tests start with quotedid


Thanks,

Harish Butani

[jira] [Resolved] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table


 [ 
https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushil Gupta resolved HIVE-6056.


Resolution: Fixed

 The AvroSerDe gives out BadSchemaException if a partition is added to the 
 table
 ---

 Key: HIVE-6056
 URL: https://issues.apache.org/jira/browse/HIVE-6056
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.11.0
 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5
Reporter: Rushil Gupta

 While creating an external table if I do not add a partition, I am able to 
 read files using following format: 
 CREATE external TABLE event
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/2013/14/10'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 but if I add a partition based on date
 CREATE external TABLE event
 PARTITIONED BY (ds STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION 
 '2013/12/16/';
 I get the following exception:
 java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table


[ 
https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852157#comment-13852157
 ] 

Rushil Gupta commented on HIVE-6056:


This bug is fixed as a part of hive 0.12 
(https://issues.apache.org/jira/browse/HIVE-3953)

 The AvroSerDe gives out BadSchemaException if a partition is added to the 
 table
 ---

 Key: HIVE-6056
 URL: https://issues.apache.org/jira/browse/HIVE-6056
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.11.0
 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5
Reporter: Rushil Gupta

 While creating an external table if I do not add a partition, I am able to 
 read files using following format: 
 CREATE external TABLE event
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/2013/14/10'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 but if I add a partition based on date
 CREATE external TABLE event
 PARTITIONED BY (ds STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION 
 '2013/12/16/';
 I get the following exception:
 java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6045) Beeline hivevars is broken for more than one hivevar

2013-12-18 Thread Szehon Ho (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-6045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852167#comment-13852167
 ] 

Szehon Ho commented on HIVE-6045:
-

I don't see any immediate changes.  Do you remember if the parsing used to 
expect '' character?   I believe that would be the only way it would have 
worked, but I guess there was never a unit test for this case.

I will take a look at effort to move the test to /itests/hive-unit , there are 
a few tests about null as empty string that seem broken when I ran it, I will 
have to take a look.

 Beeline hivevars is broken for more than one hivevar
 

 Key: HIVE-6045
 URL: https://issues.apache.org/jira/browse/HIVE-6045
 Project: Hive
  Issue Type: Bug
  Components: CLI
Affects Versions: 0.13.0
Reporter: Szehon Ho
Assignee: Szehon Ho
 Attachments: HIVE-6045.1.patch, HIVE-6045.patch


 HIVE-4568 introduced --hivevar flag.  But if you specify more than one 
 hivevar, for example 
 {code}
 beeline --hivevar file1=/user/szehon/file1 --hivevar file2=/user/szehon/file2
 {code}
 then the variables during runtime get mangled to evaluate to:
 {code}
 file1=/user/szehon/file1file2=/user/szehon/file2
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Hive-trunk-h0.21 - Build # 2513 - Still Failing

Changes for Build #2474
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #2475
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #2476
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #2477

Changes for Build #2478

Changes for Build #2479

Changes for Build #2480
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #2481

Changes for Build #2482
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #2483
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #2484
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #2485
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #2486
[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey Shelukhin via Eric Hanson)

[hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via 
Xuefu)

[brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles 
(Szehon Ho via Brock Noland)

[brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock 
Noland reviewed by Navis)

[brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh 
Chauhan)


Changes for Build #2487
[hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter 
mechanism (Ashutosh Chauhan via Navis)

[xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock)

[thejas] HIVE-5550 : Import fails for tables created with default text, 
sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas 
Nair)


Changes for Build #2488
[hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis 
via Ashutosh Chauhan)

[navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis)

[hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates 
for join, limit and filter operator (Prasanth J via Harish Butani)

[rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns 
(Jason Dere via Harish Butani)


Changes for Build #2489
[xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values 
(Szehon via Xuefu, reviewed by Navis)

[brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by 
Prasad and Thejas)

[hashutosh] HIVE-5909 : locate and instr throw 
java.nio.BufferUnderflowException when empty string as substring (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-5686 : partition column type validation doesn't quite work for 
dates (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey 
Shelukhin via Ashutosh Chauhan)


Changes for Build #2490

Changes for Build #2491

Changes for Build #2492
[brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by 
Prasad)


Changes for Build #2493
[xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate 
precision/scale for decimal types (reviewed by Sergey Shelukhin)

[hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath 
Pandey via Ashutosh Chauhan)

[hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery 
contains nulls in matching column (Harish Butani via Ashutosh Chauhan)

[hashutosh] HIVE-5598 : Remove dummy new line at the end of non-sql commands 
(Navis via Ashutosh Chauhan)


Changes

[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates


 [ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostiantyn Kudriavtsev updated HIVE-6006:
-

Status: Open  (was: Patch Available)

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates


 [ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostiantyn Kudriavtsev updated HIVE-6006:
-

Attachment: (was: hive-6006.patch)

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5957) Fix HCatalog Unit tests on Windows

2013-12-18 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852201#comment-13852201
 ] 

Daniel Dai commented on HIVE-5957:
--

Do we still need the change in TestMultiOutputFormat.java? Seems it is for 
debug?

 Fix HCatalog Unit tests on Windows
 --

 Key: HIVE-5957
 URL: https://issues.apache.org/jira/browse/HIVE-5957
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5957.patch


 org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It 
 generates java.lang.IllegalStateException: Failed to setup cluster at 
 org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). 
 Digging further there is 
 Please fix invalid configuration for hbase.rootdir 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase
 java.lang.IllegalArgumentException: Wrong FS: 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, 
 expected: file:///
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 This was fixed in HIVE-5015 (9/3/13) and 
 and got clobbered by HIVE-5261 (9/12/13).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names


[ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852205#comment-13852205
 ] 

Hive QA commented on HIVE-6013:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619403/HIVE-6013.5.patch

{color:green}SUCCESS:{color} +1 4796 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/690/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/690/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619403

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates


 [ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostiantyn Kudriavtsev updated HIVE-6006:
-

Status: Patch Available  (was: Open)

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

 Attachments: hive-6006.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates


 [ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostiantyn Kudriavtsev updated HIVE-6006:
-

Attachment: hive-6006.patch

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

 Attachments: hive-6006.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5957) Fix HCatalog Unit tests on Windows


[ 
https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852240#comment-13852240
 ] 

Eugene Koifman commented on HIVE-5957:
--

[~daijy]Not sure I understand.  It seems like a proper change.

 Fix HCatalog Unit tests on Windows
 --

 Key: HIVE-5957
 URL: https://issues.apache.org/jira/browse/HIVE-5957
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5957.patch


 org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It 
 generates java.lang.IllegalStateException: Failed to setup cluster at 
 org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). 
 Digging further there is 
 Please fix invalid configuration for hbase.rootdir 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase
 java.lang.IllegalArgumentException: Wrong FS: 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, 
 expected: file:///
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 This was fixed in HIVE-5015 (9/3/13) and 
 and got clobbered by HIVE-5261 (9/12/13).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Hive-trunk-hadoop2 - Build # 612 - Still Failing

Changes for Build #573
[navis] HIVE-5827 : Incorrect location of logs for failed tests (Vikram Dixit K 
and Szehon Ho via Navis)

[thejas] HIVE-4485 : beeline prints null as empty strings (Thejas Nair reviewed 
by Ashutosh Chauhan)

[brock] HIVE-5704 - A couple of generic UDFs are not in the right 
folder/package (Xuefu Zhang via Brock Noland)

[brock] HIVE-5706 - Move a few numeric UDFs to generic implementations (Xuefu 
Zhang via Brock Noland)

[hashutosh] HIVE-5817 : column name to index mapping in VectorizationContext is 
broken (Remus Rusanu, Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5876 : Split elimination in ORC breaks for partitioned tables 
(Prasanth J via Ashutosh Chauhan)

[hashutosh] HIVE-5886 : [Refactor] Remove unused class JobCloseFeedback 
(Ashutosh Chauhan via Thejas Nair)

[brock] HIVE-5894 - Fix minor PTest2 issues (Brock Noland)


Changes for Build #574
[brock] HIVE-5755 - Fix hadoop2 execution environment Milestone 1 (Vikram Dixit 
K via Brock Noland)


Changes for Build #575
[xuefu] HIVE-5893: hive-schema-0.13.0.mysql.sql contains reference to 
nonexistent column (Carl via Xuefu)

[xuefu] HIVE-5684: Serde support for char (Jason via Xuefu)


Changes for Build #576

Changes for Build #577

Changes for Build #578

Changes for Build #579
[brock] HIVE-5441 - Async query execution doesn't return resultset status 
(Prasad Mujumdar via Thejas M Nair)

[brock] HIVE-5880 - Rename HCatalog HBase Storage Handler artifact id (Brock 
Noland reviewed by Prasad Mujumdar)


Changes for Build #580
[ehans] HIVE-5581: Implement vectorized year/month/day... etc. for string 
arguments (Teddy Choi via Eric Hanson)


Changes for Build #581
[rhbutani] HIVE-5898 Make fetching of column statistics configurable (Prasanth 
Jayachandran via Harish Butani)


Changes for Build #582
[brock] HIVE-5880 - (Rename HCatalog HBase Storage Handler artifact id) breaks 
packaging (Xuefu Zhang via Brock Noland)


Changes for Build #583
[xuefu] HIVE-5866: Hive divide operator generates wrong results in certain 
cases (reviewed by Prasad)

[ehans] HIVE-5877: Implement vectorized support for IN as boolean-valued 
expression (Eric Hanson)


Changes for Build #584
[thejas] HIVE-5550 : Import fails for tables created with default text, 
sequence and orc file formats using HCatalog API (Sushanth Sowmyan via Thejas 
Nair)

[ehans] HIVE-5895: vectorization handles division by zero differently from 
normal execution (Sergey Shelukhin via Eric Hanson)

[hashutosh] HIVE-5938 : Remove apache.mina dependency for test (Navis via 
Ashutosh Chauhan)

[xuefu] HIVE-5912: Show partition command doesn't support db.table (Yu Zhao via 
Xuefu)

[brock] HIVE-5906 - TestGenericUDFPower should use delta to compare doubles 
(Szehon Ho via Brock Noland)

[brock] HIVE-5855 - Add deprecated methods back to ColumnProjectionUtils (Brock 
Noland reviewed by Navis)

[brock] HIVE-5915 - Shade Kryo dependency (Brock Noland reviewed by Ashutosh 
Chauhan)


Changes for Build #585
[hashutosh] HIVE-5916 : No need to aggregate statistics collected via counter 
mechanism (Ashutosh Chauhan via Navis)

[xuefu] HIVE-5947: Fix test failure in decimal_udf.q (reviewed by Brock)


Changes for Build #586
[hashutosh] HIVE-5935 : hive.query.string is not provided to FetchTask (Navis 
via Ashutosh Chauhan)

[navis] HIVE-3455 : ANSI CORR(X,Y) is incorrect (Maxim Bolotin via Navis)

[hashutosh] HIVE-5921 : Better heuristics for worst case statistics estimates 
for join, limit and filter operator (Prasanth J via Harish Butani)

[rhbutani] HIVE-5899 NPE during explain extended with char/varchar columns 
(Jason Dere via Harish Butani)


Changes for Build #587
[xuefu] HIVE-3181: getDatabaseMajor/Minor version does not return values 
(Szehon via Xuefu, reviewed by Navis)

[brock] HIVE-5641 - BeeLineOpts ignores Throwable (Brock Noland reviewed by 
Prasad and Thejas)

[hashutosh] HIVE-5909 : locate and instr throw 
java.nio.BufferUnderflowException when empty string as substring (Navis via 
Ashutosh Chauhan)

[hashutosh] HIVE-5686 : partition column type validation doesn't quite work for 
dates (Sergey Shelukhin via Ashutosh Chauhan)

[hashutosh] HIVE-5887 : metastore direct sql doesn't work with oracle (Sergey 
Shelukhin via Ashutosh Chauhan)


Changes for Build #588

Changes for Build #589

Changes for Build #590
[brock] HIVE-5981 - Add hive-unit back to itests pom (Brock Noland reviewed by 
Prasad)


Changes for Build #591
[xuefu] HIVE-5872: Make UDAFs such as GenericUDAFSum report accurate 
precision/scale for decimal types (reviewed by Sergey Shelukhin)

[hashutosh] HIVE-5978 : Rollups not supported in vector mode. (Jitendra Nath 
Pandey via Ashutosh Chauhan)

[hashutosh] HIVE-5830 : SubQuery: Not In subqueries should check if subquery 
contains nulls in matching column (Harish Butani via Ashutosh Chauhan)

[hashutosh] HIVE-5598 : Remove dummy new line at the end of non-sql commands 
(Navis via Ashutosh Chauhan)


Changes for Build #592
[hashutosh] HIVE-5982 :

[jira] [Commented] (HIVE-6006) Add UDF to calculate distance between geographic coordinates


[ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852260#comment-13852260
 ] 

Hive QA commented on HIVE-6006:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619415/hive-6006.patch

{color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4796 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_functions
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_haversinedistance
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/691/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/691/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 2 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619415

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

 Attachments: hive-6006.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates


 [ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostiantyn Kudriavtsev updated HIVE-6006:
-

Status: Open  (was: Patch Available)

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6006) Add UDF to calculate distance between geographic coordinates


 [ 
https://issues.apache.org/jira/browse/HIVE-6006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kostiantyn Kudriavtsev updated HIVE-6006:
-

Attachment: (was: hive-6006.patch)

 Add UDF to calculate distance between geographic coordinates
 

 Key: HIVE-6006
 URL: https://issues.apache.org/jira/browse/HIVE-6006
 Project: Hive
  Issue Type: New Feature
  Components: UDF
Affects Versions: 0.13.0
Reporter: Kostiantyn Kudriavtsev
Priority: Minor
 Fix For: 0.13.0

   Original Estimate: 336h
  Remaining Estimate: 336h

 It would be nice to have Hive UDF to calculate distance between two points on 
 Earth. Haversine formula seems to be good enough to overcome this issue
 The next function is proposed:
 HaversineDistance(lat1, lon1, lat2, lon2) - calculate Harvesine Distance 
 between 2 points with coordinates (lat1, lon1) and (lat2, lon2)



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Reopened] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table


 [ 
https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rushil Gupta reopened HIVE-6056:



Thought this was fixed in hive-0.12. However when I tried that, it failed with 
the error: 
java.lang.NoSuchMethodError: 
org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory.getPrimitiveJavaObjectInspector(Lorg/apache/hadoop/hive/serde2/typeinfo/PrimitiveTypeSpec;)Lorg/apache/hadoop/hive/serde2/objectinspector/primitive/AbstractPrimitiveJavaObjectInspector;




 The AvroSerDe gives out BadSchemaException if a partition is added to the 
 table
 ---

 Key: HIVE-6056
 URL: https://issues.apache.org/jira/browse/HIVE-6056
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.11.0
 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5
Reporter: Rushil Gupta

 While creating an external table if I do not add a partition, I am able to 
 read files using following format: 
 CREATE external TABLE event
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/2013/14/10'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 but if I add a partition based on date
 CREATE external TABLE event
 PARTITIONED BY (ds STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION 
 '2013/12/16/';
 I get the following exception:
 java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6056) The AvroSerDe gives out BadSchemaException if a partition is added to the table


[ 
https://issues.apache.org/jira/browse/HIVE-6056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852392#comment-13852392
 ] 

Rushil Gupta commented on HIVE-6056:


Here is the full stacktrace:
at 
org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:95)
at 
org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspector(AvroObjectInspectorGenerator.java:81)
at 
org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.init(AvroObjectInspectorGenerator.java:55)
at 
org.apache.hadoop.hive.serde2.avro.AvroSerDe.initialize(AvroSerDe.java:69)
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:203)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:260)
at 
org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:253)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:490)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:518)
at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:3305)
at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:242)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
at 
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at 
org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:689)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)


 The AvroSerDe gives out BadSchemaException if a partition is added to the 
 table
 ---

 Key: HIVE-6056
 URL: https://issues.apache.org/jira/browse/HIVE-6056
 Project: Hive
  Issue Type: Bug
  Components: Database/Schema
Affects Versions: 0.11.0
 Environment: amazon EMR (hadoop Amazon 1.0.3), avro-1.7.5
Reporter: Rushil Gupta

 While creating an external table if I do not add a partition, I am able to 
 read files using following format: 
 CREATE external TABLE event
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/2013/14/10'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 but if I add a partition based on date
 CREATE external TABLE event
 PARTITIONED BY (ds STRING)
 ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
 STORED AS INPUTFORMAT 
 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
 OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
 LOCATION 's3n://test-event/input/'
 TBLPROPERTIES ('avro.schema.literal' = '..some schema..');
 ALTER TABLE event ADD IF NOT EXISTS PARTITION (ds = '2013_12_16') LOCATION 
 '2013/12/16/';
 I get the following exception:
 java.io.IOException:org.apache.hadoop.hive.serde2.avro.BadSchemaException



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names

[
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852406#comment-13852406
]

Harish Butani commented on HIVE-6013:
-

had a discussion with [~ashutoshc]
Leaning towards setting 'hive.support.quoted.identifiers' to support quoted
identifiers by default.
This is a backward incompatible change.
Assumption is that the regex feature with backticks is a obscure feature; it
makes more sense to have this feature on by default.
Will document the incompatible change.
If anybody strongly disagrees with this, please voice your concerns.

Supporting Quoted Identifiers in Column Names
-

Key: HIVE-6013
URL: https://issues.apache.org/jira/browse/HIVE-6013
Project: Hive
Issue Type: Improvement
Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
Fix For: 0.13.0

Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch,
HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html

Hive's current behavior on Quoted Identifiers is different from the normal
interpretation. Quoted Identifier (using backticks) has a special
interpretation for Select expressions(as Regular Expressions). Have
documented current behavior and proposed a solution in attached doc.
Summary of solution is:
- Introduce 'standard' quoted identifiers for columns only.
- At the langauage level this is turned on by a flag.
- At the metadata level we relax the constraint on column names.

--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16184: Hive should be able to skip header and footer rows when reading data file for a table (HIVE-5795)

2013-12-18 Thread Shuaishuai Nie


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16184/
---

(Updated Dec. 19, 2013, 12:40 a.m.)


Review request for hive, Eric Hanson and Thejas Nair.


Bugs: hive-5795
https://issues.apache.org/jira/browse/hive-5795


Repository: hive-git


Description
---

Hive should be able to skip header and footer rows when reading data file for a 
table
(follow up with review https://reviews.apache.org/r/15663/diff/#index_header)


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 
  conf/hive-default.xml.template 1b30d19 
  data/files/header_footer_table_1/0001.txt PRE-CREATION 
  data/files/header_footer_table_1/0002.txt PRE-CREATION 
  data/files/header_footer_table_1/0003.txt PRE-CREATION 
  data/files/header_footer_table_2/2012/01/01/0001.txt PRE-CREATION 
  data/files/header_footer_table_2/2012/01/02/0002.txt PRE-CREATION 
  data/files/header_footer_table_2/2012/01/03/0003.txt PRE-CREATION 
  itests/qtest/pom.xml 971c5d3 
  ql/src/java/org/apache/hadoop/hive/ql/exec/FetchOperator.java fc9b7e4 
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java 9afc80b 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveContextAwareRecordReader.java 
dd5cb6b 
  ql/src/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java 974a5d6 
  
ql/src/test/org/apache/hadoop/hive/ql/io/TestHiveBinarySearchRecordReader.java 
85dd975 
  ql/src/test/org/apache/hadoop/hive/ql/io/TestSymlinkTextInputFormat.java 
0686d9b 
  ql/src/test/queries/clientnegative/file_with_header_footer_negative.q 
PRE-CREATION 
  ql/src/test/queries/clientpositive/file_with_header_footer.q PRE-CREATION 
  ql/src/test/results/clientnegative/file_with_header_footer_negative.q.out 
PRE-CREATION 
  ql/src/test/results/clientpositive/file_with_header_footer.q.out PRE-CREATION 
  serde/if/serde.thrift 2ceb572 
  
serde/src/gen/thrift/gen-javabean/org/apache/hadoop/hive/serde/serdeConstants.java
 22a6168 

Diff: https://reviews.apache.org/r/16184/diff/


Testing
---


Thanks,

Shuaishuai Nie

[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table

2013-12-18 Thread Shuaishuai Nie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-5795:
-

Attachment: HIVE-5795.3.patch

 Hive should be able to skip header and footer rows when reading data file for 
 a table
 -

 Key: HIVE-5795
 URL: https://issues.apache.org/jira/browse/HIVE-5795
 Project: Hive
  Issue Type: Bug
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch


 Hive should be able to skip header and footer lines when reading data file 
 from table. In this way, user don't need to processing data which generated 
 by other application with a header or footer and directly use the file for 
 table operations.
 To implement this, the idea is adding new properties in table descriptions to 
 define the number of lines in header and footer and skip them when reading 
 the record from record reader. An DDL example for creating a table with 
 header and footer should be like this:
 {code}
 Create external table testtable (name string, message string) row format 
 delimited fields terminated by '\t' lines terminated by '\n' location 
 '/testtable' tblproperties (skip.header.number=1, 
 skip.footer.number=2);
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Incompatible Changes affecting Serdes and UDFS

2013-12-18 Thread Lefty Leverenz

Incompatible in what sense, or with what -- previous releases?

Thanks.

-- Lefty


On Tue, Dec 17, 2013 at 7:06 AM, Brock Noland br...@cloudera.com wrote:

 Hi,

 Hive 0.12 made some incompatible changes which impacts Serdes and it
 appears 0.13 makes more incompatible changes. I created HIVE-6043 to track
 this, if you know of any more changes than what is described there, please
 do add them.

 Thanks!
 Brock

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Status: Open  (was: Patch Available)

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Status: Patch Available  (was: Open)

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6013) Supporting Quoted Identifiers in Column Names


 [ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harish Butani updated HIVE-6013:


Attachment: HIVE-6013.6.patch

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables

2013-12-18 Thread Carl Steinbach (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Steinbach updated HIVE-3159:
-

Status: Open  (was: Patch Available)

 Update AvroSerde to determine schema of new tables
 --

 Key: HIVE-3159
 URL: https://issues.apache.org/jira/browse/HIVE-3159
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Jakob Homan
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159v1.patch


 Currently when writing tables to Avro one must manually provide an Avro 
 schema that matches what is being delivered by Hive. It'd be better to have 
 the serde infer this schema by converting the table's TypeInfo into an 
 appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table

2013-12-18 Thread Eric Hanson (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852448#comment-13852448
 ] 

Eric Hanson commented on HIVE-5795:
---

The coding and comment style looks good now. Thanks.

Did you accidentally pick up some changes to conf/hive-default.xml.template? 
There are many capitalization and spelling changes in there. Maybe you need to 
diff again from a clean latest version of trunk.

 Hive should be able to skip header and footer rows when reading data file for 
 a table
 -

 Key: HIVE-5795
 URL: https://issues.apache.org/jira/browse/HIVE-5795
 Project: Hive
  Issue Type: Bug
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch


 Hive should be able to skip header and footer lines when reading data file 
 from table. In this way, user don't need to processing data which generated 
 by other application with a header or footer and directly use the file for 
 table operations.
 To implement this, the idea is adding new properties in table descriptions to 
 define the number of lines in header and footer and skip them when reading 
 the record from record reader. An DDL example for creating a table with 
 header and footer should be like this:
 {code}
 Create external table testtable (name string, message string) row format 
 delimited fields terminated by '\t' lines terminated by '\n' location 
 '/testtable' tblproperties (skip.header.number=1, 
 skip.footer.number=2);
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely

2013-12-18 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4216:
-

Attachment: HIVE-4216.1.patch

Cool, that did allow the test case to pass.  Attaching patch, which also 
updates hbase_bulk.m to set the right properties settings and to fix some mask 
issues.

 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 
 and test is stuck infinitely
 

 Key: HIVE-4216
 URL: https://issues.apache.org/jira/browse/HIVE-4216
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.9.0
Reporter: Viraj Bhat
 Attachments: HIVE-4216.1.patch


 After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The 
 TestHBaseMinimrCliDriver, fails after performing the following steps
 Update hbase_bulk.m with the following properties
 set mapreduce.totalorderpartitioner.naturalorder=false;
 set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst;
 Otherwise I keep seeing: _partition.lst not found exception in the mappers, 
 even though set total.order.partitioner.path=/tmp/hbpartition.lst is set.
 When the test runs, the 3 reducer phase of the second query fails with the 
 following error, but the MiniMRCluster keeps spinning up new reducer and the 
 test is stuck infinitely.
 {code}
 insert overwrite table hbsort
  select distinct value,
   case when key=103 then cast(null as string) else key end,
   case when key=103 then ''
else cast(key+1 as string) end
  from src
  cluster by value;
 {code}
 The stack trace I see in the syslog for the Node Manager is the following:
 ==
 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
 ... 7 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
 ... 7 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265)
 at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69)
 at 
 org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90)
 at 
 org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67)

[jira] [Updated] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely

2013-12-18 Thread Jason Dere (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dere updated HIVE-4216:
-

Status: Patch Available  (was: Open)

 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 
 and test is stuck infinitely
 

 Key: HIVE-4216
 URL: https://issues.apache.org/jira/browse/HIVE-4216
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.9.0
Reporter: Viraj Bhat
 Attachments: HIVE-4216.1.patch


 After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The 
 TestHBaseMinimrCliDriver, fails after performing the following steps
 Update hbase_bulk.m with the following properties
 set mapreduce.totalorderpartitioner.naturalorder=false;
 set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst;
 Otherwise I keep seeing: _partition.lst not found exception in the mappers, 
 even though set total.order.partitioner.path=/tmp/hbpartition.lst is set.
 When the test runs, the 3 reducer phase of the second query fails with the 
 following error, but the MiniMRCluster keeps spinning up new reducer and the 
 test is stuck infinitely.
 {code}
 insert overwrite table hbsort
  select distinct value,
   case when key=103 then cast(null as string) else key end,
   case when key=103 then ''
else cast(key+1 as string) end
  from src
  cluster by value;
 {code}
 The stack trace I see in the syslog for the Node Manager is the following:
 ==
 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
 ... 7 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
 ... 7 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265)
 at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69)
 at 
 org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90)
 at 
 org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67)
 at 
 org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getHiveRecordWriter(HiveHFileOutputFormat.java:104)
 at

[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely

2013-12-18 Thread Viraj Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852454#comment-13852454
 ] 

Viraj Bhat commented on HIVE-4216:
--

Hi Jason, Andrey and Sheng,
 Thanks for finding the issue and fixing it in the patch. Let me test the patch 
and run it on our test clusters.
Viraj

 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 
 and test is stuck infinitely
 

 Key: HIVE-4216
 URL: https://issues.apache.org/jira/browse/HIVE-4216
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.9.0
Reporter: Viraj Bhat
 Attachments: HIVE-4216.1.patch


 After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The 
 TestHBaseMinimrCliDriver, fails after performing the following steps
 Update hbase_bulk.m with the following properties
 set mapreduce.totalorderpartitioner.naturalorder=false;
 set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst;
 Otherwise I keep seeing: _partition.lst not found exception in the mappers, 
 even though set total.order.partitioner.path=/tmp/hbpartition.lst is set.
 When the test runs, the 3 reducer phase of the second query fails with the 
 following error, but the MiniMRCluster keeps spinning up new reducer and the 
 test is stuck infinitely.
 {code}
 insert overwrite table hbsort
  select distinct value,
   case when key=103 then cast(null as string) else key end,
   case when key=103 then ''
else cast(key+1 as string) end
  from src
  cluster by value;
 {code}
 The stack trace I see in the syslog for the Node Manager is the following:
 ==
 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
 ... 7 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
 ... 7 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265)
 at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69)
 at 
 org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90)
 at 
 org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67)

[jira] [Updated] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely

2013-12-18 Thread Viraj Bhat (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated HIVE-4216:
-

  Environment: Hadoop 23.X
Affects Version/s: 0.11.0
   0.12.0
Fix Version/s: 0.13.0

 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 
 and test is stuck infinitely
 

 Key: HIVE-4216
 URL: https://issues.apache.org/jira/browse/HIVE-4216
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.9.0, 0.11.0, 0.12.0
 Environment: Hadoop 23.X
Reporter: Viraj Bhat
 Fix For: 0.13.0

 Attachments: HIVE-4216.1.patch


 After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The 
 TestHBaseMinimrCliDriver, fails after performing the following steps
 Update hbase_bulk.m with the following properties
 set mapreduce.totalorderpartitioner.naturalorder=false;
 set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst;
 Otherwise I keep seeing: _partition.lst not found exception in the mappers, 
 even though set total.order.partitioner.path=/tmp/hbpartition.lst is set.
 When the test runs, the 3 reducer phase of the second query fails with the 
 following error, but the MiniMRCluster keeps spinning up new reducer and the 
 test is stuck infinitely.
 {code}
 insert overwrite table hbsort
  select distinct value,
   case when key=103 then cast(null as string) else key end,
   case when key=103 then ''
else cast(key+1 as string) end
  from src
  cluster by value;
 {code}
 The stack trace I see in the syslog for the Node Manager is the following:
 ==
 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
 ... 7 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
 ... 7 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265)
 at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69)
 at 
 org.apache.hadoop.hbase.mapreduce.HFileOutputFormat.getRecordWriter(HFileOutputFormat.java:90)
 at 
 org.apache.hadoop.hive.hbase.HiveHFileOutputFormat.getFileWriter(HiveHFileOutputFormat.java:67)

[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table


[ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852467#comment-13852467
 ] 

Hive QA commented on HIVE-5795:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619445/HIVE-5795.3.patch

{color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 4794 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_file_with_header_footer
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_nonmr_fetch
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_file_with_header_footer_negative
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/692/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/692/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 3 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619445

 Hive should be able to skip header and footer rows when reading data file for 
 a table
 -

 Key: HIVE-5795
 URL: https://issues.apache.org/jira/browse/HIVE-5795
 Project: Hive
  Issue Type: Bug
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch


 Hive should be able to skip header and footer lines when reading data file 
 from table. In this way, user don't need to processing data which generated 
 by other application with a header or footer and directly use the file for 
 table operations.
 To implement this, the idea is adding new properties in table descriptions to 
 define the number of lines in header and footer and skip them when reading 
 the record from record reader. An DDL example for creating a table with 
 header and footer should be like this:
 {code}
 Create external table testtable (name string, message string) row format 
 delimited fields terminated by '\t' lines terminated by '\n' location 
 '/testtable' tblproperties (skip.header.number=1, 
 skip.footer.number=2);
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6052) metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns

2013-12-18 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-6052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HIVE-6052:
---

Attachment: HIVE-6052.01.patch

updated patch

 metastore JDO filter pushdown for integers may produce unexpected results 
 with non-normalized integer columns
 -

 Key: HIVE-6052
 URL: https://issues.apache.org/jira/browse/HIVE-6052
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, 0.13.0
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HIVE-6052.01.patch, HIVE-6052.patch


 If integer partition columns have values stores in non-canonical form, for 
 example with leading zeroes, the integer filter doesn't work. That is because 
 JDO pushdown uses substrings to compare for equality, and SQL pushdown is 
 intentionally crippled to do the same to produce same results.
 Probably, since both SQL pushdown and integers pushdown are just perf 
 optimizations, we can remove it for JDO (or make configurable and disable by 
 default), and uncripple SQL.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables


 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Status: Patch Available  (was: Open)

 Update AvroSerde to determine schema of new tables
 --

 Key: HIVE-3159
 URL: https://issues.apache.org/jira/browse/HIVE-3159
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Jakob Homan
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
 HIVE-3159v1.patch


 Currently when writing tables to Avro one must manually provide an Avro 
 schema that matches what is being delivered by Hive. It'd be better to have 
 the serde infer this schema by converting the table's TypeInfo into an 
 appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-3159) Update AvroSerde to determine schema of new tables


 [ 
https://issues.apache.org/jira/browse/HIVE-3159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-3159:


Attachment: HIVE-3159.6.patch

 Update AvroSerde to determine schema of new tables
 --

 Key: HIVE-3159
 URL: https://issues.apache.org/jira/browse/HIVE-3159
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Jakob Homan
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-3159.4.patch, HIVE-3159.5.patch, HIVE-3159.6.patch, 
 HIVE-3159v1.patch


 Currently when writing tables to Avro one must manually provide an Avro 
 schema that matches what is being delivered by Hive. It'd be better to have 
 the serde infer this schema by converting the table's TypeInfo into an 
 appropriate AvroSchema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16339: HIVE-6052 metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns

2013-12-18 Thread Sergey Shelukhin


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16339/
---

(Updated Dec. 19, 2013, 2:17 a.m.)


Review request for hive and Ashutosh Chauhan.


Repository: hive-git


Description
---

see JIRA


Diffs (updated)
-

  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java fa3e048 
  metastore/src/java/org/apache/hadoop/hive/metastore/MetaStoreDirectSql.java 
a98d9d1 
  metastore/src/java/org/apache/hadoop/hive/metastore/ObjectStore.java 04d399f 
  
metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java 
93e9942 
  ql/src/test/queries/clientpositive/alter_partition_coltype.q 5479afb 
  ql/src/test/queries/clientpositive/annotate_stats_part.q 83510e3 
  ql/src/test/queries/clientpositive/dynamic_partition_skip_default.q 397a220 
  ql/src/test/results/clientpositive/alter_partition_coltype.q.out 27b1fbc 
  ql/src/test/results/clientpositive/annotate_stats_part.q.out 87fb980 
  ql/src/test/results/clientpositive/dynamic_partition_skip_default.q.out 
baee525 

Diff: https://reviews.apache.org/r/16339/diff/


Testing
---


Thanks,

Sergey Shelukhin

[jira] [Updated] (HIVE-5957) Fix HCatalog Unit tests on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5957:
-

Status: Open  (was: Patch Available)

 Fix HCatalog Unit tests on Windows
 --

 Key: HIVE-5957
 URL: https://issues.apache.org/jira/browse/HIVE-5957
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5957.patch


 org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It 
 generates java.lang.IllegalStateException: Failed to setup cluster at 
 org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). 
 Digging further there is 
 Please fix invalid configuration for hbase.rootdir 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase
 java.lang.IllegalArgumentException: Wrong FS: 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, 
 expected: file:///
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 This was fixed in HIVE-5015 (9/3/13) and 
 and got clobbered by HIVE-5261 (9/12/13).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5957) Fix HCatalog Unit tests on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5957:
-

Status: Patch Available  (was: Open)

 Fix HCatalog Unit tests on Windows
 --

 Key: HIVE-5957
 URL: https://issues.apache.org/jira/browse/HIVE-5957
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5957.2.patch, HIVE-5957.patch


 org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It 
 generates java.lang.IllegalStateException: Failed to setup cluster at 
 org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). 
 Digging further there is 
 Please fix invalid configuration for hbase.rootdir 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase
 java.lang.IllegalArgumentException: Wrong FS: 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, 
 expected: file:///
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 This was fixed in HIVE-5015 (9/3/13) and 
 and got clobbered by HIVE-5261 (9/12/13).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5957) Fix HCatalog Unit tests on Windows


 [ 
https://issues.apache.org/jira/browse/HIVE-5957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eugene Koifman updated HIVE-5957:
-

Attachment: HIVE-5957.2.patch

HIVE-5957.2.patch to address Daniel's comments

 Fix HCatalog Unit tests on Windows
 --

 Key: HIVE-5957
 URL: https://issues.apache.org/jira/browse/HIVE-5957
 Project: Hive
  Issue Type: Bug
  Components: HCatalog
Affects Versions: 0.12.0
Reporter: Eugene Koifman
Assignee: Eugene Koifman
 Attachments: HIVE-5957.2.patch, HIVE-5957.patch


 org.apache.hcatalog.hbase.TestHBaseHCatStorageHandler fails on Windows. It 
 generates java.lang.IllegalStateException: Failed to setup cluster at 
 org.apache.hcatalog.hbase.ManyMiniCluster.start(ManyMiniCluster.java:119). 
 Digging further there is 
 Please fix invalid configuration for hbase.rootdir 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase
 java.lang.IllegalArgumentException: Wrong FS: 
 file://C:/tmp/build/test/data/test_default_573990410261077827/hbase, 
 expected: file:///
 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:642)
 This was fixed in HIVE-5015 (9/3/13) and 
 and got clobbered by HIVE-5261 (9/12/13).



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table

2013-12-18 Thread Shuaishuai Nie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-5795:
-

Attachment: (was: HIVE-5795.3.patch)

 Hive should be able to skip header and footer rows when reading data file for 
 a table
 -

 Key: HIVE-5795
 URL: https://issues.apache.org/jira/browse/HIVE-5795
 Project: Hive
  Issue Type: Bug
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch


 Hive should be able to skip header and footer lines when reading data file 
 from table. In this way, user don't need to processing data which generated 
 by other application with a header or footer and directly use the file for 
 table operations.
 To implement this, the idea is adding new properties in table descriptions to 
 define the number of lines in header and footer and skip them when reading 
 the record from record reader. An DDL example for creating a table with 
 header and footer should be like this:
 {code}
 Create external table testtable (name string, message string) row format 
 delimited fields terminated by '\t' lines terminated by '\n' location 
 '/testtable' tblproperties (skip.header.number=1, 
 skip.footer.number=2);
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15654/#review30669
---



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java
https://reviews.apache.org/r/15654/#comment58764

Would you please further explain on this? preferably with an example.



- Mohammad Islam


On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15654/
 ---
 
 (Updated Dec. 18, 2013, 3:16 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra 
 Pandey.
 
 
 Bugs: HIVE-5829
 https://issues.apache.org/jira/browse/HIVE-5829
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Rewrite the UDFS *pads and *trim using GenericUDF.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 bca1f26 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
  eff251f 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/15654/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mohammad Islam

[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table

2013-12-18 Thread Shuaishuai Nie (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shuaishuai Nie updated HIVE-5795:
-

Attachment: HIVE-5795.3.patch

 Hive should be able to skip header and footer rows when reading data file for 
 a table
 -

 Key: HIVE-5795
 URL: https://issues.apache.org/jira/browse/HIVE-5795
 Project: Hive
  Issue Type: Bug
Reporter: Shuaishuai Nie
Assignee: Shuaishuai Nie
 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch


 Hive should be able to skip header and footer lines when reading data file 
 from table. In this way, user don't need to processing data which generated 
 by other application with a header or footer and directly use the file for 
 table operations.
 To implement this, the idea is adding new properties in table descriptions to 
 define the number of lines in header and footer and skip them when reading 
 the record from record reader. An DDL example for creating a table with 
 header and footer should be like this:
 {code}
 Create external table testtable (name string, message string) row format 
 delimited fields terminated by '\t' lines terminated by '\n' location 
 '/testtable' tblproperties (skip.header.number=1, 
 skip.footer.number=2);
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF



 On Dec. 18, 2013, 5:37 a.m., Xuefu Zhang wrote:
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java,
   line 774
  https://reviews.apache.org/r/15654/diff/4/?file=399245#file399245line774
 
  I don't think we need the bridge udf for generic UDFs.

This change is only to replace UDFLTrim with new GenericUDFLTrim used in this 
test case. Generic bridge UDF is  already there.


 On Dec. 18, 2013, 5:37 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java, 
  line 109
  https://reviews.apache.org/r/15654/diff/4/?file=399238#file399238line109
 
  I'm not sure if this is intentional, but the logic here means that any 
  of the three input can have a type of INT. If INT is okay, then why not 
  BYTE, SHORT, or LONG? It's probably better to check each argument's type 
  separately.

will do.


 On Dec. 18, 2013, 5:37 a.m., Xuefu Zhang wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java, 
  line 48
  https://reviews.apache.org/r/15654/diff/4/?file=399238#file399238line48
 
  Msg doesn't match the if condition.

will correct.


- Mohammad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15654/#review30604
---


On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15654/
 ---
 
 (Updated Dec. 18, 2013, 3:16 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra 
 Pandey.
 
 
 Bugs: HIVE-5829
 https://issues.apache.org/jira/browse/HIVE-5829
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Rewrite the UDFS *pads and *trim using GenericUDF.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 bca1f26 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
  eff251f 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/15654/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mohammad Islam

Re: Review Request 15654: Rewrite Trim and Pad UDFs based on GenericUDF



 On Dec. 18, 2013, 10:58 a.m., Jason Dere wrote:
  ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java, 
  line 78
  https://reviews.apache.org/r/15654/diff/4/?file=399238#file399238line78
 
  Having gone through some pain with Hive on Windows, the bytes returned 
  by String.getBytes() will not be in utf-8 if the default encoding is 
  something other than utf-8. Would be safer here to either use 
  getBytes(UTF-8), or Text.encode() if you want to get bytes from the 
  string. Or just do the padding as Strings.

str is of type Text. It doesn't have getBytes(UTF-8). only have 
getBytes().


- Mohammad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15654/#review30610
---


On Dec. 18, 2013, 3:16 a.m., Mohammad Islam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/15654/
 ---
 
 (Updated Dec. 18, 2013, 3:16 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan, Carl Steinbach, and Jitendra 
 Pandey.
 
 
 Bugs: HIVE-5829
 https://issues.apache.org/jira/browse/HIVE-5829
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Rewrite the UDFS *pads and *trim using GenericUDF.
 
 
 Diffs
 -
 
   ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java a895d65 
   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 
 bca1f26 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLTrim.java dc00cf9 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFLpad.java d1da19a 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRTrim.java 2bcc5fa 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFRpad.java 9652ce2 
   ql/src/java/org/apache/hadoop/hive/ql/udf/UDFTrim.java 490886d 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBasePad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFBaseTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFLpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRpad.java 
 PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFTrim.java 
 PRE-CREATION 
   
 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorizationContext.java
  eff251f 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFLpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRTrim.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFRpad.java 
 PRE-CREATION 
   ql/src/test/org/apache/hadoop/hive/ql/udf/generic/TestGenericUDFTrim.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/15654/diff/
 
 
 Testing
 ---
 
 
 Thanks,
 
 Mohammad Islam

[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF


 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: (was: HIVE-5829.3.patch)

 Rewrite Trim and Pad UDFs based on GenericUDF
 -

 Key: HIVE-5829
 URL: https://issues.apache.org/jira/browse/HIVE-5829
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, tmp.HIVE-5829.patch


 This JIRA includes following UDFs:
 1. trim()
 2. ltrim()
 3. rtrim()
 4. lpad()
 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF


 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Attachment: HIVE-5829.3.patch

 Rewrite Trim and Pad UDFs based on GenericUDF
 -

 Key: HIVE-5829
 URL: https://issues.apache.org/jira/browse/HIVE-5829
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, 
 tmp.HIVE-5829.patch


 This JIRA includes following UDFs:
 1. trim()
 2. ltrim()
 3. rtrim()
 4. lpad()
 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-5829) Rewrite Trim and Pad UDFs based on GenericUDF


 [ 
https://issues.apache.org/jira/browse/HIVE-5829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mohammad Kamrul Islam updated HIVE-5829:


Status: Patch Available  (was: Open)

 Rewrite Trim and Pad UDFs based on GenericUDF
 -

 Key: HIVE-5829
 URL: https://issues.apache.org/jira/browse/HIVE-5829
 Project: Hive
  Issue Type: Bug
  Components: UDF
Reporter: Mohammad Kamrul Islam
Assignee: Mohammad Kamrul Islam
 Attachments: HIVE-5829.1.patch, HIVE-5829.2.patch, HIVE-5829.3.patch, 
 tmp.HIVE-5829.patch


 This JIRA includes following UDFs:
 1. trim()
 2. ltrim()
 3. rtrim()
 4. lpad()
 5. rpad()



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Re: Review Request 16329: HIVE-6039: Round, AVG and SUM functions reject char/varch input while accepting string input


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16329/#review30672
---


Looks good to go. very minor comments added.


ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java
https://reviews.apache.org/r/16329/#comment58769

Will it be better to explicitly add char and varchar as accepted type in 
the message?



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java
https://reviews.apache.org/r/16329/#comment58770

same here.



ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRound.java
https://reviews.apache.org/r/16329/#comment58771

Same. more informative message could be helpful.


- Mohammad Islam


On Dec. 17, 2013, 9:26 p.m., Xuefu Zhang wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/16329/
 ---
 
 (Updated Dec. 17, 2013, 9:26 p.m.)
 
 
 Review request for hive and Prasad Mujumdar.
 
 
 Bugs: HIVE-6039
 https://issues.apache.org/jira/browse/HIVE-6039
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Allow input to these UDFs for char and varchar.
 
 
 Diffs
 -
 
   data/files/char_varchar_udf.txt PRE-CREATION 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFAverage.java 
 4b219bd 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDAFSum.java 
 41d5efd 
   ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFRound.java 
 fc9c1b2 
   ql/src/test/queries/clientpositive/char_varchar_udf.q PRE-CREATION 
   ql/src/test/results/clientpositive/char_varchar_udf.q.out PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/16329/diff/
 
 
 Testing
 ---
 
 Unit tested. New test added. Test suite passed.
 
 
 Thanks,
 
 Xuefu Zhang

[jira] [Commented] (HIVE-6013) Supporting Quoted Identifiers in Column Names


[ 
https://issues.apache.org/jira/browse/HIVE-6013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852539#comment-13852539
 ] 

Hive QA commented on HIVE-6013:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619449/HIVE-6013.6.patch

{color:red}ERROR:{color} -1 due to 14 failed/errored test(s), 4796 tests 
executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_ambiguous_col
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_join_reordering_values
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_clusterby1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_distributeby1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_orderby1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_escape_sortby1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap3
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_index_bitmap_auto
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_quote1
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_show_tablestatus
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_udf_index
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_ambiguous_col1
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_ambiguous_col2
org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_invalidate_view1
{noformat}

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/693/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/693/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests exited with: TestsFailedException: 14 tests failed
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619449

 Supporting Quoted Identifiers in Column Names
 -

 Key: HIVE-6013
 URL: https://issues.apache.org/jira/browse/HIVE-6013
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Harish Butani
Assignee: Harish Butani
 Fix For: 0.13.0

 Attachments: HIVE-6013.1.patch, HIVE-6013.2.patch, HIVE-6013.3.patch, 
 HIVE-6013.4.patch, HIVE-6013.5.patch, HIVE-6013.6.patch, QuotedIdentifier.html


 Hive's current behavior on Quoted Identifiers is different from the normal 
 interpretation. Quoted Identifier (using backticks) has a special 
 interpretation for Select expressions(as Regular Expressions). Have 
 documented current behavior and proposed a solution in attached doc.
 Summary of solution is:
 - Introduce 'standard' quoted identifiers for columns only. 
 - At the langauage level this is turned on by a flag.
 - At the metadata level we relax the constraint on column names.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-5992) Hive inconsistently converts timestamp in AVG and SUM UDAF's


[ 
https://issues.apache.org/jira/browse/HIVE-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852540#comment-13852540
 ] 

Mohammad Kamrul Islam commented on HIVE-5992:
-

Looks good.
+1

 Hive inconsistently converts timestamp in AVG and SUM UDAF's
 

 Key: HIVE-5992
 URL: https://issues.apache.org/jira/browse/HIVE-5992
 Project: Hive
  Issue Type: Bug
  Components: UDF
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-5992.patch


 {code}
 hive select t, sum(t), count(*), sum(t)/count(*), avg(t) from ts group by t;
 ...
 OK
 1977-03-15 12:34:22.345678 227306062  1  227306062
 2.27306062345678E8
 {code}
 As it can be seen, timestamp value (1977-03-15 12:34:22.345678) is converted 
 with fractional part ignored in sum, while preserved in avg. As a further 
 result, sum()/count() is not equivalent to avg.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely

2013-12-18 Thread Sheng Liu (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852578#comment-13852578
 ] 

Sheng Liu commented on HIVE-4216:
-

Hi! Thanks all for finding the issue and fixing it.
I find  the path in the patch 
shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java is 
not exist in release-0.11.0 and 0.12.0, is it should be 
shims/src/0.23/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java?

 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 
 and test is stuck infinitely
 

 Key: HIVE-4216
 URL: https://issues.apache.org/jira/browse/HIVE-4216
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.9.0, 0.11.0, 0.12.0
 Environment: Hadoop 23.X
Reporter: Viraj Bhat
 Fix For: 0.13.0

 Attachments: HIVE-4216.1.patch


 After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The 
 TestHBaseMinimrCliDriver, fails after performing the following steps
 Update hbase_bulk.m with the following properties
 set mapreduce.totalorderpartitioner.naturalorder=false;
 set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst;
 Otherwise I keep seeing: _partition.lst not found exception in the mappers, 
 even though set total.order.partitioner.path=/tmp/hbpartition.lst is set.
 When the test runs, the 3 reducer phase of the second query fails with the 
 following error, but the MiniMRCluster keeps spinning up new reducer and the 
 test is stuck infinitely.
 {code}
 insert overwrite table hbsort
  select distinct value,
   case when key=103 then cast(null as string) else key end,
   case when key=103 then ''
else cast(key+1 as string) end
  from src
  cluster by value;
 {code}
 The stack trace I see in the syslog for the Node Manager is the following:
 ==
 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
 ... 7 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
 ... 7 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265)
 at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.appendTo(TaskAttemptID.java:119)
 at 
 org.apache.hadoop.mapreduce.TaskAttemptID.toString(TaskAttemptID.java:151)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.getTaskAttemptPath(FileOutputCommitter.java:209)
 at 
 org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.init(FileOutputCommitter.java:69)

[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely


[ 
https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852577#comment-13852577
 ] 

Hive QA commented on HIVE-4216:
---



{color:green}Overall{color}: +1 all checks pass

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12619451/HIVE-4216.1.patch

{color:green}SUCCESS:{color} +1 4792 tests passed

Test results: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/694/testReport
Console output: 
http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/694/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
{noformat}

This message is automatically generated.

ATTACHMENT ID: 12619451

 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 
 and test is stuck infinitely
 

 Key: HIVE-4216
 URL: https://issues.apache.org/jira/browse/HIVE-4216
 Project: Hive
  Issue Type: Bug
  Components: StorageHandler
Affects Versions: 0.9.0, 0.11.0, 0.12.0
 Environment: Hadoop 23.X
Reporter: Viraj Bhat
 Fix For: 0.13.0

 Attachments: HIVE-4216.1.patch


 After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The 
 TestHBaseMinimrCliDriver, fails after performing the following steps
 Update hbase_bulk.m with the following properties
 set mapreduce.totalorderpartitioner.naturalorder=false;
 set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst;
 Otherwise I keep seeing: _partition.lst not found exception in the mappers, 
 even though set total.order.partitioner.path=/tmp/hbpartition.lst is set.
 When the test runs, the 3 reducer phase of the second query fails with the 
 following error, but the MiniMRCluster keeps spinning up new reducer and the 
 test is stuck infinitely.
 {code}
 insert overwrite table hbsort
  select distinct value,
   case when key=103 then cast(null as string) else key end,
   case when key=103 then ''
else cast(key+1 as string) end
  from src
  cluster by value;
 {code}
 The stack trace I see in the syslog for the Node Manager is the following:
 ==
 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] 
 org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: 
 attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: 
 org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while 
 processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
 at 
 org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399)
 at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:396)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212)
 at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime 
 Error while processing row (tag=0) 
 {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0}
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
 ... 7 more
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
 ... 7 more
 Caused by: java.lang.NullPointerException
 at 
 org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265)
 at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153)
 at

[jira] [Commented] (HIVE-6048) Hive load data command rejects file with '+' in the name


[ 
https://issues.apache.org/jira/browse/HIVE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13852594#comment-13852594
 ] 

Xuefu Zhang commented on HIVE-6048:
---

[~ashutoshc] I agree with you about the need for HIVE-6024. However, I thinking 
solving that problem doesn't necessary address the problem described here, as 
the same issue also occurs with non-local files.

{code}
hive dfs -ls /tmp/files/t+est.txt;  
Found 1 items
-rw-r--r--   1 xzhang supergroup  9 2013-12-18 20:20 
/tmp/files/t+est.txt
hive load data inpath '/tmp/files/t+est.txt' into table test;
Loading data to table default.test
Failed with exception Wrong file format. Please check the file's format.
FAILED: Execution Error, return code 1 from 
org.apache.hadoop.hive.ql.exec.MoveTask
{code}

Therefore, it might make sense to solve them separately.

 Hive load data command rejects file with '+' in the name
 

 Key: HIVE-6048
 URL: https://issues.apache.org/jira/browse/HIVE-6048
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-6048.patch


 '+' is a valid character in a file name on linux and HDFS. However, loading 
 data from such a file into table results the following error:
 {code}
 hive load data local inpath './t+est' into table test;
 FAILED: SemanticException Line 1:23 Invalid path ''./t+est'': No files 
 matching path file:/home/xzhang/apache/hive7/t%20est
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Updated] (HIVE-6048) Hive load data command rejects file with '+' in the name


 [ 
https://issues.apache.org/jira/browse/HIVE-6048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang updated HIVE-6048:
--

Description: 
'+' is a valid character in a file name on linux and HDFS. However, loading 
data from such a file into table results the following error:

{code}
hive load data local inpath '/home/xzhang/temp/t+est.txt' into table test;
Copying data from file:/home/xzhang/temp/t est.txt
No files matching path: file:/home/xzhang/temp/t est.txt
FAILED: Execution Error, return code 3 from 
org.apache.hadoop.hive.ql.exec.CopyTask
{code}

  was:
'+' is a valid character in a file name on linux and HDFS. However, loading 
data from such a file into table results the following error:

{code}
hive load data local inpath './t+est' into table test;
FAILED: SemanticException Line 1:23 Invalid path ''./t+est'': No files matching 
path file:/home/xzhang/apache/hive7/t%20est
{code}


 Hive load data command rejects file with '+' in the name
 

 Key: HIVE-6048
 URL: https://issues.apache.org/jira/browse/HIVE-6048
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.12.0
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Attachments: HIVE-6048.patch


 '+' is a valid character in a file name on linux and HDFS. However, loading 
 data from such a file into table results the following error:
 {code}
 hive load data local inpath '/home/xzhang/temp/t+est.txt' into table test;
 Copying data from file:/home/xzhang/temp/t est.txt
 No files matching path: file:/home/xzhang/temp/t est.txt
 FAILED: Execution Error, return code 3 from 
 org.apache.hadoop.hive.ql.exec.CopyTask
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

[jira] [Commented] (HIVE-6052) metastore JDO filter pushdown for integers may produce unexpected results with non-normalized integer columns