[jira] [Commented] (HIVE-5901) Query cancel should stop running MR tasks
[ https://issues.apache.org/jira/browse/HIVE-5901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861369#comment-13861369 ] Hive QA commented on HIVE-5901: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621020/HIVE-5901.4.patch.txt {color:green}SUCCESS:{color} +1 4873 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/788/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/788/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621020 Query cancel should stop running MR tasks - Key: HIVE-5901 URL: https://issues.apache.org/jira/browse/HIVE-5901 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5901.1.patch.txt, HIVE-5901.2.patch.txt, HIVE-5901.3.patch.txt, HIVE-5901.4.patch.txt Currently, query canceling does not stop running MR job immediately. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5032) Enable hive creating external table at the root directory of DFS
[ https://issues.apache.org/jira/browse/HIVE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861385#comment-13861385 ] Lefty Leverenz commented on HIVE-5032: -- Is this going to need any documentation? (I'm guessing not.) Enable hive creating external table at the root directory of DFS Key: HIVE-5032 URL: https://issues.apache.org/jira/browse/HIVE-5032 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5032.1.patch, HIVE-5032.2.patch, HIVE-5032.3.patch Creating external table using HIVE with location point to the root directory of DFS will fail because the function HiveFileFormatUtils#doGetPartitionDescFromPath treat authority of the path the same as folder and cannot find a match in the pathToPartitionInfo table when doing prefix match. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5941) SQL std auth - support 'show all roles'
[ https://issues.apache.org/jira/browse/HIVE-5941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861412#comment-13861412 ] Hive QA commented on HIVE-5941: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621029/HIVE-5941.2.patch.txt {color:green}SUCCESS:{color} +1 4874 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/789/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/789/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621029 SQL std auth - support 'show all roles' --- Key: HIVE-5941 URL: https://issues.apache.org/jira/browse/HIVE-5941 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Navis Attachments: HIVE-5941.1.patch.txt, HIVE-5941.2.patch.txt Original Estimate: 24h Remaining Estimate: 24h SHOW ALL ROLES - This will list all currently existing roles. This will be available only to the superuser. This task includes parser changes. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5945) ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task.
[ https://issues.apache.org/jira/browse/HIVE-5945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861465#comment-13861465 ] Hive QA commented on HIVE-5945: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621023/HIVE-5945.6.patch.txt {color:green}SUCCESS:{color} +1 4873 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/790/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/790/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621023 ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask also sums those tables which are not used in the child of this conditional task. - Key: HIVE-5945 URL: https://issues.apache.org/jira/browse/HIVE-5945 Project: Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.12.0, 0.13.0 Reporter: Yin Huai Assignee: Navis Priority: Critical Attachments: HIVE-5945.1.patch.txt, HIVE-5945.2.patch.txt, HIVE-5945.3.patch.txt, HIVE-5945.4.patch.txt, HIVE-5945.5.patch.txt, HIVE-5945.6.patch.txt Here is an example {code} select i_item_id, s_state, avg(ss_quantity) agg1, avg(ss_list_price) agg2, avg(ss_coupon_amt) agg3, avg(ss_sales_price) agg4 FROM store_sales JOIN date_dim on (store_sales.ss_sold_date_sk = date_dim.d_date_sk) JOIN item on (store_sales.ss_item_sk = item.i_item_sk) JOIN customer_demographics on (store_sales.ss_cdemo_sk = customer_demographics.cd_demo_sk) JOIN store on (store_sales.ss_store_sk = store.s_store_sk) where cd_gender = 'F' and cd_marital_status = 'U' and cd_education_status = 'Primary' and d_year = 2002 and s_state in ('GA','PA', 'LA', 'SC', 'MI', 'AL') group by i_item_id, s_state order by i_item_id, s_state limit 100; {\code} I turned off noconditionaltask. So, I expected that there will be 4 Map-only jobs for this query. However, I got 1 Map-only job (joining strore_sales and date_dim) and 3 MR job (for reduce joins.) So, I checked the conditional task determining the plan of the join involving item. In ql.plan.ConditionalResolverCommonJoin.resolveMapJoinTask, aliasToFileSizeMap contains all input tables used in this query and the intermediate table generated by joining store_sales and date_dim. So, when we sum the size of all small tables, the size of store_sales (which is around 45GB in my test) will be also counted. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5757) Implement vectorized support for CASE
[ https://issues.apache.org/jira/browse/HIVE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861522#comment-13861522 ] Hive QA commented on HIVE-5757: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621119/HIVE-5757.4.patch {color:green}SUCCESS:{color} +1 4874 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/791/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/791/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621119 Implement vectorized support for CASE - Key: HIVE-5757 URL: https://issues.apache.org/jira/browse/HIVE-5757 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-5757.1.patch, HIVE-5757.2.patch, HIVE-5757.3.patch, HIVE-5757.4.patch Implement support for CASE in vectorized mode. The approach is to use the vectorized UDF adaptor internally. A higher-performance version that used VectorExpression subclasses was considered but not done due to complexity. Such a version potentially could be done in the future if it's important enough. This is high priority because CASE is a fairly popular expression. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6051) Create DecimalColumnVector and a representative VectorExpression for decimal
[ https://issues.apache.org/jira/browse/HIVE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861584#comment-13861584 ] Hive QA commented on HIVE-6051: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12619194/HIVE-6051.01.patch Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/794/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/794/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-794/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'shims/0.23/src/main/java/org/apache/hadoop/hive/shims/Hadoop23Shims.java' Reverted 'hbase-handler/src/test/results/negative/cascade_dbdrop.q.out' Reverted 'hbase-handler/src/test/results/positive/hbase_bulk.m.out' Reverted 'hbase-handler/src/test/queries/positive/hbase_bulk.m' ++ awk '{print $2}' ++ egrep -v '^X|^Performing status on external' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1555121. At revision 1555121. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12619194 Create DecimalColumnVector and a representative VectorExpression for decimal Key: HIVE-6051 URL: https://issues.apache.org/jira/browse/HIVE-6051 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-6051.01.patch Create a DecimalColumnVector to use as a basis for vectorized decimal operations. Include a representative VectorExpression on decimal (e.g. column-column addition) to demonstrate it's use. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4216) TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely
[ https://issues.apache.org/jira/browse/HIVE-4216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861583#comment-13861583 ] Hive QA commented on HIVE-4216: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621157/HIVE-4216.2.patch {color:green}SUCCESS:{color} +1 4873 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/792/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/792/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621157 TestHBaseMinimrCliDriver throws weird error with HBase 0.94.5 and Hadoop 23 and test is stuck infinitely Key: HIVE-4216 URL: https://issues.apache.org/jira/browse/HIVE-4216 Project: Hive Issue Type: Bug Components: StorageHandler Affects Versions: 0.9.0, 0.11.0, 0.12.0 Environment: Hadoop 23.X Reporter: Viraj Bhat Fix For: 0.13.0 Attachments: HIVE-4216.1.patch, HIVE-4216.2.patch After upgrading to Hadoop 23 and HBase 0.94.5 compiled for Hadoop 23. The TestHBaseMinimrCliDriver, fails after performing the following steps Update hbase_bulk.m with the following properties set mapreduce.totalorderpartitioner.naturalorder=false; set mapreduce.totalorderpartitioner.path=/tmp/hbpartition.lst; Otherwise I keep seeing: _partition.lst not found exception in the mappers, even though set total.order.partitioner.path=/tmp/hbpartition.lst is set. When the test runs, the 3 reducer phase of the second query fails with the following error, but the MiniMRCluster keeps spinning up new reducer and the test is stuck infinitely. {code} insert overwrite table hbsort select distinct value, case when key=103 then cast(null as string) else key end, case when key=103 then '' else cast(key+1 as string) end from src cluster by value; {code} The stack trace I see in the syslog for the Node Manager is the following: == 13-03-20 16:26:48,942 FATAL [IPC Server handler 17 on 55996] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1363821864968_0003_r_02_0 - exited : java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:448) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:399) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1212) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {key:{reducesinkkey0:val_200},value:{_col0:val_200,_col1:200,_col2:201.0},alias:0} at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256) ... 7 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:237) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:477) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:525) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247) ... 7 more Caused by: java.lang.NullPointerException at org.apache.hadoop.mapreduce.TaskID$CharTaskTypeMaps.getRepresentingCharacter(TaskID.java:265) at org.apache.hadoop.mapreduce.TaskID.appendTo(TaskID.java:153) at
[jira] [Commented] (HIVE-3553) Support binary qualifiers for Hive/HBase integration
[ https://issues.apache.org/jira/browse/HIVE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861605#comment-13861605 ] Brock Noland commented on HIVE-3553: Ahh ok, that makes sense. So the change to Bytes.toBytesBinary() will break some users and we'll need to create our own utility method to do the conversion. Support binary qualifiers for Hive/HBase integration Key: HIVE-3553 URL: https://issues.apache.org/jira/browse/HIVE-3553 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-3553.1.patch.txt Along with regular qualifiers, we should support binary HBase qualifiers as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3553) Support binary qualifiers for Hive/HBase integration
[ https://issues.apache.org/jira/browse/HIVE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861612#comment-13861612 ] Swarnim Kulkarni commented on HIVE-3553: Sounds good to me as well. I'll make the change and try to have an updated patch soon. Support binary qualifiers for Hive/HBase integration Key: HIVE-3553 URL: https://issues.apache.org/jira/browse/HIVE-3553 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-3553.1.patch.txt Along with regular qualifiers, we should support binary HBase qualifiers as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-3553) Support binary qualifiers for Hive/HBase integration
[ https://issues.apache.org/jira/browse/HIVE-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861633#comment-13861633 ] Brock Noland commented on HIVE-3553: Sounds good! And thank you for the test :) Support binary qualifiers for Hive/HBase integration Key: HIVE-3553 URL: https://issues.apache.org/jira/browse/HIVE-3553 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.9.0 Reporter: Swarnim Kulkarni Assignee: Swarnim Kulkarni Attachments: HIVE-3553.1.patch.txt Along with regular qualifiers, we should support binary HBase qualifiers as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Hive-trunk-hadoop2 - Build # 645 - Still Failing
Changes for Build #640 Changes for Build #641 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) [navis] HIVE-4257 : java.sql.SQLNonTransientConnectionException on JDBCStatsAggregator (Teddy Choi via Navis, reviewed by Ashutosh) Changes for Build #642 Changes for Build #643 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #644 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #645 No tests ran. The Apache Jenkins build system has built Hive-trunk-hadoop2 (build #645) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-hadoop2/645/ to view the results.
[jira] [Commented] (HIVE-5923) SQL std auth - parser changes
[ https://issues.apache.org/jira/browse/HIVE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861655#comment-13861655 ] Brock Noland commented on HIVE-5923: +1 SQL std auth - parser changes - Key: HIVE-5923 URL: https://issues.apache.org/jira/browse/HIVE-5923 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-5923.1.patch, HIVE-5923.2.patch, HIVE-5923.3.patch, HIVE-5923.4.patch Original Estimate: 96h Time Spent: 72h Remaining Estimate: 12h There are new access control statements proposed in the functional spec in HIVE-5837 . It also proposes some small changes to the existing query syntax (mostly extensions and some optional keywords). The syntax supported should depend on the current authorization mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5446) Hive can CREATE an external table but not SELECT from it when file path have spaces
[ https://issues.apache.org/jira/browse/HIVE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861663#comment-13861663 ] Hive QA commented on HIVE-5446: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621210/HIVE-5446.2.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 4873 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_single_sourced_multi_insert org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_reduce_deduplicate {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/795/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/795/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12621210 Hive can CREATE an external table but not SELECT from it when file path have spaces --- Key: HIVE-5446 URL: https://issues.apache.org/jira/browse/HIVE-5446 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5446.1.patch, HIVE-5446.2.patch Create external table table1 (age int, gender string, totBil float, dirBill float, alkphos int, sgpt int, sgot int, totProt float, aLB float, aG float, sel int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'hdfs://namenodehost:9000/hive newtable'; select * from table1; return nothing even there is file in the target folder -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Hive-trunk-h0.21 - Build # 2545 - Still Failing
Changes for Build #2539 Changes for Build #2540 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) Changes for Build #2541 Changes for Build #2542 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #2543 [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #2544 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) Changes for Build #2545 No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #2545) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/2545/ to view the results.
[jira] [Commented] (HIVE-6017) Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive
[ https://issues.apache.org/jira/browse/HIVE-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861701#comment-13861701 ] Eric Hanson commented on HIVE-6017: --- I don't think this needs end-user documentation. This is an internal performance enhancement. The user-visible type system won't change. Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive --- Key: HIVE-6017 URL: https://issues.apache.org/jira/browse/HIVE-6017 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-6017.01.patch, HIVE-6017.02.patch, HIVE-6017.03.patch, HIVE-6017.04.patch Contribute the Decimal128 high-performance decimal package developed by Microsoft to Hive. This was originally written for Microsoft PolyBase by Hideaki Kimura. This code is about 8X more efficient than Java BigDecimal for typical operations. It uses a finite (128 bit) precision and can handle up to decimal(38, X). It is also mutable so you can change the contents of an existing object. This helps reduce the cost of new() and garbage collection. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5446) Hive can CREATE an external table but not SELECT from it when file path have spaces
[ https://issues.apache.org/jira/browse/HIVE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861700#comment-13861700 ] Xuefu Zhang commented on HIVE-5446: --- With a series of standardizing path in Hive such as via HIVE6048, HIVE-6121, etc, URI decoding might become unnecessary. At least, it's worth reproducing the problem with the latest trunk and fix it accordingly if the problem remains. Hive can CREATE an external table but not SELECT from it when file path have spaces --- Key: HIVE-5446 URL: https://issues.apache.org/jira/browse/HIVE-5446 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5446.1.patch, HIVE-5446.2.patch Create external table table1 (age int, gender string, totBil float, dirBill float, alkphos int, sgpt int, sgot int, totProt float, aLB float, aG float, sel int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'hdfs://namenodehost:9000/hive newtable'; select * from table1; return nothing even there is file in the target folder -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861705#comment-13861705 ] Brock Noland commented on HIVE-2599: Hi Swarnim, This looks pretty good! Am I correct that the patch takes care of both selects and inserts? Hi Nick, Do you have a simple example? Brock Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6134) Merging small files based on file size only works for CTAS queries
Eric Chu created HIVE-6134: -- Summary: Merging small files based on file size only works for CTAS queries Key: HIVE-6134 URL: https://issues.apache.org/jira/browse/HIVE-6134 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, 0.11.0, 0.10.0, 0.8.0 Reporter: Eric Chu According to the documentation, if we set hive.merge.mapfiles to true, Hive will launch an additional MR job to merge the small output files at the end of a map-only job when the average output file size is smaller than hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles to true, Hive will merge the output files of a map-reduce job. My expectation is that this is true for all MR queries. However, my observation is that this is only true for CTAS queries. In GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used if ((ctx.getMvTask() != null) (!ctx.getMvTask().isEmpty())). So, for a regular SELECT query that doesn't have move tasks, these properties are not used. Is my understanding correct and if so, what's the reasoning behind the logic of not supporting this for regular SELECT queries? It seems to me that this should be supported for regular SELECT queries as well. One scenario where this hits us hard is when users try to download the result in HUE, and HUE times out b/c there are thousands of output files. The workaround is to re-run the query as CTAS, but it's a significant time sink. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861727#comment-13861727 ] Swarnim Kulkarni commented on HIVE-2599: {quote} Am I correct that the patch takes care of both selects and inserts? {quote} Unfortunately no. This one would allow to querying of custom composite keys but currently doesn't support writing them back to HBase. Do you want me to include that support as a part of this patch itself or open up a separate issue for that? Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861736#comment-13861736 ] Brock Noland commented on HIVE-2599: What happens if an insert is tried? We can address that in a follow on JIRA as long as the results of an insert aren't data corruption or a terrible error message. Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5446) Hive can CREATE an external table but not SELECT from it when file path have spaces
[ https://issues.apache.org/jira/browse/HIVE-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861744#comment-13861744 ] Shuaishuai Nie commented on HIVE-5446: -- Hi [~xuefuz], thanks for the advice. I run the unit test with the latest trunk and the problem still exist, so I think we still need the fix. Also I have validated that the failed tests in HIVE QA is not related to the patch. Hive can CREATE an external table but not SELECT from it when file path have spaces --- Key: HIVE-5446 URL: https://issues.apache.org/jira/browse/HIVE-5446 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5446.1.patch, HIVE-5446.2.patch Create external table table1 (age int, gender string, totBil float, dirBill float, alkphos int, sgpt int, sgot int, totProt float, aLB float, aG float, sel int) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION 'hdfs://namenodehost:9000/hive newtable'; select * from table1; return nothing even there is file in the target folder -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5032) Enable hive creating external table at the root directory of DFS
[ https://issues.apache.org/jira/browse/HIVE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861745#comment-13861745 ] Shuaishuai Nie commented on HIVE-5032: -- Yes [~leftylev], I think documentation is not necessary Enable hive creating external table at the root directory of DFS Key: HIVE-5032 URL: https://issues.apache.org/jira/browse/HIVE-5032 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5032.1.patch, HIVE-5032.2.patch, HIVE-5032.3.patch Creating external table using HIVE with location point to the root directory of DFS will fail because the function HiveFileFormatUtils#doGetPartitionDescFromPath treat authority of the path the same as folder and cannot find a match in the pathToPartitionInfo table when doing prefix match. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5224) When creating table with AVRO serde, the avro.schema.url should be about to load serde schema from file system beside HDFS
[ https://issues.apache.org/jira/browse/HIVE-5224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861746#comment-13861746 ] Hive QA commented on HIVE-5224: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621212/HIVE-5224.4.patch {color:green}SUCCESS:{color} +1 4873 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/796/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/796/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621212 When creating table with AVRO serde, the avro.schema.url should be about to load serde schema from file system beside HDFS Key: HIVE-5224 URL: https://issues.apache.org/jira/browse/HIVE-5224 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5224.1.patch, HIVE-5224.2.patch, HIVE-5224.4.patch, Hive-5224.3.patch Now when loading schema for table with AVRO serde, the file system is hard coded to hdfs in AvroSerdeUtils.java. This should enable loading schema from file system beside hdfs. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5757) Implement vectorized support for CASE
[ https://issues.apache.org/jira/browse/HIVE-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-5757: -- Resolution: Implemented Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Committed to trunk. Implement vectorized support for CASE - Key: HIVE-5757 URL: https://issues.apache.org/jira/browse/HIVE-5757 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-5757.1.patch, HIVE-5757.2.patch, HIVE-5757.3.patch, HIVE-5757.4.patch Implement support for CASE in vectorized mode. The approach is to use the vectorized UDF adaptor internally. A higher-performance version that used VectorExpression subclasses was considered but not done due to complexity. Such a version potentially could be done in the future if it's important enough. This is high priority because CASE is a fairly popular expression. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861751#comment-13861751 ] Gunther Hagleitner commented on HIVE-6125: -- failure seems unrelated. passes locally. Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6125: - Attachment: HIVE-6125.2.patch No change in .2 other than rebasing it. Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861754#comment-13861754 ] Gunther Hagleitner commented on HIVE-6125: -- This one is without golden files (easier to navigate): https://reviews.apache.org/r/16611/ Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Hive-trunk-hadoop2 - Build # 646 - Still Failing
Changes for Build #640 Changes for Build #641 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) [navis] HIVE-4257 : java.sql.SQLNonTransientConnectionException on JDBCStatsAggregator (Teddy Choi via Navis, reviewed by Ashutosh) Changes for Build #642 Changes for Build #643 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #644 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #645 Changes for Build #646 [ehans] HIVE-5757: Implement vectorized support for CASE (Eric Hanson) No tests ran. The Apache Jenkins build system has built Hive-trunk-hadoop2 (build #646) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-hadoop2/646/ to view the results.
[jira] [Updated] (HIVE-6051) Create DecimalColumnVector and a representative VectorExpression for decimal
[ https://issues.apache.org/jira/browse/HIVE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6051: -- Attachment: HIVE-6051.02.patch Create DecimalColumnVector and a representative VectorExpression for decimal Key: HIVE-6051 URL: https://issues.apache.org/jira/browse/HIVE-6051 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-6051.01.patch, HIVE-6051.02.patch Create a DecimalColumnVector to use as a basis for vectorized decimal operations. Include a representative VectorExpression on decimal (e.g. column-column addition) to demonstrate it's use. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6051) Create DecimalColumnVector and a representative VectorExpression for decimal
[ https://issues.apache.org/jira/browse/HIVE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861768#comment-13861768 ] Eric Hanson commented on HIVE-6051: --- Fixed minor formatting issue to allow patch to apply to trunk. Create DecimalColumnVector and a representative VectorExpression for decimal Key: HIVE-6051 URL: https://issues.apache.org/jira/browse/HIVE-6051 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-6051.01.patch, HIVE-6051.02.patch Create a DecimalColumnVector to use as a basis for vectorized decimal operations. Include a representative VectorExpression on decimal (e.g. column-column addition) to demonstrate it's use. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Hive-trunk-h0.21 - Build # 2546 - Still Failing
Changes for Build #2539 Changes for Build #2540 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) Changes for Build #2541 Changes for Build #2542 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #2543 [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #2544 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) Changes for Build #2545 Changes for Build #2546 [ehans] HIVE-5757: Implement vectorized support for CASE (Eric Hanson) No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #2546) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/2546/ to view the results.
[jira] [Commented] (HIVE-5946) DDL authorization task factory should be better tested
[ https://issues.apache.org/jira/browse/HIVE-5946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861793#comment-13861793 ] Brock Noland commented on HIVE-5946: Sounds good, I will rebase this patch after committing the other change. DDL authorization task factory should be better tested -- Key: HIVE-5946 URL: https://issues.apache.org/jira/browse/HIVE-5946 Project: Hive Issue Type: Improvement Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-5946.patch Thejas is working on various authorization issues and one element that might be useful in that effort and increase test coverage and testability would be perform authorization task creation in a factory. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861802#comment-13861802 ] Nick Dimiduk commented on HIVE-2599: bq. Do you have a simple example? The best documentation available is still in the [unit tests|https://github.com/apache/hbase/blob/trunk/hbase-common/src/test/java/org/apache/hadoop/hbase/types/TestStruct.java]. I will do a proper writeup of using this feature, it's just not a priority for me as of late. Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6135) Fix merge error on tez branch (TestCompareCliDriver)
Gunther Hagleitner created HIVE-6135: Summary: Fix merge error on tez branch (TestCompareCliDriver) Key: HIVE-6135 URL: https://issues.apache.org/jira/browse/HIVE-6135 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861808#comment-13861808 ] Brock Noland commented on HIVE-2599: Cool, sounds good. I think we can address this in a follow on JIRA since Swarnim has a working patch here for a common use case. Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6135) Fix merge error on tez branch (TestCompareCliDriver)
[ https://issues.apache.org/jira/browse/HIVE-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6135: - Attachment: HIVE-6135.1.patch Fix merge error on tez branch (TestCompareCliDriver) Key: HIVE-6135 URL: https://issues.apache.org/jira/browse/HIVE-6135 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6135.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-6135) Fix merge error on tez branch (TestCompareCliDriver)
[ https://issues.apache.org/jira/browse/HIVE-6135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-6135. -- Resolution: Fixed Committed to branch. Fix merge error on tez branch (TestCompareCliDriver) Key: HIVE-6135 URL: https://issues.apache.org/jira/browse/HIVE-6135 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6135.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries
[ https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861818#comment-13861818 ] Eric Chu commented on HIVE-6134: [~brocknoland] and [~xuefuz]: I was talking to Yin Huai about this issue and he suggested I pinged you on this, especially on how it affects HUE UX as mentioned above. Merging small files based on file size only works for CTAS queries -- Key: HIVE-6134 URL: https://issues.apache.org/jira/browse/HIVE-6134 Project: Hive Issue Type: Bug Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Eric Chu According to the documentation, if we set hive.merge.mapfiles to true, Hive will launch an additional MR job to merge the small output files at the end of a map-only job when the average output file size is smaller than hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles to true, Hive will merge the output files of a map-reduce job. My expectation is that this is true for all MR queries. However, my observation is that this is only true for CTAS queries. In GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used if ((ctx.getMvTask() != null) (!ctx.getMvTask().isEmpty())). So, for a regular SELECT query that doesn't have move tasks, these properties are not used. Is my understanding correct and if so, what's the reasoning behind the logic of not supporting this for regular SELECT queries? It seems to me that this should be supported for regular SELECT queries as well. One scenario where this hits us hard is when users try to download the result in HUE, and HUE times out b/c there are thousands of output files. The workaround is to re-run the query as CTAS, but it's a significant time sink. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6136) Hive metastore configured with DB2 LUW doesn't work
Thomas Friedrich created HIVE-6136: -- Summary: Hive metastore configured with DB2 LUW doesn't work Key: HIVE-6136 URL: https://issues.apache.org/jira/browse/HIVE-6136 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.12.0 Reporter: Thomas Friedrich Hive 0.12 with datanucleus 3.2.1 generates invalid SQL syntax if the metastore is configured with DB2. To reproduce the issue, simply create a table and drop it using Hive CLI: create table test(i1 int); drop table test; Drop will fail and this is the stacktrace: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-206, SQLSTATE=42703, SQLERRMC=SUBQ.A0.CREATE_TIME, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) at com.ibm.db2.jcc.am.to.c(to.java:2771) at com.ibm.db2.jcc.am.to.d(to.java:2759) at com.ibm.db2.jcc.am.to.a(to.java:2192) at com.ibm.db2.jcc.am.uo.a(uo.java:7827) at com.ibm.db2.jcc.t4.ab.h(ab.java:141) at com.ibm.db2.jcc.t4.ab.b(ab.java:41) at com.ibm.db2.jcc.t4.o.a(o.java:32) at com.ibm.db2.jcc.t4.tb.i(tb.java:145) at com.ibm.db2.jcc.am.to.kb(to.java:2161) at com.ibm.db2.jcc.am.uo.wc(uo.java:3657) at com.ibm.db2.jcc.am.uo.b(uo.java:4454) at com.ibm.db2.jcc.am.uo.jc(uo.java:760) at com.ibm.db2.jcc.am.uo.executeQuery(uo.java:725) at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:172) at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:381) at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:504) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:637) at org.datanucleus.store.query.Query.executeQuery(Query.java:1786) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:266) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitions(ObjectStore.java:1698) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1428) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1402) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) at com.sun.proxy.$Proxy7.getPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.dropPartitionsAndGetLocations(HiveMetaStore.java:1286) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1189) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1328) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler. at com.sun.proxy.$Proxy8.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreCl at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreCl at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaSt at com.sun.proxy.$Proxy9.dropTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:869) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:836) at org.apache.hadoop.hive.ql.exec.DDLTask.dropTable(DDLTask.java:3329) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:277) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at
[jira] [Updated] (HIVE-6136) Hive metastore configured with DB2 LUW doesn't work
[ https://issues.apache.org/jira/browse/HIVE-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Friedrich updated HIVE-6136: --- Attachment: hive.log Hive log output for failed ALTER statement Hive metastore configured with DB2 LUW doesn't work --- Key: HIVE-6136 URL: https://issues.apache.org/jira/browse/HIVE-6136 Project: Hive Issue Type: Bug Components: Metastore Affects Versions: 0.12.0 Reporter: Thomas Friedrich Attachments: hive.log Hive 0.12 with datanucleus 3.2.1 generates invalid SQL syntax if the metastore is configured with DB2. To reproduce the issue, simply create a table and drop it using Hive CLI: create table test(i1 int); drop table test; Drop will fail and this is the stacktrace: com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-206, SQLSTATE=42703, SQLERRMC=SUBQ.A0.CREATE_TIME, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) at com.ibm.db2.jcc.am.to.c(to.java:2771) at com.ibm.db2.jcc.am.to.d(to.java:2759) at com.ibm.db2.jcc.am.to.a(to.java:2192) at com.ibm.db2.jcc.am.uo.a(uo.java:7827) at com.ibm.db2.jcc.t4.ab.h(ab.java:141) at com.ibm.db2.jcc.t4.ab.b(ab.java:41) at com.ibm.db2.jcc.t4.o.a(o.java:32) at com.ibm.db2.jcc.t4.tb.i(tb.java:145) at com.ibm.db2.jcc.am.to.kb(to.java:2161) at com.ibm.db2.jcc.am.uo.wc(uo.java:3657) at com.ibm.db2.jcc.am.uo.b(uo.java:4454) at com.ibm.db2.jcc.am.uo.jc(uo.java:760) at com.ibm.db2.jcc.am.uo.executeQuery(uo.java:725) at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:172) at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:381) at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:504) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:637) at org.datanucleus.store.query.Query.executeQuery(Query.java:1786) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:266) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitions(ObjectStore.java:1698) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1428) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1402) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) at com.sun.proxy.$Proxy7.getPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.dropPartitionsAndGetLocations(HiveMetaStore.java:1286) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1189) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1328) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler. at com.sun.proxy.$Proxy8.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreCl at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreCl at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaSt at com.sun.proxy.$Proxy9.dropTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:869) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:836)
[jira] [Commented] (HIVE-6098) Merge Tez branch into trunk
[ https://issues.apache.org/jira/browse/HIVE-6098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861831#comment-13861831 ] Thejas M Nair commented on HIVE-6098: - FYI, Gunther has created HIVE-6125 which has part of the tez changes, that involve refactoring of the existing hive code. Merge Tez branch into trunk --- Key: HIVE-6098 URL: https://issues.apache.org/jira/browse/HIVE-6098 Project: Hive Issue Type: New Feature Affects Versions: 0.12.0 Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6098.1.patch, HIVE-6098.2.patch, HIVE-6098.3.patch, HIVE-6098.4.patch, HIVE-6098.5.patch, HIVE-6098.6.patch, HIVE-6098.7.patch, hive-on-tez-conf.txt I think the Tez branch is at a point where we can consider merging it back into trunk after review. Tez itself has had its first release, most hive features are available on Tez and the test coverage is decent. There are a few known limitations, all of which can be handled in trunk as far as I can tell (i.e.: None of them are large disruptive changes that still require a branch.) Limitations: - Union all is not yet supported on Tez - SMB is not yet supported on Tez - Bucketed map-join is executed as broadcast join (bucketing is ignored) Since the user is free to toggle hive.optimize.tez, it's obviously possible to just run these on MR. I am hoping to follow the approach that was taken with vectorization and shoot for a merge instead of single commit. This would retain history of the branch. Also in vectorization we required at least three +1s before merge, I'm hoping to go with that as well. I will add a combined patch to this ticket for review purposes (not for commit). I'll also attach instructions to run on a cluster if anyone wants to try. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6136) Hive metastore configured with DB2 LUW doesn't work
[ https://issues.apache.org/jira/browse/HIVE-6136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-6136: --- Description: Hive 0.12 with datanucleus 3.2.1 generates invalid SQL syntax if the metastore is configured with DB2. To reproduce the issue, simply create a table and drop it using Hive CLI: create table test(i1 int); drop table test; Drop will fail and this is the stacktrace: {noformat} com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-206, SQLSTATE=42703, SQLERRMC=SUBQ.A0.CREATE_TIME, DRIVER=4.16.53 at com.ibm.db2.jcc.am.fd.a(fd.java:739) at com.ibm.db2.jcc.am.fd.a(fd.java:60) at com.ibm.db2.jcc.am.fd.a(fd.java:127) at com.ibm.db2.jcc.am.to.c(to.java:2771) at com.ibm.db2.jcc.am.to.d(to.java:2759) at com.ibm.db2.jcc.am.to.a(to.java:2192) at com.ibm.db2.jcc.am.uo.a(uo.java:7827) at com.ibm.db2.jcc.t4.ab.h(ab.java:141) at com.ibm.db2.jcc.t4.ab.b(ab.java:41) at com.ibm.db2.jcc.t4.o.a(o.java:32) at com.ibm.db2.jcc.t4.tb.i(tb.java:145) at com.ibm.db2.jcc.am.to.kb(to.java:2161) at com.ibm.db2.jcc.am.uo.wc(uo.java:3657) at com.ibm.db2.jcc.am.uo.b(uo.java:4454) at com.ibm.db2.jcc.am.uo.jc(uo.java:760) at com.ibm.db2.jcc.am.uo.executeQuery(uo.java:725) at com.jolbox.bonecp.PreparedStatementHandle.executeQuery(PreparedStatementHandle.java:172) at org.datanucleus.store.rdbms.ParamLoggingPreparedStatement.executeQuery(ParamLoggingPreparedStatement.java:381) at org.datanucleus.store.rdbms.SQLController.executeStatementQuery(SQLController.java:504) at org.datanucleus.store.rdbms.query.JDOQLQuery.performExecute(JDOQLQuery.java:637) at org.datanucleus.store.query.Query.executeQuery(Query.java:1786) at org.datanucleus.store.query.Query.executeWithArray(Query.java:1672) at org.datanucleus.api.jdo.JDOQuery.execute(JDOQuery.java:266) at org.apache.hadoop.hive.metastore.ObjectStore.listMPartitions(ObjectStore.java:1698) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsInternal(ObjectStore.java:1428) at org.apache.hadoop.hive.metastore.ObjectStore.getPartitions(ObjectStore.java:1402) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:124) at com.sun.proxy.$Proxy7.getPartitions(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.dropPartitionsAndGetLocations(HiveMetaStore.java:1286) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1189) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_with_environment_context(HiveMetaStore.java:1328) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler. at com.sun.proxy.$Proxy8.drop_table_with_environment_context(Unknown Source) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreCl at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreCl at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.j at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaSt at com.sun.proxy.$Proxy9.dropTable(Unknown Source) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:869) at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:836) at org.apache.hadoop.hive.ql.exec.DDLTask.dropTable(DDLTask.java:3329) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:277) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:151) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:65) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1437) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1215) at
[jira] [Commented] (HIVE-6089) Add metrics to HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-6089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861841#comment-13861841 ] Thiruvel Thirumoolan commented on HIVE-6089: [~jaideepdhok] Thanks for the feedback. As this is the first metrics patch, I will add everything that's straightforward. Will add others in a followup JIRA. Add metrics to HiveServer2 -- Key: HIVE-6089 URL: https://issues.apache.org/jira/browse/HIVE-6089 Project: Hive Issue Type: Improvement Components: Diagnosability, HiveServer2 Affects Versions: 0.12.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Fix For: 0.13.0 Attachments: HIVE-6089_prototype.patch Would like to collect metrics about HiveServer's usage, like active connections, total requests etc. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries
[ https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861846#comment-13861846 ] Xuefu Zhang commented on HIVE-6134: --- It seems reasonable to me that these flags kicks in only for CTAS, or other queries that resulting a new table. In other words, the functionality of merging small files for a table should be applied to table (upon request) rather than coming in effect for any query that touches the table. I think what is missing is a new command/query something like MERGE FILES FOR TABLE table_name. This might be further automated in a scheduled fashion in HiveServer2. Of course, the scope is much larger. Merging small files based on file size only works for CTAS queries -- Key: HIVE-6134 URL: https://issues.apache.org/jira/browse/HIVE-6134 Project: Hive Issue Type: Bug Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Eric Chu According to the documentation, if we set hive.merge.mapfiles to true, Hive will launch an additional MR job to merge the small output files at the end of a map-only job when the average output file size is smaller than hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles to true, Hive will merge the output files of a map-reduce job. My expectation is that this is true for all MR queries. However, my observation is that this is only true for CTAS queries. In GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used if ((ctx.getMvTask() != null) (!ctx.getMvTask().isEmpty())). So, for a regular SELECT query that doesn't have move tasks, these properties are not used. Is my understanding correct and if so, what's the reasoning behind the logic of not supporting this for regular SELECT queries? It seems to me that this should be supported for regular SELECT queries as well. One scenario where this hits us hard is when users try to download the result in HUE, and HUE times out b/c there are thousands of output files. The workaround is to re-run the query as CTAS, but it's a significant time sink. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5795: Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the contribution [~shuainie] ! [~leftylev] [~shuainie] Should we create a followup jira for adding documentation to wiki ? Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5795: Issue Type: New Feature (was: Bug) Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5795: Fix Version/s: 0.13.0 Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5032) Enable hive creating external table at the root directory of DFS
[ https://issues.apache.org/jira/browse/HIVE-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861858#comment-13861858 ] Hive QA commented on HIVE-5032: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621218/HIVE-5032.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4873 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_infer_bucket_sort_map_operators {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/797/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/797/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12621218 Enable hive creating external table at the root directory of DFS Key: HIVE-5032 URL: https://issues.apache.org/jira/browse/HIVE-5032 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-5032.1.patch, HIVE-5032.2.patch, HIVE-5032.3.patch Creating external table using HIVE with location point to the root directory of DFS will fail because the function HiveFileFormatUtils#doGetPartitionDescFromPath treat authority of the path the same as folder and cannot find a match in the pathToPartitionInfo table when doing prefix match. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries
[ https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861870#comment-13861870 ] Eric Chu commented on HIVE-6134: Thanks Xuefu for the quick response! A few questions/comments: 1. Could you elaborate on why you think it makes sense to only merge small files for queries resulting in a new table? Alternatively, what are the issues for supporting these properties for regular queries? I'd love to have this support for regular queries, unless there's a strong reason against it. 2. If indeed these properties are designed only for queries resulting in a new table, then we should mention that in the documentation. Currently it's misleading - it sounds like they'd work for regular queries as well. 3. The main pain point here is that users won't know that there are many output files until AFTER the query is run. Imagine analysts who don't know these details and HUE is the only query interface for them. It's frustrating and time consuming to run a long-running query in Hue, only to find out they can't get the results b/c HUE times out trying to read these many small files, and so they'll have to run the query again as CTAS. Having a table just so they could download the result seems to be an overkill. 4. Do you have a suggestion for the aforementioned HUE issue? Hue starts timing out when the query results in thousands of small output files. This is a major pain point for our analysts today. Merging small files based on file size only works for CTAS queries -- Key: HIVE-6134 URL: https://issues.apache.org/jira/browse/HIVE-6134 Project: Hive Issue Type: Bug Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Eric Chu According to the documentation, if we set hive.merge.mapfiles to true, Hive will launch an additional MR job to merge the small output files at the end of a map-only job when the average output file size is smaller than hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles to true, Hive will merge the output files of a map-reduce job. My expectation is that this is true for all MR queries. However, my observation is that this is only true for CTAS queries. In GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used if ((ctx.getMvTask() != null) (!ctx.getMvTask().isEmpty())). So, for a regular SELECT query that doesn't have move tasks, these properties are not used. Is my understanding correct and if so, what's the reasoning behind the logic of not supporting this for regular SELECT queries? It seems to me that this should be supported for regular SELECT queries as well. One scenario where this hits us hard is when users try to download the result in HUE, and HUE times out b/c there are thousands of output files. The workaround is to re-run the query as CTAS, but it's a significant time sink. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6125: - Status: Patch Available (was: Open) Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6125: - Status: Open (was: Patch Available) Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6125: - Attachment: HIVE-6125.3.patch .3 is another rebase Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6134) Merging small files based on file size only works for CTAS queries
[ https://issues.apache.org/jira/browse/HIVE-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861903#comment-13861903 ] Xuefu Zhang commented on HIVE-6134: --- [~ericchu30] I guess my above comments was a little off the topic. I thought the problem you mentioned was about too many small files for a table (which my comments above was mostly about) but now I realized that the problem is about a query resulting too many tables. Thanks for your clarifications. The two problems are different yet seemingly related. I'm wondering if the problem #2 (too many small files from a query) is root caused by problem #1 (too many small files for a table). I cannot image a case of that (besides too many mappers or reducers), but appreciate if you can share your case. If the answer is yes, then the proposal that I outlined above may prevent problem #2 from happening. If no, then it may makes sense to have both. For information only, HIVE-439, which originally introduced the merge feature, seems targeting only at small files from mappers, no mentioning either this is for query result or table files. However, the comments did mention about movetask, which may be related to the code you saw. For the Hue issue you mentioned, I'd think that getting rid of the small files one way or the other seems reasonable. Merging small files based on file size only works for CTAS queries -- Key: HIVE-6134 URL: https://issues.apache.org/jira/browse/HIVE-6134 Project: Hive Issue Type: Bug Affects Versions: 0.8.0, 0.10.0, 0.11.0, 0.12.0 Reporter: Eric Chu According to the documentation, if we set hive.merge.mapfiles to true, Hive will launch an additional MR job to merge the small output files at the end of a map-only job when the average output file size is smaller than hive.merge.smallfiles.avgsize. Similarly, by setting hive.merge.mapredfiles to true, Hive will merge the output files of a map-reduce job. My expectation is that this is true for all MR queries. However, my observation is that this is only true for CTAS queries. In GenMRFileSink1.java, HIVEMERGEMAPFILES and HIVEMERGEMAPREDFILES are only used if ((ctx.getMvTask() != null) (!ctx.getMvTask().isEmpty())). So, for a regular SELECT query that doesn't have move tasks, these properties are not used. Is my understanding correct and if so, what's the reasoning behind the logic of not supporting this for regular SELECT queries? It seems to me that this should be supported for regular SELECT queries as well. One scenario where this hits us hard is when users try to download the result in HUE, and HUE times out b/c there are thousands of output files. The workaround is to re-run the query as CTAS, but it's a significant time sink. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-5795: - Release Note: hive.file.max.footer Default Value: 100 Max number of lines of footer user can set for a table file. skip.header.line.count Default Value: 0 Number of header lines for the table file. skip.footer.line.count Default Value: 0 Number of footer lines for the table file. Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861911#comment-13861911 ] Shuaishuai Nie commented on HIVE-5795: -- Thanks [~leftylev] [~thejas]. It seems I don't have permission to edit wiki directly. I have added the configuration details in the release note. Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Hive-trunk-hadoop2 - Build # 647 - Still Failing
Changes for Build #640 Changes for Build #641 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) [navis] HIVE-4257 : java.sql.SQLNonTransientConnectionException on JDBCStatsAggregator (Teddy Choi via Navis, reviewed by Ashutosh) Changes for Build #642 Changes for Build #643 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #644 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #645 Changes for Build #646 [ehans] HIVE-5757: Implement vectorized support for CASE (Eric Hanson) Changes for Build #647 [thejas] HIVE-5795 : Hive should be able to skip header and footer rows when reading data file for a table (Shuaishuai Nie via Thejas Nair) No tests ran. The Apache Jenkins build system has built Hive-trunk-hadoop2 (build #647) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-hadoop2/647/ to view the results.
Hive-trunk-h0.21 - Build # 2547 - Still Failing
Changes for Build #2539 Changes for Build #2540 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) Changes for Build #2541 Changes for Build #2542 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #2543 [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #2544 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) Changes for Build #2545 Changes for Build #2546 [ehans] HIVE-5757: Implement vectorized support for CASE (Eric Hanson) Changes for Build #2547 [thejas] HIVE-5795 : Hive should be able to skip header and footer rows when reading data file for a table (Shuaishuai Nie via Thejas Nair) No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #2547) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/2547/ to view the results.
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-5795: - Release Note: hive.file.max.footer Default Value: 100 Max number of lines of footer user can set for a table file. skip.header.line.count Default Value: 0 Number of header lines for the table file. skip.footer.line.count Default Value: 0 Number of footer lines for the table file. skip.footer.line.count and skip.header.line.count should be specified in the table property during creating the table. Following example shows the usage of these two properties: Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); was: hive.file.max.footer Default Value: 100 Max number of lines of footer user can set for a table file. skip.header.line.count Default Value: 0 Number of header lines for the table file. skip.footer.line.count Default Value: 0 Number of footer lines for the table file. skip.footer.line.count and skip.header.line.count should be specified in the table property during creating the table. Following example shows the usage of these two properties: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuaishuai Nie updated HIVE-5795: - Release Note: hive.file.max.footer Default Value: 100 Max number of lines of footer user can set for a table file. skip.header.line.count Default Value: 0 Number of header lines for the table file. skip.footer.line.count Default Value: 0 Number of footer lines for the table file. skip.footer.line.count and skip.header.line.count should be specified in the table property during creating the table. Following example shows the usage of these two properties: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} was: hive.file.max.footer Default Value: 100 Max number of lines of footer user can set for a table file. skip.header.line.count Default Value: 0 Number of header lines for the table file. skip.footer.line.count Default Value: 0 Number of footer lines for the table file. Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4773) webhcat intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861922#comment-13861922 ] Hive QA commented on HIVE-4773: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621220/HIVE-4773.4.patch {color:green}SUCCESS:{color} +1 4874 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/798/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/798/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621220 webhcat intermittently fail to commit output to file system --- Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch, HIVE-4773.3.patch, HIVE-4773.4.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861945#comment-13861945 ] Thejas M Nair commented on HIVE-6125: - LGTM. Just some minor comments in review board, otherwise +1. Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861962#comment-13861962 ] Lefty Leverenz commented on HIVE-5795: -- Got it, thanks [~shuainie]. One doc question: can skip.footer.line.count and skip.header.line.count be changed, or specified for the first time, with ALTER TABLE tbl SET TBLPROPERTIES and if so would any problems ensue? (Hm, that's two or three questions. Here's another: can the values vary by partition?) [~thejas], a followup jira isn't needed to get the doc task done, because it's already on my to-do list. I'll post a comment here when the doc is ready for review. TL;DR: This jira has a doc release note, so that covers the record-keeping requirement. The new config parameter and table properties are named here, so search capability is covered. The only question is whether we want all doc tasks to have separate jiras. I don't see any immediate advantage to that policy although we might want to move in that direction eventually. Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5795) Hive should be able to skip header and footer rows when reading data file for a table
[ https://issues.apache.org/jira/browse/HIVE-5795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861969#comment-13861969 ] Shuaishuai Nie commented on HIVE-5795: -- Hi [~leftylev], this property name cannot be changes, and also since this is a table level property, it will apply to all partitions on the table. User cannot set this property on partition level. It should also works for ALTER TABLE statement if is not set when creating the table. Hive should be able to skip header and footer rows when reading data file for a table - Key: HIVE-5795 URL: https://issues.apache.org/jira/browse/HIVE-5795 Project: Hive Issue Type: New Feature Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Fix For: 0.13.0 Attachments: HIVE-5795.1.patch, HIVE-5795.2.patch, HIVE-5795.3.patch, HIVE-5795.4.patch, HIVE-5795.5.patch Hive should be able to skip header and footer lines when reading data file from table. In this way, user don't need to processing data which generated by other application with a header or footer and directly use the file for table operations. To implement this, the idea is adding new properties in table descriptions to define the number of lines in header and footer and skip them when reading the record from record reader. An DDL example for creating a table with header and footer should be like this: {code} Create external table testtable (name string, message string) row format delimited fields terminated by '\t' lines terminated by '\n' location '/testtable' tblproperties (skip.header.line.count=1, skip.footer.line.count=2); {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-4887) hive should have an option to disable non sql commands that impose security risk
[ https://issues.apache.org/jira/browse/HIVE-4887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861993#comment-13861993 ] Jason Dere commented on HIVE-4887: -- Hi [~thejas], [~brocknoland], on the subject of UDFs/JARs, for HIVE-6047 I was proposing that Hive have registered sets of jars which would be referenced by the UDFs, as opposed to URIs. It would still be similar in that privileges could be defined so that users can only create UDFs using jars sets they can access. Let me know if you guys would be ok with that. hive should have an option to disable non sql commands that impose security risk Key: HIVE-4887 URL: https://issues.apache.org/jira/browse/HIVE-4887 Project: Hive Issue Type: Sub-task Components: Authorization, Security Reporter: Thejas M Nair Original Estimate: 72h Remaining Estimate: 72h Hive's RDBMS style of authorization (using grant/revoke), relies on all data access being done through hive select queries. But hive also supports running dfs commands, shell commands (eg !cat file), and shell commands through hive streaming. This creates problems in securing a hive server using this authorization model. UDF is another way to write custom code that can compromise security, but you can control that by restricting access to users to be only through jdbc connection to hive server (2). (note that there are other major problems such as this one - HIVE-3271) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861998#comment-13861998 ] Hive QA commented on HIVE-2599: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12597571/HIVE-2599.2.patch.txt Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/800/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/800/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests exited with: NonZeroExitCodeException Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m ' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m ' + export 'M2_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + M2_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-800/source-prep.txt + [[ false == \t\r\u\e ]] + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists2.q.out' Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists.q.out' Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_partition_exists3.q.out' Reverted 'ql/src/test/results/clientnegative/exchange_partition_neg_incomplete_partition.q.out' Reverted 'ql/src/test/results/clientpositive/exchange_partition3.q.out' Reverted 'ql/src/test/results/clientpositive/exchange_partition.q.out' Reverted 'ql/src/test/results/clientpositive/exchange_partition2.q.out' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_incomplete_partition.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists2.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists3.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_missing.q' Reverted 'ql/src/test/queries/clientnegative/exchange_partition_neg_partition_exists.q' Reverted 'ql/src/test/queries/clientpositive/exchange_partition.q' Reverted 'ql/src/test/queries/clientpositive/exchange_partition2.q' Reverted 'ql/src/test/queries/clientpositive/exchange_partition3.q' Reverted 'ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf target datanucleus.log ant/target shims/target shims/0.20/target shims/0.20S/target shims/0.23/target shims/aggregator/target shims/common/target shims/common-secure/target packaging/target hbase-handler/target testutils/target jdbc/target metastore/target itests/target itests/hcatalog-unit/target itests/test-serde/target itests/qtest/target itests/hive-unit/target itests/custom-serde/target itests/util/target hcatalog/target hcatalog/storage-handlers/hbase/target hcatalog/server-extensions/target hcatalog/core/target hcatalog/webhcat/svr/target hcatalog/webhcat/java-client/target hcatalog/hcatalog-pig-adapter/target hwi/target common/target common/src/gen contrib/target service/target serde/target beeline/target odbc/target cli/target ql/dependency-reduced-pom.xml ql/target + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1555274. At revision 1555274. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0, p1, or p2 + exit 1 ' {noformat} This message is automatically generated. ATTACHMENT ID: 12597571 Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key
[jira] [Commented] (HIVE-6129) alter exchange is implemented in inverted manner
[ https://issues.apache.org/jira/browse/HIVE-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13861995#comment-13861995 ] Hive QA commented on HIVE-6129: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621248/HIVE-6129.1.patch.txt {color:green}SUCCESS:{color} +1 4874 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/799/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/799/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621248 alter exchange is implemented in inverted manner Key: HIVE-6129 URL: https://issues.apache.org/jira/browse/HIVE-6129 Project: Hive Issue Type: Bug Reporter: Navis Assignee: Navis Priority: Critical Attachments: HIVE-6129.1.patch.txt see https://issues.apache.org/jira/browse/HIVE-4095?focusedCommentId=13819885page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13819885 alter exchange should be implemented accord to document in https://cwiki.apache.org/confluence/display/Hive/Exchange+Partition. i.e {code} alter table T1 exchange partition (ds='1') with table T2 {code} should be (after creating T1@ds=1) {quote} moves the data from T2 to T1@ds=1 {quote} -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6124) Support basic Decimal arithmetic in vector mode (+, -, *)
[ https://issues.apache.org/jira/browse/HIVE-6124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6124: -- Attachment: HIVE-6124.02.patch Finished patch, parked here for safe-keeping. Supports col-col, scalar-col, and col-scalar decimal operations. Unit tests included. Currently includes DecimalColumnVector, which needs to be removed before commit (after commit of HIVE-6051). Support basic Decimal arithmetic in vector mode (+, -, *) - Key: HIVE-6124 URL: https://issues.apache.org/jira/browse/HIVE-6124 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Attachments: HIVE-6124.01.patch, HIVE-6124.02.patch Create support for basic decimal arithmetic (+, -, * but not /, %) based on templates for column-scalar, scalar-column, and column-column operations. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5923) SQL std auth - parser changes
[ https://issues.apache.org/jira/browse/HIVE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5923: Resolution: Fixed Fix Version/s: 0.13.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks for the review Brock! (Regarding commit wait period - The original version of the patch was reviewed yesterday, and new version reviewed again today has only minor changes. ) SQL std auth - parser changes - Key: HIVE-5923 URL: https://issues.apache.org/jira/browse/HIVE-5923 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-5923.1.patch, HIVE-5923.2.patch, HIVE-5923.3.patch, HIVE-5923.4.patch Original Estimate: 96h Time Spent: 72h Remaining Estimate: 12h There are new access control statements proposed in the functional spec in HIVE-5837 . It also proposes some small changes to the existing query syntax (mostly extensions and some optional keywords). The syntax supported should depend on the current authorization mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6137) Hive should report that the file/path doesn’t exist when it doesn’t (it now reports SocketTimeoutException)
Hari Sankar Sivarama Subramaniyan created HIVE-6137: --- Summary: Hive should report that the file/path doesn’t exist when it doesn’t (it now reports SocketTimeoutException) Key: HIVE-6137 URL: https://issues.apache.org/jira/browse/HIVE-6137 Project: Hive Issue Type: Bug Reporter: Hari Sankar Sivarama Subramaniyan Assignee: Hari Sankar Sivarama Subramaniyan Hive should report that the file/path doesn’t exist when it doesn’t (it now reports SocketTimeoutException): Execute a Hive DDL query with a reference to a non-existent blob (such as CREATE EXTERNAL TABLE...) and check Hive logs (stderr): FAILED: Error in metadata: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask This error message is not intuitive. If a file doesn't exist, Hive should report FileNotFoundException. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6125: - Status: Patch Available (was: Open) Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch, HIVE-6125.4.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6125: - Status: Open (was: Patch Available) Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch, HIVE-6125.4.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6125: - Attachment: HIVE-6125.4.patch .4 addresses the review comments. Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch, HIVE-6125.4.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Hive-trunk-hadoop2 - Build # 648 - Still Failing
Changes for Build #640 Changes for Build #641 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) [navis] HIVE-4257 : java.sql.SQLNonTransientConnectionException on JDBCStatsAggregator (Teddy Choi via Navis, reviewed by Ashutosh) Changes for Build #642 Changes for Build #643 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #644 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #645 Changes for Build #646 [ehans] HIVE-5757: Implement vectorized support for CASE (Eric Hanson) Changes for Build #647 [thejas] HIVE-5795 : Hive should be able to skip header and footer rows when reading data file for a table (Shuaishuai Nie via Thejas Nair) Changes for Build #648 [thejas] HIVE-5923 : SQL std auth - parser changes (Thejas Nair, reviewed by Brock Noland) No tests ran. The Apache Jenkins build system has built Hive-trunk-hadoop2 (build #648) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-hadoop2/648/ to view the results.
Re: [DISCUSS] Proposed Changes to the Apache Hive Project Bylaws
One other benefit in rotating chairs is that it exposes more of Hive’s PMC members to the board and other Apache old timers. This is helpful in getting better integrated into Apache and becoming a candidate for Apache membership. It is also an excellent education in the Apache Way for those who serve. Alan. On Dec 31, 2013, at 3:30 PM, Lefty Leverenz leftylever...@gmail.com wrote: Okay, I'm convinced that one-year terms for the chair are reasonable. Thanks for the reassurance, Edward and Thejas. Is 24h rule is needed at all? In other projects, I've seen patches simply reverted by author (or someone else). It's a rare occurrence, and it should be possible to revert a patch if someone -1s it after commit, esp. within the same 24 hours when not many other changes are in. Sergey makes a good point, but the 24h rule seems helpful in prioritizing tasks. We're all deadline-driven, right? I'm the chief culprit of seeing patch available and ignoring it until it has been committed. Then if I find some minor typo or doc issue, I'm embarrassed at posting a comment after the commit because nobody wants to revert a patch just for documentation. -- Lefty On Sun, Dec 29, 2013 at 12:06 PM, Thejas Nair the...@hortonworks.comwrote: On Sun, Dec 29, 2013 at 12:06 AM, Lefty Leverenz leftylever...@gmail.com wrote: Let's discuss annual rotation of the PMC chair a bit more. Although I agree with the points made in favor, I wonder about frequent loss of expertise and needing to establish new relationships. What's the ramp-up time? The ramp up time is not significant, as you can see from the list of responsibilities mentioned here - http://www.apache.org/dev/pmc.html#chair . We have enough people in PMC who have been involved with Apache project for long time and are familiar with apache bylaws and way of doing things. Also, the former PMC chairs are likely to be around to help as needed. Could a current chair be chosen for another consecutive term? Could two chairs alternate years indefinitely? I would take the meaning of rotation to mean that we have a new chair for the next term. I think it should be OK to have same chair in alternative year. 2 years is a long time and it sounds reasonable given the size of the community ! :) Do many other projects have annual rotations? Yes, at least hadoop and pig project have that. I could not find by-laws pages easily for other projects. Would it be inconvenient to change chairs in the middle of a release? No. The PMC Chair position does not have any special role in a release. And now to trivialize my comments: while making other changes, let's fix this typo: Membership of the PMC can be revoked by an unanimous vote ... *(should be a unanimous ... just like a university because the rule is based on sound, not spelling)*. I think you should feel free to fix such a typos in this wiki without a vote on it ! :) -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: How do you run single query test(s) after mavenization?
The rest of the ant instances are okay because the MVN section afterwards gives the alternative, but should we keep ant or make the replacements? - 9. Now you can run the ant 'thriftif' target ... - 11. ant thriftif -Dthrift.home=... - 15. ant thriftif - 18. ant clean package - The maven equivalent of ant thriftif is: mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local I have not generated the thrift stuff recently. It would be great if Alan or someone else who has would update this section. I can take a look at this. It works with pretty minimal changes. Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (HIVE-5155) Support secure proxy user access to HiveServer2
[ https://issues.apache.org/jira/browse/HIVE-5155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862038#comment-13862038 ] Shivaraju Gowda commented on HIVE-5155: --- It is important to note that the middle ware server does not have access to Principal's credentials. All it has is a javax.security.auth.Subject(Subject) from the end-user(Principal) and can do a Subject.doAS() to connect to HiveServer2. In Proposal 2, the middle ware server is expected to have access to Hadoop-level super-user's credentials(by doing kinit) or it has the Subject from a Hadoop-level super-user which has been passed on to it. In the code I have attached above, I am trying to show that any end-user's Subject can be effectively used to connect to HiveServer2 using Subject.doAs() in the middle ware server. This will allow multi-user kerberos access through the middleware server without additional requirements of proxy access. I might have overlooked or be unaware of some limitations of such an approach, so I am soliciting feedback to check that. Support secure proxy user access to HiveServer2 --- Key: HIVE-5155 URL: https://issues.apache.org/jira/browse/HIVE-5155 Project: Hive Issue Type: Improvement Components: Authentication, HiveServer2, JDBC Affects Versions: 0.12.0 Reporter: Prasad Mujumdar Assignee: Prasad Mujumdar Attachments: HIVE-5155-1-nothrift.patch, HIVE-5155-noThrift.2.patch, HIVE-5155-noThrift.4.patch, HIVE-5155-noThrift.5.patch, HIVE-5155-noThrift.6.patch, HIVE-5155.1.patch, HIVE-5155.2.patch, HIVE-5155.3.patch, ProxyAuth.java, ProxyAuth.out, TestKERBEROS_Hive_JDBC.java The HiveServer2 can authenticate a client using via Kerberos and impersonate the connecting user with underlying secure hadoop. This becomes a gateway for a remote client to access secure hadoop cluster. Now this works fine for when the client obtains Kerberos ticket and directly connects to HiveServer2. There's another big use case for middleware tools where the end user wants to access Hive via another server. For example Oozie action or Hue submitting queries or a BI tool server accessing to HiveServer2. In these cases, the third party server doesn't have end user's Kerberos credentials and hence it can't submit queries to HiveServer2 on behalf of the end user. This ticket is for enabling proxy access to HiveServer2 for third party tools on behalf of end users. There are two parts of the solution proposed in this ticket: 1) Delegation token based connection for Oozie (OOZIE-1457) This is the common mechanism for Hadoop ecosystem components. Hive Remote Metastore and HCatalog already support this. This is suitable for tool like Oozie that submits the MR jobs as actions on behalf of its client. Oozie already uses similar mechanism for Metastore/HCatalog access. 2) Direct proxy access for privileged hadoop users The delegation token implementation can be a challenge for non-hadoop (especially non-java) components. This second part enables a privileged user to directly specify an alternate session user during the connection. If the connecting user has hadoop level privilege to impersonate the requested userid, then HiveServer2 will run the session as that requested user. For example, user Hue is allowed to impersonate user Bob (via core-site.xml proxy user configuration). Then user Hue can connect to HiveServer2 and specify Bob as session user via a session property. HiveServer2 will verify Hue's proxy user privilege and then impersonate user Bob instead of Hue. This will enable any third party tool to impersonate alternate userid without having to implement delegation token connection. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6138) Tez: Add some additional comments to clarify intent
Gunther Hagleitner created HIVE-6138: Summary: Tez: Add some additional comments to clarify intent Key: HIVE-6138 URL: https://issues.apache.org/jira/browse/HIVE-6138 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6138) Tez: Add some additional comments to clarify intent
[ https://issues.apache.org/jira/browse/HIVE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-6138: - Attachment: HIVE-6138.1.patch Tez: Add some additional comments to clarify intent --- Key: HIVE-6138 URL: https://issues.apache.org/jira/browse/HIVE-6138 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6138.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Resolved] (HIVE-6138) Tez: Add some additional comments to clarify intent
[ https://issues.apache.org/jira/browse/HIVE-6138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner resolved HIVE-6138. -- Resolution: Fixed Committed to branch. Tez: Add some additional comments to clarify intent --- Key: HIVE-6138 URL: https://issues.apache.org/jira/browse/HIVE-6138 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-6138.1.patch -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6017) Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive
[ https://issues.apache.org/jira/browse/HIVE-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862044#comment-13862044 ] Lefty Leverenz commented on HIVE-6017: -- Okay, thanks Eric. Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive --- Key: HIVE-6017 URL: https://issues.apache.org/jira/browse/HIVE-6017 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-6017.01.patch, HIVE-6017.02.patch, HIVE-6017.03.patch, HIVE-6017.04.patch Contribute the Decimal128 high-performance decimal package developed by Microsoft to Hive. This was originally written for Microsoft PolyBase by Hideaki Kimura. This code is about 8X more efficient than Java BigDecimal for typical operations. It uses a finite (128 bit) precision and can handle up to decimal(38, X). It is also mutable so you can change the contents of an existing object. This helps reduce the cost of new() and garbage collection. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Hive-trunk-h0.21 - Build # 2548 - Still Failing
Changes for Build #2539 Changes for Build #2540 [navis] HIVE-5414 : The result of show grant is not visible via JDBC (Navis reviewed by Thejas M Nair) Changes for Build #2541 Changes for Build #2542 [ehans] HIVE-6017: Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive (Hideaki Kumura via Eric Hanson) Changes for Build #2543 [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression II (Navis via cws) [cws] HIVE-3746: Fix HS2 ResultSet Serialization Performance Regression (Navis via cws) [jitendra] HIVE-6010: TestCompareCliDriver enables tests that would ensure vectorization produces same results as non-vectorized execution (Sergey Shelukhin via Jitendra Pandey) Changes for Build #2544 [cws] HIVE-5911: Recent change to schema upgrade scripts breaks file naming conventions (Sergey Shelukhin via cws) Changes for Build #2545 Changes for Build #2546 [ehans] HIVE-5757: Implement vectorized support for CASE (Eric Hanson) Changes for Build #2547 [thejas] HIVE-5795 : Hive should be able to skip header and footer rows when reading data file for a table (Shuaishuai Nie via Thejas Nair) Changes for Build #2548 [thejas] HIVE-5923 : SQL std auth - parser changes (Thejas Nair, reviewed by Brock Noland) No tests ran. The Apache Jenkins build system has built Hive-trunk-h0.21 (build #2548) Status: Still Failing Check console output at https://builds.apache.org/job/Hive-trunk-h0.21/2548/ to view the results.
[jira] [Commented] (HIVE-5923) SQL std auth - parser changes
[ https://issues.apache.org/jira/browse/HIVE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862053#comment-13862053 ] Lefty Leverenz commented on HIVE-5923: -- Should this be documented now or should we wait for the umbrella jira (HIVE-5837) or release 0.13.0, whichever comes first? SQL std auth - parser changes - Key: HIVE-5923 URL: https://issues.apache.org/jira/browse/HIVE-5923 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-5923.1.patch, HIVE-5923.2.patch, HIVE-5923.3.patch, HIVE-5923.4.patch Original Estimate: 96h Time Spent: 72h Remaining Estimate: 12h There are new access control statements proposed in the functional spec in HIVE-5837 . It also proposes some small changes to the existing query syntax (mostly extensions and some optional keywords). The syntax supported should depend on the current authorization mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-5771) Constant propagation optimizer for Hive
[ https://issues.apache.org/jira/browse/HIVE-5771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862064#comment-13862064 ] Eric Hanson commented on HIVE-5771: --- Where does this patch stand? Ted, are you going to move it forward? Constant propagation optimizer for Hive --- Key: HIVE-5771 URL: https://issues.apache.org/jira/browse/HIVE-5771 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Ted Xu Assignee: Ted Xu Attachments: HIVE-5771.1.patch, HIVE-5771.2.patch, HIVE-5771.3.patch, HIVE-5771.4.patch, HIVE-5771.patch Currently there is no constant folding/propagation optimizer, all expressions are evaluated at runtime. HIVE-2470 did a great job on evaluating constants on UDF initializing phase, however, it is still a runtime evaluation and it doesn't propagate constants from a subquery to outside. It may reduce I/O and accelerate process if we introduce such an optimizer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6139) Implement vectorized decimal division and modulo
Eric Hanson created HIVE-6139: - Summary: Implement vectorized decimal division and modulo Key: HIVE-6139 URL: https://issues.apache.org/jira/browse/HIVE-6139 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-6139) Implement vectorized decimal division and modulo
[ https://issues.apache.org/jira/browse/HIVE-6139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eric Hanson updated HIVE-6139: -- Description: Support column-scalar, scalar-column, and column-column versions for division and modulo. Include unit tests. Implement vectorized decimal division and modulo Key: HIVE-6139 URL: https://issues.apache.org/jira/browse/HIVE-6139 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Support column-scalar, scalar-column, and column-column versions for division and modulo. Include unit tests. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
checking progress of automated tests for a patch
Is there a way to check the progress of the automated tests after you've uploaded a patch? If so, how? Thanks, Eric
[jira] [Commented] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862080#comment-13862080 ] Hive QA commented on HIVE-6125: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621386/HIVE-6125.3.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 4874 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_bucket_num_reducers {noformat} Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/802/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/802/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12621386 Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch, HIVE-6125.4.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5904) HiveServer2 JDBC connect to non-default database
[ https://issues.apache.org/jira/browse/HIVE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5904: Assignee: Matt Tucker HiveServer2 JDBC connect to non-default database Key: HIVE-5904 URL: https://issues.apache.org/jira/browse/HIVE-5904 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.12.0 Reporter: Matt Tucker Assignee: Matt Tucker Attachments: HIVE-5904.patch When connecting to HiveServer to via the following URLs, the session uses the 'default' database, instead of the intended database. jdbc://localhost:1/customDb jdbc:///customDb -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5904) HiveServer2 JDBC connect to non-default database
[ https://issues.apache.org/jira/browse/HIVE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5904: Resolution: Duplicate Status: Resolved (was: Patch Available) The fix for this got committed through HIVE-4256 . [~matucker], Sorry, didn't notice that you had a patch for same issue here. I have added you as a contributor on jira so that you can assign yourself jiras. HiveServer2 JDBC connect to non-default database Key: HIVE-5904 URL: https://issues.apache.org/jira/browse/HIVE-5904 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.12.0 Reporter: Matt Tucker Assignee: Matt Tucker Attachments: HIVE-5904.patch When connecting to HiveServer to via the following URLs, the session uses the 'default' database, instead of the intended database. jdbc://localhost:1/customDb jdbc:///customDb -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Comment Edited] (HIVE-5904) HiveServer2 JDBC connect to non-default database
[ https://issues.apache.org/jira/browse/HIVE-5904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862098#comment-13862098 ] Thejas M Nair edited comment on HIVE-5904 at 1/4/14 12:58 AM: -- The fix for this got committed through HIVE-4256 . [~matucker], Sorry, didn't notice that you had a patch for same issue here. I have added you as a contributor on jira so that you can assign jiras to yourself. was (Author: thejas): The fix for this got committed through HIVE-4256 . [~matucker], Sorry, didn't notice that you had a patch for same issue here. I have added you as a contributor on jira so that you can assign yourself jiras. HiveServer2 JDBC connect to non-default database Key: HIVE-5904 URL: https://issues.apache.org/jira/browse/HIVE-5904 Project: Hive Issue Type: Bug Components: HiveServer2, JDBC Affects Versions: 0.12.0 Reporter: Matt Tucker Assignee: Matt Tucker Attachments: HIVE-5904.patch When connecting to HiveServer to via the following URLs, the session uses the 'default' database, instead of the intended database. jdbc://localhost:1/customDb jdbc:///customDb -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HIVE-6140) trim udf is very slow
Thejas M Nair created HIVE-6140: --- Summary: trim udf is very slow Key: HIVE-6140 URL: https://issues.apache.org/jira/browse/HIVE-6140 Project: Hive Issue Type: Bug Components: UDF Reporter: Thejas M Nair Paraphrasing what was reported by [~cartershanklin] - I used the attached Perl script to generate 500 million two-character strings which always included a space. I loaded it using: create table letters (l string); load data local inpath '/home/sandbox/data.csv' overwrite into table letters; Then I ran this SQL script: select count(l) from letters where l = 'l '; select count(l) from letters where trim(l) = 'l'; First query = 170 seconds Second query = 514 seconds -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: How do you run single query test(s) after mavenization?
Ok, I’ve updated it to just have the maven instructions, since I’m assuming no one cares about the ant ones anymore. Alan. On Jan 3, 2014, at 3:46 PM, Alan Gates ga...@hortonworks.com wrote: The rest of the ant instances are okay because the MVN section afterwards gives the alternative, but should we keep ant or make the replacements? - 9. Now you can run the ant 'thriftif' target ... - 11. ant thriftif -Dthrift.home=... - 15. ant thriftif - 18. ant clean package - The maven equivalent of ant thriftif is: mvn clean install -Pthriftif -DskipTests -Dthrift.home=/usr/local I have not generated the thrift stuff recently. It would be great if Alan or someone else who has would update this section. I can take a look at this. It works with pretty minimal changes. Alan. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Assigned] (HIVE-6140) trim udf is very slow
[ https://issues.apache.org/jira/browse/HIVE-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anandha L Ranganathan reassigned HIVE-6140: --- Assignee: Anandha L Ranganathan trim udf is very slow - Key: HIVE-6140 URL: https://issues.apache.org/jira/browse/HIVE-6140 Project: Hive Issue Type: Bug Components: UDF Reporter: Thejas M Nair Assignee: Anandha L Ranganathan Paraphrasing what was reported by [~cartershanklin] - I used the attached Perl script to generate 500 million two-character strings which always included a space. I loaded it using: create table letters (l string); load data local inpath '/home/sandbox/data.csv' overwrite into table letters; Then I ran this SQL script: select count(l) from letters where l = 'l '; select count(l) from letters where trim(l) = 'l'; First query = 170 seconds Second query = 514 seconds -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5923) SQL std auth - parser changes
[ https://issues.apache.org/jira/browse/HIVE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5923: Release Note: Grant privilege and revoke privilege statements need to be changed to remove the requirement (but not the option) for the noise word TABLE. In the SQL specification table is the assumed default for grant and revoke statements. Today Hive’s syntax is GRANT action ON TABLE table TO grantee. It should be GRANT action ON [TABLE] table TO grantee. Grant role and revoke role statements has been changed to remove the need for keyword ROLE. Support for WITH ADMIN OPTION needs to be added to grant role and revoke role statement syntax. SQL std auth - parser changes - Key: HIVE-5923 URL: https://issues.apache.org/jira/browse/HIVE-5923 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-5923.1.patch, HIVE-5923.2.patch, HIVE-5923.3.patch, HIVE-5923.4.patch Original Estimate: 96h Time Spent: 72h Remaining Estimate: 12h There are new access control statements proposed in the functional spec in HIVE-5837 . It also proposes some small changes to the existing query syntax (mostly extensions and some optional keywords). The syntax supported should depend on the current authorization mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HIVE-5923) SQL std auth - parser changes
[ https://issues.apache.org/jira/browse/HIVE-5923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-5923: Release Note: Grant privilege and revoke privilege statements don't have the requirement (but not the option) for the noise word TABLE. TABLE is the assumed default for grant and revoke statements. Hive’s syntax changes from GRANT action ON TABLE table TO grantee to GRANT action ON [TABLE] table TO grantee. Grant role and revoke role statements has been changed to remove the need for keyword ROLE. Support for WITH ADMIN OPTION has been added to grant role and revoke role statement syntax. was: Grant privilege and revoke privilege statements need to be changed to remove the requirement (but not the option) for the noise word TABLE. In the SQL specification table is the assumed default for grant and revoke statements. Today Hive’s syntax is GRANT action ON TABLE table TO grantee. It should be GRANT action ON [TABLE] table TO grantee. Grant role and revoke role statements has been changed to remove the need for keyword ROLE. Support for WITH ADMIN OPTION needs to be added to grant role and revoke role statement syntax. SQL std auth - parser changes - Key: HIVE-5923 URL: https://issues.apache.org/jira/browse/HIVE-5923 Project: Hive Issue Type: Sub-task Components: Authorization Reporter: Thejas M Nair Assignee: Thejas M Nair Fix For: 0.13.0 Attachments: HIVE-5923.1.patch, HIVE-5923.2.patch, HIVE-5923.3.patch, HIVE-5923.4.patch Original Estimate: 96h Time Spent: 72h Remaining Estimate: 12h There are new access control statements proposed in the functional spec in HIVE-5837 . It also proposes some small changes to the existing query syntax (mostly extensions and some optional keywords). The syntax supported should depend on the current authorization mode. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6140) trim udf is very slow
[ https://issues.apache.org/jira/browse/HIVE-6140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862177#comment-13862177 ] Anandha L Ranganathan commented on HIVE-6140: - [~thejas]/[~cartershanklin] Could you provide data.csv file that caused the problem. Otherwise provide example of the data. trim udf is very slow - Key: HIVE-6140 URL: https://issues.apache.org/jira/browse/HIVE-6140 Project: Hive Issue Type: Bug Components: UDF Reporter: Thejas M Nair Assignee: Anandha L Ranganathan Paraphrasing what was reported by [~cartershanklin] - I used the attached Perl script to generate 500 million two-character strings which always included a space. I loaded it using: create table letters (l string); load data local inpath '/home/sandbox/data.csv' overwrite into table letters; Then I ran this SQL script: select count(l) from letters where l = 'l '; select count(l) from letters where trim(l) = 'l'; First query = 170 seconds Second query = 514 seconds -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-6051) Create DecimalColumnVector and a representative VectorExpression for decimal
[ https://issues.apache.org/jira/browse/HIVE-6051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862191#comment-13862191 ] Hive QA commented on HIVE-6051: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621359/HIVE-6051.02.patch {color:green}SUCCESS:{color} +1 4876 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/803/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/803/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621359 Create DecimalColumnVector and a representative VectorExpression for decimal Key: HIVE-6051 URL: https://issues.apache.org/jira/browse/HIVE-6051 Project: Hive Issue Type: Sub-task Affects Versions: 0.13.0 Reporter: Eric Hanson Assignee: Eric Hanson Fix For: 0.13.0 Attachments: HIVE-6051.01.patch, HIVE-6051.02.patch Create a DecimalColumnVector to use as a basis for vectorized decimal operations. Include a representative VectorExpression on decimal (e.g. column-column addition) to demonstrate it's use. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: checking progress of automated tests for a patch
Yes, you can, it's here: http://bigtop01.cloudera.org:8080/view/Hive/job/PreCommit-HIVE-Build/ Unfortunately the version of jenkins they are using doesn't support putting the JIRA number in the build description so it's kind of opaque. But if you view the full console for a build ( http://bigtop01.cloudera.org:8080/view/Hive/job/PreCommit-HIVE-Build/804/consoleFull) at the top you'll see something like: ISSUE_NUM=6125 which means that build is for HIVE-6125. All the logs for that build (#804) can be seen here: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/ On Fri, Jan 3, 2014 at 6:25 PM, Eric Hanson (BIG DATA) eric.n.han...@microsoft.com wrote: Is there a way to check the progress of the automated tests after you've uploaded a patch? If so, how? Thanks, Eric -- Apache MRUnit - Unit testing MapReduce - http://mrunit.apache.org
[jira] [Commented] (HIVE-6125) Tez: Refactoring changes
[ https://issues.apache.org/jira/browse/HIVE-6125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862212#comment-13862212 ] Hive QA commented on HIVE-6125: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12621408/HIVE-6125.4.patch {color:green}SUCCESS:{color} +1 4875 tests passed Test results: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/804/testReport Console output: http://bigtop01.cloudera.org:8080/job/PreCommit-HIVE-Build/804/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12621408 Tez: Refactoring changes Key: HIVE-6125 URL: https://issues.apache.org/jira/browse/HIVE-6125 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Attachments: HIVE-6125.1.patch, HIVE-6125.2.patch, HIVE-6125.3.patch, HIVE-6125.4.patch In order to facilitate merge back I've separated out all the changes that don't require Tez. These changes introduce new interfaces, move code etc. In preparation of the Tez specific classes. This should help show what changes have been made that affect the MR codepath as well. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HIVE-2599) Support Composit/Compound Keys with HBaseStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13862231#comment-13862231 ] Swarnim Kulkarni commented on HIVE-2599: Attached is the latest patch rebased with the master. The patch should apply cleanly now. [~brocknoland] On your question about inserts, I think I might have misunderstood you a little bit. I ran the following queries to test inserts on composite keys and was able to do it successfully. {noformat} CREATE EXTERNAL TABLE test_table_1(key structpersonId:string,value:string, data string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf:value) TBLPROPERTIES (hbase.table.name = hbase_test_table_1,hbase.composite.key.class=com.test.hive.TestHBaseCompositeKey) select * from test_table_1; {personid:person1,value:value1} 1385435417948 {personid:person2,value:value2} 1386691798261 {personid:person3,value:value3} 1387481795304 {personid:person4,value:value4} 1386705359123 {personid:person5,value:value5} 1386972894836 .. CREATE TABLE test_table_2(key structpersonId:string,value:string, value string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES (hbase.columns.mapping = :key,cf1:val) TBLPROPERTIES (hbase.table.name = hbase_test_table_2); INSERT OVERWRITE TABLE test_table_2 select key,data from test_table_1; 14/01/04 00:32:33 INFO ql.Driver: Launching Job 1 out of 1 14/01/04 00:32:58 INFO exec.Task: 2014-01-04 00:32:58,720 Stage-0 map = 0%, reduce = 0% 2014-01-04 00:33:29,930 Stage-0 map = 100%, reduce = 100%, Cumulative CPU 5.48 sec select * from test_table_2; {personid:person1,value:value1} 1385435417948 {personid:person2,value:value2} 1386691798261 {personid:person3,value:value3} 1387481795304 {personid:person4,value:value4} 1386705359123 {personid:person5,value:value5} 1386972894836 .. {noformat} If this is what you meant, then yes the patch will handle both select and inserts. If not, then please let me know so that I will log a new bug and tackle it accordingly. Support Composit/Compound Keys with HBaseStorageHandler --- Key: HIVE-2599 URL: https://issues.apache.org/jira/browse/HIVE-2599 Project: Hive Issue Type: Improvement Components: HBase Handler Affects Versions: 0.8.0 Reporter: Hans Uhlig Assignee: Swarnim Kulkarni Attachments: HIVE-2599.1.patch.txt, HIVE-2599.2.patch.txt, HIVE-2599.2.patch.txt It would be really nice for hive to be able to understand composite keys from an underlying HBase schema. Currently we have to store key fields twice to be able to both key and make data available. I noticed John Sichi mentioned in HIVE-1228 that this would be a separate issue but I cant find any follow up. How feasible is this in the HBaseStorageHandler? -- This message was sent by Atlassian JIRA (v6.1.5#6160)