[jira] [Commented] (HIVE-7341) Support for Table replication across HCatalog instances
[ https://issues.apache.org/jira/browse/HIVE-7341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101912#comment-14101912 ] Lefty Leverenz commented on HIVE-7341: -- Thanks for the doc note, [~sushanth]. When you say should mostly be covered by javadocs and the bug report that leaves a little wiggle room for user docs, although I don't see a good place for this in the HCatalog wikidocs. Would this only be done by external systems such as Falcon, or could it also be done directly by a Hive/HCat administrator? Support for Table replication across HCatalog instances --- Key: HIVE-7341 URL: https://issues.apache.org/jira/browse/HIVE-7341 Project: Hive Issue Type: New Feature Components: HCatalog Affects Versions: 0.13.1 Reporter: Mithun Radhakrishnan Assignee: Mithun Radhakrishnan Fix For: 0.14.0 Attachments: HIVE-7341.1.patch, HIVE-7341.2.patch, HIVE-7341.3.patch, HIVE-7341.4.patch, HIVE-7341.5.patch The HCatClient currently doesn't provide very much support for replicating HCatTable definitions between 2 HCatalog Server (i.e. Hive metastore) instances. Systems similar to Apache Falcon might find the need to replicate partition data between 2 clusters, and keep the HCatalog metadata in sync between the two. This poses a couple of problems: # The definition of the source table might change (in column schema, I/O formats, record-formats, serde-parameters, etc.) The system will need a way to diff 2 tables and update the target-metastore with the changes. E.g. {code} targetTable.resolve( sourceTable, targetTable.diff(sourceTable) ); hcatClient.updateTableSchema(dbName, tableName, targetTable); {code} # The current {{HCatClient.addPartitions()}} API requires that the partition's schema be derived from the table's schema, thereby requiring that the table-schema be resolved *before* partitions with the new schema are added to the table. This is problematic, because it introduces race conditions when 2 partitions with differing column-schemas (e.g. right after a schema change) are copied in parallel. This can be avoided if each HCatAddPartitionDesc kept track of the partition's schema, in flight. # The source and target metastores might be running different/incompatible versions of Hive. The impending patch attempts to address these concerns (with some caveats). # {{HCatTable}} now has ## a {{diff()}} method, to compare against another HCatTable instance ## a {{resolve(diff)}} method to copy over specified table-attributes from another HCatTable ## a serialize/deserialize mechanism (via {{HCatClient.serializeTable()}} and {{HCatClient.deserializeTable()}}), so that HCatTable instances constructed in other class-loaders may be used for comparison # {{HCatPartition}} now provides finer-grained control over a Partition's column-schema, StorageDescriptor settings, etc. This allows partitions to be copied completely from source, with the ability to override specific properties if required (e.g. location). # {{HCatClient.updateTableSchema()}} can now update the entire table-definition, not just the column schema. # I've cleaned up and removed most of the redundancy between the HCatTable, HCatCreateTableDesc and HCatCreateTableDesc.Builder. The prior API failed to separate the table-attributes from the add-table-operation's attributes. By providing fluent-interfaces in HCatTable, and composing an HCatTable instance in HCatCreateTableDesc, the interfaces are cleaner(ish). The old setters are deprecated, in favour of those in HCatTable. Likewise, HCatPartition and HCatAddPartitionDesc. I'll post a patch for trunk shortly. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7777) add CSV support for Serde
Ferdinand Xu created HIVE-: -- Summary: add CSV support for Serde Key: HIVE- URL: https://issues.apache.org/jira/browse/HIVE- Project: Hive Issue Type: Bug Components: Serializers/Deserializers Reporter: Ferdinand Xu Assignee: Ferdinand Xu There is no official support for csvSerde for hive while there is an open source project in github(https://github.com/ogrodnek/csv-serde). CSV is of high frequency in use as a data format. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7778) hive deal with sql witch has whitespace character
peter zhao created HIVE-7778: Summary: hive deal with sql witch has whitespace character Key: HIVE-7778 URL: https://issues.apache.org/jira/browse/HIVE-7778 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.1 Reporter: peter zhao Priority: Minor i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,becaust ibatis usiing xml file to hold the sql str.it has some format,so hive server recive the sql like this \t set hive.exec.dynamic.partition.mode=nonstrict ,so in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it deal with \t not very good.then generat variable key is set hive.exec.dynamic.partition.mode, and the right key may be hive.exec.dynamic.partition.mode, so my next select by partition sql throw a strict exception. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7778) hive deal with sql witch has whitespace character
[ https://issues.apache.org/jira/browse/HIVE-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] peter zhao updated HIVE-7778: - Description: i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because ibatis usiing xml file to hold the sql str.it has some format,so hive server recive the sql like this \t set hive.exec.dynamic.partition.mode=nonstrict ,so in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it deal with \t or any other whitespace charactors not very good.then generat variable key is set hive.exec.dynamic.partition.mode, and the right key may be hive.exec.dynamic.partition.mode, so my next select by partition sql throw a strict exception. String command = getStatement().trim(); String[] tokens = statement.split(\\s); //this position may be change to command.split(\\s); String commandArgs = command.substring(tokens[0].length()).trim(); was: i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,becaust ibatis usiing xml file to hold the sql str.it has some format,so hive server recive the sql like this \t set hive.exec.dynamic.partition.mode=nonstrict ,so in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it deal with \t not very good.then generat variable key is set hive.exec.dynamic.partition.mode, and the right key may be hive.exec.dynamic.partition.mode, so my next select by partition sql throw a strict exception. hive deal with sql witch has whitespace character - Key: HIVE-7778 URL: https://issues.apache.org/jira/browse/HIVE-7778 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.1 Reporter: peter zhao Priority: Minor i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because ibatis usiing xml file to hold the sql str.it has some format,so hive server recive the sql like this \t set hive.exec.dynamic.partition.mode=nonstrict ,so in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it deal with \t or any other whitespace charactors not very good.then generat variable key is set hive.exec.dynamic.partition.mode, and the right key may be hive.exec.dynamic.partition.mode, so my next select by partition sql throw a strict exception. String command = getStatement().trim(); String[] tokens = statement.split(\\s); //this position may be change to command.split(\\s); String commandArgs = command.substring(tokens[0].length()).trim(); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7778) hive deal with sql witch has whitespace character
[ https://issues.apache.org/jira/browse/HIVE-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] peter zhao updated HIVE-7778: - Description: i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because ibatis usiing xml file to hold the sql str.it has some format,so hive server recive the sql like this \t set hive.exec.dynamic.partition.mode=nonstrict ,so in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it deal with \t or any other whitespace charactors not very good.then generat variable key is set hive.exec.dynamic.partition.mode, and the right key may be hive.exec.dynamic.partition.mode, so my next select by partition sql throw a strict exception. String command = getStatement().trim(); String[] tokens = statement.split(\\\s); //this position may be change to command.split(\\\s); String commandArgs = command.substring(tokens\[0\].length()).trim(); was: i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because ibatis usiing xml file to hold the sql str.it has some format,so hive server recive the sql like this \t set hive.exec.dynamic.partition.mode=nonstrict ,so in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it deal with \t or any other whitespace charactors not very good.then generat variable key is set hive.exec.dynamic.partition.mode, and the right key may be hive.exec.dynamic.partition.mode, so my next select by partition sql throw a strict exception. String command = getStatement().trim(); String[] tokens = statement.split(\\s); //this position may be change to command.split(\\s); String commandArgs = command.substring(tokens[0].length()).trim(); hive deal with sql witch has whitespace character - Key: HIVE-7778 URL: https://issues.apache.org/jira/browse/HIVE-7778 Project: Hive Issue Type: Bug Components: CLI Affects Versions: 0.13.1 Reporter: peter zhao Priority: Minor i run sql set hive.exec.dynamic.partition.mode=nonstrict with ibatis,because ibatis usiing xml file to hold the sql str.it has some format,so hive server recive the sql like this \t set hive.exec.dynamic.partition.mode=nonstrict ,so in org.apache.hive.service.cli.operation.HiveCommandOperation.run() method, it deal with \t or any other whitespace charactors not very good.then generat variable key is set hive.exec.dynamic.partition.mode, and the right key may be hive.exec.dynamic.partition.mode, so my next select by partition sql throw a strict exception. String command = getStatement().trim(); String[] tokens = statement.split(\\\s); //this position may be change to command.split(\\\s); String commandArgs = command.substring(tokens\[0\].length()).trim(); -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7513) Add ROW__ID VirtualColumn
[ https://issues.apache.org/jira/browse/HIVE-7513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101926#comment-14101926 ] Lefty Leverenz commented on HIVE-7513: -- Is this just behind-the-scenes or does it need some user doc? Add ROW__ID VirtualColumn - Key: HIVE-7513 URL: https://issues.apache.org/jira/browse/HIVE-7513 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Fix For: 0.14.0 Attachments: HIVE-7513.10.patch, HIVE-7513.11.patch, HIVE-7513.12.patch, HIVE-7513.13.patch, HIVE-7513.14.patch, HIVE-7513.3.patch, HIVE-7513.4.patch, HIVE-7513.5.patch, HIVE-7513.8.patch, HIVE-7513.9.patch, HIVE-7513.codeOnly.txt In order to support Update/Delete we need to read rowId from AcidInputFormat and pass that along through the operator pipeline (built from the WHERE clause of the SQL Statement) so that it can be written to the delta file by the update/delete (sink) operators. The parser will add this column to the projection list to make sure it's passed along. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101937#comment-14101937 ] Hive QA commented on HIVE-6329: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662662/HIVE-6329.10.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5821 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestHBaseCliDriver.testCliDriver_hbase_queries org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/395/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/395/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-395/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662662 Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7728) Enable q-tests for TABLESAMPLE feature [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101940#comment-14101940 ] Hive QA commented on HIVE-7728: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662698/HIVE-7728.1-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5927 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/60/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/60/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-60/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662698 Enable q-tests for TABLESAMPLE feature [Spark Branch] -- Key: HIVE-7728 URL: https://issues.apache.org/jira/browse/HIVE-7728 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7728.1-spark.patch Enable q-tests for TABLESAMPLE feature since automatic test environment is ready. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7779) Support windowing and analytic functions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7779: Issue Type: Sub-task (was: Task) Parent: HIVE-7292 Support windowing and analytic functions.[Spark Branch] --- Key: HIVE-7779 URL: https://issues.apache.org/jira/browse/HIVE-7779 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Verify the functionality and fix found issues, which should include: # windowing functions # the OVER clause # analytic functions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7779) Support windowing and analytic functions.[Spark Branch]
Chengxiang Li created HIVE-7779: --- Summary: Support windowing and analytic functions.[Spark Branch] Key: HIVE-7779 URL: https://issues.apache.org/jira/browse/HIVE-7779 Project: Hive Issue Type: Task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Verify the functionality and fix found issues, which should include: # windowing functions # the OVER clause # analytic functions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7780) Query with OVER clause return duplicate results[Spark Branch]
Chengxiang Li created HIVE-7780: --- Summary: Query with OVER clause return duplicate results[Spark Branch] Key: HIVE-7780 URL: https://issues.apache.org/jira/browse/HIVE-7780 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li A simple query with the OVER clause return duplicate results. {code:sql} hive select address, count(id) over(partition by address) from test; Query ID = root_2014081915_f5506fcc-4950-424b-a134-56fc5b06d6eb Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number OK QD 1 SH 2 SH 2 SZ 2 SZ 2 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101968#comment-14101968 ] Lars Francke commented on HIVE-5799: Thanks for getting to this. It's needed badly! The patch looks mostly good, I have a couple of minor comments regarding style/checkstyle. If you're interested in them could you please update RB? session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6329) Support column level encryption/decryption
[ https://issues.apache.org/jira/browse/HIVE-6329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101969#comment-14101969 ] Navis commented on HIVE-6329: - Cannot reproduce fail of testCliDriver_hbase_queries, both in hadoop-1 and hadoop-2. Support column level encryption/decryption -- Key: HIVE-6329 URL: https://issues.apache.org/jira/browse/HIVE-6329 Project: Hive Issue Type: New Feature Components: Security, Serializers/Deserializers Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-6329.1.patch.txt, HIVE-6329.10.patch.txt, HIVE-6329.2.patch.txt, HIVE-6329.3.patch.txt, HIVE-6329.4.patch.txt, HIVE-6329.5.patch.txt, HIVE-6329.6.patch.txt, HIVE-6329.7.patch.txt, HIVE-6329.8.patch.txt, HIVE-6329.9.patch.txt Receiving some requirements on encryption recently but hive is not supporting it. Before the full implementation via HIVE-5207, this might be useful for some cases. {noformat} hive create table encode_test(id int, name STRING, phone STRING, address STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIES ('column.encode.columns'='phone,address', 'column.encode.classname'='org.apache.hadoop.hive.serde2.Base64WriteOnly') STORED AS TEXTFILE; OK Time taken: 0.584 seconds hive insert into table encode_test select 100,'navis','010--','Seoul, Seocho' from src tablesample (1 rows); .. OK Time taken: 5.121 seconds hive select * from encode_test; OK 100 navis MDEwLTAwMDAtMDAwMA== U2VvdWwsIFNlb2Nobw== Time taken: 0.078 seconds, Fetched: 1 row(s) hive {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7733: Status: Patch Available (was: Open) Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Attachments: HIVE-7733.1.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7733: Attachment: HIVE-7733.1.patch.txt Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Attachments: HIVE-7733.1.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7771) ORC PPD fails for some decimal predicates
[ https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14101990#comment-14101990 ] Hive QA commented on HIVE-7771: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662665/HIVE-7771.1.patch {color:green}SUCCESS:{color} +1 5819 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/396/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/396/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-396/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12662665 ORC PPD fails for some decimal predicates - Key: HIVE-7771 URL: https://issues.apache.org/jira/browse/HIVE-7771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7771.1.patch Some queries like {code} select * from table where dcol=11.22BD; {code} fails when ORC predicate pushdown is enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7780) Query with OVER clause return duplicate results[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li resolved HIVE-7780. - Resolution: Not a Problem Query with OVER clause return duplicate results[Spark Branch] - Key: HIVE-7780 URL: https://issues.apache.org/jira/browse/HIVE-7780 Project: Hive Issue Type: Bug Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li A simple query with the OVER clause return duplicate results. {code:sql} hive select address, count(id) over(partition by address) from test; Query ID = root_2014081915_f5506fcc-4950-424b-a134-56fc5b06d6eb Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number OK QD1 SH2 SH2 SZ2 SZ2 {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]
Chengxiang Li created HIVE-7781: --- Summary: Enable windowing and analytic function qtests.[Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7781: Issue Type: Sub-task (was: Task) Parent: HIVE-7292 Enable windowing and analytic function qtests.[Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-5799) session/operation timeout for hiveserver2
[ https://issues.apache.org/jira/browse/HIVE-5799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102040#comment-14102040 ] Hive QA commented on HIVE-5799: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662668/HIVE-5799.11.patch.txt {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5820 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection org.apache.hive.jdbc.miniHS2.TestHiveServer2SessionTimeout.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/397/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/397/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-397/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662668 session/operation timeout for hiveserver2 - Key: HIVE-5799 URL: https://issues.apache.org/jira/browse/HIVE-5799 Project: Hive Issue Type: Improvement Components: HiveServer2 Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-5799.1.patch.txt, HIVE-5799.10.patch.txt, HIVE-5799.11.patch.txt, HIVE-5799.2.patch.txt, HIVE-5799.3.patch.txt, HIVE-5799.4.patch.txt, HIVE-5799.5.patch.txt, HIVE-5799.6.patch.txt, HIVE-5799.7.patch.txt, HIVE-5799.8.patch.txt, HIVE-5799.9.patch.txt Need some timeout facility for preventing resource leakages from instable or bad clients. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7781: Status: Patch Available (was: Open) Enable windowing and analytic function qtests.[Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7781.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li updated HIVE-7781: Attachment: HIVE-7781.1-spark.patch Miss ptf.q and ptf_streaming.q as they depends on join operation. Enable windowing and analytic function qtests.[Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7781.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7779) Support windowing and analytic functions.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chengxiang Li resolved HIVE-7779. - Resolution: Fixed verified through qtest and in real test environment, no issue found. Support windowing and analytic functions.[Spark Branch] --- Key: HIVE-7779 URL: https://issues.apache.org/jira/browse/HIVE-7779 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Verify the functionality and fix found issues, which should include: # windowing functions # the OVER clause # analytic functions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU
[ https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7664: Status: Patch Available (was: Open) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU - Key: HIVE-7664 URL: https://issues.apache.org/jira/browse/HIVE-7664 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7664.1.patch.txt In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom(). Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing. addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row. I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization. Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods. {code} Stack Trace Sample CountPercentage(%) VectorizedBatchUtil.addRowToBatchFrom 86 26.543 AbstractPrimitiveObjectInspector.getPrimitiveCategory()34 10.494 LazyBinaryStructObjectInspector.getStructFieldData 25 7.716 StandardStructObjectInspector.getStructFieldData 4 1.235 {code} The query used : {code} select ss_sold_date_sk from store_sales where ss_sold_date between '1998-01-01' and '1998-06-01' group by ss_item_sk , ss_customer_sk , ss_sold_date_sk having sum(ss_list_price) 50; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU
[ https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Navis updated HIVE-7664: Attachment: HIVE-7664.1.patch.txt Preliminary test VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU - Key: HIVE-7664 URL: https://issues.apache.org/jira/browse/HIVE-7664 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7664.1.patch.txt In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom(). Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing. addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row. I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization. Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods. {code} Stack Trace Sample CountPercentage(%) VectorizedBatchUtil.addRowToBatchFrom 86 26.543 AbstractPrimitiveObjectInspector.getPrimitiveCategory()34 10.494 LazyBinaryStructObjectInspector.getStructFieldData 25 7.716 StandardStructObjectInspector.getStructFieldData 4 1.235 {code} The query used : {code} select ss_sold_date_sk from store_sales where ss_sold_date between '1998-01-01' and '1998-06-01' group by ss_item_sk , ss_customer_sk , ss_sold_date_sk having sum(ss_list_price) 50; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7774) Issues with location path for temporary external tables
[ https://issues.apache.org/jira/browse/HIVE-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102089#comment-14102089 ] Hive QA commented on HIVE-7774: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662678/HIVE-7774.1.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5819 tests executed *Failed tests:* {noformat} org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/398/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/398/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-398/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662678 Issues with location path for temporary external tables --- Key: HIVE-7774 URL: https://issues.apache.org/jira/browse/HIVE-7774 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7774.1.patch Depending on the location string passed into temp external table, a query requiring a map/reduce job will fail. Example: {noformat} create temporary external table tmp1 (c1 string) location '/tmp/tmp1'; describe extended tmp1; select count(*) from tmp1; {noformat} Will result in the following error: {noformat} Diagnostic Messages for this Task: Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:123) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:398) ... 23 more FAILED: Execution
[jira] [Commented] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102093#comment-14102093 ] Hive QA commented on HIVE-7781: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662728/HIVE-7781.1-spark.patch {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5925 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/61/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/61/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-61/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662728 Enable windowing and analytic function qtests.[Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7781.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
Error with Meta Data from SELECT 0 as mx
Hi, It seems there are a bug in trunk version. When I'm doing this query : create table mag_new as SELECT row_sequence() + tbl.mx, nommagasin, idsociete, idregion, codepostal from magasin, (select 0 as mx) as tbl; Metastore complains about missing table : _dummy_database._dummy_table Complete log : 2014-08-19 12:05:02,673 ERROR [pool-5-thread-3]: stats.StatsUtils (StatsUtils.java:getTableColumnStats(474)) - Failed to retrieve table statistics: org.apache.hadoop.hive.ql.metadata.HiveException: NoSuchObjectException(message:Specified database/table does not exist : _dummy_database._dummy_table) at org.apache.hadoop.hive.ql.metadata.Hive.getTableColumnStatistics(Hive.java:2563) at org.apache.hadoop.hive.ql.stats.StatsUtils.getTableColumnStats(StatsUtils.java:470) at org.apache.hadoop.hive.ql.stats.StatsUtils.collectStatistics(StatsUtils.java:147) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.StatsRulesProcFactory$TableScanStatsRule.process(StatsRulesProcFactory.java:100) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:94) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:78) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:109) at org.apache.hadoop.hive.ql.optimizer.stats.annotation.AnnotateWithStatistics.transform(AnnotateWithStatistics.java:78) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:149) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:9484) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:208) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:413) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:309) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1003) at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:997) at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:99) at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:170) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:306) at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:293) at sun.reflect.GeneratedMethodAccessor19.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:79) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:37) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:64) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) at org.apache.hadoop.hive.shims.HadoopShimsSecure.doAs(HadoopShimsSecure.java:508) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:60) at com.sun.proxy.$Proxy21.executeStatementAsync(Unknown Source) at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:259) at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:346) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) -- Damien CAROL * tél : +33 (0)4 74 96 88 14 * fax : +33 (0)4 74 96 31 88 * email :dca...@blitzbs.com mailto:dca...@blitzbs.com BLITZ BUSINESS SERVICE
[jira] [Commented] (HIVE-7646) Modify parser to support new grammar for Insert,Update,Delete
[ https://issues.apache.org/jira/browse/HIVE-7646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102116#comment-14102116 ] Hive QA commented on HIVE-7646: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662702/HIVE-7646.1.patch {color:red}ERROR:{color} -1 due to 25 failed/errored test(s), 5831 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_1 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_10 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_11 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_12 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_13 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_14 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_15 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_16 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_17 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_18 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_19 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_2 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_20 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_21 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_22 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_23 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_24 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_25 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_3 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_4 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_5 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_6 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_7 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_8 org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_union_remove_9 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/399/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/399/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-399/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 25 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662702 Modify parser to support new grammar for Insert,Update,Delete - Key: HIVE-7646 URL: https://issues.apache.org/jira/browse/HIVE-7646 Project: Hive Issue Type: Sub-task Components: Query Processor Affects Versions: 0.13.1 Reporter: Eugene Koifman Assignee: Eugene Koifman Attachments: HIVE-7646.1.patch, HIVE-7646.patch need parser to recognize constructs such as : {code:sql} INSERT INTO Cust (Customer_Number, Balance, Address) VALUES (101, 50.00, '123 Main Street'), (102, 75.00, '123 Pine Ave'); {code} {code:sql} DELETE FROM Cust WHERE Balance 5.0 {code} {code:sql} UPDATE Cust SET column1=value1,column2=value2,... WHERE some_column=some_value {code} also useful {code:sql} select a,b from values((1,2),(3,4)) as FOO(a,b) {code} This makes writing tests easier. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Li updated HIVE-7773: - Attachment: HIVE-7773.spark.patch I found the problem is that IOContext is used to store and retrieve input path for the operators. IOContext is a singleton when I submit the query via hive cli. Since spark tasks are threads within a JVM, the input path in IOContext will get messed up if concurrent tasks have different input paths. In my test case, two map works run concurrently for two different tables. This patch makes sure we always use a thread local IOContext. Union all query finished with errors [Spark Branch] --- Key: HIVE-7773 URL: https://issues.apache.org/jira/browse/HIVE-7773 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Priority: Critical Attachments: HIVE-7773.spark.patch When I run a union all query, I found the following error in spark log (the query finished with correct results though): {noformat} java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more {noformat} Judging from the log, I think we don't properly handle the input paths when cloning the job conf, so it may also affect other queries with multiple maps or reduces. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7733) Ambiguous column reference error on query
[ https://issues.apache.org/jira/browse/HIVE-7733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102153#comment-14102153 ] Hive QA commented on HIVE-7733: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662715/HIVE-7733.1.patch.txt {color:red}ERROR:{color} -1 due to 3 failed/errored test(s), 5819 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_ambiguous_col org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/400/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/400/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-400/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 3 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662715 Ambiguous column reference error on query - Key: HIVE-7733 URL: https://issues.apache.org/jira/browse/HIVE-7733 Project: Hive Issue Type: Bug Affects Versions: 0.13.0 Reporter: Jason Dere Attachments: HIVE-7733.1.patch.txt {noformat} CREATE TABLE agg1 ( col0 INT, col1 STRING, col2 DOUBLE ); explain SELECT single_use_subq11.a1 AS a1, single_use_subq11.a2 AS a2 FROM (SELECT Sum(agg1.col2) AS a1 FROM agg1 GROUP BY agg1.col0) single_use_subq12 JOIN (SELECT alias.a2 AS a0, alias.a1 AS a1, alias.a1 AS a2 FROM (SELECT agg1.col1 AS a0, '42' AS a1, agg1.col0 AS a2 FROM agg1 UNION ALL SELECT agg1.col1 AS a0, '41' AS a1, agg1.col0 AS a2 FROM agg1) alias GROUP BY alias.a2, alias.a1) single_use_subq11 ON ( single_use_subq11.a0 = single_use_subq11.a0 ); {noformat} Gets the following error: FAILED: SemanticException [Error 10007]: Ambiguous column reference a2 Looks like this query had been working in 0.12 but starting failing with this error in 0.13 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU
[ https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102188#comment-14102188 ] Hive QA commented on HIVE-7664: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662732/HIVE-7664.1.patch.txt {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 5819 tests executed *Failed tests:* {noformat} org.apache.hive.hcatalog.pig.TestOrcHCatLoader.testReadDataPrimitiveTypes {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/401/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/401/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-401/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662732 VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU - Key: HIVE-7664 URL: https://issues.apache.org/jira/browse/HIVE-7664 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7664.1.patch.txt In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom(). Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing. addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row. I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization. Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods. {code} Stack Trace Sample CountPercentage(%) VectorizedBatchUtil.addRowToBatchFrom 86 26.543 AbstractPrimitiveObjectInspector.getPrimitiveCategory()34 10.494 LazyBinaryStructObjectInspector.getStructFieldData 25 7.716 StandardStructObjectInspector.getStructFieldData 4 1.235 {code} The query used : {code} select ss_sold_date_sk from store_sales where ss_sold_date between '1998-01-01' and '1998-06-01' group by ss_item_sk , ss_customer_sk , ss_sold_date_sk having sum(ss_list_price) 50; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU
[ https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102204#comment-14102204 ] Remus Rusanu commented on HIVE-7664: shouldn't there be a case for DECIMAL primitive category? I see a DecimalAccessor, but no case covering it in the BatchAccessor ctor. VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU - Key: HIVE-7664 URL: https://issues.apache.org/jira/browse/HIVE-7664 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7664.1.patch.txt In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom(). Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing. addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row. I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization. Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods. {code} Stack Trace Sample CountPercentage(%) VectorizedBatchUtil.addRowToBatchFrom 86 26.543 AbstractPrimitiveObjectInspector.getPrimitiveCategory()34 10.494 LazyBinaryStructObjectInspector.getStructFieldData 25 7.716 StandardStructObjectInspector.getStructFieldData 4 1.235 {code} The query used : {code} select ss_sold_date_sk from store_sales where ss_sold_date between '1998-01-01' and '1998-06-01' group by ss_item_sk , ss_customer_sk , ss_sold_date_sk having sum(ss_list_price) 50; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client
[ https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Vasu updated HIVE-7762: - Status: Patch Available (was: Open) Enhancement while getting partitions via webhcat client --- Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Attachments: HIVE-7762.2.patch, HIVE-7762.patch Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client
[ https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Vasu updated HIVE-7762: - Attachment: HIVE-7762.2.patch Rebasing the patch Enhancement while getting partitions via webhcat client --- Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Attachments: HIVE-7762.2.patch, HIVE-7762.patch Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7762) Enhancement while getting partitions via webhcat client
[ https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suhas Vasu updated HIVE-7762: - Status: Open (was: Patch Available) Enhancement while getting partitions via webhcat client --- Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Attachments: HIVE-7762.2.patch, HIVE-7762.patch Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7770) Undo backward-incompatible behaviour change introduced by HIVE-7341
[ https://issues.apache.org/jira/browse/HIVE-7770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102304#comment-14102304 ] Mithun Radhakrishnan commented on HIVE-7770: Yikes, will post a patch shortly. Undo backward-incompatible behaviour change introduced by HIVE-7341 --- Key: HIVE-7770 URL: https://issues.apache.org/jira/browse/HIVE-7770 Project: Hive Issue Type: Bug Components: HCatalog Affects Versions: 0.14.0 Reporter: Sushanth Sowmyan Assignee: Mithun Radhakrishnan Labels: regression HIVE-7341 introduced a backward-incompatibility regression in Exception signatures for HCatPartition.getColumns() that breaks compilation for external tools like Falcon. This bug tracks a scrub of any other issues we discover, so we can put them back to how it used to be. This bug needs resolution in the same release as HIVE-7341, and thus, must be resolved in 0.14.0. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7762) Enhancement while getting partitions via webhcat client
[ https://issues.apache.org/jira/browse/HIVE-7762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102306#comment-14102306 ] Hive QA commented on HIVE-7762: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662754/HIVE-7762.2.patch {color:green}SUCCESS:{color} +1 5819 tests passed Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/402/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/402/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-402/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. ATTACHMENT ID: 12662754 Enhancement while getting partitions via webhcat client --- Key: HIVE-7762 URL: https://issues.apache.org/jira/browse/HIVE-7762 Project: Hive Issue Type: Improvement Components: WebHCat Reporter: Suhas Vasu Priority: Minor Attachments: HIVE-7762.2.patch, HIVE-7762.patch Hcatalog creates partitions in lower case, whereas getting partitions from hcatalog via webhcat client doesn't handle this. So the client starts throwing exceptions. Ex: CREATE EXTERNAL TABLE in_table (word STRING, cnt INT) PARTITIONED BY (Year STRING, Month STRING, Date STRING, Hour STRING, Minute STRING) STORED AS TEXTFILE LOCATION '/user/suhas/hcat-data/in/'; Then i try to get partitions by: {noformat} String inputTableName = in_table; String database = default; MapString, String partitionSpec = new HashMapString, String(); partitionSpec.put(Year, 2014); partitionSpec.put(Month, 08); partitionSpec.put(Date, 11); partitionSpec.put(Hour, 00); partitionSpec.put(Minute, 00); HCatClient client = get(catalogUrl); HCatPartition hCatPartition = client.getPartition(database, inputTableName, partitionSpec); {noformat} This throws up saying: {noformat} Exception in thread main org.apache.hcatalog.common.HCatException : 9001 : Exception occurred while processing HCat request : Invalid partition-key specified: year at org.apache.hcatalog.api.HCatClientHMSImpl.getPartition(HCatClientHMSImpl.java:366) at com.inmobi.demo.HcatPartitions.main(HcatPartitions.java:34) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) {noformat} The same code works if i do {noformat} partitionSpec.put(year, 2014); partitionSpec.put(month, 08); partitionSpec.put(date, 11); partitionSpec.put(hour, 00); partitionSpec.put(minute, 00); {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7664) VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU
[ https://issues.apache.org/jira/browse/HIVE-7664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102305#comment-14102305 ] Mostafa Mokhtar commented on HIVE-7664: --- [~navis] Can you please add a code review. VectorizedBatchUtil.addRowToBatchFrom is not optimized for Vectorized execution and takes 25% CPU - Key: HIVE-7664 URL: https://issues.apache.org/jira/browse/HIVE-7664 Project: Hive Issue Type: Bug Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7664.1.patch.txt In a Group by heavy vectorized Reducer vertex 25% of CPU is spent in VectorizedBatchUtil.addRowToBatchFrom(). Looked at the code of VectorizedBatchUtil.addRowToBatchFrom and it looks like it wasn't optimized for Vectorized processing. addRowToBatchFrom is called for every row and for each row and every column in the batch getPrimitiveCategory is called to figure the type of each column, column types are stored in a HashMap, for VectorGroupByOperator columns types won't change between batches, so column types shouldn't be looked up for every row. I recommend storing the column type in StructObjectInspector so that other components can leverage this optimization. Also addRowToBatchFrom has a case statement for every row and every column used for type casting I recommend encapsulating the type logic in templatized methods. {code} Stack Trace Sample CountPercentage(%) VectorizedBatchUtil.addRowToBatchFrom 86 26.543 AbstractPrimitiveObjectInspector.getPrimitiveCategory()34 10.494 LazyBinaryStructObjectInspector.getStructFieldData 25 7.716 StandardStructObjectInspector.getStructFieldData 4 1.235 {code} The query used : {code} select ss_sold_date_sk from store_sales where ss_sold_date between '1998-01-01' and '1998-06-01' group by ss_item_sk , ss_customer_sk , ss_sold_date_sk having sum(ss_list_price) 50; {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-6930) Beeline should nicely format timestamps when displaying results
[ https://issues.apache.org/jira/browse/HIVE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ferdinand Xu reassigned HIVE-6930: -- Assignee: Ferdinand Xu Beeline should nicely format timestamps when displaying results --- Key: HIVE-6930 URL: https://issues.apache.org/jira/browse/HIVE-6930 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Gwen Shapira Assignee: Ferdinand Xu When I have a timestamp column in my query, I get the results back as the bigint with number of seconds since epoch. Not very user friendly or readable. This means that all my queries need to include stuff like: select from_unixtime(cast(round(transaction_ts/1000) as bigint))... which is not too readable either :) Other SQL query tools automatically convert timestamps to some standard readable date format. They even let users specify the default formatting by setting a parameter (for example NLS_DATE_FORMAT for Oracle). I'd love to see something like that in beeline. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-6930) Beeline should nicely format timestamps when displaying results
[ https://issues.apache.org/jira/browse/HIVE-6930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102326#comment-14102326 ] Lars Francke commented on HIVE-6930: If this is implemented then it should be an optional thing, disabled by default. Otherwise you'd run into issues with timezones etc. Beeline should nicely format timestamps when displaying results --- Key: HIVE-6930 URL: https://issues.apache.org/jira/browse/HIVE-6930 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Gwen Shapira Assignee: Ferdinand Xu When I have a timestamp column in my query, I get the results back as the bigint with number of seconds since epoch. Not very user friendly or readable. This means that all my queries need to include stuff like: select from_unixtime(cast(round(transaction_ts/1000) as bigint))... which is not too readable either :) Other SQL query tools automatically convert timestamps to some standard readable date format. They even let users specify the default formatting by setting a parameter (for example NLS_DATE_FORMAT for Oracle). I'd love to see something like that in beeline. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7782) tez default engine not overridden by hive.execution.engine=mr in hive cli session
Hari Sekhon created HIVE-7782: - Summary: tez default engine not overridden by hive.execution.engine=mr in hive cli session Key: HIVE-7782 URL: https://issues.apache.org/jira/browse/HIVE-7782 Project: Hive Issue Type: Bug Components: CLI, Tez Environment: HDP2.1 Reporter: Hari Sekhon Priority: Minor I've deployed hive.execution.engine=tez as the default on my secondary HDP cluster I find that hive cli interactive sessions where I do {code} set hive.execution.engine=mr {code} still execute with Tez as shown in the Resource Manager applications view. Now this may make sense since it's connected a Tez session by that point but it's also misleading because the job progress output in the cli changes to look like MapReduce rather than Tez and the query time is increased although only to 15-16 secs rather than the 25-30+ secs I usually see with MR. The Resource Manager shows both of these jobs as TEZ application type. Is this a bug in the way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is reporting it? {code} hive Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties hive select count(*) from sample_07; Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: application_1408444369445_0031) Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 823 Time taken: 8.492 seconds, Fetched: 1 row(s) hive set hive.execution.engine=mr; hive select count(*) from sample_07; Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Job = job_1408444369445_0032, Tracking URL = http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1408444369445_0032 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-08-19 16:48:35,242 Stage-1 map = 0%, reduce = 0% 2014-08-19 16:48:40,539 Stage-1 map = 100%, reduce = 0% 2014-08-19 16:48:44,676 Stage-1 map = 100%, reduce = 100% Ended Job = job_1408444369445_0032 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 823 Time taken: 16.579 seconds, Fetched: 1 row(s) {code} If I exit hive shell and restart it instead using {code}--hiveconf hive.execution.engine=mr{code} to set before session is established then it does a proper MapReduce job according to RM and it also takes the longer expected 25 secs instead of the 8 in Tez or 15 in trying to do MR instead Tez session. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7782) tez default engine not overridden by hive.execution.engine=mr in hive cli session
[ https://issues.apache.org/jira/browse/HIVE-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hari Sekhon updated HIVE-7782: -- Description: I've deployed hive.execution.engine=tez as the default on my secondary HDP cluster I find that hive cli interactive sessions where I do {code} set hive.execution.engine=mr {code} still execute with Tez as shown in the Resource Manager applications view. Now this may make sense since it's connected a Tez session by that point but it's also misleading because the job progress output in the cli changes to look like MapReduce rather than Tez and the query time is increased from 8 to to 15-16 secs but still less than the 25-30+ secs I usually see with MR. The Resource Manager shows both of these jobs as TEZ application type regardless of setting hive.execution.engine=mr. Is this a bug in the way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is reporting it? {code} hive Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties hive select count(*) from sample_07; Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: application_1408444369445_0031) Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 823 Time taken: 8.492 seconds, Fetched: 1 row(s) hive set hive.execution.engine=mr; hive select count(*) from sample_07; Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Job = job_1408444369445_0032, Tracking URL = http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/ Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1408444369445_0032 Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0 2014-08-19 16:48:35,242 Stage-1 map = 0%, reduce = 0% 2014-08-19 16:48:40,539 Stage-1 map = 100%, reduce = 0% 2014-08-19 16:48:44,676 Stage-1 map = 100%, reduce = 100% Ended Job = job_1408444369445_0032 MapReduce Jobs Launched: Job 0: HDFS Read: 0 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 0 msec OK 823 Time taken: 16.579 seconds, Fetched: 1 row(s) {code} If I exit hive shell and restart it instead using {code}--hiveconf hive.execution.engine=mr{code} to set before session is established then it does a proper MapReduce job according to RM and it also takes the longer expected 25 secs instead of the 8 in Tez or 15 in trying to do MR instead Tez session. was: I've deployed hive.execution.engine=tez as the default on my secondary HDP cluster I find that hive cli interactive sessions where I do {code} set hive.execution.engine=mr {code} still execute with Tez as shown in the Resource Manager applications view. Now this may make sense since it's connected a Tez session by that point but it's also misleading because the job progress output in the cli changes to look like MapReduce rather than Tez and the query time is increased although only to 15-16 secs rather than the 25-30+ secs I usually see with MR. The Resource Manager shows both of these jobs as TEZ application type. Is this a bug in the way Hive is submitting the job (Tez vs MR) or a bug in the way the RM is reporting it? {code} hive Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties hive select count(*) from sample_07; Query ID = hari_20140819164848_c03824c7-0e76-4507-b619-6a22cb0fbc4c Total jobs = 1 Launching Job 1 out of 1 Status: Running (application id: application_1408444369445_0031) Map 1: -/- Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 0/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 0/1 Map 1: 1/1 Reducer 2: 1/1 Status: Finished successfully OK 823 Time taken: 8.492 seconds, Fetched: 1 row(s) hive set hive.execution.engine=mr; hive select count(*) from sample_07; Query ID = hari_20140819164848_b620d990-b405-479c-be5b-d9616527cefe Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=number In order to limit the maximum number of reducers: set hive.exec.reducers.max=number In order to set a constant number of reducers: set mapreduce.job.reduces=number Starting Job = job_1408444369445_0032, Tracking URL = http://lonsl1101827-data.uk.net.intra:8088/proxy/application_1408444369445_0032/ Kill
Re: Review Request 24293: HIVE-4629: HS2 should support an API to retrieve query logs
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24293/#review50977 --- Hi, This patch looks really good! I was not clear when I said how we should define the new fetchResults method. I hope my response below is clear, if not, please let me know! service/src/java/org/apache/hive/service/cli/CLIServiceClient.java https://reviews.apache.org/r/24293/#comment88863 Thank you very much for removing the thrift enum! That resolves the thrift enum compatability issue! I should have been a more clear on the other issue I was describing. I have felt for some time we should change the way we do RPC in Hive. Today we define specific methods for the use case at hand. This causes method explosion. For example after this patch we would have three method signatures which fetch results. Going forward I think we should define methods differently. For example, for this method I think we should define the classes: FetchResultsRequest and FetchResultsResponse and then have a new method: FetchResultsResponse fetchResults(FetchResultsRequest request) throws HiveSQLException and then all of the arguments can be defined inside FetchResultsRequest. That way everytime we add an argument, we don't to define a new public RPC method. I have described this approach on this mail here: http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E - Brock Noland On Aug. 14, 2014, 3:09 p.m., Dong Chen wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24293/ --- (Updated Aug. 14, 2014, 3:09 p.m.) Review request for hive. Repository: hive-git Description --- HIVE-4629: HS2 should support an API to retrieve query logs HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. Diffs - common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 3bfc681 service/if/TCLIService.thrift 80086b4 service/src/gen/thrift/gen-cpp/TCLIService_types.h 1b37fb5 service/src/gen/thrift/gen-cpp/TCLIService_types.cpp d5f98a8 service/src/gen/thrift/gen-javabean/org/apache/hive/service/cli/thrift/TFetchResultsReq.java 808b73f service/src/gen/thrift/gen-py/TCLIService/ttypes.py 2cbbdd8 service/src/gen/thrift/gen-rb/t_c_l_i_service_types.rb 93f9a81 service/src/java/org/apache/hive/service/cli/CLIService.java add37a1 service/src/java/org/apache/hive/service/cli/CLIServiceClient.java 87c10b9 service/src/java/org/apache/hive/service/cli/EmbeddedCLIServiceClient.java f665146 service/src/java/org/apache/hive/service/cli/FetchType.java PRE-CREATION service/src/java/org/apache/hive/service/cli/ICLIService.java c569796 service/src/java/org/apache/hive/service/cli/operation/GetCatalogsOperation.java c9fd5f9 service/src/java/org/apache/hive/service/cli/operation/GetColumnsOperation.java caf413d service/src/java/org/apache/hive/service/cli/operation/GetFunctionsOperation.java fd4e94d service/src/java/org/apache/hive/service/cli/operation/GetSchemasOperation.java ebca996 service/src/java/org/apache/hive/service/cli/operation/GetTableTypesOperation.java 05991e0 service/src/java/org/apache/hive/service/cli/operation/GetTablesOperation.java 315dbea service/src/java/org/apache/hive/service/cli/operation/GetTypeInfoOperation.java 0ec2543 service/src/java/org/apache/hive/service/cli/operation/HiveCommandOperation.java 3d3fddc service/src/java/org/apache/hive/service/cli/operation/LogDivertAppender.java PRE-CREATION service/src/java/org/apache/hive/service/cli/operation/MetadataOperation.java e0d17a1 service/src/java/org/apache/hive/service/cli/operation/Operation.java 45fbd61 service/src/java/org/apache/hive/service/cli/operation/OperationLog.java PRE-CREATION service/src/java/org/apache/hive/service/cli/operation/OperationManager.java 21c33bc service/src/java/org/apache/hive/service/cli/operation/SQLOperation.java de54ca1 service/src/java/org/apache/hive/service/cli/session/HiveSession.java 9785e95 service/src/java/org/apache/hive/service/cli/session/HiveSessionBase.java 4c3164e service/src/java/org/apache/hive/service/cli/session/HiveSessionImpl.java b39d64d service/src/java/org/apache/hive/service/cli/session/SessionManager.java 816bea4
[jira] [Commented] (HIVE-6093) table creation should fail when user does not have permissions on db
[ https://issues.apache.org/jira/browse/HIVE-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102368#comment-14102368 ] Thiruvel Thirumoolan commented on HIVE-6093: Thanks [~thejas] table creation should fail when user does not have permissions on db Key: HIVE-6093 URL: https://issues.apache.org/jira/browse/HIVE-6093 Project: Hive Issue Type: Bug Components: Authorization, HCatalog, Metastore Affects Versions: 0.12.0, 0.13.0 Reporter: Thiruvel Thirumoolan Assignee: Thiruvel Thirumoolan Priority: Minor Labels: authorization, metastore, security Fix For: 0.14.0 Attachments: HIVE-6093-1.patch, HIVE-6093.1.patch, HIVE-6093.1.patch, HIVE-6093.patch Its possible to create a table under a database where the user does not have write permission. It can be done by specifying a LOCATION where the user has write access (say /tmp/foo). This should be restricted. HdfsAuthorizationProvider (which typically runs on client) checks the database directory during table creation. But StorageBasedAuthorizationProvider does not. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7773: --- Attachment: HIVE-7773.2-spark.patch Same patch, I just removed the section commented out in IOContext Union all query finished with errors [Spark Branch] --- Key: HIVE-7773 URL: https://issues.apache.org/jira/browse/HIVE-7773 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Priority: Critical Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch When I run a union all query, I found the following error in spark log (the query finished with correct results though): {noformat} java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more {noformat} Judging from the log, I think we don't properly handle the input paths when cloning the job conf, so it may also affect other queries with multiple maps or reduces. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7773) Union all query finished with errors [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102374#comment-14102374 ] Brock Noland commented on HIVE-7773: Hi [~lirui], yes thank you very much for updating IOContext. I have removed the section of code which was commented out. I also hit that issue when looking at joins! FYI [~szehon] +1 pending tests Union all query finished with errors [Spark Branch] --- Key: HIVE-7773 URL: https://issues.apache.org/jira/browse/HIVE-7773 Project: Hive Issue Type: Bug Components: Spark Reporter: Rui Li Priority: Critical Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch When I run a union all query, I found the following error in spark log (the query finished with correct results though): {noformat} java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more {noformat} Judging from the log, I think we don't properly handle the input paths when cloning the job conf, so it may also affect other queries with multiple maps or reduces. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7769) add --SORT_BEFORE_DIFF to union all .q tests
[ https://issues.apache.org/jira/browse/HIVE-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102383#comment-14102383 ] Brock Noland commented on HIVE-7769: +1 add --SORT_BEFORE_DIFF to union all .q tests Key: HIVE-7769 URL: https://issues.apache.org/jira/browse/HIVE-7769 Project: Hive Issue Type: Bug Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7769.patch Some union all test cases do not generate deterministic ordered result. We need to add --SORT_BEFORE_DIFF to those .q tests -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7773: --- Issue Type: Sub-task (was: Bug) Parent: HIVE-7292 Union all query finished with errors [Spark Branch] --- Key: HIVE-7773 URL: https://issues.apache.org/jira/browse/HIVE-7773 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Priority: Critical Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch When I run a union all query, I found the following error in spark log (the query finished with correct results though): {noformat} java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more {noformat} Judging from the log, I think we don't properly handle the input paths when cloning the job conf, so it may also affect other queries with multiple maps or reduces. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Comment Edited] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102460#comment-14102460 ] Brock Noland edited comment on HIVE-4629 at 8/19/14 4:58 PM: - Hi Dong, I tried posting this on RB but it went down. Thank you very much for removing the thrift enum compatibility problem! I had another comment with regards to the method signature which I think I did not explain well. I think the new method should be: {noformat} FetchResultsResponse fetchResults(FetchResultsRequest) throws ... {noformat} The problem with how we've defined RPC methods to date has led to an explosion of RPC methods which is problematic. This is described in [more detail in this thread|http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E]. Let me know what you think!! Cheers, Brock was (Author: brocknoland): Hi Dong, I tried posting this on RB but it went down. Thank you very much for removing the thrift enum compatibility problem! I had another comment with regards to the method signature which I think I did not explain well. I think the new method should be: {noformat} FetchResultsResponse fetchResults(FetchResultsRequest) throws ... {noformat} The problem with how we've defined RPC methods to date has led to an explosion of RPC methods which is problematic. This is described in [more detail in this thread|http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E ]. Let me know what you think!! Cheers, Brock HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Dong Chen Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-4629) HS2 should support an API to retrieve query logs
[ https://issues.apache.org/jira/browse/HIVE-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102460#comment-14102460 ] Brock Noland commented on HIVE-4629: Hi Dong, I tried posting this on RB but it went down. Thank you very much for removing the thrift enum compatibility problem! I had another comment with regards to the method signature which I think I did not explain well. I think the new method should be: {noformat} FetchResultsResponse fetchResults(FetchResultsRequest) throws ... {noformat} The problem with how we've defined RPC methods to date has led to an explosion of RPC methods which is problematic. This is described in [more detail in this thread|http://mail-archives.apache.org/mod_mbox/hive-dev/201403.mbox/%3CCAFukC=6xss1kjgad7hv2v4wwoigjzctm1rujcczsocdj8x3...@mail.gmail.com%3E ]. Let me know what you think!! Cheers, Brock HS2 should support an API to retrieve query logs Key: HIVE-4629 URL: https://issues.apache.org/jira/browse/HIVE-4629 Project: Hive Issue Type: Sub-task Components: HiveServer2 Reporter: Shreepadma Venugopalan Assignee: Dong Chen Attachments: HIVE-4629-no_thrift.1.patch, HIVE-4629.1.patch, HIVE-4629.2.patch, HIVE-4629.3.patch.txt, HIVE-4629.4.patch, HIVE-4629.5.patch, HIVE-4629.6.patch HiveServer2 should support an API to retrieve query logs. This is particularly relevant because HiveServer2 supports async execution but doesn't provide a way to report progress. Providing an API to retrieve query logs will help report progress to the client. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7783) QTests throw exception during cleanup
Ashish Kumar Singh created HIVE-7783: Summary: QTests throw exception during cleanup Key: HIVE-7783 URL: https://issues.apache.org/jira/browse/HIVE-7783 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh qTests during cleanup try to drop tables read only tables and throw exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Assigned] (HIVE-7783) QTests throw exception during cleanup
[ https://issues.apache.org/jira/browse/HIVE-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh reassigned HIVE-7783: Assignee: Ashish Kumar Singh QTests throw exception during cleanup - Key: HIVE-7783 URL: https://issues.apache.org/jira/browse/HIVE-7783 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh qTests during cleanup try to drop tables read only tables and throw exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7734) Join stats annotation rule is not updating columns statistics correctly
[ https://issues.apache.org/jira/browse/HIVE-7734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7734: - Resolution: Fixed Fix Version/s: (was: 0.13.0) 0.14.0 Status: Resolved (was: Patch Available) Patch committed to trunk. Thanks Gunther and Brock for the review. Join stats annotation rule is not updating columns statistics correctly --- Key: HIVE-7734 URL: https://issues.apache.org/jira/browse/HIVE-7734 Project: Hive Issue Type: Sub-task Components: Query Processor, Statistics Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Fix For: 0.14.0 Attachments: HIVE-7734.1.patch, HIVE-7734.2.patch HIVE-7679 is not doing the correct thing. The scale down/up factor updating column stats was wrong as ratio = newRowCount/oldRowCount is always infinite (oldRowCount = 0). The old row count should be retrieved from parent corresponding to the current column whose statistics is being updated. -- This message was sent by Atlassian JIRA (v6.2#6252)
Review Request 24853: HIVE-7783: QTests throw exception during cleanup
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24853/ --- Review request for hive. Repository: hive-git Description --- HIVE-7783: QTests throw exception during cleanup Diffs - itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java af4a3e575ca5aea2746a5b51862ff178b59e403d Diff: https://reviews.apache.org/r/24853/diff/ Testing --- Ran a couple of qTests locally. Thanks, Ashish Singh
Re: Review Request 24853: HIVE-7783: QTests throw exception during cleanup
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24853/ --- (Updated Aug. 19, 2014, 5:29 p.m.) Review request for hive. Changes --- Link Hive JIRA. Bugs: HIVE-7783 https://issues.apache.org/jira/browse/HIVE-7783 Repository: hive-git Description --- HIVE-7783: QTests throw exception during cleanup Diffs - itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java af4a3e575ca5aea2746a5b51862ff178b59e403d Diff: https://reviews.apache.org/r/24853/diff/ Testing --- Ran a couple of qTests locally. Thanks, Ashish Singh
[jira] [Updated] (HIVE-7571) RecordUpdater should read virtual columns from row
[ https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7571: - Status: Patch Available (was: Open) RecordUpdater should read virtual columns from row -- Key: HIVE-7571 URL: https://issues.apache.org/jira/browse/HIVE-7571 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch Currently RecordUpdater.update and delete take rowid and original transaction as parameters. These values are already present in the row as part of the new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer from there. And the writer will already have to handle skipping ROW__ID when writing, so it needs to be aware of that column anyone. We could instead read the values from ROW__ID and then remove it from the object inspector in FileSinkOperator, but this will be hard in the vectorization case where rows are being dealt with 10k at a time. For these reasons it makes more sense to do this work in the writer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7783) QTests throw exception during cleanup
[ https://issues.apache.org/jira/browse/HIVE-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh updated HIVE-7783: - Attachment: HIVE-7783.patch RB: https://reviews.apache.org/r/24853/ QTests throw exception during cleanup - Key: HIVE-7783 URL: https://issues.apache.org/jira/browse/HIVE-7783 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7783.patch qTests during cleanup try to drop tables read only tables and throw exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7571) RecordUpdater should read virtual columns from row
[ https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated HIVE-7571: - Attachment: HIVE-7571.patch A patch that changes the RecordUpdater interface to assume that the ROWID is passed as a virtual column. This changes the update and delete calls to no longer explicitly ask for transaction id and row id in the interface. RecordUpdater should read virtual columns from row -- Key: HIVE-7571 URL: https://issues.apache.org/jira/browse/HIVE-7571 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch Currently RecordUpdater.update and delete take rowid and original transaction as parameters. These values are already present in the row as part of the new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer from there. And the writer will already have to handle skipping ROW__ID when writing, so it needs to be aware of that column anyone. We could instead read the values from ROW__ID and then remove it from the object inspector in FileSinkOperator, but this will be hard in the vectorization case where rows are being dealt with 10k at a time. For these reasons it makes more sense to do this work in the writer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO
Mostafa Mokhtar created HIVE-7784: - Summary: Created the needed indexes on Hive.PART_COL_STATS for CBO Key: HIVE-7784 URL: https://issues.apache.org/jira/browse/HIVE-7784 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 With CBO we need the correct set of indexes to provide an efficient Read/Write access. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24853: HIVE-7783: QTests throw exception during cleanup
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24853/#review50985 --- itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java https://reviews.apache.org/r/24853/#comment88870 Isn't this fixed by HIVE-7684? Are you still seeing errors on trunk? - Venki Korukanti On Aug. 19, 2014, 5:29 p.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24853/ --- (Updated Aug. 19, 2014, 5:29 p.m.) Review request for hive. Bugs: HIVE-7783 https://issues.apache.org/jira/browse/HIVE-7783 Repository: hive-git Description --- HIVE-7783: QTests throw exception during cleanup Diffs - itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java af4a3e575ca5aea2746a5b51862ff178b59e403d Diff: https://reviews.apache.org/r/24853/diff/ Testing --- Ran a couple of qTests locally. Thanks, Ashish Singh
[jira] [Updated] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO
[ https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7784: -- Description: With CBO we need the correct set of indexes to provide an efficient Read/Write access. These indexes improve performance of Explain plan and Analyzed table by 60% and 300%. {code} MySQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE; MsSQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Oracle CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Postgres CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree (DB_NAME,TABLE_NAME,COLUMN_NAME); {code} was: With CBO we need the correct set of indexes to provide an efficient Read/Write access. Created the needed indexes on Hive.PART_COL_STATS for CBO -- Key: HIVE-7784 URL: https://issues.apache.org/jira/browse/HIVE-7784 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 With CBO we need the correct set of indexes to provide an efficient Read/Write access. These indexes improve performance of Explain plan and Analyzed table by 60% and 300%. {code} MySQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE; MsSQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Oracle CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Postgres CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree (DB_NAME,TABLE_NAME,COLUMN_NAME); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24853: HIVE-7783: QTests throw exception during cleanup
On Aug. 19, 2014, 5:37 p.m., Venki Korukanti wrote: itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java, line 629 https://reviews.apache.org/r/24853/diff/1/?file=664330#file664330line629 Isn't this fixed by HIVE-7684? Are you still seeing errors on trunk? Ahh.. I had older versoin of trunk. Good to know, this is already fixed. Thanks. - Ashish --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24853/#review50985 --- On Aug. 19, 2014, 5:29 p.m., Ashish Singh wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24853/ --- (Updated Aug. 19, 2014, 5:29 p.m.) Review request for hive. Bugs: HIVE-7783 https://issues.apache.org/jira/browse/HIVE-7783 Repository: hive-git Description --- HIVE-7783: QTests throw exception during cleanup Diffs - itests/util/src/main/java/org/apache/hadoop/hive/ql/QTestUtil.java af4a3e575ca5aea2746a5b51862ff178b59e403d Diff: https://reviews.apache.org/r/24853/diff/ Testing --- Ran a couple of qTests locally. Thanks, Ashish Singh
[jira] [Updated] (HIVE-7771) ORC PPD fails for some decimal predicates
[ https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7771: - Attachment: HIVE-7771.2.patch [~daijy] I added BigDecimal support in SARG creation in this patch. ORC PPD fails for some decimal predicates - Key: HIVE-7771 URL: https://issues.apache.org/jira/browse/HIVE-7771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch Some queries like {code} select * from table where dcol=11.22BD; {code} fails when ORC predicate pushdown is enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Resolved] (HIVE-7783) QTests throw exception during cleanup
[ https://issues.apache.org/jira/browse/HIVE-7783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Kumar Singh resolved HIVE-7783. -- Resolution: Duplicate Fixed by HIVE-7684. QTests throw exception during cleanup - Key: HIVE-7783 URL: https://issues.apache.org/jira/browse/HIVE-7783 Project: Hive Issue Type: Bug Reporter: Ashish Kumar Singh Assignee: Ashish Kumar Singh Attachments: HIVE-7783.patch qTests during cleanup try to drop tables read only tables and throw exceptions. -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Review Request 24834: HIVE-7771: ORC PPD fails for some decimal predicates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24834/ --- (Updated Aug. 19, 2014, 5:51 p.m.) Review request for hive, Gopal V and Gunther Hagleitner. Changes --- Added support for BigDecimal in SARG construction. Repository: hive-git Description --- Some queries like {code} select * from table where dcol=11.22BD; {code} fails when ORC predicate pushdown is enabled. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java f5023bb ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java 2c53f65 ql/src/test/queries/clientpositive/orc_ppd_decimal.q a93590e ql/src/test/results/clientpositive/orc_ppd_decimal.q.out 0c11ea8 Diff: https://reviews.apache.org/r/24834/diff/ Testing --- Thanks, Prasanth_J
[jira] [Commented] (HIVE-7771) ORC PPD fails for some decimal predicates
[ https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102532#comment-14102532 ] Prasanth J commented on HIVE-7771: -- [~daijy] The predicate object will support BigDecimal now. However during SARG evaluation BigDecimal will be converted to HiveDecimal to match the type of decimal column statistics in ORC. ORC PPD fails for some decimal predicates - Key: HIVE-7771 URL: https://issues.apache.org/jira/browse/HIVE-7771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch Some queries like {code} select * from table where dcol=11.22BD; {code} fails when ORC predicate pushdown is enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
Dropping support for JDK6 in Apache Hadoop
[Apologies for the wide distribution.] Dear HBase/Hive/Pig/Oozie communities, We, over at Hadoop are considering dropping support for JDK6 this year. As you maybe aware we just released hadoop-2.5.0 and are now considering making the next release i.e. hadoop-2.6.0 the *last* release of Apache Hadoop which supports JDK6. This means, from hadoop-2.7.0 onwards we will not support JDK6 anymore and we *may* start relying on JDK7-specific apis. Now, the above releases a proposal and we do not want to pull the trigger without talking to projects downstream - hence the request for you feedback. Please feel free to forward this to other communities you might deem to be at risk from this too. thanks, Arun -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 24834: HIVE-7771: ORC PPD fails for some decimal predicates
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24834/ --- (Updated Aug. 19, 2014, 5:59 p.m.) Review request for hive, Gopal V and Gunther Hagleitner. Changes --- Added unit test for big decimal support in search argument. Repository: hive-git Description --- Some queries like {code} select * from table where dcol=11.22BD; {code} fails when ORC predicate pushdown is enabled. Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/io/orc/RecordReaderImpl.java f5023bb ql/src/java/org/apache/hadoop/hive/ql/io/sarg/SearchArgumentImpl.java 2c53f65 ql/src/test/org/apache/hadoop/hive/ql/io/sarg/TestSearchArgumentImpl.java b1524f7 ql/src/test/queries/clientpositive/orc_ppd_decimal.q a93590e ql/src/test/results/clientpositive/orc_ppd_decimal.q.out 0c11ea8 Diff: https://reviews.apache.org/r/24834/diff/ Testing --- Thanks, Prasanth_J
[jira] [Updated] (HIVE-7771) ORC PPD fails for some decimal predicates
[ https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth J updated HIVE-7771: - Attachment: HIVE-7771.3.patch Added unit test for big decimal support in search argument. ORC PPD fails for some decimal predicates - Key: HIVE-7771 URL: https://issues.apache.org/jira/browse/HIVE-7771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch, HIVE-7771.3.patch Some queries like {code} select * from table where dcol=11.22BD; {code} fails when ORC predicate pushdown is enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Attachment: HIVE-6361.patch Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde Attachments: HIVE-6361.patch I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Attachment: (was: HIVE-6361.patch) Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Status: Open (was: Patch Available) Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Status: Patch Available (was: Open) Attaching re-based patch. Commit db5e7181d329331d76d6d53741beb87b44f5263f, parent commit 253a869dc62c7d36b1020a70932ddd35cb44cb81. Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7771) ORC PPD fails for some decimal predicates
[ https://issues.apache.org/jira/browse/HIVE-7771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102564#comment-14102564 ] Daniel Dai commented on HIVE-7771: -- +1, works for me now. ORC PPD fails for some decimal predicates - Key: HIVE-7771 URL: https://issues.apache.org/jira/browse/HIVE-7771 Project: Hive Issue Type: Bug Affects Versions: 0.14.0 Reporter: Prasanth J Assignee: Prasanth J Attachments: HIVE-7771.1.patch, HIVE-7771.2.patch, HIVE-7771.3.patch Some queries like {code} select * from table where dcol=11.22BD; {code} fails when ORC predicate pushdown is enabled. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7747) Submitting a query to Spark from HiveServer2 fails [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102589#comment-14102589 ] Venki Korukanti commented on HIVE-7747: --- When I run the spark job locally (spark.master=local), it completes successfully. It repros only on the Spark cluster. Similar exception is seen in HIVE-7437. HIVE-7437 suggested shading jetty/servlet classes, but I still see the same exception. Submitting a query to Spark from HiveServer2 fails [Spark Branch] - Key: HIVE-7747 URL: https://issues.apache.org/jira/browse/HIVE-7747 Project: Hive Issue Type: Bug Components: Spark Affects Versions: spark-branch Reporter: Venki Korukanti Assignee: Venki Korukanti Fix For: spark-branch {{spark.serializer}} is set to {{org.apache.spark.serializer.KryoSerializer}}. Same configuration works fine from Hive CLI. Spark tasks fails with following error: {code} Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 9, 192.168.168.216): java.lang.IllegalStateException: unread block data java.io.ObjectInputStream$BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2421) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1382) java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990) java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915) java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798) java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350) java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62) org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:84) org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7774) Issues with location path for temporary external tables
[ https://issues.apache.org/jira/browse/HIVE-7774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102593#comment-14102593 ] Ashutosh Chauhan commented on HIVE-7774: +1 Issues with location path for temporary external tables --- Key: HIVE-7774 URL: https://issues.apache.org/jira/browse/HIVE-7774 Project: Hive Issue Type: Bug Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-7774.1.patch Depending on the location string passed into temp external table, a query requiring a map/reduce job will fail. Example: {noformat} create temporary external table tmp1 (c1 string) location '/tmp/tmp1'; describe extended tmp1; select count(*) from tmp1; {noformat} Will result in the following error: {noformat} Diagnostic Messages for this Task: Error: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:426) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:38) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 17 more Caused by: java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:154) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.configure(ExecMapper.java:123) ... 22 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:398) ... 23 more FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask {noformat} If the location is set to 'hdfs:/tmp/tmp1', it gets the following error: {noformat} java.io.IOException: cannot find dir = hdfs://node-1.example.com:8020/tmp/tmp1/tmp1.txt in pathToPartitionInfo: [hdfs:/tmp/tmp1] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:344) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:306) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.init(CombineHiveInputFormat.java:108) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:455) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:520) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:512) at
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Status: Open (was: Patch Available) Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde Attachments: HIVE-6361.patch I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-6361) Un-fork Sqlline
[ https://issues.apache.org/jira/browse/HIVE-6361?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julian Hyde updated HIVE-6361: -- Status: Patch Available (was: Open) Un-fork Sqlline --- Key: HIVE-6361 URL: https://issues.apache.org/jira/browse/HIVE-6361 Project: Hive Issue Type: Improvement Components: CLI Affects Versions: 0.12.0 Reporter: Julian Hyde Assignee: Julian Hyde Attachments: HIVE-6361.patch I propose to merge the two development forks of sqlline: Hive's beeline module, and the fork at https://github.com/julianhyde/sqlline. How did the forks come about? Hive’s SQL command-line interface Beeline was created by forking Sqlline (see HIVE-987, HIVE-3100), which at the time it was a useful but low-activity project languishing on SourceForge without an active owner. Around the same time, Julian Hyde independently started a github repo based on the same code base. Now several projects are using Julian Hyde's sqlline, including Apache Drill, Apache Phoenix, Cascading Lingual and Optiq. Merging these two forks will allow us to pool our resources. (Case in point: Drill issue DRILL-327 had already been fixed in a later version of sqlline; it still exists in beeline.) I propose the following steps: 1. Copy Julian Hyde's sqlline as a new Hive module, hive-sqlline. 2. Port fixes to hive-beeline into hive-sqlline. 3. Make hive-beeline depend on hive-sqlline, and remove code that is identical. What remains in the hive-beeline module is Beeline.java (a derived class of Sqlline.java) and Hive-specific extensions. 4. Make the hive-sqlline the official successor to Julian Hyde's sqlline. This achieves continuity for Hive’s users, gives the users of the non-Hive sqlline a version with minimal dependencies, unifies the two code lines, and brings everything under the Apache roof. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7593) Instantiate SparkClient per user session [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinna Rao Lalam updated HIVE-7593: --- Attachment: HIVE-7593.1-spark.patch Instantiate SparkClient per user session [Spark Branch] --- Key: HIVE-7593 URL: https://issues.apache.org/jira/browse/HIVE-7593 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Xuefu Zhang Assignee: Chinna Rao Lalam Attachments: HIVE-7593-spark.patch, HIVE-7593.1-spark.patch SparkContext is the main class via which Hive talk to Spark cluster. SparkClient encapsulates a SparkContext instance. Currently all user sessions share a single SparkClient instance in HiveServer2. While this is good enough for a POC, even for our first two milestones, this is not desirable for a multi-tenancy environment and gives least flexibility to Hive users. Here is what we propose: 1. Have a SparkClient instance per user session. The SparkClient instance is created when user executes its first query in the session. It will get destroyed when user session ends. 2. The SparkClient is instantiated based on the spark configurations that are available to the user, including those defined at the global level and those overwritten by the user (thru set command, for instance). 3. Ideally, when user changes any spark configuration during the session, the old SparkClient instance should be destroyed and a new one based on the new configurations is created. This may turn out to be a little hard, and thus it's a nice-to-have. If not implemented, we need to document that subsequent configuration changes will not take effect in the current session. Please note that there is a thread-safety issue on Spark side where multiple SparkContext instances cannot coexist in the same JVM (SPARK-2243). We need to work with Spark community to get this addressed. Besides above functional requirements, avoid potential issues is also a consideration. For instance, sharing SC among users is bad, as resources (such as jar for UDF) will be also shared, which is problematic. On the other hand, one SC per job seems too expensive, as the resource needs to be re-rendered even there isn't any change. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7281) DbTxnManager acquiring wrong level of lock for dynamic partitioning
[ https://issues.apache.org/jira/browse/HIVE-7281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102621#comment-14102621 ] Ashutosh Chauhan commented on HIVE-7281: Patch looks fine. But, I wonder if Entity type of DummyPartition make sense at all. It seems this entity is created only in DP case to be used for locking and authorization purposes. And since in locking case (as argued in this ticket) as well as auth case (probably) it make sense to use Table entity. I dont see what useful purpose DummyPartition serves. On the contrary, it results in confusion like the topic of this jira. Shall we just delete this DummyPartition entity. cc: [~thejas] DbTxnManager acquiring wrong level of lock for dynamic partitioning --- Key: HIVE-7281 URL: https://issues.apache.org/jira/browse/HIVE-7281 Project: Hive Issue Type: Bug Components: Locking, Transactions Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7281.patch Currently DbTxnManager.acquireLocks() locks the DUMMY_PARTITION for dynamic partitioning. But this is not adequate. This will not prevent drop operations on partitions being written to. The lock should be at the table level. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO
[ https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7784: -- Attachment: HIVE-7784.1.patch Created the needed indexes on Hive.PART_COL_STATS for CBO -- Key: HIVE-7784 URL: https://issues.apache.org/jira/browse/HIVE-7784 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7784.1.patch With CBO we need the correct set of indexes to provide an efficient Read/Write access. These indexes improve performance of Explain plan and Analyzed table by 60% and 300%. {code} MySQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE; MsSQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Oracle CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Postgres CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree (DB_NAME,TABLE_NAME,COLUMN_NAME); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7735) Implement Char, Varchar in ParquetSerDe
[ https://issues.apache.org/jira/browse/HIVE-7735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102633#comment-14102633 ] Szehon Ho commented on HIVE-7735: - Hi Mohit, the patch became stale after change to Virtual Column. Can you please rebase? Implement Char, Varchar in ParquetSerDe --- Key: HIVE-7735 URL: https://issues.apache.org/jira/browse/HIVE-7735 Project: Hive Issue Type: Sub-task Components: Serializers/Deserializers Reporter: Mohit Sabharwal Assignee: Mohit Sabharwal Labels: Parquet Attachments: HIVE-7735.1.patch, HIVE-7735.1.patch, HIVE-7735.2.patch, HIVE-7735.2.patch, HIVE-7735.patch This JIRA is to implement CHAR and VARCHAR support in Parquet SerDe. Both are represented in Parquet as PrimitiveType binary and OriginalType UTF8. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7769) add --SORT_BEFORE_DIFF to union all .q tests
[ https://issues.apache.org/jira/browse/HIVE-7769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7769: --- Resolution: Fixed Fix Version/s: 0.14.0 Status: Resolved (was: Patch Available) Thank you for your contribution! I have committed this to trunk! add --SORT_BEFORE_DIFF to union all .q tests Key: HIVE-7769 URL: https://issues.apache.org/jira/browse/HIVE-7769 Project: Hive Issue Type: Bug Reporter: Na Yang Assignee: Na Yang Fix For: 0.14.0 Attachments: HIVE-7769.patch Some union all test cases do not generate deterministic ordered result. We need to add --SORT_BEFORE_DIFF to those .q tests -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7571) RecordUpdater should read virtual columns from row
[ https://issues.apache.org/jira/browse/HIVE-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102668#comment-14102668 ] Hive QA commented on HIVE-7571: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662789/HIVE-7571.patch {color:red}ERROR:{color} -1 due to 2 failed/errored test(s), 5819 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_auto_sortmerge_join_8 org.apache.hive.jdbc.miniHS2.TestHiveServer2.testConnection {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/403/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-TRUNK-Build/403/console Test logs: http://ec2-174-129-184-35.compute-1.amazonaws.com/logs/PreCommit-HIVE-TRUNK-Build-403/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 2 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662789 RecordUpdater should read virtual columns from row -- Key: HIVE-7571 URL: https://issues.apache.org/jira/browse/HIVE-7571 Project: Hive Issue Type: Sub-task Components: Transactions Affects Versions: 0.13.0 Reporter: Alan Gates Assignee: Alan Gates Attachments: HIVE-7571.WIP.patch, HIVE-7571.patch Currently RecordUpdater.update and delete take rowid and original transaction as parameters. These values are already present in the row as part of the new ROW__ID virtual column in HIVE-7513, and thus can be read by the writer from there. And the writer will already have to handle skipping ROW__ID when writing, so it needs to be aware of that column anyone. We could instead read the values from ROW__ID and then remove it from the object inspector in FileSinkOperator, but this will be hard in the vectorization case where rows are being dealt with 10k at a time. For these reasons it makes more sense to do this work in the writer. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7717) Add .q tests coverage for union all [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102675#comment-14102675 ] Brock Noland commented on HIVE-7717: I merged HIVE-7769 into the branch! Add .q tests coverage for union all [Spark Branch] Key: HIVE-7717 URL: https://issues.apache.org/jira/browse/HIVE-7717 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7717.1-spark.patch, HIVE-7717.2-spark.patch Add automation test coverage for union all, by searching through the q-tests in ql/src/test/queries/clientpositive/ for union tests (like union*.q) and verifying/enabling them on spark. Steps to do: 1. Enable a qtest q-test-name.q in itests/src/test/resources/testconfiguration.properties by adding the .q test files to spark.query.files. 2. Run mvn test -Dtest=TestSparkCliDriver -Dqfile=q-test-name.q -Dtest.output.overwrite=true -Phadoop-2 to generate the output (located in ql/src/test/results/clientpositive/spark). File will be called q-test-name.q.out. 3. Check the generated output is good by verifying the results. For comparison, check the MR version in ql/src/test/results/clientpositive/q-test-name.q.out. The reason its separate is because the explain plan outputs are different for Spark/MR. 4. Checkin the modification to testconfiguration.properties, and the generated q.out file as well. You only have to generate the output once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7773) Union all query finished with errors [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7773: --- Assignee: Rui Li Status: Patch Available (was: Open) Marking Patch Available Union all query finished with errors [Spark Branch] --- Key: HIVE-7773 URL: https://issues.apache.org/jira/browse/HIVE-7773 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Critical Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch When I run a union all query, I found the following error in spark log (the query finished with correct results though): {noformat} java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more {noformat} Judging from the log, I think we don't properly handle the input paths when cloning the job conf, so it may also affect other queries with multiple maps or reduces. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7781) Enable windowing and analytic function qtests.[Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102694#comment-14102694 ] Brock Noland commented on HIVE-7781: +1 Enable windowing and analytic function qtests.[Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7781.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7779) Support windowing and analytic functions [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7779: --- Summary: Support windowing and analytic functions [Spark Branch] (was: Support windowing and analytic functions.[Spark Branch]) Support windowing and analytic functions [Spark Branch] --- Key: HIVE-7779 URL: https://issues.apache.org/jira/browse/HIVE-7779 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Verify the functionality and fix found issues, which should include: # windowing functions # the OVER clause # analytic functions -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7781: --- Summary: Enable windowing and analytic function qtests [Spark Branch] (was: Enable windowing and analytic function qtests.[Spark Branch]) Enable windowing and analytic function qtests [Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Attachments: HIVE-7781.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO
[ https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7784: -- Attachment: HIVE-7784.2.patch Created the needed indexes on Hive.PART_COL_STATS for CBO -- Key: HIVE-7784 URL: https://issues.apache.org/jira/browse/HIVE-7784 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7784.1.patch, HIVE-7784.2.patch With CBO we need the correct set of indexes to provide an efficient Read/Write access. These indexes improve performance of Explain plan and Analyzed table by 60% and 300%. {code} MySQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE; MsSQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Oracle CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Postgres CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree (DB_NAME,TABLE_NAME,COLUMN_NAME); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7784) Created the needed indexes on Hive.PART_COL_STATS for CBO
[ https://issues.apache.org/jira/browse/HIVE-7784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102698#comment-14102698 ] Mostafa Mokhtar commented on HIVE-7784: --- [~ashutoshc] Code review link https://reviews.apache.org/r/24861/diff/# Created the needed indexes on Hive.PART_COL_STATS for CBO -- Key: HIVE-7784 URL: https://issues.apache.org/jira/browse/HIVE-7784 Project: Hive Issue Type: Bug Components: CBO Affects Versions: 0.14.0 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7784.1.patch, HIVE-7784.2.patch With CBO we need the correct set of indexes to provide an efficient Read/Write access. These indexes improve performance of Explain plan and Analyzed table by 60% and 300%. {code} MySQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME) USING BTREE; MsSQL CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Oracle CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS (DB_NAME,TABLE_NAME,COLUMN_NAME); Postgres CREATE INDEX PART_COL_STATS_N50 ON PART_COL_STATS USING btree (DB_NAME,TABLE_NAME,COLUMN_NAME); {code} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7781) Enable windowing and analytic function qtests [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7781: --- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Thank you so much for your contribution! I have committed this to spark!! Enable windowing and analytic function qtests [Spark Branch] Key: HIVE-7781 URL: https://issues.apache.org/jira/browse/HIVE-7781 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7781.1-spark.patch -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7717) Add .q tests coverage for union all [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102697#comment-14102697 ] Brock Noland commented on HIVE-7717: Also FYI I committed HIVE-7781 so you'll need to pull the latest HEAD. Add .q tests coverage for union all [Spark Branch] Key: HIVE-7717 URL: https://issues.apache.org/jira/browse/HIVE-7717 Project: Hive Issue Type: Sub-task Components: Spark Affects Versions: spark-branch Reporter: Na Yang Assignee: Na Yang Attachments: HIVE-7717.1-spark.patch, HIVE-7717.2-spark.patch Add automation test coverage for union all, by searching through the q-tests in ql/src/test/queries/clientpositive/ for union tests (like union*.q) and verifying/enabling them on spark. Steps to do: 1. Enable a qtest q-test-name.q in itests/src/test/resources/testconfiguration.properties by adding the .q test files to spark.query.files. 2. Run mvn test -Dtest=TestSparkCliDriver -Dqfile=q-test-name.q -Dtest.output.overwrite=true -Phadoop-2 to generate the output (located in ql/src/test/results/clientpositive/spark). File will be called q-test-name.q.out. 3. Check the generated output is good by verifying the results. For comparison, check the MR version in ql/src/test/results/clientpositive/q-test-name.q.out. The reason its separate is because the explain plan outputs are different for Spark/MR. 4. Checkin the modification to testconfiguration.properties, and the generated q.out file as well. You only have to generate the output once. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7728) Enable q-tests for TABLESAMPLE feature [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-7728: --- Resolution: Fixed Fix Version/s: spark-branch Status: Resolved (was: Patch Available) Thank you very much for your contribution! I have committed this to spark! Enable q-tests for TABLESAMPLE feature [Spark Branch] -- Key: HIVE-7728 URL: https://issues.apache.org/jira/browse/HIVE-7728 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Chengxiang Li Assignee: Chengxiang Li Fix For: spark-branch Attachments: HIVE-7728.1-spark.patch Enable q-tests for TABLESAMPLE feature since automatic test environment is ready. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7702) Start running .q file tests on spark [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102719#comment-14102719 ] Brock Noland commented on HIVE-7702: Let's try and add the following tests in this JIRA: {noformat} enforce_order.q,\ filter_join_breaktask.q,\ filter_join_breaktask2.q,\ groupby1.q,\ groupby2.q,\ groupby3.q,\ having.q,\ insert1.q,\ insert_into1.q,\ insert_into2.q,\ {noformat} Start running .q file tests on spark [Spark Branch] --- Key: HIVE-7702 URL: https://issues.apache.org/jira/browse/HIVE-7702 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Brock Noland Assignee: Chinna Rao Lalam Spark can currently only support a few queries, however there are some .q file tests which will pass today. The basic idea is that we should get some number of these actually working (10-20) so we can actually start testing the project. A good starting point might be the udf*, varchar*, or alter* tests: https://github.com/apache/hive/tree/spark/ql/src/test/queries/clientpositive To generate the output file for test XXX.q, you'd do: {noformat} mvn clean install -DskipTests -Phadoop-2 cd itests mvn clean install -DskipTests -Phadoop-2 cd qtest-spark mvn test -Dtest=TestCliDriver -Dqfile=XXX.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} which would generate XXX.q.out which we can check-in to source control as a golden file. Multiple tests can be run at a give time as so: {noformat} mvn test -Dtest=TestCliDriver -Dqfile=X1.q,X2.q -Dtest.output.overwrite=true -Phadoop-2 {noformat} -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mostafa Mokhtar updated HIVE-7723: -- Attachment: HIVE-7723.5.patch Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity Key: HIVE-7723 URL: https://issues.apache.org/jira/browse/HIVE-7723 Project: Hive Issue Type: Bug Components: CLI, Physical Optimizer Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it showed that ReadEntity.equals is taking ~40% of the CPU. ReadEntity.equals is called from the snippet below. Again and again the set is iterated over to get the actual match, a HashMap is a better option for this case as Set doesn't have a Get method. Also for ReadEntity equals is case-insensitive while hash is , which is an undesired behavior. {code} public static ReadEntity addInput(SetReadEntity inputs, ReadEntity newInput) { // If the input is already present, make sure the new parent is added to the input. if (inputs.contains(newInput)) { for (ReadEntity input : inputs) { if (input.equals(newInput)) { if ((newInput.getParents() != null) (!newInput.getParents().isEmpty())) { input.getParents().addAll(newInput.getParents()); input.setDirect(input.isDirect() || newInput.isDirect()); } return input; } } assert false; } else { inputs.add(newInput); return newInput; } // make compile happy return null; } {code} This is the query used : {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status cd2.cd_marital_status and i_color in
[jira] [Commented] (HIVE-7723) Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity
[ https://issues.apache.org/jira/browse/HIVE-7723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102728#comment-14102728 ] Mostafa Mokhtar commented on HIVE-7723: --- [~gopalv] Link to code review https://reviews.apache.org/r/24864/diff/# Explain plan for complex query with lots of partitions is slow due to in-efficient collection used to find a matching ReadEntity Key: HIVE-7723 URL: https://issues.apache.org/jira/browse/HIVE-7723 Project: Hive Issue Type: Bug Components: CLI, Physical Optimizer Affects Versions: 0.13.1 Reporter: Mostafa Mokhtar Assignee: Mostafa Mokhtar Fix For: 0.14.0 Attachments: HIVE-7723.1.patch, HIVE-7723.2.patch, HIVE-7723.3.patch, HIVE-7723.4.patch, HIVE-7723.5.patch Explain on TPC-DS query 64 took 11 seconds, when the CLI was profiled it showed that ReadEntity.equals is taking ~40% of the CPU. ReadEntity.equals is called from the snippet below. Again and again the set is iterated over to get the actual match, a HashMap is a better option for this case as Set doesn't have a Get method. Also for ReadEntity equals is case-insensitive while hash is , which is an undesired behavior. {code} public static ReadEntity addInput(SetReadEntity inputs, ReadEntity newInput) { // If the input is already present, make sure the new parent is added to the input. if (inputs.contains(newInput)) { for (ReadEntity input : inputs) { if (input.equals(newInput)) { if ((newInput.getParents() != null) (!newInput.getParents().isEmpty())) { input.getParents().addAll(newInput.getParents()); input.setDirect(input.isDirect() || newInput.isDirect()); } return input; } } assert false; } else { inputs.add(newInput); return newInput; } // make compile happy return null; } {code} This is the query used : {code} select cs1.product_name ,cs1.store_name ,cs1.store_zip ,cs1.b_street_number ,cs1.b_streen_name ,cs1.b_city ,cs1.b_zip ,cs1.c_street_number ,cs1.c_street_name ,cs1.c_city ,cs1.c_zip ,cs1.syear ,cs1.cnt ,cs1.s1 ,cs1.s2 ,cs1.s3 ,cs2.s1 ,cs2.s2 ,cs2.s3 ,cs2.syear ,cs2.cnt from (select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3 FROM store_sales JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number = store_returns.sr_ticket_number JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk JOIN store ON store_sales.ss_store_sk = store.s_store_sk JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk JOIN item ON store_sales.ss_item_sk = item.i_item_sk JOIN (select cs_item_sk ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit) as refund from catalog_sales JOIN catalog_returns ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk and catalog_sales.cs_order_number = catalog_returns.cr_order_number group by cs_item_sk having sum(cs_ext_list_price)2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)) cs_ui ON store_sales.ss_item_sk = cs_ui.cs_item_sk WHERE cd1.cd_marital_status
[jira] [Created] (HIVE-7785) CBO: Projection Pruning needs to handle cross Joins
Laljo John Pullokkaran created HIVE-7785: Summary: CBO: Projection Pruning needs to handle cross Joins Key: HIVE-7785 URL: https://issues.apache.org/jira/browse/HIVE-7785 Project: Hive Issue Type: Sub-task Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Projection pruning needs to handle cross joins. Ex: select r1.x from r1 join r2. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HIVE-7773) Union all query finished with errors [Spark Branch]
[ https://issues.apache.org/jira/browse/HIVE-7773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14102755#comment-14102755 ] Hive QA commented on HIVE-7773: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12662771/HIVE-7773.2-spark.patch {color:red}ERROR:{color} -1 due to 9 failed/errored test(s), 5925 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_sample_islocalmode_hook org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_dynpart_sort_opt_vectorization org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union2 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union3 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union5 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union7 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union8 org.apache.hadoop.hive.cli.TestMiniTezCliDriver.testCliDriver_union9 org.apache.hadoop.hive.cli.TestNegativeCliDriver.testNegativeCliDriver_fs_default_name2 {noformat} Test results: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/62/testReport Console output: http://ec2-174-129-184-35.compute-1.amazonaws.com/jenkins/job/PreCommit-HIVE-SPARK-Build/62/console Test logs: http://ec2-54-176-176-199.us-west-1.compute.amazonaws.com/logs/PreCommit-HIVE-SPARK-Build-62/ Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests exited with: TestsFailedException: 9 tests failed {noformat} This message is automatically generated. ATTACHMENT ID: 12662771 Union all query finished with errors [Spark Branch] --- Key: HIVE-7773 URL: https://issues.apache.org/jira/browse/HIVE-7773 Project: Hive Issue Type: Sub-task Components: Spark Reporter: Rui Li Assignee: Rui Li Priority: Critical Attachments: HIVE-7773.2-spark.patch, HIVE-7773.spark.patch When I run a union all query, I found the following error in spark log (the query finished with correct results though): {noformat} java.lang.RuntimeException: Map operator initialization failed at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:127) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:52) at org.apache.hadoop.hive.ql.exec.spark.HiveMapFunction.call(HiveMapFunction.java:30) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$7$1.apply(JavaRDDLike.scala:164) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.RDD$$anonfun$13.apply(RDD.scala:596) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:35) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:54) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:199) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: Configuration and input path are inconsistent at org.apache.hadoop.hive.ql.exec.MapOperator.setChildren(MapOperator.java:404) at org.apache.hadoop.hive.ql.exec.spark.SparkMapRecordHandler.init(SparkMapRecordHandler.java:93) ... 16 more {noformat} Judging from the log, I think we don't properly handle the input paths when cloning the job conf, so it may also affect other queries with multiple maps or reduces. -- This message was sent by Atlassian JIRA (v6.2#6252)