[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722235#comment-13722235 ] Alexey Zotov commented on HIVE-3442: I have a problem with HA mode for NameNodes. When I specified the following url as _avro.schema.url_: {noformat} For NameNode's HA mode you can specify nameservice instead of an active NameNode as _avro.schema.url_: {noformat} http://some_datanode_address:50075/streamFile/path/to/file/schema.json?nnaddr=nameservice1:8020'; {noformat} AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722238#comment-13722238 ] Alexey Zotov commented on HIVE-3442: Please remove my previous comment. I had a problem with HA mode for NameNodes (_namenode1_address_ and _namenode2_address_). At first I specified the following url as avro.schema.url: {noformat} http://some_datanode_address:50075/streamFile/path/to/file/schema.json?nnaddr=namenode1_address:8020 {noformat} But I couldn't get data from Hive when _namenode1_address_ was StandByNode. So, had to change the link manually. After some time, I found how to fix it. I want to post it here and I hope it will help to someone: {noformat} http://some_datanode_address:50075/streamFile/path/to/file/schema.json?nnaddr=nameservice1:8020'; {noformat} So, for NameNode's HA mode you can specify nameservice instead of an active NameNode. AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output:
[jira] [Commented] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722245#comment-13722245 ] Andy Jefferson commented on HIVE-3256: -- https://issues.apache.org/jira/browse/HIVE-3632 upgraded Hive to use DN 3.2.x. This comes with its own repackaged ASM internally so you don't need any ASM for DataNucleus any longer. Consequently any DN-utilising system can use whichever version of ASM it requires Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2137) JDBC driver doesn't encode string properly.
[ https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kousuke Saruta updated HIVE-2137: - Attachment: HIVE-2137.patch Hi, I've re-fine the patch uploaded before. I've added a test code and test data. JDBC driver doesn't encode string properly. --- Key: HIVE-2137 URL: https://issues.apache.org/jira/browse/HIVE-2137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Jin Adachi Fix For: 0.12.0 Attachments: HIVE-2137.patch, HIVE-2137.patch JDBC driver for HiveServer1 decodes string by client side default encoding, which depends on operating system unless we don't specify another encoding. It ignore server side encoding. For example, when server side operating system and encoding are Linux (utf-8) and client side operating system and encoding are Windows (shift-jis : it's japanese charset, makes character corruption happens in the client. In current implementation of Hive, UTF-8 appears to be expected in server side so client side should encode/decode string as UTF-8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2137) JDBC driver doesn't encode string properly.
[ https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722303#comment-13722303 ] Hive QA commented on HIVE-2137: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594645/HIVE-2137.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2736 tests executed *Failed tests:* {noformat} org.apache.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/220/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/220/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. JDBC driver doesn't encode string properly. --- Key: HIVE-2137 URL: https://issues.apache.org/jira/browse/HIVE-2137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Jin Adachi Fix For: 0.12.0 Attachments: HIVE-2137.patch, HIVE-2137.patch JDBC driver for HiveServer1 decodes string by client side default encoding, which depends on operating system unless we don't specify another encoding. It ignore server side encoding. For example, when server side operating system and encoding are Linux (utf-8) and client side operating system and encoding are Windows (shift-jis : it's japanese charset, makes character corruption happens in the client. In current implementation of Hive, UTF-8 appears to be expected in server side so client side should encode/decode string as UTF-8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-305) Port Hadoop streaming's counters/status reporters to Hive Transforms
[ https://issues.apache.org/jira/browse/HIVE-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-305: -- Release Note: (was: I use the trunk to create this patch . http://svn.apache.org/repos/asf/hive/trunk ) Port Hadoop streaming's counters/status reporters to Hive Transforms Key: HIVE-305 URL: https://issues.apache.org/jira/browse/HIVE-305 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Guo Hongjie Attachments: HIVE-305.1.patch, HIVE-305.2.patch, hive-305.3.diff.txt, HIVE-305.patch.txt https://issues.apache.org/jira/browse/HADOOP-1328 Introduced a way for a streaming process to update global counters and status using stderr stream to emit information. Use reporter:counter:group,counter,amount to update a counter. Use reporter:status:message to update status. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-305) Port Hadoop streaming's counters/status reporters to Hive Transforms
[ https://issues.apache.org/jira/browse/HIVE-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-305: -- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk! Thank you for your contribution Guo and Edward! Port Hadoop streaming's counters/status reporters to Hive Transforms Key: HIVE-305 URL: https://issues.apache.org/jira/browse/HIVE-305 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Guo Hongjie Fix For: 0.12.0 Attachments: HIVE-305.1.patch, HIVE-305.2.patch, hive-305.3.diff.txt, HIVE-305.patch.txt https://issues.apache.org/jira/browse/HADOOP-1328 Introduced a way for a streaming process to update global counters and status using stderr stream to emit information. Use reporter:counter:group,counter,amount to update a counter. Use reporter:status:message to update status. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array
[ https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niko Stahl updated HIVE-4943: - Component/s: Query Processor An explode function that includes the item's position in the array -- Key: HIVE-4943 URL: https://issues.apache.org/jira/browse/HIVE-4943 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Niko Stahl Labels: patch Original Estimate: 8h Remaining Estimate: 8h A function that explodes an array and includes an output column with the position of each item in the original array. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4838: --- Status: Open (was: Patch Available) Forgot to reface junit. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HIVE-2906) Support providing some table properties by user via SQL
[ https://issues.apache.org/jira/browse/HIVE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-2906. --- Resolution: Fixed Fix Version/s: 0.12.0 Support providing some table properties by user via SQL --- Key: HIVE-2906 URL: https://issues.apache.org/jira/browse/HIVE-2906 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Fix For: 0.12.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.5.patch, HIVE-2906.D2499.6.patch, HIVE-2906.D2499.7.patch Some properties are needed to be provided to StorageHandler by user in runtime. It might be an address for remote resource or retry count for access or maximum version count(for hbase), etc. For example, {code} select emp.empno, emp.ename from hbase_emp ('max.version'='3') emp; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2906) Support providing some table properties by user via SQL
[ https://issues.apache.org/jira/browse/HIVE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722511#comment-13722511 ] Edward Capriolo commented on HIVE-2906: --- Committed. Thanks Navis. Support providing some table properties by user via SQL --- Key: HIVE-2906 URL: https://issues.apache.org/jira/browse/HIVE-2906 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.1.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.2.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.3.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.4.patch, ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.5.patch, HIVE-2906.D2499.6.patch, HIVE-2906.D2499.7.patch Some properties are needed to be provided to StorageHandler by user in runtime. It might be an address for remote resource or retry count for access or maximum version count(for hbase), etc. For example, {code} select emp.empno, emp.ename from hbase_emp ('max.version'='3') emp; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: HCatalog (from Hive 0.11) and Hadoop 2
There is a build scheduled on jenkins for hive trunk which is failing. I will give it a try on my local for hive-011, there is another build which does the ptests which is disabled due to lots of test case failures. https://builds.apache.org/job/Hive-trunk-hadoop2/ I will update you if I could build it On Mon, Jul 29, 2013 at 8:07 PM, Rodrigo Trujillo rodrigo.truji...@linux.vnet.ibm.com wrote: Hi, is it possible to build Hive 0.11 and HCatalog with Hadoop 2 (2.0.4-alpha) ?? Regards, Rodrigo -- Nitin Pawar
[jira] [Commented] (HIVE-4934) ntile function has to be the last thing in the select list
[ https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722548#comment-13722548 ] Lars Francke commented on HIVE-4934: I see. A misunderstanding on my side then I guess. So at most it's a documentation issue. ntile function has to be the last thing in the select list -- Key: HIVE-4934 URL: https://issues.apache.org/jira/browse/HIVE-4934 Project: Hive Issue Type: Bug Reporter: Lars Francke Priority: Minor {code} CREATE TABLE test (foo INT); SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test; FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: Only COMPLETE mode supported for NTile function SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test; ...works... {code} I'm not sure if that is a bug or necessary. Either way the error message is not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A cursory glance at the code didn't help me either. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork
[ https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated HIVE-4825: --- Resolution: Fixed Fix Version/s: 0.12.0 Status: Resolved (was: Patch Available) Committed to trunk. Thanks, Gunther! Separate MapredWork into MapWork and ReduceWork --- Key: HIVE-4825 URL: https://issues.apache.org/jira/browse/HIVE-4825 Project: Hive Issue Type: Improvement Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Priority: Minor Fix For: 0.12.0 Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch, HIVE-4825.5.patch, HIVE-4825.6.patch Right now all the information needed to run an MR job is captured in MapredWork. This class has aliases, tagging info, table descriptors etc. For Tez and MRR it will be useful to break this into map and reduce specific pieces. The separation is natural and I think has value in itself, it makes the code easier to understand. However, it will also allow us to reuse these abstractions in Tez where you'll have a graph of these instead of just 1M and 0-1R. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4838: --- Status: Patch Available (was: Open) Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4388: --- Attachment: HIVE-4388.patch Attaching patch/marking pa to get a full rest run. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brock Noland updated HIVE-4388: --- Status: Patch Available (was: Open) HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tony Murphy updated HIVE-4794: -- Attachment: HIVE-4794.1.patch the patch depend on: HIVE-4525 HIVE-4922 HIVE-4931 Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722635#comment-13722635 ] Jakob Homan commented on HIVE-4734: --- +1. Looks good. Use custom ObjectInspectors for AvroSerde - Key: HIVE-4734 URL: https://issues.apache.org/jira/browse/HIVE-4734 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mark Wagner Fix For: 0.12.0 Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch Currently, the AvroSerde recursively copies all fields of a record from the GenericRecord to a List row object and provides the standard ObjectInspectors. Performance can be improved by providing ObjectInspectors to the Avro record itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table
[ https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722643#comment-13722643 ] Swarnim Kulkarni commented on HIVE-3442: [~azotcsit] This seems like useful information. Would you mind doing a post about it on the hive users group for a larger audience? I am sure it will be much appreciated. Thanks! AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table --- Key: HIVE-3442 URL: https://issues.apache.org/jira/browse/HIVE-3442 Project: Hive Issue Type: Bug Affects Versions: 0.10.0 Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Fix For: 0.10.0 After creating a table and load data into it, I could check that the table is created successfully, and data is inside: DROP TABLE IF EXISTS ml_items; CREATE TABLE ml_items(id INT, title STRING, release_date STRING, video_release_date STRING, imdb_url STRING, unknown_genre TINYINT, action TINYINT, adventure TINYINT, animation TINYINT, children TINYINT, comedy TINYINT, crime TINYINT, documentary TINYINT, drama TINYINT, fantasy TINYINT, film_noir TINYINT, horror TINYINT, musical TINYINT, mystery TINYINT, romance TINYINT, sci_fi TINYINT, thriller TINYINT, war TINYINT, western TINYINT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' STORED AS TEXTFILE; LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items; select * from ml_items ORDER BY id ASC; While, the following create external table with AvroSerDe is not working: DROP TABLE IF EXISTS ml_items_as_avro; CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:${system:test.tmp.dir}/hive-ml-items'; describe ml_items_as_avro; INSERT OVERWRITE TABLE ml_items_as_avro SELECT id, title, imdb_url, unknown_genre, action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film_noir, horror, musical, mystery, romance, sci_fi, thriller, war, western FROM ml_items; ml_items_as_avro is not created with expected schema, as shown in the describe ml_items_as_avro output. The output is below: PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro PREHOOK: type: DROPTABLE POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro POSTHOOK: type: DROPTABLE PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' PREHOOK: type: CREATETABLE POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ( 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc') STORED as INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items' POSTHOOK: type: CREATETABLE POSTHOOK: Output: default@ml_items_as_avro PREHOOK: query: describe ml_items_as_avro PREHOOK: type: DESCTABLE POSTHOOK: query: describe ml_items_as_avro POSTHOOK: type: DESCTABLE error_error_error_error_error_error_error string from deserializer cannot_determine_schema string from deserializer check string from deserializer schema string from deserializer url string from deserializer and string from deserializer literal string from deserializer FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column
[jira] [Commented] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722646#comment-13722646 ] Ashutosh Chauhan commented on HIVE-3256: Thanks, Andy for the update. I think we can now remove asm dependency from hive build altogether then. I dont think we use it anywhere else. Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Reporter: Zhenxiao Luo Assignee: Zhenxiao Luo Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722647#comment-13722647 ] Tony Murphy commented on HIVE-4794: --- https://issues.apache.org/jira/browse/HIVE-4794 Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722657#comment-13722657 ] Edward Capriolo commented on HIVE-4794: --- Your not using the proper code conventions code, not conforming to http://uima.apache.org/codeConventions.html can not be committed. Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Deleted] (HIVE-4794) Unit e2e tests for vectorization
[ https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4794: -- Comment: was deleted (was: Your not using the proper code conventions code, not conforming to http://uima.apache.org/codeConventions.html can not be committed.) Unit e2e tests for vectorization Key: HIVE-4794 URL: https://issues.apache.org/jira/browse/HIVE-4794 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Tony Murphy Assignee: Tony Murphy Fix For: vectorization-branch Attachments: HIVE-4794.1.patch, hive-4794.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722704#comment-13722704 ] Ashutosh Chauhan commented on HIVE-3264: +1 Add support for binary dataype to AvroSerde --- Key: HIVE-3264 URL: https://issues.apache.org/jira/browse/HIVE-3264 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Reporter: Jakob Homan Assignee: Eli Reisman Labels: patch Fix For: 0.12.0 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch When the AvroSerde was written, Hive didn't have a binary type, so Avro's byte array type is converted an array of small ints. Now that HIVE-2380 is in, this step isn't necessary and we can convert both Avro's bytes type and probably fixed type to Hive's binary type. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/#review24149 --- There is still no text covering a map-reduce job on an already existing, non-Avro table into an avro table. ie, create a text table, populate it, run a CTAS to manipulate the data into an Avro table. ql/src/test/queries/clientpositive/avro_create_as_select.q https://reviews.apache.org/r/11925/#comment47977 This is testing that one can copy data into an already existing table, but doesn't verify that the already existing, non-avro data is converted correctly. - Jakob Homan On July 23, 2013, 2:51 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/ --- (Updated July 23, 2013, 2:51 a.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Bugs: HIVE-3159 https://issues.apache.org/jira/browse/HIVE-3159 Repository: hive-git Description --- Problem: Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name. For multiple cases, it is very inconvenient for user. Some of the un-supported use cases: 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE 3. Create table without specifying Avro schema. Diffs - ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/11925/diff/ Testing --- Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases. Thanks, Mohammad Islam
Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Hi All: We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore Server 0.9) where we get connection reset or connection timeout errors. The hive metastore server has been allocated enough (12G) memory. This is a critical problem for us and would appreciate if anyone has any pointers. We did add a retry logic in our client, which seems to help, but I am just wondering how can we narrow down to the root cause of this problem. Could this be a hiccup in networking which causes the hive server to get into a unresponsive state ? Thanks Agateaaa Example Connection reset error: === org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 30 more Example Connection timeout error: == org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524) at
[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause
[ https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722718#comment-13722718 ] Phabricator commented on HIVE-4928: --- ashutoshc has accepted the revision HIVE-4928 [jira] Date literals do not work properly in partition spec clause. +1 REVISION DETAIL https://reviews.facebook.net/D11871 BRANCH HIVE-4928.2 ARCANIST PROJECT hive To: JIRA, ashutoshc, jdere Date literals do not work properly in partition spec clause --- Key: HIVE-4928 URL: https://issues.apache.org/jira/browse/HIVE-4928 Project: Hive Issue Type: Bug Components: Query Processor Reporter: Jason Dere Assignee: Jason Dere Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch The partition spec parsing doesn't do any actual real evaluation of the values in the partition spec, instead just taking the text value of the ASTNode representing the partition value. This works fine for string/numeric literals (expression tree below): (TOK_PARTVAL region 99) But not for Date literals which are of form DATE '-mm-dd' (expression tree below: (TOK_DATELITERAL '1999-12-31') In this case the parser/analyzer uses TOK_DATELITERAL as the partition column value, when it should really get value of the child of the DATELITERAL token. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
On June 29, 2013, 7:43 p.m., Ashutosh Chauhan wrote: serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 70 https://reviews.apache.org/r/11925/diff/2/?file=307412#file307412line70 I think determining schema from table definition should be default. There are multiple of determining schema. I think order should be: a) Try table definition. b) Try schema literal in properties. c) Try from hdfs. d) Try from url. This is a big change. Avro tables have always been defined via a property. This change is to support a small use case; why switch the entire order? - Jakob --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/#review22571 --- On July 23, 2013, 2:51 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/ --- (Updated July 23, 2013, 2:51 a.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Bugs: HIVE-3159 https://issues.apache.org/jira/browse/HIVE-3159 Repository: hive-git Description --- Problem: Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name. For multiple cases, it is very inconvenient for user. Some of the un-supported use cases: 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE 3. Create table without specifying Avro schema. Diffs - ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/11925/diff/ Testing --- Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases. Thanks, Mohammad Islam
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Is there any chance you can do a update on test environment with hcat-0.5 and hive-0(11 or 10) and see if you can reproduce the issue? We used to see this error when there was load on hcat server or some network issue connecting to the server(second one was rare occurrence) On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote: Hi All: We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore Server 0.9) where we get connection reset or connection timeout errors. The hive metastore server has been allocated enough (12G) memory. This is a critical problem for us and would appreciate if anyone has any pointers. We did add a retry logic in our client, which seems to help, but I am just wondering how can we narrow down to the root cause of this problem. Could this be a hiccup in networking which causes the hive server to get into a unresponsive state ? Thanks Agateaaa Example Connection reset error: === org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 30 more Example Connection timeout error: == org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Thanks Nitin! We have simiar setup (identical hcatalog and hive server versions) on a another production environment and dont see any errors (its been running ok for a few months) Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive 0.10 soon. I did see that the last time we ran into this problem doing a netstat-ntp | grep :1 see that server was holding on to one socket connection in CLOSE_WAIT state for a long time (hive metastore server is running on port 1). Dont know if thats relevant here or not Can you suggest any hive configuration settings we can tweak or networking tools/tips, we can use to narrow this down ? Thanks Agateaaa On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Is there any chance you can do a update on test environment with hcat-0.5 and hive-0(11 or 10) and see if you can reproduce the issue? We used to see this error when there was load on hcat server or some network issue connecting to the server(second one was rare occurrence) On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote: Hi All: We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore Server 0.9) where we get connection reset or connection timeout errors. The hive metastore server has been allocated enough (12G) memory. This is a critical problem for us and would appreciate if anyone has any pointers. We did add a retry logic in our client, which seems to help, but I am just wondering how can we narrow down to the root cause of this problem. Could this be a hiccup in networking which causes the hive server to get into a unresponsive state ? Thanks Agateaaa Example Connection reset error: === org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.net.SocketException: Connection reset at java.net.SocketInputStream.read(SocketInputStream.java:168) at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127) ... 30 more Example Connection timeout error: == org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at
[jira] [Commented] (HIVE-2137) JDBC driver doesn't encode string properly.
[ https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722789#comment-13722789 ] Kousuke Saruta commented on HIVE-2137: -- I wonder if my change really affects TestE2EScenarios.testReadOrcAndRCFromPig. I've just only added test code and test data for HiveQueryResultSet after the build successfully finished. JDBC driver doesn't encode string properly. --- Key: HIVE-2137 URL: https://issues.apache.org/jira/browse/HIVE-2137 Project: Hive Issue Type: Bug Components: JDBC Affects Versions: 0.9.0 Reporter: Jin Adachi Fix For: 0.12.0 Attachments: HIVE-2137.patch, HIVE-2137.patch JDBC driver for HiveServer1 decodes string by client side default encoding, which depends on operating system unless we don't specify another encoding. It ignore server side encoding. For example, when server side operating system and encoding are Linux (utf-8) and client side operating system and encoding are Windows (shift-jis : it's japanese charset, makes character corruption happens in the client. In current implementation of Hive, UTF-8 appears to be expected in server side so client side should encode/decode string as UTF-8. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
Laljo John Pullokkaran created HIVE-4950: Summary: Hive childSuspend is broken (debugging local hadoop jobs) Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Hive --debug has an option to suspend child JVMs, which seems to be broken currently. Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Status: Patch Available (was: Open) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.patch Hive --debug has an option to suspend child JVMs, which seems to be broken currently. Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Attachment: HIVE-4950.patch Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.patch Hive --debug has an option to suspend child JVMs, which seems to be broken currently. Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4950: - Description: Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. (was: Hive --debug has an option to suspend child JVMs, which seems to be broken currently. Note that this mode may be useful only when running in local mode.) Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4916) Add TezWork
[ https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722819#comment-13722819 ] Gunther Hagleitner commented on HIVE-4916: -- [~appodictic] Will do. I was looking for a way to avoid the pre-commit to run. .txt is much better though. Add TezWork --- Key: HIVE-4916 URL: https://issues.apache.org/jira/browse/HIVE-4916 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4916.1.patch.branch TezWork is the class that encapsulates all the info needed to execute a single Tez job (i.e.: a dag of map or reduce work). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4916) Add TezWork
[ https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4916: - Attachment: HIVE-4916.2.patch.txt Changing name to .txt Add TezWork --- Key: HIVE-4916 URL: https://issues.apache.org/jira/browse/HIVE-4916 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4916.1.patch.branch, HIVE-4916.2.patch.txt TezWork is the class that encapsulates all the info needed to execute a single Tez job (i.e.: a dag of map or reduce work). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4917) Tez Job Monitoring
[ https://issues.apache.org/jira/browse/HIVE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4917: - Attachment: HIVE-4917.2.patch.txt Renaming patch to .txt Tez Job Monitoring -- Key: HIVE-4917 URL: https://issues.apache.org/jira/browse/HIVE-4917 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4917.1.patch.branch, HIVE-4917.2.patch.txt TezJobMonitor handles monitoring the execution of a Tez dag -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4843: - Attachment: HIVE-4843.3.patch Latest patch based on trunk. Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4826) Setup build infrastructure for tez
[ https://issues.apache.org/jira/browse/HIVE-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4826: - Attachment: HIVE-4826.2.patch Latest update based on trunk/branch. Setup build infrastructure for tez -- Key: HIVE-4826 URL: https://issues.apache.org/jira/browse/HIVE-4826 Project: Hive Issue Type: New Feature Components: Tez Affects Versions: tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Fix For: tez-branch Attachments: HIVE-4826.2.patch, HIVE-4826.patch Address changes required in ivy and build xml files to support tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4918) Tez job submission
[ https://issues.apache.org/jira/browse/HIVE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722835#comment-13722835 ] Gunther Hagleitner commented on HIVE-4918: -- Thanks Ed. The createHashTables still has to be built. I need to find the right code in hive for this. Let me also look into createScratchDir. You're right. Hive should do this already. Tez job submission -- Key: HIVE-4918 URL: https://issues.apache.org/jira/browse/HIVE-4918 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4918.1.patch.branch This patch is to create infrastructure to submit a tez dag. (i.e.: TezTask + utils to convert work into a tez dag). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4918) Tez job submission
[ https://issues.apache.org/jira/browse/HIVE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4918: - Attachment: HIVE-4918.2.patch.txt Renaming patch to .txt. Tez job submission -- Key: HIVE-4918 URL: https://issues.apache.org/jira/browse/HIVE-4918 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4918.1.patch.branch, HIVE-4918.2.patch.txt This patch is to create infrastructure to submit a tez dag. (i.e.: TezTask + utils to convert work into a tez dag). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 13021: Vectorization Tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13021/ --- (Updated July 29, 2013, 9:04 p.m.) Review request for hive, Eric Hanson, Jitendra Pandey, Remus Rusanu, and Sarvesh Sakalanaga. Changes --- updated for style fixes Bugs: HIVE-4794 https://issues.apache.org/jira/browse/HIVE-4794 Repository: hive-git Description --- These test cover all types, aggregates, and operators currently supported for vectorization. The queries are executed over a specially crafted data set which covers all the interesting classes of batch for each type: all nulls, repeating value, no nulls, and random values, to fully exercise the vectorization stack. The queries were stabilized against a text test oracle in order to validate results. This patch depends on: HIVE-4525 HIVE-4922 HIVE-4931 Diffs (updated) - ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java 97436c5 ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java bdeabe0 ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/AllVectorTypesRecord.java PRE-CREATION ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/OrcFileGenerator.java PRE-CREATION ql/src/test/queries/clientpositive/vectorization_0.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_1.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_10.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_11.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_12.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_13.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_14.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_15.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_16.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_2.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_3.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_4.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_5.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_6.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_7.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_8.q PRE-CREATION ql/src/test/queries/clientpositive/vectorization_9.q PRE-CREATION ql/src/test/results/clientpositive/vectorization_0.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_1.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_10.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_11.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_12.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_13.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_14.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_15.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_16.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_2.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_3.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_4.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_5.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_6.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_7.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_8.q.out PRE-CREATION ql/src/test/results/clientpositive/vectorization_9.q.out PRE-CREATION Diff: https://reviews.apache.org/r/13021/diff/ Testing --- Thanks, tony murphy
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722984#comment-13722984 ] Ashutosh Chauhan commented on HIVE-4331: [~viraj] If its not too hard can you separate current patch into two issues : one dealing with HivePassThroughFormat and second about merging storage handlers. Seems like first is a pre-requisite for second. I want to understand that change little better since that may have implications for other storage handler writers and output format writers for Hive. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Review Request 13032: HIVE-4826 Setup build infrastructure for tez
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13032/ --- Review request for hive, Ashutosh Chauhan and Gunther Hagleitner. Bugs: HIVE-4826 https://issues.apache.org/jira/browse/HIVE-4826 Repository: hive-git Description --- Setup build infrastructure for tez. Diffs - build-common.xml 0807827 build.xml 016d363 eclipse-templates/.classpath 7114b90 ivy/libraries.properties 4a8edce ql/ivy.xml bfb3116 shims/ivy.xml 04ef641 Diff: https://reviews.apache.org/r/13032/diff/ Testing --- All unit tests pass. Thanks, Vikram Dixit Kumaraswamy
[jira] [Created] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
Thejas M Nair created HIVE-4951: --- Summary: combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels) Key: HIVE-4951 URL: https://issues.apache.org/jira/browse/HIVE-4951 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair combine2.q was updated in HIVE-3253, the corresponding change is missing in combine2_win.q, causing it to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables
On July 29, 2013, 5:41 p.m., Jakob Homan wrote: There is still no text covering a map-reduce job on an already existing, non-Avro table into an avro table. ie, create a text table, populate it, run a CTAS to manipulate the data into an Avro table. In general, Hive creates internal column names such as col0, col1 etc. Due to this, I didn't able to copy non-avro data to avro data and run select SQL. Only option is to change the current behavior to reuse the provided column names. Separate JIRA regarding this could be a choice. On July 29, 2013, 5:41 p.m., Jakob Homan wrote: ql/src/test/queries/clientpositive/avro_create_as_select.q, line 3 https://reviews.apache.org/r/11925/diff/4/?file=325386#file325386line3 This is testing that one can copy data into an already existing table, but doesn't verify that the already existing, non-avro data is converted correctly. same as above. - Mohammad --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/#review24149 --- On July 23, 2013, 9:51 a.m., Mohammad Islam wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/11925/ --- (Updated July 23, 2013, 9:51 a.m.) Review request for hive, Ashutosh Chauhan and Jakob Homan. Bugs: HIVE-3159 https://issues.apache.org/jira/browse/HIVE-3159 Repository: hive-git Description --- Problem: Hive doesn't support to create a Avro-based table using HQL create table command. It currently requires to specify Avro schema literal or schema file name. For multiple cases, it is very inconvenient for user. Some of the un-supported use cases: 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE 3. Create table without specifying Avro schema. Diffs - ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_create_as_select2.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 13848b6 serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java PRE-CREATION serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 010f614 serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java PRE-CREATION Diff: https://reviews.apache.org/r/11925/diff/ Testing --- Wrote a new java Test class for a new Java class. Added a new test case into existing java test class. In addition, there are 4 .q file for testing multiple use-cases. Thanks, Mohammad Islam
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723023#comment-13723023 ] Ashutosh Chauhan commented on HIVE-4331: If I am reading this right, is the motivation of HivePassThroughFormat is that storage handler writers need not to write their custom OF (to implement Hive OF) anymore and thus can use their existing OF unmodified and all the necessary plumbing will be done in Storage handler ? Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
[ https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4951: Attachment: HIVE-4951.1.patch combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels) - Key: HIVE-4951 URL: https://issues.apache.org/jira/browse/HIVE-4951 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4951.1.patch combine2.q was updated in HIVE-3253, the corresponding change is missing in combine2_win.q, causing it to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723029#comment-13723029 ] Viraj Bhat commented on HIVE-4331: -- Hi Ashutosh, That is right,. They do not need to write their CustomOF to implement HiveOF. If it makes it easier to review, I can split the patch based on HCat (contrib) and core Hive. Viraj Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
[ https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated HIVE-4951: Status: Patch Available (was: Open) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels) - Key: HIVE-4951 URL: https://issues.apache.org/jira/browse/HIVE-4951 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Thejas M Nair Assignee: Thejas M Nair Attachments: HIVE-4951.1.patch combine2.q was updated in HIVE-3253, the corresponding change is missing in combine2_win.q, causing it to fail on windows. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
[ https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723055#comment-13723055 ] Ashutosh Chauhan commented on HIVE-4331: Yeah.. it will make it easier for review. Division in terms of core Hive and HCatalog sounds good. Integrated StorageHandler for Hive and HCat using the HiveStorageHandler Key: HIVE-4331 URL: https://issues.apache.org/jira/browse/HIVE-4331 Project: Hive Issue Type: Task Components: HCatalog Affects Versions: 0.11.0, 0.12.0 Reporter: Ashutosh Chauhan Assignee: Viraj Bhat Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. These will now continue to function but internally they will use the DefaultStorageHandler from Hive. They will be removed in future release of Hive. 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will bypass the HiveOutputFormat. We will use this class in Hive's HBaseStorageHandler instead of the HiveHBaseTableOutputFormat. 3) Write new unit tests in the HCat's storagehandler so that systems such as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the HCatHBaseStorageHandler. 4) Make sure all the old and new unit tests pass without backward compatibility (except known issues as described in the Design Document). 5) Replace all instances of the HCat source code, which point to HCatStorageHandler to use theHiveStorageHandler including the FosterStorageHandler. I have attached the design document for the same and will attach a patch to this Jira. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
Yin Huai created HIVE-4952: -- Summary: When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4952: --- Attachment: replay.txt to replay the problem. Apply 'replay.txt' and then run {code} ant test -Dtestcase=TestCliDriver -Dqfile=correlationoptimizer15.q -Dtest.silent=false {\code} When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2702) listPartitionsByFilter only supports string partitions for equals
[ https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-2702: -- Attachment: HIVE-2702.D11847.2.patch sershe updated the revision HIVE-2702 [jira] listPartitionsByFilter only supports string partitions for equals. Adding the query change. Fetching partition dt=100x for query dt = 100 seems incorrect Reviewers: ashutoshc, JIRA REVISION DETAIL https://reviews.facebook.net/D11847 CHANGE SINCE LAST DIFF https://reviews.facebook.net/D11847?vs=36303id=36483#toc BRANCH HIVE-2702-2 ARCANIST PROJECT hive AFFECTED FILES metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java ql/src/test/results/clientpositive/alter_partition_coltype.q.out To: JIRA, ashutoshc, sershe listPartitionsByFilter only supports string partitions for equals - Key: HIVE-2702 URL: https://issues.apache.org/jira/browse/HIVE-2702 Project: Hive Issue Type: Bug Affects Versions: 0.8.1 Reporter: Aniket Mokashi Assignee: Sergey Shelukhin Fix For: 0.12.0 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, HIVE-2702.patch, HIVE-2702-v0.patch listPartitionsByFilter supports only non-string partitions. This is because its explicitly specified in generateJDOFilterOverPartitions in ExpressionTree.java. //Can only support partitions whose types are string if( ! table.getPartitionKeys().get(partitionColumnIndex). getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) { throw new MetaException (Filtering is supported only on partition keys of type string); } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4917) Tez Job Monitoring
[ https://issues.apache.org/jira/browse/HIVE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated HIVE-4917: - Description: TezJobMonitor handles monitoring the execution of a Tez dag NO PRECOMMIT TESTS (this is wip for the tez branch) was:TezJobMonitor handles monitoring the execution of a Tez dag Tez Job Monitoring -- Key: HIVE-4917 URL: https://issues.apache.org/jira/browse/HIVE-4917 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4917.1.patch.branch, HIVE-4917.2.patch.txt TezJobMonitor handles monitoring the execution of a Tez dag NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4916) Add TezWork
[ https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723132#comment-13723132 ] Gunther Hagleitner commented on HIVE-4916: -- Thanks [~brocknoland] I've added the ALL CAPS prop to all the relevant descriptions. That's really nice to have and much better than fiddling with the name. Add TezWork --- Key: HIVE-4916 URL: https://issues.apache.org/jira/browse/HIVE-4916 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Gunther Hagleitner Fix For: tez-branch Attachments: HIVE-4916.1.patch.branch, HIVE-4916.2.patch.txt TezWork is the class that encapsulates all the info needed to execute a single Tez job (i.e.: a dag of map or reduce work). NO PRECOMMIT TESTS (this is wip for the tez branch) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723150#comment-13723150 ] Laljo John Pullokkaran commented on HIVE-4870: -- [~brocknoland]: Brock, I am seeing pre commit test failures in auto_sortmerge_join_1.q and auto_sortmerge_join_7.q. I can not reproduce these in my linux or Mac OS X env (when run stand alone). Wondering if this is a known issue with new pre-commit test framework. Thanks John Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723160#comment-13723160 ] Brock Noland commented on HIVE-4870: Hey, I haven't seen those fail. If you upload the patch again you could let it run a second time and see if they fail again. Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HIVE-4953) Regression: Hive does not build offline anymore
Edward Capriolo created HIVE-4953: - Summary: Regression: Hive does not build offline anymore Key: HIVE-4953 URL: https://issues.apache.org/jira/browse/HIVE-4953 Project: Hive Issue Type: Bug Reporter: Edward Capriolo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723185#comment-13723185 ] Laljo John Pullokkaran commented on HIVE-4870: -- Ok, let me try it again. Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Attachment: (was: HIVE-4870.patch) Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Attachment: HIVE-4870.patch Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task
[ https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laljo John Pullokkaran updated HIVE-4870: - Status: Patch Available (was: Open) Explain Extended to show partition info for Fetch Task -- Key: HIVE-4870 URL: https://issues.apache.org/jira/browse/HIVE-4870 Project: Hive Issue Type: Bug Components: Query Processor, Tests Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Attachments: HIVE-4870.patch Explain extended does not include partition information for Fetch Task (FetchWork). Map Reduce Task (MapredWork)already does this. Patch includes Partition Description info to Fetch Task. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors
Looking at the hive metastore server logs see errors like these: 2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message. java.lang.NullPointerException at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183) at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) approx same time as we see timeout or connection reset errors. Dont know if this is the cause or the side affect of he connection timeout/connection reset errors. Does anybody have any pointers or suggestions ? Thanks On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote: Thanks Nitin! We have simiar setup (identical hcatalog and hive server versions) on a another production environment and dont see any errors (its been running ok for a few months) Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive 0.10 soon. I did see that the last time we ran into this problem doing a netstat-ntp | grep :1 see that server was holding on to one socket connection in CLOSE_WAIT state for a long time (hive metastore server is running on port 1). Dont know if thats relevant here or not Can you suggest any hive configuration settings we can tweak or networking tools/tips, we can use to narrow this down ? Thanks Agateaaa On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.comwrote: Is there any chance you can do a update on test environment with hcat-0.5 and hive-0(11 or 10) and see if you can reproduce the issue? We used to see this error when there was load on hcat server or some network issue connecting to the server(second one was rare occurrence) On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote: Hi All: We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore Server 0.9) where we get connection reset or connection timeout errors. The hive metastore server has been allocated enough (12G) memory. This is a critical problem for us and would appreciate if anyone has any pointers. We did add a retry logic in our client, which seems to help, but I am just wondering how can we narrow down to the root cause of this problem. Could this be a hiccup in networking which causes the hive server to get into a unresponsive state ? Thanks Agateaaa Example Connection reset error: === org.apache.thrift.transport.TTransportException: java.net.SocketException: Connection reset at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378) at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297) at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136) at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102) at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817) at org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723204#comment-13723204 ] Hive QA commented on HIVE-4388: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594727/HIVE-4388.patch {color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 1694 tests executed *Failed tests:* {noformat} junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestIDGenerator junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestRevisionManagerEndpoint org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeII junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseInputFormat junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeI org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithColumnPrefixes org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamilyII org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithTimestamp {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/222/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/222/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 10 tests failed {noformat} This message is automatically generated. HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4541) Run check-style on the branch and fix style issues.
[ https://issues.apache.org/jira/browse/HIVE-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723213#comment-13723213 ] Jitendra Nath Pandey commented on HIVE-4541: We will need to break the style fixes into multiple patches, otherwise patch size will be too big. Run check-style on the branch and fix style issues. --- Key: HIVE-4541 URL: https://issues.apache.org/jira/browse/HIVE-4541 Project: Hive Issue Type: Sub-task Affects Versions: vectorization-branch Reporter: Jitendra Nath Pandey Assignee: Jitendra Nath Pandey Fix For: vectorization-branch Attachments: HIVE-4541.1.patch We should run check style on the entire branch and fix issues before the branch is merged back to the trunk. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions
[ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4954: -- Attachment: HIVE-4954.1.patch.txt PTFTranslator hardcodes ranking functions - Key: HIVE-4954 URL: https://issues.apache.org/jira/browse/HIVE-4954 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-4954.1.patch.txt protected static final ArrayListString RANKING_FUNCS = new ArrayListString(); static { RANKING_FUNCS.add(rank); RANKING_FUNCS.add(dense_rank); RANKING_FUNCS.add(percent_rank); RANKING_FUNCS.add(cume_dist); }; Move this logic to annotations -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions
[ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4954: -- Status: Patch Available (was: Open) PTFTranslator hardcodes ranking functions - Key: HIVE-4954 URL: https://issues.apache.org/jira/browse/HIVE-4954 Project: Hive Issue Type: Sub-task Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-4954.1.patch.txt protected static final ArrayListString RANKING_FUNCS = new ArrayListString(); static { RANKING_FUNCS.add(rank); RANKING_FUNCS.add(dense_rank); RANKING_FUNCS.add(percent_rank); RANKING_FUNCS.add(cume_dist); }; Move this logic to annotations -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723244#comment-13723244 ] Prasanth J commented on HIVE-4388: -- Hi Brock I was using this patch to make hive work with hbase 0.95 and found that there are some unit test failures in TestHBaseSerDe There are few assertions that still checks for Put.class where it should check for PutWritable.class The following methods needs to be fixed in TestHBaseSerde {code} deserializeAndSerialize() deserializeAndSerializeHiveMapHBaseColumnFamilyII() {code} Also, can you please let me know how to test readFields() and write() interfaces in the ResultWritable/PutWritable? Are there any tests/.q files that makes use of these interface? HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2
[ https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723247#comment-13723247 ] Brock Noland commented on HIVE-4388: Hi, Yes that patch is not even close to be ready for use. I was just uploading to get the unit tests to run. The serde tests in addition to the TestHBaseCliDriver should exercise those tests. Brock HBase tests fail against Hadoop 2 - Key: HIVE-4388 URL: https://issues.apache.org/jira/browse/HIVE-4388 Project: Hive Issue Type: Bug Reporter: Gunther Hagleitner Assignee: Brock Noland Attachments: HIVE-4388.patch, HIVE-4388-wip.txt Currently we're building by default against 0.92. When you run against hadoop 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963. HIVE-3861 upgrades the version of hbase used. This will get you past the problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions
[ https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4954: -- Issue Type: Improvement (was: Sub-task) Parent: (was: HIVE-4937) PTFTranslator hardcodes ranking functions - Key: HIVE-4954 URL: https://issues.apache.org/jira/browse/HIVE-4954 Project: Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: HIVE-4954.1.patch.txt protected static final ArrayListString RANKING_FUNCS = new ArrayListString(); static { RANKING_FUNCS.add(rank); RANKING_FUNCS.add(dense_rank); RANKING_FUNCS.add(percent_rank); RANKING_FUNCS.add(cume_dist); }; Move this logic to annotations -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table
[ https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723276#comment-13723276 ] Chaoyu Tang commented on HIVE-4223: --- [~java8964] I was not able to reproduce the said problem in hive-0.9.0 and wondering if it might be related to the data? Here is my test case; 1. create table bcd (col1 array structcol1:string, col2:string, col3:string,col4:string,col5:string,col6:string,col7:string,col8:arraystructcol1:string,col2:string,col3:string,col4:string,col5:string,col6:string,col7:string,col8:string,col9:string) row format delimited fields terminated by '\001' collection items terminated by '\002' lines terminated by '\n' stored as textfile; ** should be same as you described 2. load data local inpath '/root/nest_struct.data' overwrite into table bcd; ** see attached nest_struct.data 3. select col1 from bcd; ** got: [{col1:c1v,col2:c2v,col3:c3v,col4:c4v,col5:c5v,col6:c6v,col7:c7v,col8:[{col1:c11v,col2:c22v,col3:c33v,col4:c44v,col5:c55v,col6:c66v,col7:c77v,col8:c88v,col9:c99v}]}] Did you see anything different from your case? Could you please update your case and probably I can have a try. LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table Key: HIVE-4223 URL: https://issues.apache.org/jira/browse/HIVE-4223 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Environment: Hive 0.9.0 Reporter: Yong Zhang Attachments: nest_struct.data The LazySimpleSerDe will throw IndexOutOfBoundsException if the column structure is struct containing array of struct. I have a table with one column defined like this: columnA array struct col1:primiType, col2:primiType, col3:primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:array struct col1:primiType, col2::primiType, col3::primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:primiType, col9:primiType In this example, the outside struct has 8 columns (including the array), and the inner struct has 9 columns. As long as the outside struct has LESS column count than the inner struct column count, I think we will get the following exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row: Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) ... 9 more I am not very sure about exactly the reason of this problem. I believe that the public static void serialize(ByteStream.Output out, Object obj,ObjectInspector objInspector, byte[] separators, int level, Text nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is recursively invoking itself when facing nest structure. But for the nested struct structure, the list reference will mass up, and the size() will return wrong data. In the
[jira] [Updated] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table
[ https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chaoyu Tang updated HIVE-4223: -- Attachment: nest_struct.data data file to my test case -- chaoyu LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table Key: HIVE-4223 URL: https://issues.apache.org/jira/browse/HIVE-4223 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Environment: Hive 0.9.0 Reporter: Yong Zhang Attachments: nest_struct.data The LazySimpleSerDe will throw IndexOutOfBoundsException if the column structure is struct containing array of struct. I have a table with one column defined like this: columnA array struct col1:primiType, col2:primiType, col3:primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:array struct col1:primiType, col2::primiType, col3::primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:primiType, col9:primiType In this example, the outside struct has 8 columns (including the array), and the inner struct has 9 columns. As long as the outside struct has LESS column count than the inner struct column count, I think we will get the following exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row: Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) ... 9 more I am not very sure about exactly the reason of this problem. I believe that the public static void serialize(ByteStream.Output out, Object obj,ObjectInspector objInspector, byte[] separators, int level, Text nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is recursively invoking itself when facing nest structure. But for the nested struct structure, the list reference will mass up, and the size() will return wrong data. In the above example case I faced, for these 2 lines: List? extends StructField fields = soi.getAllStructFieldRefs(); list = soi.getStructFieldsDataAsList(obj); my StructObjectInspector(soi) will return the CORRECT data for getAllStructFieldRefs() and getStructFieldsDataAsList() methods. For example, for one row, for the outsider 8 columns struct, I have 2 elements in the inner array of struct, and each element will have 9 columns (as there are 9 columns in the inner struct). During runtime, after I added more logging in the LazySimpleSerDe, I will see the following behavior in the logging: for 8 outside column, loop for 9 inside columns, loop for serialize for 9 inside columns, loop for serialize code broken here, for the outside loop, it will try to access the 9th element,which not exist in the outside loop, as you will see the stracktrace as it tried to access location 8 of size 8 of list. What I did is to change the
[jira] [Commented] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table
[ https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723303#comment-13723303 ] Chaoyu Tang commented on HIVE-4223: --- The previous comments are not in right format, re-post: I was not able to reproduce the said problem in hive-0.9.0 and wondering if it might be related to the data? Here is my test case; 1. create table bcd (col1 array structcol1:string, col2:string, col3:string,col4:string,col5:string,col6:string,col7:string,col8:arraystructcol1:string,col2:string,col3:string,col4:string,col5:string,col6:string,col7:string,col8:string,col9:string) row format delimited fields terminated by '\001' collection items terminated by '\002' lines terminated by '\n' stored as textfile; -- same as the case described in this JIRA 2. load data local inpath '/root/nest_struct.data' overwrite into table bcd; -- see attached nest_struct.data 3. select col1 from bcd; -- got expected result {code} [{col1:c1v,col2:c2v,col3:c3v,col4:c4v,col5:c5v,col6:c6v,col7:c7v,col8:[{col1:c11v,col2:c22v,col3:c33v,col4:c44v,col5:c55v,col6:c66v,col7:c77v,col8:c88v,col9:c99v}]}] {code} LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table Key: HIVE-4223 URL: https://issues.apache.org/jira/browse/HIVE-4223 Project: Hive Issue Type: Bug Components: Serializers/Deserializers Affects Versions: 0.9.0 Environment: Hive 0.9.0 Reporter: Yong Zhang Attachments: nest_struct.data The LazySimpleSerDe will throw IndexOutOfBoundsException if the column structure is struct containing array of struct. I have a table with one column defined like this: columnA array struct col1:primiType, col2:primiType, col3:primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:array struct col1:primiType, col2::primiType, col3::primiType, col4:primiType, col5:primiType, col6:primiType, col7:primiType, col8:primiType, col9:primiType In this example, the outside struct has 8 columns (including the array), and the inner struct has 9 columns. As long as the outside struct has LESS column count than the inner struct column count, I think we will get the following exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row: Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381) at org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531) ... 9 more I am not very sure about exactly the reason of this problem. I believe that the public static void serialize(ByteStream.Output out, Object obj,ObjectInspector objInspector, byte[] separators, int level, Text nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is recursively invoking itself when facing nest structure. But for the nested struct structure, the list reference will mass up, and the size() will return wrong data. In the above example case I faced, for these
[jira] [Updated] (HIVE-4879) Window functions that imply order can only be registered at compile time
[ https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4879: -- Attachment: HIVE-4879.3.patch.txt Third time is a charm? Window functions that imply order can only be registered at compile time Key: HIVE-4879 URL: https://issues.apache.org/jira/browse/HIVE-4879 Project: Hive Issue Type: Improvement Affects Versions: 0.11.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.12.0 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, HIVE-4879.3.patch.txt Adding an annotation for impliesOrder -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4879) Window functions that imply order can only be registered at compile time
[ https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723310#comment-13723310 ] Edward Capriolo commented on HIVE-4879: --- This patch is cummulative with HIVE-4954 so if you apply this first you do not need to apply that. Window functions that imply order can only be registered at compile time Key: HIVE-4879 URL: https://issues.apache.org/jira/browse/HIVE-4879 Project: Hive Issue Type: Improvement Affects Versions: 0.11.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.12.0 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, HIVE-4879.3.patch.txt Adding an annotation for impliesOrder -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query
[ https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723315#comment-13723315 ] Edward Capriolo commented on HIVE-4002: --- [~navis]Sorry I dropped the ball on this review. Can you rebase? Fetch task aggregation for simple group by query Key: HIVE-4002 URL: https://issues.apache.org/jira/browse/HIVE-4002 Project: Hive Issue Type: Improvement Components: Query Processor Reporter: Navis Assignee: Navis Priority: Minor Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch Aggregation queries with no group-by clause (for example, select count(*) from src) executes final aggregation in single reduce task. But it's too small even for single reducer because the most of UDAF generates just single row for map aggregation. If final fetch task can aggregate outputs from map tasks, shuffling time can be removed. This optimization transforms operator tree something like, TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK into TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS) With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 min, before). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3256) Update asm version in Hive
[ https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723325#comment-13723325 ] Hive QA commented on HIVE-3256: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594735/HIVE-3256.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/223/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/223/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Tests failed with: ExecutionException: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException: resource batch-exec.vm not found. {noformat} This message is automatically generated. Update asm version in Hive -- Key: HIVE-3256 URL: https://issues.apache.org/jira/browse/HIVE-3256 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Zhenxiao Luo Assignee: Ashutosh Chauhan Attachments: HIVE-3256.patch Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any objections to bumping the Hive version to 3.2 to be inline with Hadoop -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-2608: -- Attachment: HIVE-2608.8.patch.txt Re-upload Navis' patch Do not require AS a,b,c part in LATERAL VIEW Key: HIVE-2608 URL: https://issues.apache.org/jira/browse/HIVE-2608 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Reporter: Igor Kabiljo Assignee: Navis Priority: Minor Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, HIVE-2608.D4317.6.patch Currently, it is required to state column names when LATERAL VIEW is used. That shouldn't be necessary, since UDTF returns struct which contains column names - and they should be used by default. For example, it would be great if this was possible: SELECT t.*, t.key1 + t.key4 FROM some_table LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW
[ https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723332#comment-13723332 ] Edward Capriolo commented on HIVE-2608: --- +1 if tests pass Do not require AS a,b,c part in LATERAL VIEW Key: HIVE-2608 URL: https://issues.apache.org/jira/browse/HIVE-2608 Project: Hive Issue Type: Improvement Components: Query Processor, UDF Reporter: Igor Kabiljo Assignee: Navis Priority: Minor Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, HIVE-2608.D4317.6.patch Currently, it is required to state column names when LATERAL VIEW is used. That shouldn't be necessary, since UDTF returns struct which contains column names - and they should be used by default. For example, it would be great if this was possible: SELECT t.*, t.key1 + t.key4 FROM some_table LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4953) Regression: Hive does not build offline anymore
[ https://issues.apache.org/jira/browse/HIVE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4953: -- Description: BUILD FAILED /home/edward/Documents/java/hive-trunk/build.xml:233: java.net.UnknownHostException: repo2.maven.org Both ant -Doffline=true and eclipse no longer can build offline Regression: Hive does not build offline anymore --- Key: HIVE-4953 URL: https://issues.apache.org/jira/browse/HIVE-4953 Project: Hive Issue Type: Bug Reporter: Edward Capriolo BUILD FAILED /home/edward/Documents/java/hive-trunk/build.xml:233: java.net.UnknownHostException: repo2.maven.org Both ant -Doffline=true and eclipse no longer can build offline -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4953) Regression: Hive does not build offline anymore
[ https://issues.apache.org/jira/browse/HIVE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-4953: -- Release Note: (was: BUILD FAILED /home/edward/Documents/java/hive-trunk/build.xml:233: java.net.UnknownHostException: repo2.maven.org Both ant -Doffline=true and eclipse no longer can build offline) Regression: Hive does not build offline anymore --- Key: HIVE-4953 URL: https://issues.apache.org/jira/browse/HIVE-4953 Project: Hive Issue Type: Bug Reporter: Edward Capriolo -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang reassigned HIVE-3976: - Assignee: Xuefu Zhang Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723365#comment-13723365 ] Xuefu Zhang commented on HIVE-3976: --- I have started working at this issue. Any comments or suggestions are welcome. Thanks. Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Tez branch and tez based patches
At ~25:00 There is a working prototype of hive which is using tez as the targeted runtime Can I get a look at that code? Is it on github? Edward On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote: Answers to some of your questions inlined. Alan. On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote: There are some points I want to bring up. First, I am on the PMC. Here is something I find relevant: http://www.apache.org/foundation/how-it-works.html -- The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not coding - but to ensure that all legal issues are addressed, that procedure is followed, and that each and every release is the product of the community as a whole. That is key to our litigation protection mechanisms. Secondly the role of the PMC is to further the long term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration does happen. Within the ASF we worry about any community which centers around a few individuals who are working virtually uncontested. We believe that this is detrimental to quality, stability, and robustness of both code and long term social structures. https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different - All other decisions happen on the dev list, discussions on the private list are kept to a minimum. If it didn't happen on the dev list, it didn't happen - which leads to: a) Elections of committers and PMC members are published on the dev list once finalized. b) Out-of-band discussions (IRC etc.) are summarized on the dev list as soon as they have impact on the project, code or community. - https://issues.apache.org/jira/browse/HIVE-4660 ironically titled Let their be Tez has not be +1 ed by any committer. It was never discussed on the dev or the user list (as far as I can tell). As all JIRA creations and updates are sent to dev@hive, creating a JIRA is de facto posting to the list. As a PMC member I feel we need more discussion on Tez on the dev list along with a wiki-fied design document. Topics of discussion should include: I talked with Gunther and he's working on posting a design doc on the wiki. He has a PDF on the JIRA but he doesn't have write permissions yet on the wiki. 1) What is tez? In Hadoop 2.0, YARN opens up the ability to have multiple execution frameworks in Hadoop. Hadoop apps are no longer tied to MapReduce as the only execution option. Tez is an effort to build an execution engine that is optimized for relational data processing, such as Hive and Pig. The biggest change here is to move away from only Map and Reduce as processing options and to allow alternate combinations of processing, such as map - reduce - reduce or tasks that take multiple inputs or shuffles that avoid sorting when it isn't needed. For a good intro to Tez, see Arun's presentation on it at the recent Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212) 2) How is tez different from oozie, http://code.google.com/p/hop/, http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming map reduce tools/frameworks? Why should we use this and not those? Oozie is a completely different thing. Oozie is a workflow engine and a scheduler. It's core competencies are the ability to coordinate workflows of disparate job types (MR, Pig, Hive, etc.) and to schedule them. It is not intended as an execution engine for apps such as Pig and Hive. I am not familiar with these other engines, but the short answer is that Tez is built to work on YARN, which works well for Hive since it is tied to Hadoop. 3) When can we expect the first tez release? I don't know, but I hope sometime this fall. 4) How much effort is involved in integrating hive and tez? Covered in the design doc. 5) Who is ready to commit to this effort? I'll let people speak for themselves on that one. 6) can we expect this work to be done in one hive release? Unlikely. Initial integration will be done in one release, but as Tez is a new project I expect it will be adding features in the future that Hive will want to take advantage of. In my opinion we should not start any work on this tez-hive until these questions are answered to the satisfaction of the hive developers. Can we change this to not commit patches? We can't tell willing people not to work on it. On Mon, Jul 15, 2013 at 9:51 PM, Edward Capriolo edlinuxg...@gmail.com wrote: The Hive bylaws, https://cwiki.apache.org/confluence/display/Hive/Bylaws , lay out what votes are needed for
[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type
[ https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723372#comment-13723372 ] Edward Capriolo commented on HIVE-3976: --- We currently do not have qualifiers (20,2) for types in the hive language. This sounds like a fairly involved change I am very curious how they will interact with the already existing system. Support specifying scale and precision with Hive decimal type - Key: HIVE-3976 URL: https://issues.apache.org/jira/browse/HIVE-3976 Project: Hive Issue Type: Improvement Components: Query Processor, Types Reporter: Mark Grover Assignee: Xuefu Zhang HIVE-2693 introduced support for Decimal datatype in Hive. However, the current implementation has unlimited precision and provides no way to specify precision and scale when creating the table. For example, MySQL allows users to specify scale and precision of the decimal datatype when creating the table: {code} CREATE TABLE numbers (a DECIMAL(20,2)); {code} Hive should support something similar too. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Tez branch and tez based patches
Also watched http://www.ustream.tv/recorded/36323173 I definitely see the win in being able to stream inter-stage output. I see some cases where small intermediate results can be kept In memory. But I was somewhat under the impression that the map reduce spill settings kept stuff in memory, isn't that what spill settings are? There is a few bullet points that came up repeatedly that I do not follow: Something was said to the effect of Container reuse makes X faster. Hadoop has jvm reuse. Not following what the difference is here? Not everyone has a 10K node cluster. Joins in map reduce are hard Really? I mean some of them are I guess, but the typical join is very easy. Just shuffle by the join key. There was not really enough low level details here saying why joins are better in tez. Chosing the number of maps and reduces is hard Really? I do not find it that hard, I think there are times when it's not perfect but I do not find it hard. The talk did not really offer anything here technical on how tez makes this better other then it could make it better. The presentations mentioned streaming data, how do two nodes stream data between a tasks and how it it reliable? If the sender or receiver dies does the entire process have to start again? Again one of the talks implied there is a prototype out there that launches hive jobs into tez. I would like to see that, it might answer more questions then a power point, and I could profile some common queries. Random late night thoughts over, Ed On Tue, Jul 30, 2013 at 12:02 AM, Edward Capriolo edlinuxg...@gmail.comwrote: At ~25:00 There is a working prototype of hive which is using tez as the targeted runtime Can I get a look at that code? Is it on github? Edward On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote: Answers to some of your questions inlined. Alan. On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote: There are some points I want to bring up. First, I am on the PMC. Here is something I find relevant: http://www.apache.org/foundation/how-it-works.html -- The role of the PMC from a Foundation perspective is oversight. The main role of the PMC is not code and not coding - but to ensure that all legal issues are addressed, that procedure is followed, and that each and every release is the product of the community as a whole. That is key to our litigation protection mechanisms. Secondly the role of the PMC is to further the long term development and health of the community as a whole, and to ensure that balanced and wide scale peer review and collaboration does happen. Within the ASF we worry about any community which centers around a few individuals who are working virtually uncontested. We believe that this is detrimental to quality, stability, and robustness of both code and long term social structures. https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different - All other decisions happen on the dev list, discussions on the private list are kept to a minimum. If it didn't happen on the dev list, it didn't happen - which leads to: a) Elections of committers and PMC members are published on the dev list once finalized. b) Out-of-band discussions (IRC etc.) are summarized on the dev list as soon as they have impact on the project, code or community. - https://issues.apache.org/jira/browse/HIVE-4660 ironically titled Let their be Tez has not be +1 ed by any committer. It was never discussed on the dev or the user list (as far as I can tell). As all JIRA creations and updates are sent to dev@hive, creating a JIRA is de facto posting to the list. As a PMC member I feel we need more discussion on Tez on the dev list along with a wiki-fied design document. Topics of discussion should include: I talked with Gunther and he's working on posting a design doc on the wiki. He has a PDF on the JIRA but he doesn't have write permissions yet on the wiki. 1) What is tez? In Hadoop 2.0, YARN opens up the ability to have multiple execution frameworks in Hadoop. Hadoop apps are no longer tied to MapReduce as the only execution option. Tez is an effort to build an execution engine that is optimized for relational data processing, such as Hive and Pig. The biggest change here is to move away from only Map and Reduce as processing options and to allow alternate combinations of processing, such as map - reduce - reduce or tasks that take multiple inputs or shuffles that avoid sorting when it isn't needed. For a good intro to Tez, see Arun's presentation on it at the recent Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212) 2) How is tez different from oozie,
[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability
[ https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723403#comment-13723403 ] Edward Capriolo commented on HIVE-4838: --- Hey, I think I may have mistakenly come to the conclusion that https://issues.apache.org/jira/browse/HIVE-2906 Passed tests when it did not. We might be best off reverting 2906 if it is a problem. Refactor MapJoin HashMap code to improve testability and readability Key: HIVE-4838 URL: https://issues.apache.org/jira/browse/HIVE-4838 Project: Hive Issue Type: Bug Reporter: Brock Noland Assignee: Brock Noland Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch MapJoin is an essential component for high performance joins in Hive and the current code has done great service for many years. However, the code is showing it's age and currently suffers from the following issues: * Uses static state via the MapJoinMetaData class to pass serialization metadata to the Key, Row classes. * The api of a logical Table Container is not defined and therefore it's unclear what apis HashMapWrapper needs to publicize. Additionally HashMapWrapper has many used public methods. * HashMapWrapper contains logic to serialize, test memory bounds, and implement the table container. Ideally these logical units could be seperated * HashTableSinkObjectCtx has unused fields and unused methods * CommonJoinOperator and children use ArrayList on left hand side when only List is required * There are unused classes MRU, DCLLItemm and classes which duplicate functionality MapJoinSingleKey and MapJoinDoubleKeys -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723407#comment-13723407 ] Yin Huai commented on HIVE-4952: To fix this bug, Demux will be modified to be aware that rows associated with a key are ordered by the tag. When Demux see a row with new tag coming, it will know that rows with tags which are less than this incoming tag can be processed. Taking the example in the description, with this fix, inputs of JOIN2 will be ordered by the tag. When Demux sees a tag with 1, it will ask GBY to process its buffer, and then GBY will ask JOIN1 to process its buffer. Before Demux forwards a new row with the tag of 1 to JOIN2, all rows with the tag of 0 will be forwarded into JOIN2. When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4843: - Attachment: HIVE-4843.4.patch Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, HIVE-4843.4.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vikram Dixit K updated HIVE-4843: - Status: Patch Available (was: Open) Latest iteration after addressing comments. Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, HIVE-4843.4.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Phabricator updated HIVE-4952: -- Attachment: HIVE-4952.D11889.1.patch yhuai requested code review of HIVE-4952 [jira] When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results. Reviewers: JIRA fix If we have a query like this ... SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; After Correlation Optimizer, the operator tree in the reducer will be JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. TEST PLAN EMPTY REVISION DETAIL https://reviews.facebook.net/D11889 AFFECTED FILES ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java ql/src/test/queries/clientpositive/correlationoptimizer15.q ql/src/test/results/clientpositive/correlationoptimizer15.q.out MANAGE HERALD RULES https://reviews.facebook.net/herald/view/differential/ WHY DID I GET THIS EMAIL? https://reviews.facebook.net/herald/transcript/28311/ To: JIRA, yhuai When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4952.D11889.1.patch, replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results
[ https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai updated HIVE-4952: --- Status: Patch Available (was: Open) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results Key: HIVE-4952 URL: https://issues.apache.org/jira/browse/HIVE-4952 Project: Hive Issue Type: Bug Affects Versions: 0.12.0 Reporter: Yin Huai Assignee: Yin Huai Attachments: HIVE-4952.D11889.1.patch, replay.txt If we have a query like this ... {code:sql} SELECT xx.key, xx.cnt, yy.key FROM (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = y.key) group by x.key) xx JOIN src yy ON xx.key=yy.key; {\code} After Correlation Optimizer, the operator tree in the reducer will be {code} JOIN2 | | MUX / \ / \ GBY | | | JOIN1| \ / \ / DEMUX {\code} For JOIN2, the right table will arrive at this operator first. If hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it has not got any row from the left table. The logic related hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by the tag. But, if a query has been optimized by Correlation Optimizer, this assumption may not hold for those JoinOperators inside the reducer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability
[ https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723419#comment-13723419 ] Edward Capriolo commented on HIVE-4843: --- {code} ListPath inputPaths = Utilities.getInputPaths(newJob, selectTask.getWork().getMapWork(), emptyScratchDir.toString(), ctx); {code} Can we remove any Path/File toString() and just pass the Path if possible? Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability --- Key: HIVE-4843 URL: https://issues.apache.org/jira/browse/HIVE-4843 Project: Hive Issue Type: Bug Affects Versions: 0.12.0, tez-branch Reporter: Vikram Dixit K Assignee: Vikram Dixit K Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, HIVE-4843.4.patch Currently, there are static apis in multiple locations in ExecDriver and MapRedTask that can be leveraged if put in the already existing utility class in the exec package. This would help making the code more maintainable, readable and also re-usable by other run-time infra such as tez. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)
[ https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723421#comment-13723421 ] Hive QA commented on HIVE-4950: --- {color:red}Overall{color}: -1 at least one tests failed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594755/HIVE-4950.patch {color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2736 tests executed *Failed tests:* {noformat} org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties {noformat} Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/226/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/226/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase Tests failed with: TestsFailedException: 1 tests failed {noformat} This message is automatically generated. Hive childSuspend is broken (debugging local hadoop jobs) - Key: HIVE-4950 URL: https://issues.apache.org/jira/browse/HIVE-4950 Project: Hive Issue Type: Bug Affects Versions: 0.11.0 Reporter: Laljo John Pullokkaran Assignee: Laljo John Pullokkaran Fix For: 0.11.1 Attachments: HIVE-4950.patch Hive debug has an option to suspend child JVMs, which seems to be broken currently (--debug childSuspend=y). Note that this mode may be useful only when running in local mode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde
[ https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723422#comment-13723422 ] Hive QA commented on HIVE-4734: --- {color:red}Overall{color}: -1 no tests executed Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12594789/HIVE-4734.3.patch Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/228/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/228/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Tests failed with: NonZeroExitCodeException: Command 'bash /data/hive-ptest/working/scratch/source-prep.sh' failed with exit status 1 and output '+ [[ -n '' ]] + export 'ANT_OPTS=-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + ANT_OPTS='-Xmx1g -XX:MaxPermSize=256m -Dhttp.proxyHost=localhost -Dhttp.proxyPort=3128' + cd /data/hive-ptest/working/ + tee /data/hive-ptest/logs/PreCommit-HIVE-Build-228/source-prep.txt + mkdir -p maven ivy + [[ svn = \s\v\n ]] + [[ -n '' ]] + [[ -d apache-svn-trunk-source ]] + [[ ! -d apache-svn-trunk-source/.svn ]] + [[ ! -d apache-svn-trunk-source ]] + cd apache-svn-trunk-source + svn revert -R . Reverted 'bin/ext/debug.sh' Reverted 'bin/hive' ++ egrep -v '^X|^Performing status on external' ++ awk '{print $2}' ++ svn status --no-ignore + rm -rf build hcatalog/build hcatalog/core/build hcatalog/storage-handlers/hbase/build hcatalog/server-extensions/build hcatalog/webhcat/svr/build hcatalog/webhcat/java-client/build hcatalog/hcatalog-pig-adapter/build common/src/gen + svn update Fetching external item into 'hcatalog/src/test/e2e/harness' External at revision 1508304. At revision 1508304. + patchCommandPath=/data/hive-ptest/working/scratch/smart-apply-patch.sh + patchFilePath=/data/hive-ptest/working/scratch/build.patch + [[ -f /data/hive-ptest/working/scratch/build.patch ]] + chmod +x /data/hive-ptest/working/scratch/smart-apply-patch.sh + /data/hive-ptest/working/scratch/smart-apply-patch.sh /data/hive-ptest/working/scratch/build.patch The patch does not appear to apply with p0 to p2 + exit 1 ' {noformat} This message is automatically generated. Use custom ObjectInspectors for AvroSerde - Key: HIVE-4734 URL: https://issues.apache.org/jira/browse/HIVE-4734 Project: Hive Issue Type: Improvement Components: Serializers/Deserializers Reporter: Mark Wagner Assignee: Mark Wagner Fix For: 0.12.0 Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch, HIVE-4734.3.patch Currently, the AvroSerde recursively copies all fields of a record from the GenericRecord to a List row object and provides the standard ObjectInspectors. Performance can be improved by providing ObjectInspectors to the Avro record itself. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira