date:20130729


[ 
https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722303#comment-13722303
 ] 

Hive QA commented on HIVE-2137:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594645/HIVE-2137.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2736 tests executed
*Failed tests:*
{noformat}
org.apache.hcatalog.pig.TestE2EScenarios.testReadOrcAndRCFromPig
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/220/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/220/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 JDBC driver doesn't encode string properly.
 ---

 Key: HIVE-2137
 URL: https://issues.apache.org/jira/browse/HIVE-2137
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Jin Adachi
 Fix For: 0.12.0

 Attachments: HIVE-2137.patch, HIVE-2137.patch


 JDBC driver for HiveServer1 decodes string by client side default encoding, 
 which depends on operating system unless we don't specify another encoding. 
 It ignore server side encoding. 
 For example, 
 when server side operating system and encoding are Linux (utf-8) and client 
 side operating system and encoding are Windows (shift-jis : it's japanese 
 charset, makes character corruption happens in the client.
 In current implementation of Hive, UTF-8 appears to be expected in server 
 side so client side should encode/decode string as UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-305) Port Hadoop streaming's counters/status reporters to Hive Transforms


 [ 
https://issues.apache.org/jira/browse/HIVE-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-305:
--

Release Note:   (was: I use the trunk to create this patch .  
http://svn.apache.org/repos/asf/hive/trunk )

 Port Hadoop streaming's counters/status reporters to Hive Transforms
 

 Key: HIVE-305
 URL: https://issues.apache.org/jira/browse/HIVE-305
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Guo Hongjie
 Attachments: HIVE-305.1.patch, HIVE-305.2.patch, hive-305.3.diff.txt, 
 HIVE-305.patch.txt


 https://issues.apache.org/jira/browse/HADOOP-1328
  Introduced a way for a streaming process to update global counters and 
 status using stderr stream to emit information. Use 
 reporter:counter:group,counter,amount  to update  a counter. Use 
 reporter:status:message to update status. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-305) Port Hadoop streaming's counters/status reporters to Hive Transforms


 [ 
https://issues.apache.org/jira/browse/HIVE-305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-305:
--

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk! Thank you for your contribution Guo and Edward!

 Port Hadoop streaming's counters/status reporters to Hive Transforms
 

 Key: HIVE-305
 URL: https://issues.apache.org/jira/browse/HIVE-305
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Guo Hongjie
 Fix For: 0.12.0

 Attachments: HIVE-305.1.patch, HIVE-305.2.patch, hive-305.3.diff.txt, 
 HIVE-305.patch.txt


 https://issues.apache.org/jira/browse/HADOOP-1328
  Introduced a way for a streaming process to update global counters and 
 status using stderr stream to emit information. Use 
 reporter:counter:group,counter,amount  to update  a counter. Use 
 reporter:status:message to update status. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4943) An explode function that includes the item's position in the array

2013-07-29 Thread Niko Stahl (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niko Stahl updated HIVE-4943:
-

Component/s: Query Processor

 An explode function that includes the item's position in the array
 --

 Key: HIVE-4943
 URL: https://issues.apache.org/jira/browse/HIVE-4943
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Niko Stahl
  Labels: patch
   Original Estimate: 8h
  Remaining Estimate: 8h

 A function that explodes an array and includes an output column with the 
 position of each item in the original array.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability


 [ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4838:
---

Status: Open  (was: Patch Available)

Forgot to reface junit.

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HIVE-2906) Support providing some table properties by user via SQL


 [ 
https://issues.apache.org/jira/browse/HIVE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-2906.
---

   Resolution: Fixed
Fix Version/s: 0.12.0

 Support providing some table properties by user via SQL
 ---

 Key: HIVE-2906
 URL: https://issues.apache.org/jira/browse/HIVE-2906
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Fix For: 0.12.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.5.patch, HIVE-2906.D2499.6.patch, 
 HIVE-2906.D2499.7.patch


 Some properties are needed to be provided to StorageHandler by user in 
 runtime. It might be an address for remote resource or retry count for access 
 or maximum version count(for hbase), etc.
 For example,  
 {code}
 select emp.empno, emp.ename from hbase_emp ('max.version'='3') emp;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2906) Support providing some table properties by user via SQL


[ 
https://issues.apache.org/jira/browse/HIVE-2906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722511#comment-13722511
 ] 

Edward Capriolo commented on HIVE-2906:
---

Committed. Thanks Navis.

 Support providing some table properties by user via SQL
 ---

 Key: HIVE-2906
 URL: https://issues.apache.org/jira/browse/HIVE-2906
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.1.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.2.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.3.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.4.patch, 
 ASF.LICENSE.NOT.GRANTED--HIVE-2906.D2499.5.patch, HIVE-2906.D2499.6.patch, 
 HIVE-2906.D2499.7.patch


 Some properties are needed to be provided to StorageHandler by user in 
 runtime. It might be an address for remote resource or retry count for access 
 or maximum version count(for hbase), etc.
 For example,  
 {code}
 select emp.empno, emp.ename from hbase_emp ('max.version'='3') emp;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: HCatalog (from Hive 0.11) and Hadoop 2

2013-07-29 Thread Nitin Pawar

There is a build scheduled on jenkins for hive trunk which is failing.
I will give it a try on my local for hive-011, there is another build which
does the ptests which is disabled due to lots of test case failures.

https://builds.apache.org/job/Hive-trunk-hadoop2/

I will update you if I could build it




On Mon, Jul 29, 2013 at 8:07 PM, Rodrigo Trujillo 
rodrigo.truji...@linux.vnet.ibm.com wrote:

 Hi,

 is it possible to build Hive 0.11 and HCatalog with Hadoop 2 (2.0.4-alpha)
 ??

 Regards,

 Rodrigo




-- 
Nitin Pawar

[jira] [Commented] (HIVE-4934) ntile function has to be the last thing in the select list

2013-07-29 Thread Lars Francke (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722548#comment-13722548
 ] 

Lars Francke commented on HIVE-4934:


I see. A misunderstanding on my side then I guess. So at most it's a 
documentation issue.

 ntile function has to be the last thing in the select list
 --

 Key: HIVE-4934
 URL: https://issues.apache.org/jira/browse/HIVE-4934
 Project: Hive
  Issue Type: Bug
Reporter: Lars Francke
Priority: Minor

 {code}
 CREATE TABLE test (foo INT);
 SELECT ntile(10), foo OVER (PARTITION BY foo) FROM test;
 FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: 
 Only COMPLETE mode supported for NTile function
 SELECT foo, ntile(10) OVER (PARTITION BY foo) FROM test;
 ...works...
 {code}
 I'm not sure if that is a bug or necessary. Either way the error message is 
 not helpful as it's not documented anywhere what {{COMPLETE}} mode is. A 
 cursory glance at the code didn't help me either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4825) Separate MapredWork into MapWork and ReduceWork


 [ 
https://issues.apache.org/jira/browse/HIVE-4825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated HIVE-4825:
---

   Resolution: Fixed
Fix Version/s: 0.12.0
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks, Gunther!

 Separate MapredWork into MapWork and ReduceWork
 ---

 Key: HIVE-4825
 URL: https://issues.apache.org/jira/browse/HIVE-4825
 Project: Hive
  Issue Type: Improvement
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
Priority: Minor
 Fix For: 0.12.0

 Attachments: HIVE-4825.1.patch, HIVE-4825.2.code.patch, 
 HIVE-4825.2.testfiles.patch, HIVE-4825.3.testfiles.patch, HIVE-4825.4.patch, 
 HIVE-4825.5.patch, HIVE-4825.6.patch


 Right now all the information needed to run an MR job is captured in 
 MapredWork. This class has aliases, tagging info, table descriptors etc.
 For Tez and MRR it will be useful to break this into map and reduce specific 
 pieces. The separation is natural and I think has value in itself, it makes 
 the code easier to understand. However, it will also allow us to reuse these 
 abstractions in Tez where you'll have a graph of these instead of just 1M and 
 0-1R.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability


 [ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4838:
---

Status: Patch Available  (was: Open)

 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch, HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2


 [ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4388:
---

Attachment: HIVE-4388.patch

Attaching patch/marking pa to get a full rest run.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4388) HBase tests fail against Hadoop 2


 [ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brock Noland updated HIVE-4388:
---

Status: Patch Available  (was: Open)

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4794) Unit e2e tests for vectorization

2013-07-29 Thread Tony Murphy (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tony Murphy updated HIVE-4794:
--

Attachment: HIVE-4794.1.patch

the patch depend on:
HIVE-4525
HIVE-4922
HIVE-4931

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde

2013-07-29 Thread Jakob Homan (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722635#comment-13722635
 ] 

Jakob Homan commented on HIVE-4734:
---

+1.  Looks good.

 Use custom ObjectInspectors for AvroSerde
 -

 Key: HIVE-4734
 URL: https://issues.apache.org/jira/browse/HIVE-4734
 Project: Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Reporter: Mark Wagner
Assignee: Mark Wagner
 Fix For: 0.12.0

 Attachments: HIVE-4734.1.patch, HIVE-4734.2.patch


 Currently, the AvroSerde recursively copies all fields of a record from the 
 GenericRecord to a List row object and provides the standard 
 ObjectInspectors. Performance can be improved by providing ObjectInspectors 
 to the Avro record itself.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3442) AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating external table

2013-07-29 Thread Swarnim Kulkarni (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722643#comment-13722643
 ] 

Swarnim Kulkarni commented on HIVE-3442:


[~azotcsit] This seems like useful information. Would you mind doing a post 
about it on the hive users group for a larger audience? I am sure it will be 
much appreciated. Thanks!

 AvroSerDe WITH SERDEPROPERTIES 'schema.url' is not working when creating 
 external table
 ---

 Key: HIVE-3442
 URL: https://issues.apache.org/jira/browse/HIVE-3442
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo
 Fix For: 0.10.0


 After creating a table and load data into it, I could check that the table is 
 created successfully, and data is inside:
 DROP TABLE IF EXISTS ml_items;
 CREATE TABLE ml_items(id INT,
   title STRING,
   release_date STRING,
   video_release_date STRING,
   imdb_url STRING,
   unknown_genre TINYINT,
   action TINYINT,
   adventure TINYINT,
   animation TINYINT,
   children TINYINT,
   comedy TINYINT,
   crime TINYINT,
   documentary TINYINT,
   drama TINYINT,
   fantasy TINYINT,
   film_noir TINYINT,
   horror TINYINT,
   musical TINYINT,
   mystery TINYINT,
   romance TINYINT,
   sci_fi TINYINT,
   thriller TINYINT,
   war TINYINT,
   western TINYINT)
   ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n'
   STORED AS TEXTFILE;
 LOAD DATA LOCAL INPATH '../data/files/avro_items' INTO TABLE ml_items;
 select * from ml_items ORDER BY id ASC;
 While, the following create external table with AvroSerDe is not working:
 DROP TABLE IF EXISTS ml_items_as_avro;
 CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='${system:test.src.data.dir}/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:${system:test.tmp.dir}/hive-ml-items';
 describe ml_items_as_avro;
 INSERT OVERWRITE TABLE ml_items_as_avro
   SELECT id, title,
 imdb_url, unknown_genre, action, adventure, animation, children, comedy, 
 crime,
 documentary, drama, fantasy, film_noir, horror, musical, mystery, romance,
 sci_fi, thriller, war, western
   FROM ml_items;
 ml_items_as_avro is not created with expected schema, as shown in the 
 describe ml_items_as_avro output. The output is below:
 PREHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 PREHOOK: type: DROPTABLE
 POSTHOOK: query: DROP TABLE IF EXISTS ml_items_as_avro
 POSTHOOK: type: DROPTABLE
 PREHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 PREHOOK: type: CREATETABLE
 POSTHOOK: query: CREATE EXTERNAL TABLE ml_items_as_avro
   ROW FORMAT SERDE
   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
   WITH SERDEPROPERTIES (
 'schema.url'='/home/cloudera/Code/hive/data/files/avro_items_schema.avsc')
   STORED as INPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
   OUTPUTFORMAT
   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
   LOCATION 'file:/home/cloudera/Code/hive/build/ql/tmp/hive-ml-items'
 POSTHOOK: type: CREATETABLE
 POSTHOOK: Output: default@ml_items_as_avro
 PREHOOK: query: describe ml_items_as_avro
 PREHOOK: type: DESCTABLE
 POSTHOOK: query: describe ml_items_as_avro
 POSTHOOK: type: DESCTABLE
 error_error_error_error_error_error_error   string  from deserializer
 cannot_determine_schema string  from deserializer
 check   string  from deserializer
 schema  string  from deserializer
 url string  from deserializer
 and string  from deserializer
 literal string  from deserializer
 FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target 
 table because column

[jira] [Commented] (HIVE-3256) Update asm version in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722646#comment-13722646
 ] 

Ashutosh Chauhan commented on HIVE-3256:


Thanks, Andy for the update. I think we can now remove asm dependency from hive 
build altogether then. I dont think we use it anywhere else.

 Update asm version in Hive
 --

 Key: HIVE-3256
 URL: https://issues.apache.org/jira/browse/HIVE-3256
 Project: Hive
  Issue Type: Bug
Reporter: Zhenxiao Luo
Assignee: Zhenxiao Luo

 Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any
 objections to bumping the Hive version to 3.2 to be inline with Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization

2013-07-29 Thread Tony Murphy (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722647#comment-13722647
 ] 

Tony Murphy commented on HIVE-4794:
---

https://issues.apache.org/jira/browse/HIVE-4794

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4794) Unit e2e tests for vectorization


[ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722657#comment-13722657
 ] 

Edward Capriolo commented on HIVE-4794:
---

Your not using the proper code conventions code, not conforming to 
http://uima.apache.org/codeConventions.html can not be committed.

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Deleted] (HIVE-4794) Unit e2e tests for vectorization


 [ 
https://issues.apache.org/jira/browse/HIVE-4794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4794:
--

Comment: was deleted

(was: Your not using the proper code conventions code, not conforming to 
http://uima.apache.org/codeConventions.html can not be committed.)

 Unit e2e tests for vectorization
 

 Key: HIVE-4794
 URL: https://issues.apache.org/jira/browse/HIVE-4794
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Tony Murphy
Assignee: Tony Murphy
 Fix For: vectorization-branch

 Attachments: HIVE-4794.1.patch, hive-4794.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3264) Add support for binary dataype to AvroSerde


[ 
https://issues.apache.org/jira/browse/HIVE-3264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722704#comment-13722704
 ] 

Ashutosh Chauhan commented on HIVE-3264:


+1

 Add support for binary dataype to AvroSerde
 ---

 Key: HIVE-3264
 URL: https://issues.apache.org/jira/browse/HIVE-3264
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
Reporter: Jakob Homan
Assignee: Eli Reisman
  Labels: patch
 Fix For: 0.12.0

 Attachments: HIVE-3264-1.patch, HIVE-3264-2.patch, HIVE-3264-3.patch, 
 HIVE-3264-4.patch, HIVE-3264-5.patch, HIVE-3264.6.patch, HIVE-3264.7.patch


 When the AvroSerde was written, Hive didn't have a binary type, so Avro's 
 byte array type is converted an array of small ints.  Now that HIVE-2380 is 
 in, this step isn't necessary and we can convert both Avro's bytes type and 
 probably fixed type to Hive's binary type.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2013-07-29 Thread Jakob Homan


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review24149
---


There is still no text covering a map-reduce job on an already existing, 
non-Avro table into an avro table.  ie, create a text table, populate it, run a 
CTAS to manipulate the data into an Avro table.


ql/src/test/queries/clientpositive/avro_create_as_select.q
https://reviews.apache.org/r/11925/#comment47977

This is testing that one can copy data into an already existing table, but 
doesn't verify that the already existing, non-avro data is converted correctly.


- Jakob Homan


On July 23, 2013, 2:51 a.m., Mohammad Islam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11925/
 ---
 
 (Updated July 23, 2013, 2:51 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Jakob Homan.
 
 
 Bugs: HIVE-3159
 https://issues.apache.org/jira/browse/HIVE-3159
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problem:
 Hive doesn't support to create a Avro-based table using HQL create table 
 command. It currently requires to specify Avro schema literal or schema file 
 name.
 For multiple cases, it is very inconvenient for user.
 Some of the un-supported use cases:
 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE
 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE
 3. Create  table  without specifying Avro schema.
 
 
 Diffs
 -
 
   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_create_as_select2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
 13848b6 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
 PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
 010f614 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/11925/diff/
 
 
 Testing
 ---
 
 Wrote a new java Test class for a new Java class. Added a new test case into 
 existing java test class. In addition, there are 4 .q file for testing 
 multiple use-cases.
 
 
 Thanks,
 
 Mohammad Islam

Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-07-29 Thread agateaaa

Hi All:

We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore
Server 0.9) where we get connection reset or connection timeout errors.

The hive metastore server has been allocated enough (12G) memory.

This is a critical problem for us and would appreciate if anyone has any
pointers.

We did add a retry logic in our client, which seems to help, but I am just
wondering how can we narrow down to the root cause
of this problem. Could this be a hiccup in networking which causes the hive
server to get into a unresponsive state  ?

Thanks

Agateaaa


Example Connection reset error:
===

org.apache.thrift.transport.TTransportException: java.net.SocketException:
Connection reset
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
 at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
 at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
 at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
at
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817)
 at
org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297)
at
org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
 at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
 at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
 at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:168)
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
 ... 30 more




Example Connection timeout error:
==

org.apache.thrift.transport.TTransportException:
java.net.SocketTimeoutException: Read timed out
at
org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
 at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
 at
org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
at
org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
 at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
 at
org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
 at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
at
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
 at
org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
 at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:830)
 at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.getMetaData(SemanticAnalyzer.java:954)
at
org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7524)
 at

[jira] [Commented] (HIVE-4928) Date literals do not work properly in partition spec clause

2013-07-29 Thread Phabricator (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722718#comment-13722718
 ] 

Phabricator commented on HIVE-4928:
---

ashutoshc has accepted the revision HIVE-4928 [jira] Date literals do not work 
properly in partition spec clause.

  +1

REVISION DETAIL
  https://reviews.facebook.net/D11871

BRANCH
  HIVE-4928.2

ARCANIST PROJECT
  hive

To: JIRA, ashutoshc, jdere


 Date literals do not work properly in partition spec clause
 ---

 Key: HIVE-4928
 URL: https://issues.apache.org/jira/browse/HIVE-4928
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Jason Dere
Assignee: Jason Dere
 Attachments: HIVE-4928.1.patch.txt, HIVE-4928.D11871.1.patch


 The partition spec parsing doesn't do any actual real evaluation of the 
 values in the partition spec, instead just taking the text value of the 
 ASTNode representing the partition value. This works fine for string/numeric 
 literals (expression tree below):
 (TOK_PARTVAL region 99)
 But not for Date literals which are of form DATE '-mm-dd' (expression 
 tree below:
 (TOK_DATELITERAL '1999-12-31')
 In this case the parser/analyzer uses TOK_DATELITERAL as the partition 
 column value, when it should really get value of the child of the DATELITERAL 
 token.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2013-07-29 Thread Jakob Homan



 On June 29, 2013, 7:43 p.m., Ashutosh Chauhan wrote:
  serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java, line 
  70
  https://reviews.apache.org/r/11925/diff/2/?file=307412#file307412line70
 
  I think determining schema from table definition should be default. 
  There are multiple of determining schema. I think order should be:
  a) Try table definition.
  b) Try schema literal in properties.
  c) Try from hdfs.
  d) Try from url.

This is a big change.  Avro tables have always been defined via a property.  
This change is to support a small use case; why switch the entire order?


- Jakob


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review22571
---


On July 23, 2013, 2:51 a.m., Mohammad Islam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11925/
 ---
 
 (Updated July 23, 2013, 2:51 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Jakob Homan.
 
 
 Bugs: HIVE-3159
 https://issues.apache.org/jira/browse/HIVE-3159
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problem:
 Hive doesn't support to create a Avro-based table using HQL create table 
 command. It currently requires to specify Avro schema literal or schema file 
 name.
 For multiple cases, it is very inconvenient for user.
 Some of the un-supported use cases:
 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE
 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE
 3. Create  table  without specifying Avro schema.
 
 
 Diffs
 -
 
   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_create_as_select2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
 13848b6 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
 PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
 010f614 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/11925/diff/
 
 
 Testing
 ---
 
 Wrote a new java Test class for a new Java class. Added a new test case into 
 existing java test class. In addition, there are 4 .q file for testing 
 multiple use-cases.
 
 
 Thanks,
 
 Mohammad Islam

Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-07-29 Thread Nitin Pawar

Is there any chance you can do a update on test environment with hcat-0.5
and hive-0(11 or 10) and see if you can reproduce the issue?

We used to see this error when there was load on hcat server or some
network issue connecting to the server(second one was rare occurrence)


On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote:

 Hi All:

 We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore
 Server 0.9) where we get connection reset or connection timeout errors.

 The hive metastore server has been allocated enough (12G) memory.

 This is a critical problem for us and would appreciate if anyone has any
 pointers.

 We did add a retry logic in our client, which seems to help, but I am just
 wondering how can we narrow down to the root cause
 of this problem. Could this be a hiccup in networking which causes the hive
 server to get into a unresponsive state  ?

 Thanks

 Agateaaa


 Example Connection reset error:
 ===

 org.apache.thrift.transport.TTransportException: java.net.SocketException:
 Connection reset
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
  at

 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
  at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
  at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
  at

 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
 at

 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817)
  at

 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297)
 at

 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
 at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:168)
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
  ... 30 more




 Example Connection timeout error:
 ==

 org.apache.thrift.transport.TTransportException:
 java.net.SocketTimeoutException: Read timed out
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
  at

 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
  at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
  at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
  at

Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-07-29 Thread agateaaa

Thanks Nitin!

We have simiar setup (identical hcatalog and hive server versions) on a
another production environment and dont see any errors (its been running ok
for a few months)

Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive
0.10 soon.

I did see that the last time we ran into this problem doing a netstat-ntp |
grep :1 see that server was holding on to one socket connection in
CLOSE_WAIT state for a long time
 (hive metastore server is running on port 1). Dont know if thats
relevant here or not

Can you suggest any hive configuration settings we can tweak or networking
tools/tips, we can use to narrow this down ?

Thanks
Agateaaa




On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Is there any chance you can do a update on test environment with hcat-0.5
 and hive-0(11 or 10) and see if you can reproduce the issue?

 We used to see this error when there was load on hcat server or some
 network issue connecting to the server(second one was rare occurrence)


 On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote:

 Hi All:

 We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore
 Server 0.9) where we get connection reset or connection timeout errors.

 The hive metastore server has been allocated enough (12G) memory.

 This is a critical problem for us and would appreciate if anyone has any
 pointers.

 We did add a retry logic in our client, which seems to help, but I am just
 wondering how can we narrow down to the root cause
 of this problem. Could this be a hiccup in networking which causes the
 hive
 server to get into a unresponsive state  ?

 Thanks

 Agateaaa


 Example Connection reset error:
 ===

 org.apache.thrift.transport.TTransportException: java.net.SocketException:
 Connection reset
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
  at

 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
  at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
  at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
  at

 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
 at

 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817)
  at

 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297)
 at

 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)
  at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:557)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at

 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at

 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
  at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.net.SocketException: Connection reset
 at java.net.SocketInputStream.read(SocketInputStream.java:168)
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
  ... 30 more




 Example Connection timeout error:
 ==

 org.apache.thrift.transport.TTransportException:
 java.net.SocketTimeoutException: Read timed out
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
  at

[jira] [Commented] (HIVE-2137) JDBC driver doesn't encode string properly.

2013-07-29 Thread Kousuke Saruta (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-2137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722789#comment-13722789
 ] 

Kousuke Saruta commented on HIVE-2137:
--

I wonder if my change really affects TestE2EScenarios.testReadOrcAndRCFromPig.
I've just only added test code and test data for HiveQueryResultSet after the 
build successfully finished.

 JDBC driver doesn't encode string properly.
 ---

 Key: HIVE-2137
 URL: https://issues.apache.org/jira/browse/HIVE-2137
 Project: Hive
  Issue Type: Bug
  Components: JDBC
Affects Versions: 0.9.0
Reporter: Jin Adachi
 Fix For: 0.12.0

 Attachments: HIVE-2137.patch, HIVE-2137.patch


 JDBC driver for HiveServer1 decodes string by client side default encoding, 
 which depends on operating system unless we don't specify another encoding. 
 It ignore server side encoding. 
 For example, 
 when server side operating system and encoding are Linux (utf-8) and client 
 side operating system and encoding are Windows (shift-jis : it's japanese 
 charset, makes character corruption happens in the client.
 In current implementation of Hive, UTF-8 appears to be expected in server 
 side so client side should encode/decode string as UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)

Laljo John Pullokkaran created HIVE-4950:


 Summary: Hive childSuspend is broken (debugging local hadoop jobs)
 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1


Hive --debug has an option to suspend child JVMs, which seems to be broken 
currently. Note that this mode may be useful only when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)


 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Status: Patch Available  (was: Open)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.patch


 Hive --debug has an option to suspend child JVMs, which seems to be broken 
 currently. Note that this mode may be useful only when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)


 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Attachment: HIVE-4950.patch

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.patch


 Hive --debug has an option to suspend child JVMs, which seems to be broken 
 currently. Note that this mode may be useful only when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)


 [ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4950:
-

Description: Hive debug has an option to suspend child JVMs, which seems to 
be broken currently (--debug childSuspend=y). Note that this mode may be useful 
only when running in local mode.  (was: Hive --debug has an option to suspend 
child JVMs, which seems to be broken currently. Note that this mode may be 
useful only when running in local mode.)

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4916) Add TezWork


[ 
https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722819#comment-13722819
 ] 

Gunther Hagleitner commented on HIVE-4916:
--

[~appodictic] Will do. I was looking for a way to avoid the pre-commit to run. 
.txt is much better though.

 Add TezWork
 ---

 Key: HIVE-4916
 URL: https://issues.apache.org/jira/browse/HIVE-4916
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4916.1.patch.branch


 TezWork is the class that encapsulates all the info needed to execute a 
 single Tez job (i.e.: a dag of map or reduce work).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4916) Add TezWork


 [ 
https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4916:
-

Attachment: HIVE-4916.2.patch.txt

Changing name to .txt

 Add TezWork
 ---

 Key: HIVE-4916
 URL: https://issues.apache.org/jira/browse/HIVE-4916
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4916.1.patch.branch, HIVE-4916.2.patch.txt


 TezWork is the class that encapsulates all the info needed to execute a 
 single Tez job (i.e.: a dag of map or reduce work).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4917) Tez Job Monitoring


 [ 
https://issues.apache.org/jira/browse/HIVE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4917:
-

Attachment: HIVE-4917.2.patch.txt

Renaming patch to .txt

 Tez Job Monitoring
 --

 Key: HIVE-4917
 URL: https://issues.apache.org/jira/browse/HIVE-4917
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4917.1.patch.branch, HIVE-4917.2.patch.txt


 TezJobMonitor handles monitoring the execution of a Tez dag

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability


 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4843:
-

Attachment: HIVE-4843.3.patch

Latest patch based on trunk.

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4826) Setup build infrastructure for tez


 [ 
https://issues.apache.org/jira/browse/HIVE-4826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4826:
-

Attachment: HIVE-4826.2.patch

Latest update based on trunk/branch.

 Setup build infrastructure for tez
 --

 Key: HIVE-4826
 URL: https://issues.apache.org/jira/browse/HIVE-4826
 Project: Hive
  Issue Type: New Feature
  Components: Tez
Affects Versions: tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Fix For: tez-branch

 Attachments: HIVE-4826.2.patch, HIVE-4826.patch


 Address changes required in ivy and build xml files to support tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4918) Tez job submission


[ 
https://issues.apache.org/jira/browse/HIVE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722835#comment-13722835
 ] 

Gunther Hagleitner commented on HIVE-4918:
--

Thanks Ed. The createHashTables still has to be built. I need to find the right 
code in hive for this.

Let me also look into createScratchDir. You're right. Hive should do this 
already.

 Tez job submission
 --

 Key: HIVE-4918
 URL: https://issues.apache.org/jira/browse/HIVE-4918
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4918.1.patch.branch


 This patch is to create infrastructure to submit a tez dag. (i.e.: TezTask + 
 utils to convert work into a tez dag).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4918) Tez job submission


 [ 
https://issues.apache.org/jira/browse/HIVE-4918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4918:
-

Attachment: HIVE-4918.2.patch.txt

Renaming patch to .txt.

 Tez job submission
 --

 Key: HIVE-4918
 URL: https://issues.apache.org/jira/browse/HIVE-4918
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4918.1.patch.branch, HIVE-4918.2.patch.txt


 This patch is to create infrastructure to submit a tez dag. (i.e.: TezTask + 
 utils to convert work into a tez dag).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 13021: Vectorization Tests

2013-07-29 Thread tony murphy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13021/
---

(Updated July 29, 2013, 9:04 p.m.)


Review request for hive, Eric Hanson, Jitendra Pandey, Remus Rusanu, and 
Sarvesh Sakalanaga.


Changes
---

updated for style fixes


Bugs: HIVE-4794
https://issues.apache.org/jira/browse/HIVE-4794


Repository: hive-git


Description
---

These test cover all types, aggregates, and operators currently supported for 
vectorization. The queries are executed over a specially crafted data set which 
covers all the interesting classes of batch for each type: all nulls, repeating 
value, no nulls, and random values, to fully exercise the vectorization stack. 
The queries were stabilized against a text test oracle in order to validate 
results.

This patch depends on: 
HIVE-4525
HIVE-4922
HIVE-4931


Diffs (updated)
-

  ql/src/java/org/apache/hadoop/hive/ql/history/HiveHistory.java 97436c5 
  ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java bdeabe0 
  
ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/AllVectorTypesRecord.java
 PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/OrcFileGenerator.java 
PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_0.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_1.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_10.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_11.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_12.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_13.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_14.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_15.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_16.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_2.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_3.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_4.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_5.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_6.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_7.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_8.q PRE-CREATION 
  ql/src/test/queries/clientpositive/vectorization_9.q PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_0.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_1.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_10.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_11.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_12.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_13.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_14.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_15.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_16.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_2.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_3.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_4.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_5.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_6.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_7.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_8.q.out PRE-CREATION 
  ql/src/test/results/clientpositive/vectorization_9.q.out PRE-CREATION 

Diff: https://reviews.apache.org/r/13021/diff/


Testing
---


Thanks,

tony murphy

[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

[
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722984#comment-13722984
]

Ashutosh Chauhan commented on HIVE-4331:

[~viraj] If its not too hard can you separate current patch into two issues :
one dealing with HivePassThroughFormat and second about merging storage
handlers. Seems like first is a pre-requisite for second. I want to understand
that change little better since that may have implications for other storage
handler writers and output format writers for Hive.

Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

Key: HIVE-4331
URL: https://issues.apache.org/jira/browse/HIVE-4331
Project: Hive
Issue Type: Task
Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf

1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog.
These will now continue to function but internally they will use the
DefaultStorageHandler from Hive. They will be removed in future release of
Hive.
2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will
bypass the HiveOutputFormat. We will use this class in Hive's
HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
3) Write new unit tests in the HCat's storagehandler so that systems such
as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the
HCatHBaseStorageHandler.
4) Make sure all the old and new unit tests pass without backward
compatibility (except known issues as described in the Design Document).
5) Replace all instances of the HCat source code, which point to
HCatStorageHandler to use theHiveStorageHandler including the
FosterStorageHandler.
I have attached the design document for the same and will attach a patch to
this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Review Request 13032: HIVE-4826 Setup build infrastructure for tez

2013-07-29 Thread Vikram Dixit Kumaraswamy


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13032/
---

Review request for hive, Ashutosh Chauhan and Gunther Hagleitner.


Bugs: HIVE-4826
https://issues.apache.org/jira/browse/HIVE-4826


Repository: hive-git


Description
---

Setup build infrastructure for tez.


Diffs
-

  build-common.xml 0807827 
  build.xml 016d363 
  eclipse-templates/.classpath 7114b90 
  ivy/libraries.properties 4a8edce 
  ql/ivy.xml bfb3116 
  shims/ivy.xml 04ef641 

Diff: https://reviews.apache.org/r/13032/diff/


Testing
---

All unit tests pass.


Thanks,

Vikram Dixit Kumaraswamy

[jira] [Created] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)

2013-07-29 Thread Thejas M Nair (JIRA)

Thejas M Nair created HIVE-4951:
---

 Summary: combine2_win.q.out needs update for HIVE-3253 (increasing 
nesting levels)
 Key: HIVE-4951
 URL: https://issues.apache.org/jira/browse/HIVE-4951
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair


combine2.q was updated in HIVE-3253, the corresponding change is missing in 
combine2_win.q, causing it to fail on windows.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Review Request 11925: Hive-3159 Update AvroSerde to determine schema of new tables

2013-07-29 Thread Mohammad Islam



 On July 29, 2013, 5:41 p.m., Jakob Homan wrote:
  There is still no text covering a map-reduce job on an already existing, 
  non-Avro table into an avro table.  ie, create a text table, populate it, 
  run a CTAS to manipulate the data into an Avro table.

In general, Hive creates internal column names such as col0, col1 etc. Due to 
this, I didn't able to copy non-avro data to avro data and run select SQL. Only 
option is to change the current behavior to reuse the provided column names. 
Separate JIRA regarding this could be a choice.


 On July 29, 2013, 5:41 p.m., Jakob Homan wrote:
  ql/src/test/queries/clientpositive/avro_create_as_select.q, line 3
  https://reviews.apache.org/r/11925/diff/4/?file=325386#file325386line3
 
  This is testing that one can copy data into an already existing table, 
  but doesn't verify that the already existing, non-avro data is converted 
  correctly.

same as above.


- Mohammad


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11925/#review24149
---


On July 23, 2013, 9:51 a.m., Mohammad Islam wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/11925/
 ---
 
 (Updated July 23, 2013, 9:51 a.m.)
 
 
 Review request for hive, Ashutosh Chauhan and Jakob Homan.
 
 
 Bugs: HIVE-3159
 https://issues.apache.org/jira/browse/HIVE-3159
 
 
 Repository: hive-git
 
 
 Description
 ---
 
 Problem:
 Hive doesn't support to create a Avro-based table using HQL create table 
 command. It currently requires to specify Avro schema literal or schema file 
 name.
 For multiple cases, it is very inconvenient for user.
 Some of the un-supported use cases:
 1. Create table ... Avro-SERDE etc. as SELECT ... from NON-AVRO FILE
 2. Create table ... Avro-SERDE etc. as SELECT from AVRO TABLE
 3. Create  table  without specifying Avro schema.
 
 
 Diffs
 -
 
   ql/src/test/queries/clientpositive/avro_create_as_select.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_create_as_select2.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_no_schema_test.q PRE-CREATION 
   ql/src/test/queries/clientpositive/avro_without_schema.q PRE-CREATION 
   ql/src/test/results/clientpositive/avro_create_as_select.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_create_as_select2.q.out 
 PRE-CREATION 
   ql/src/test/results/clientpositive/avro_no_schema_test.q.out PRE-CREATION 
   ql/src/test/results/clientpositive/avro_without_schema.q.out PRE-CREATION 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/AvroSerdeUtils.java 
 13848b6 
   serde/src/java/org/apache/hadoop/hive/serde2/avro/TypeInfoToSchema.java 
 PRE-CREATION 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestAvroSerdeUtils.java 
 010f614 
   serde/src/test/org/apache/hadoop/hive/serde2/avro/TestTypeInfoToSchema.java 
 PRE-CREATION 
 
 Diff: https://reviews.apache.org/r/11925/diff/
 
 
 Testing
 ---
 
 Wrote a new java Test class for a new Java class. Added a new test case into 
 existing java test class. In addition, there are 4 .q file for testing 
 multiple use-cases.
 
 
 Thanks,
 
 Mohammad Islam

[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler


[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723023#comment-13723023
 ] 

Ashutosh Chauhan commented on HIVE-4331:


If I am reading this right, is the motivation of HivePassThroughFormat is that 
storage handler writers need not to write their custom OF (to implement Hive 
OF) anymore and thus can use their existing OF unmodified and all the necessary 
plumbing will be done in Storage handler ?

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)

2013-07-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4951:


Attachment: HIVE-4951.1.patch

 combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
 -

 Key: HIVE-4951
 URL: https://issues.apache.org/jira/browse/HIVE-4951
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4951.1.patch


 combine2.q was updated in HIVE-3253, the corresponding change is missing in 
 combine2_win.q, causing it to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler

2013-07-29 Thread Viraj Bhat (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723029#comment-13723029
 ] 

Viraj Bhat commented on HIVE-4331:
--

Hi Ashutosh,
 That is right,. They do not need to write their CustomOF to implement HiveOF.
If it makes it easier to review, I can split the patch based on HCat (contrib) 
and core Hive.
Viraj

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4951) combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)

2013-07-29 Thread Thejas M Nair (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated HIVE-4951:


Status: Patch Available  (was: Open)

 combine2_win.q.out needs update for HIVE-3253 (increasing nesting levels)
 -

 Key: HIVE-4951
 URL: https://issues.apache.org/jira/browse/HIVE-4951
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: HIVE-4951.1.patch


 combine2.q was updated in HIVE-3253, the corresponding change is missing in 
 combine2_win.q, causing it to fail on windows.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4331) Integrated StorageHandler for Hive and HCat using the HiveStorageHandler


[ 
https://issues.apache.org/jira/browse/HIVE-4331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723055#comment-13723055
 ] 

Ashutosh Chauhan commented on HIVE-4331:


Yeah.. it will make it easier for review. Division in terms of core Hive and 
HCatalog sounds good.

 Integrated StorageHandler for Hive and HCat using the HiveStorageHandler
 

 Key: HIVE-4331
 URL: https://issues.apache.org/jira/browse/HIVE-4331
 Project: Hive
  Issue Type: Task
  Components: HCatalog
Affects Versions: 0.11.0, 0.12.0
Reporter: Ashutosh Chauhan
Assignee: Viraj Bhat
 Attachments: HIVE4331_07-17.patch, StorageHandlerDesign_HIVE4331.pdf


 1) Deprecate the HCatHBaseStorageHandler and RevisionManager from HCatalog. 
 These will now continue to function but internally they will use the 
 DefaultStorageHandler from Hive. They will be removed in future release of 
 Hive.
 2) Design a HivePassThroughFormat so that any new StorageHandler in Hive will 
 bypass the HiveOutputFormat. We will use this class in Hive's 
 HBaseStorageHandler instead of the HiveHBaseTableOutputFormat.
 3) Write new unit tests in the HCat's storagehandler so that systems such 
 as Pig and Map Reduce can use the Hive's HBaseStorageHandler instead of the 
 HCatHBaseStorageHandler.
 4) Make sure all the old and new unit tests pass without backward 
 compatibility (except known issues as described in the Design Document).
 5) Replace all instances of the HCat source code, which point to 
 HCatStorageHandler to use theHiveStorageHandler including the 
 FosterStorageHandler.
 I have attached the design document for the same and will attach a patch to 
 this Jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

Yin Huai created HIVE-4952:
--

 Summary: When hive.join.emit.interval is small, queries optimized 
by Correlation Optimizer may generate wrong results
 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai


If we have a query like this ...
{code:sql}
SELECT xx.key, xx.cnt, yy.key
FROM
(SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
y.key) group by x.key) xx
JOIN src yy
ON xx.key=yy.key;
{\code}

After Correlation Optimizer, the operator tree in the reducer will be 
{code}
 JOIN2
   |
   |
  MUX
 /   \
/ \
   GBY |
|  |
  JOIN1|
\ /
 \   /
 DEMUX
{\code}
For JOIN2, the right table will arrive at this operator first. If 
hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it 
has not got any row from the left table. The logic related 
hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by 
the tag. But, if a query has been optimized by Correlation Optimizer, this 
assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results


 [ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4952:
---

Attachment: replay.txt

to replay the problem. Apply 'replay.txt' and then run
{code}
ant test -Dtestcase=TestCliDriver -Dqfile=correlationoptimizer15.q 
-Dtest.silent=false
{\code}

 When hive.join.emit.interval is small, queries optimized by Correlation 
 Optimizer may generate wrong results
 

 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: replay.txt


 If we have a query like this ...
 {code:sql}
 SELECT xx.key, xx.cnt, yy.key
 FROM
 (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
 y.key) group by x.key) xx
 JOIN src yy
 ON xx.key=yy.key;
 {\code}
 After Correlation Optimizer, the operator tree in the reducer will be 
 {code}
  JOIN2
|
|
   MUX
  /   \
 / \
GBY |
 |  |
   JOIN1|
 \ /
  \   /
  DEMUX
 {\code}
 For JOIN2, the right table will arrive at this operator first. If 
 hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
 it has not got any row from the left table. The logic related 
 hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
 by the tag. But, if a query has been optimized by Correlation Optimizer, this 
 assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2702) listPartitionsByFilter only supports string partitions for equals

2013-07-29 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-2702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-2702:
--

Attachment: HIVE-2702.D11847.2.patch

sershe updated the revision HIVE-2702 [jira] listPartitionsByFilter only 
supports string partitions for equals.

  Adding the query change. Fetching partition dt=100x for query dt = 100 
seems incorrect

Reviewers: ashutoshc, JIRA

REVISION DETAIL
  https://reviews.facebook.net/D11847

CHANGE SINCE LAST DIFF
  https://reviews.facebook.net/D11847?vs=36303id=36483#toc

BRANCH
  HIVE-2702-2

ARCANIST PROJECT
  hive

AFFECTED FILES
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/ExpressionTree.java
  metastore/src/java/org/apache/hadoop/hive/metastore/parser/Filter.g
  metastore/src/test/org/apache/hadoop/hive/metastore/TestHiveMetaStore.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Utilities.java
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java
  ql/src/test/results/clientpositive/alter_partition_coltype.q.out

To: JIRA, ashutoshc, sershe


 listPartitionsByFilter only supports string partitions for equals
 -

 Key: HIVE-2702
 URL: https://issues.apache.org/jira/browse/HIVE-2702
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Aniket Mokashi
Assignee: Sergey Shelukhin
 Fix For: 0.12.0

 Attachments: ASF.LICENSE.NOT.GRANTED--HIVE-2702.D2043.1.patch, 
 HIVE-2702.1.patch, HIVE-2702.D11715.1.patch, HIVE-2702.D11715.2.patch, 
 HIVE-2702.D11715.3.patch, HIVE-2702.D11847.1.patch, HIVE-2702.D11847.2.patch, 
 HIVE-2702.patch, HIVE-2702-v0.patch


 listPartitionsByFilter supports only non-string partitions. This is because 
 its explicitly specified in generateJDOFilterOverPartitions in 
 ExpressionTree.java. 
 //Can only support partitions whose types are string
   if( ! table.getPartitionKeys().get(partitionColumnIndex).
   
 getType().equals(org.apache.hadoop.hive.serde.Constants.STRING_TYPE_NAME) ) {
 throw new MetaException
 (Filtering is supported only on partition keys of type string);
   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4917) Tez Job Monitoring


 [ 
https://issues.apache.org/jira/browse/HIVE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gunther Hagleitner updated HIVE-4917:
-

Description: 
TezJobMonitor handles monitoring the execution of a Tez dag

NO PRECOMMIT TESTS (this is wip for the tez branch)

  was:TezJobMonitor handles monitoring the execution of a Tez dag


 Tez Job Monitoring
 --

 Key: HIVE-4917
 URL: https://issues.apache.org/jira/browse/HIVE-4917
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4917.1.patch.branch, HIVE-4917.2.patch.txt


 TezJobMonitor handles monitoring the execution of a Tez dag
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4916) Add TezWork


[ 
https://issues.apache.org/jira/browse/HIVE-4916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723132#comment-13723132
 ] 

Gunther Hagleitner commented on HIVE-4916:
--

Thanks [~brocknoland] I've added the ALL CAPS prop to all the relevant 
descriptions. That's really nice to have and much better than fiddling with the 
name.

 Add TezWork
 ---

 Key: HIVE-4916
 URL: https://issues.apache.org/jira/browse/HIVE-4916
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Gunther Hagleitner
 Fix For: tez-branch

 Attachments: HIVE-4916.1.patch.branch, HIVE-4916.2.patch.txt


 TezWork is the class that encapsulates all the info needed to execute a 
 single Tez job (i.e.: a dag of map or reduce work).
 NO PRECOMMIT TESTS (this is wip for the tez branch)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task


[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723150#comment-13723150
 ] 

Laljo John Pullokkaran commented on HIVE-4870:
--

[~brocknoland]: Brock, I am seeing pre commit test failures in 
auto_sortmerge_join_1.q and auto_sortmerge_join_7.q. I can not reproduce 
these in my linux or Mac OS X env (when run stand alone). Wondering if this is 
a known issue with new pre-commit test framework.

Thanks
John

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task


[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723160#comment-13723160
 ] 

Brock Noland commented on HIVE-4870:


Hey, I haven't seen those fail. If you upload the patch again you could let it 
run a second time and see if they fail again.

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HIVE-4953) Regression: Hive does not build offline anymore

Edward Capriolo created HIVE-4953:
-

 Summary: Regression: Hive does not build offline anymore
 Key: HIVE-4953
 URL: https://issues.apache.org/jira/browse/HIVE-4953
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4870) Explain Extended to show partition info for Fetch Task


[ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723185#comment-13723185
 ] 

Laljo John Pullokkaran commented on HIVE-4870:
--

Ok, let me try it again.

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: (was: HIVE-4870.patch)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Attachment: HIVE-4870.patch

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4870) Explain Extended to show partition info for Fetch Task


 [ 
https://issues.apache.org/jira/browse/HIVE-4870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laljo John Pullokkaran updated HIVE-4870:
-

Status: Patch Available  (was: Open)

 Explain Extended to show partition info for Fetch Task
 --

 Key: HIVE-4870
 URL: https://issues.apache.org/jira/browse/HIVE-4870
 Project: Hive
  Issue Type: Bug
  Components: Query Processor, Tests
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Attachments: HIVE-4870.patch


 Explain extended does not include partition information for Fetch Task 
 (FetchWork). Map Reduce Task (MapredWork)already does this. 
 Patch includes Partition Description info to Fetch Task.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Hive Metastore Server 0.9 Connection Reset and Connection Timeout errors

2013-07-29 Thread agateaaa

Looking at the hive metastore server logs see errors like these:

2013-07-26 06:34:52,853 ERROR server.TThreadPoolServer
(TThreadPoolServer.java:run(182)) - Error occurred during processing of
message.
java.lang.NullPointerException
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.setIpAddress(TUGIBasedProcessor.java:183)
at
org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:79)
at
org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:176)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)

approx same time as we see timeout or connection reset errors.

Dont know if this is the cause or the side affect of he connection
timeout/connection reset errors. Does anybody have any pointers or
suggestions ?

Thanks


On Mon, Jul 29, 2013 at 11:29 AM, agateaaa agate...@gmail.com wrote:

 Thanks Nitin!

 We have simiar setup (identical hcatalog and hive server versions) on a
 another production environment and dont see any errors (its been running ok
 for a few months)

 Unfortunately we wont be able to move to hcat 0.5 and hive 0.11 or hive
 0.10 soon.

 I did see that the last time we ran into this problem doing a netstat-ntp
 | grep :1 see that server was holding on to one socket connection in
 CLOSE_WAIT state for a long time
  (hive metastore server is running on port 1). Dont know if thats
 relevant here or not

 Can you suggest any hive configuration settings we can tweak or networking
 tools/tips, we can use to narrow this down ?

 Thanks
 Agateaaa




 On Mon, Jul 29, 2013 at 11:02 AM, Nitin Pawar nitinpawar...@gmail.comwrote:

 Is there any chance you can do a update on test environment with hcat-0.5
 and hive-0(11 or 10) and see if you can reproduce the issue?

 We used to see this error when there was load on hcat server or some
 network issue connecting to the server(second one was rare occurrence)


 On Mon, Jul 29, 2013 at 11:13 PM, agateaaa agate...@gmail.com wrote:

 Hi All:

 We are running into frequent problem using HCatalog 0.4.1 (HIve Metastore
 Server 0.9) where we get connection reset or connection timeout errors.

 The hive metastore server has been allocated enough (12G) memory.

 This is a critical problem for us and would appreciate if anyone has any
 pointers.

 We did add a retry logic in our client, which seems to help, but I am
 just
 wondering how can we narrow down to the root cause
 of this problem. Could this be a hiccup in networking which causes the
 hive
 server to get into a unresponsive state  ?

 Thanks

 Agateaaa


 Example Connection reset error:
 ===

 org.apache.thrift.transport.TTransportException:
 java.net.SocketException:
 Connection reset
 at

 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
  at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:378)
  at

 org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:297)
 at

 org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:204)
  at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
 at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_set_ugi(ThriftHiveMetastore.java:2136)
  at

 org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.set_ugi(ThriftHiveMetastore.java:2122)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.openStore(HiveMetaStoreClient.java:286)
  at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:197)
 at

 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.init(HiveMetaStoreClient.java:157)
  at

 org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2092)
 at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2102)
  at org.apache.hadoop.hive.ql.metadata.Hive.getTable(Hive.java:888)
 at

 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeAlterTableAddParts(DDLSemanticAnalyzer.java:1817)
  at

 org.apache.hadoop.hive.ql.parse.DDLSemanticAnalyzer.analyzeInternal(DDLSemanticAnalyzer.java:297)
 at

 org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
  at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:431)
 at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:336)
  at org.apache.hadoop.hive.ql.Driver.run(Driver.java:909)
 at
 org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:258)
  at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:215)
 at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:406)
  at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:341)
 at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:642)

[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2


[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723204#comment-13723204
 ] 

Hive QA commented on HIVE-4388:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594727/HIVE-4388.patch

{color:red}ERROR:{color} -1 due to 10 failed/errored test(s), 1694 tests 
executed
*Failed tests:*
{noformat}
junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestIDGenerator
junit.framework.TestSuite.org.apache.hcatalog.hbase.snapshot.TestRevisionManagerEndpoint
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeII
junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseInputFormat
junit.framework.TestSuite.org.apache.hcatalog.hbase.TestHBaseBulkOutputFormat
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeI
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithColumnPrefixes
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithHiveMapToHBaseColumnFamilyII
org.apache.hadoop.hive.hbase.TestHBaseSerDe.testHBaseSerDeWithTimestamp
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/222/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/222/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 10 tests failed
{noformat}

This message is automatically generated.

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4541) Run check-style on the branch and fix style issues.

2013-07-29 Thread Jitendra Nath Pandey (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723213#comment-13723213
 ] 

Jitendra Nath Pandey commented on HIVE-4541:


We will need to break the style fixes into multiple patches, otherwise patch 
size will be too big.

 Run check-style on the branch and fix style issues.
 ---

 Key: HIVE-4541
 URL: https://issues.apache.org/jira/browse/HIVE-4541
 Project: Hive
  Issue Type: Sub-task
Affects Versions: vectorization-branch
Reporter: Jitendra Nath Pandey
Assignee: Jitendra Nath Pandey
 Fix For: vectorization-branch

 Attachments: HIVE-4541.1.patch


 We should run check style on the entire branch and fix issues before the 
 branch is merged back to the trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions


 [ 
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4954:
--

Attachment: HIVE-4954.1.patch.txt

 PTFTranslator hardcodes ranking functions
 -

 Key: HIVE-4954
 URL: https://issues.apache.org/jira/browse/HIVE-4954
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-4954.1.patch.txt


   protected static final ArrayListString RANKING_FUNCS = new 
 ArrayListString();
   static {
 RANKING_FUNCS.add(rank);
 RANKING_FUNCS.add(dense_rank);
 RANKING_FUNCS.add(percent_rank);
 RANKING_FUNCS.add(cume_dist);
   };
 Move this logic to annotations

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions


 [ 
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4954:
--

Status: Patch Available  (was: Open)

 PTFTranslator hardcodes ranking functions
 -

 Key: HIVE-4954
 URL: https://issues.apache.org/jira/browse/HIVE-4954
 Project: Hive
  Issue Type: Sub-task
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-4954.1.patch.txt


   protected static final ArrayListString RANKING_FUNCS = new 
 ArrayListString();
   static {
 RANKING_FUNCS.add(rank);
 RANKING_FUNCS.add(dense_rank);
 RANKING_FUNCS.add(percent_rank);
 RANKING_FUNCS.add(cume_dist);
   };
 Move this logic to annotations

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2

2013-07-29 Thread Prasanth J (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723244#comment-13723244
 ] 

Prasanth J commented on HIVE-4388:
--

Hi Brock

I was using this patch to make hive work with hbase 0.95 and found that there 
are some unit test failures in TestHBaseSerDe

There are few assertions that still checks for Put.class where it should check 
for PutWritable.class
The following methods needs to be fixed in TestHBaseSerde
{code}
deserializeAndSerialize()
deserializeAndSerializeHiveMapHBaseColumnFamilyII()
{code}

Also, can you please let me know how to test readFields() and write() 
interfaces in the ResultWritable/PutWritable? Are there any tests/.q files that 
makes use of these interface? 

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4388) HBase tests fail against Hadoop 2


[ 
https://issues.apache.org/jira/browse/HIVE-4388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723247#comment-13723247
 ] 

Brock Noland commented on HIVE-4388:


Hi,

Yes that patch is not even close to be ready for use.  I was just uploading to 
get the unit tests to run. The serde tests in addition to the 
TestHBaseCliDriver should exercise those tests.

Brock

 HBase tests fail against Hadoop 2
 -

 Key: HIVE-4388
 URL: https://issues.apache.org/jira/browse/HIVE-4388
 Project: Hive
  Issue Type: Bug
Reporter: Gunther Hagleitner
Assignee: Brock Noland
 Attachments: HIVE-4388.patch, HIVE-4388-wip.txt


 Currently we're building by default against 0.92. When you run against hadoop 
 2 (-Dhadoop.mr.rev=23) builds fail because of: HBASE-5963.
 HIVE-3861 upgrades the version of hbase used. This will get you past the 
 problem in HBASE-5963 (which was fixed in 0.94.1) but fails with: HBASE-6396.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4954) PTFTranslator hardcodes ranking functions


 [ 
https://issues.apache.org/jira/browse/HIVE-4954?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4954:
--

Issue Type: Improvement  (was: Sub-task)
Parent: (was: HIVE-4937)

 PTFTranslator hardcodes ranking functions
 -

 Key: HIVE-4954
 URL: https://issues.apache.org/jira/browse/HIVE-4954
 Project: Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: HIVE-4954.1.patch.txt


   protected static final ArrayListString RANKING_FUNCS = new 
 ArrayListString();
   static {
 RANKING_FUNCS.add(rank);
 RANKING_FUNCS.add(dense_rank);
 RANKING_FUNCS.add(percent_rank);
 RANKING_FUNCS.add(cume_dist);
   };
 Move this logic to annotations

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table

2013-07-29 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723276#comment-13723276
 ] 

Chaoyu Tang commented on HIVE-4223:
---

[~java8964] I was not able to reproduce the said problem in hive-0.9.0 and 
wondering if it might be related to the data? Here is my test case;
1. create table bcd (col1 array structcol1:string, col2:string, 
col3:string,col4:string,col5:string,col6:string,col7:string,col8:arraystructcol1:string,col2:string,col3:string,col4:string,col5:string,col6:string,col7:string,col8:string,col9:string)
 row format delimited fields terminated by '\001' collection items terminated 
by '\002' lines terminated by '\n' stored as textfile;
** should be same as you described
2. load data local inpath '/root/nest_struct.data' overwrite into table bcd;
** see attached nest_struct.data
3. select col1 from bcd;
** got:
[{col1:c1v,col2:c2v,col3:c3v,col4:c4v,col5:c5v,col6:c6v,col7:c7v,col8:[{col1:c11v,col2:c22v,col3:c33v,col4:c44v,col5:c55v,col6:c66v,col7:c77v,col8:c88v,col9:c99v}]}]


Did you see anything different from your case?
Could you please update your case and probably I can have a try.

 

 LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of 
 hive table
 

 Key: HIVE-4223
 URL: https://issues.apache.org/jira/browse/HIVE-4223
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
 Environment: Hive 0.9.0
Reporter: Yong Zhang
 Attachments: nest_struct.data


 The LazySimpleSerDe will throw IndexOutOfBoundsException if the column 
 structure is struct containing array of struct. 
 I have a table with one column defined like this:
 columnA
 array 
 struct
col1:primiType,
col2:primiType,
col3:primiType,
col4:primiType,
col5:primiType,
col6:primiType,
col7:primiType,
col8:array
 struct
   col1:primiType,
   col2::primiType,
   col3::primiType,
   col4:primiType,
   col5:primiType,
   col6:primiType,
   col7:primiType,
   col8:primiType,
   col9:primiType
 

 
 
 In this example, the outside struct has 8 columns (including the array), and 
 the inner struct has 9 columns. As long as the outside struct has LESS column 
 count than the inner struct column count, I think we will get the following 
 exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row:
 Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
 ... 9 more
 I am not very sure about exactly the reason of this problem. I believe that 
 the   public static void serialize(ByteStream.Output out, Object 
 obj,ObjectInspector objInspector, byte[] separators, int level, Text 
 nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is 
 recursively invoking itself when facing nest structure. But for the nested 
 struct structure, the list reference will mass up, and the size() will return 
 wrong data.
 In the

[jira] [Updated] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table

2013-07-29 Thread Chaoyu Tang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoyu Tang updated HIVE-4223:
--

Attachment: nest_struct.data

data file to my test case -- chaoyu

 LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of 
 hive table
 

 Key: HIVE-4223
 URL: https://issues.apache.org/jira/browse/HIVE-4223
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
 Environment: Hive 0.9.0
Reporter: Yong Zhang
 Attachments: nest_struct.data


 The LazySimpleSerDe will throw IndexOutOfBoundsException if the column 
 structure is struct containing array of struct. 
 I have a table with one column defined like this:
 columnA
 array 
 struct
col1:primiType,
col2:primiType,
col3:primiType,
col4:primiType,
col5:primiType,
col6:primiType,
col7:primiType,
col8:array
 struct
   col1:primiType,
   col2::primiType,
   col3::primiType,
   col4:primiType,
   col5:primiType,
   col6:primiType,
   col7:primiType,
   col8:primiType,
   col9:primiType
 

 
 
 In this example, the outside struct has 8 columns (including the array), and 
 the inner struct has 9 columns. As long as the outside struct has LESS column 
 count than the inner struct column count, I think we will get the following 
 exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row:
 Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
 ... 9 more
 I am not very sure about exactly the reason of this problem. I believe that 
 the   public static void serialize(ByteStream.Output out, Object 
 obj,ObjectInspector objInspector, byte[] separators, int level, Text 
 nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is 
 recursively invoking itself when facing nest structure. But for the nested 
 struct structure, the list reference will mass up, and the size() will return 
 wrong data.
 In the above example case I faced, 
 for these 2 lines:
   List? extends StructField fields = soi.getAllStructFieldRefs();
   list = soi.getStructFieldsDataAsList(obj);
 my StructObjectInspector(soi) will return the CORRECT data for 
 getAllStructFieldRefs() and getStructFieldsDataAsList() methods. For example, 
 for one row, for the outsider 8 columns struct, I have 2 elements in the 
 inner array of struct, and each element will have 9 columns (as there are 9 
 columns in the inner struct). During runtime, after I added more logging in 
 the LazySimpleSerDe, I will see the following behavior in the logging:
 for 8 outside column, loop
 for 9 inside columns, loop for serialize
 for 9 inside columns, loop for serialize
 code broken here, for the outside loop, it will try to access the 9th 
 element,which not exist in the outside loop, as you will see the stracktrace 
 as it tried to access location 8 of size 8 of list.
 What I did is to change the

[jira] [Commented] (HIVE-4223) LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of hive table

2013-07-29 Thread Chaoyu Tang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-4223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723303#comment-13723303
 ] 

Chaoyu Tang commented on HIVE-4223:
---

The previous comments are not in right format, re-post:

I was not able to reproduce the said problem in hive-0.9.0 and wondering if it 
might be related to the data? Here is my test case;
1. create table bcd (col1 array structcol1:string, col2:string, 
col3:string,col4:string,col5:string,col6:string,col7:string,col8:arraystructcol1:string,col2:string,col3:string,col4:string,col5:string,col6:string,col7:string,col8:string,col9:string)
 row format delimited fields terminated by '\001' collection items terminated 
by '\002' lines terminated by '\n' stored as textfile;
-- same as the case described in this JIRA
2. load data local inpath '/root/nest_struct.data' overwrite into table bcd;
-- see attached nest_struct.data
3. select col1 from bcd;
-- got expected result
{code}
[{col1:c1v,col2:c2v,col3:c3v,col4:c4v,col5:c5v,col6:c6v,col7:c7v,col8:[{col1:c11v,col2:c22v,col3:c33v,col4:c44v,col5:c55v,col6:c66v,col7:c77v,col8:c88v,col9:c99v}]}]
{code}

 LazySimpleSerDe will throw IndexOutOfBoundsException in nested structs of 
 hive table
 

 Key: HIVE-4223
 URL: https://issues.apache.org/jira/browse/HIVE-4223
 Project: Hive
  Issue Type: Bug
  Components: Serializers/Deserializers
Affects Versions: 0.9.0
 Environment: Hive 0.9.0
Reporter: Yong Zhang
 Attachments: nest_struct.data


 The LazySimpleSerDe will throw IndexOutOfBoundsException if the column 
 structure is struct containing array of struct. 
 I have a table with one column defined like this:
 columnA
 array 
 struct
col1:primiType,
col2:primiType,
col3:primiType,
col4:primiType,
col5:primiType,
col6:primiType,
col7:primiType,
col8:array
 struct
   col1:primiType,
   col2::primiType,
   col3::primiType,
   col4:primiType,
   col5:primiType,
   col6:primiType,
   col7:primiType,
   col8:primiType,
   col9:primiType
 

 
 
 In this example, the outside struct has 8 columns (including the array), and 
 the inner struct has 9 columns. As long as the outside struct has LESS column 
 count than the inner struct column count, I think we will get the following 
 exception as stracktrace in LazeSimpleSerDe when it tries to serialize a row:
 Caused by: java.lang.IndexOutOfBoundsException: Index: 8, Size: 8
 at java.util.ArrayList.RangeCheck(ArrayList.java:547)
 at java.util.ArrayList.get(ArrayList.java:322)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:485)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:443)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serializeField(LazySimpleSerDe.java:381)
 at 
 org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe.serialize(LazySimpleSerDe.java:365)
 at 
 org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:568)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:132)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:83)
 at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
 at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
 at 
 org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:531)
 ... 9 more
 I am not very sure about exactly the reason of this problem. I believe that 
 the   public static void serialize(ByteStream.Output out, Object 
 obj,ObjectInspector objInspector, byte[] separators, int level, Text 
 nullSequence, boolean escaped, byte escapeChar, boolean[] needsEscape) is 
 recursively invoking itself when facing nest structure. But for the nested 
 struct structure, the list reference will mass up, and the size() will return 
 wrong data.
 In the above example case I faced, 
 for these

[jira] [Updated] (HIVE-4879) Window functions that imply order can only be registered at compile time


 [ 
https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4879:
--

Attachment: HIVE-4879.3.patch.txt

Third time is a charm?

 Window functions that imply order can only be registered at compile time
 

 Key: HIVE-4879
 URL: https://issues.apache.org/jira/browse/HIVE-4879
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.12.0

 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, 
 HIVE-4879.3.patch.txt


 Adding an annotation for impliesOrder

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4879) Window functions that imply order can only be registered at compile time


[ 
https://issues.apache.org/jira/browse/HIVE-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723310#comment-13723310
 ] 

Edward Capriolo commented on HIVE-4879:
---

This patch is cummulative with  HIVE-4954 so if you apply this first you do not 
need to apply that.

 Window functions that imply order can only be registered at compile time
 

 Key: HIVE-4879
 URL: https://issues.apache.org/jira/browse/HIVE-4879
 Project: Hive
  Issue Type: Improvement
Affects Versions: 0.11.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.12.0

 Attachments: HIVE-4879.1.patch.txt, HIVE-4879.2.patch.txt, 
 HIVE-4879.3.patch.txt


 Adding an annotation for impliesOrder

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4002) Fetch task aggregation for simple group by query


[ 
https://issues.apache.org/jira/browse/HIVE-4002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723315#comment-13723315
 ] 

Edward Capriolo commented on HIVE-4002:
---

[~navis]Sorry I dropped the ball on this review. Can you rebase?

 Fetch task aggregation for simple group by query
 

 Key: HIVE-4002
 URL: https://issues.apache.org/jira/browse/HIVE-4002
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Reporter: Navis
Assignee: Navis
Priority: Minor
 Attachments: HIVE-4002.D8739.1.patch, HIVE-4002.D8739.2.patch


 Aggregation queries with no group-by clause (for example, select count(*) 
 from src) executes final aggregation in single reduce task. But it's too 
 small even for single reducer because the most of UDAF generates just single 
 row for map aggregation. If final fetch task can aggregate outputs from map 
 tasks, shuffling time can be removed.
 This optimization transforms operator tree something like,
 TS-FIL-SEL-GBY1-RS-GBY2-SEL-FS + FETCH-TASK
 into 
 TS-FIL-SEL-GBY1-FS + FETCH-TASK(GBY2-SEL-LS)
 With the patch, time taken for auto_join_filters.q test reduced to 6 min (10 
 min, before).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3256) Update asm version in Hive


[ 
https://issues.apache.org/jira/browse/HIVE-3256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723325#comment-13723325
 ] 

Hive QA commented on HIVE-3256:
---



{color:red}Overall{color}: -1 no tests executed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594735/HIVE-3256.patch

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/223/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/223/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Tests failed with: ExecutionException: java.util.concurrent.ExecutionException: 
java.lang.IllegalArgumentException: resource batch-exec.vm not found.
{noformat}

This message is automatically generated.

 Update asm version in Hive
 --

 Key: HIVE-3256
 URL: https://issues.apache.org/jira/browse/HIVE-3256
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Zhenxiao Luo
Assignee: Ashutosh Chauhan
 Attachments: HIVE-3256.patch


 Hive trunk are currently using asm version 3.1, Hadoop trunk are on 3.2. Any
 objections to bumping the Hive version to 3.2 to be inline with Hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW


 [ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-2608:
--

Attachment: HIVE-2608.8.patch.txt

Re-upload Navis' patch

 Do not require AS a,b,c part in LATERAL VIEW
 

 Key: HIVE-2608
 URL: https://issues.apache.org/jira/browse/HIVE-2608
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Reporter: Igor Kabiljo
Assignee: Navis
Priority: Minor
 Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
 HIVE-2608.D4317.6.patch


 Currently, it is required to state column names when LATERAL VIEW is used.
 That shouldn't be necessary, since UDTF returns struct which contains column 
 names - and they should be used by default.
 For example, it would be great if this was possible:
 SELECT t.*, t.key1 + t.key4
 FROM some_table
 LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2608) Do not require AS a,b,c part in LATERAL VIEW


[ 
https://issues.apache.org/jira/browse/HIVE-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723332#comment-13723332
 ] 

Edward Capriolo commented on HIVE-2608:
---

+1 if tests pass

 Do not require AS a,b,c part in LATERAL VIEW
 

 Key: HIVE-2608
 URL: https://issues.apache.org/jira/browse/HIVE-2608
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, UDF
Reporter: Igor Kabiljo
Assignee: Navis
Priority: Minor
 Attachments: HIVE-2608.8.patch.txt, HIVE-2608.D4317.5.patch, 
 HIVE-2608.D4317.6.patch


 Currently, it is required to state column names when LATERAL VIEW is used.
 That shouldn't be necessary, since UDTF returns struct which contains column 
 names - and they should be used by default.
 For example, it would be great if this was possible:
 SELECT t.*, t.key1 + t.key4
 FROM some_table
 LATERAL VIEW JSON_TUPLE(json, 'key1', 'key2', 'key3', 'key3') t;

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4953) Regression: Hive does not build offline anymore


 [ 
https://issues.apache.org/jira/browse/HIVE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4953:
--

Description: 
BUILD FAILED
/home/edward/Documents/java/hive-trunk/build.xml:233: 
java.net.UnknownHostException: repo2.maven.org

Both ant -Doffline=true and eclipse no longer can build offline

 Regression: Hive does not build offline anymore
 ---

 Key: HIVE-4953
 URL: https://issues.apache.org/jira/browse/HIVE-4953
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo

 BUILD FAILED
 /home/edward/Documents/java/hive-trunk/build.xml:233: 
 java.net.UnknownHostException: repo2.maven.org
 Both ant -Doffline=true and eclipse no longer can build offline

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4953) Regression: Hive does not build offline anymore


 [ 
https://issues.apache.org/jira/browse/HIVE-4953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-4953:
--

Release Note:   (was: BUILD FAILED
/home/edward/Documents/java/hive-trunk/build.xml:233: 
java.net.UnknownHostException: repo2.maven.org

Both ant -Doffline=true and eclipse no longer can build offline)

 Regression: Hive does not build offline anymore
 ---

 Key: HIVE-4953
 URL: https://issues.apache.org/jira/browse/HIVE-4953
 Project: Hive
  Issue Type: Bug
Reporter: Edward Capriolo



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2013-07-29 Thread Xuefu Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xuefu Zhang reassigned HIVE-3976:
-

Assignee: Xuefu Zhang

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang

 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type

2013-07-29 Thread Xuefu Zhang (JIRA)


[ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723365#comment-13723365
 ] 

Xuefu Zhang commented on HIVE-3976:
---

I have started working at this issue. Any comments or suggestions are welcome. 
Thanks.

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang

 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Tez branch and tez based patches

2013-07-29 Thread Edward Capriolo

At ~25:00

There is a working prototype of hive which is using tez as the targeted
runtime

Can I get a look at that code? Is it on github?

Edward


On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote:

 Answers to some of your questions inlined.

 Alan.

 On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:

  There are some points I want to bring up. First, I am on the PMC. Here is
  something I find relevant:
 
  http://www.apache.org/foundation/how-it-works.html
 
  --
 
  The role of the PMC from a Foundation perspective is oversight. The main
  role of the PMC is not code and not coding - but to ensure that all legal
  issues are addressed, that procedure is followed, and that each and every
  release is the product of the community as a whole. That is key to our
  litigation protection mechanisms.
 
  Secondly the role of the PMC is to further the long term development and
  health of the community as a whole, and to ensure that balanced and wide
  scale peer review and collaboration does happen. Within the ASF we worry
  about any community which centers around a few individuals who are
 working
  virtually uncontested. We believe that this is detrimental to quality,
  stability, and robustness of both code and long term social structures.
 
  
 
 
 https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
 
  -
 
  All other decisions happen on the dev list, discussions on the private
 list
  are kept to a minimum.
 
  If it didn't happen on the dev list, it didn't happen - which leads to:
 
  a) Elections of committers and PMC members are published on the dev list
  once finalized.
 
  b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
  soon as they have impact on the project, code or community.
  -
 
  https://issues.apache.org/jira/browse/HIVE-4660 ironically titled Let
  their be Tez has not be +1 ed by any committer. It was never discussed
 on
  the dev or the user list (as far as I can tell).

 As all JIRA creations and updates are sent to dev@hive, creating a JIRA
 is de facto posting to the list.

 
  As a PMC member I feel we need more discussion on Tez on the dev list
 along
  with a wiki-fied design document. Topics of discussion should include:

 I talked with Gunther and he's working on posting a design doc on the
 wiki.  He has a PDF on the JIRA but he doesn't have write permissions yet
 on the wiki.

 
  1) What is tez?
 In Hadoop 2.0, YARN opens up the ability to have multiple execution
 frameworks in Hadoop.  Hadoop apps are no longer tied to MapReduce as the
 only execution option.  Tez is an effort to build an execution engine that
 is optimized for relational data processing, such as Hive and Pig.

 The biggest change here is to move away from only Map and Reduce as
 processing options and to allow alternate combinations of processing, such
 as map - reduce - reduce or tasks that take multiple inputs or shuffles
 that avoid sorting when it isn't needed.

 For a good intro to Tez, see Arun's presentation on it at the recent
 Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides
 http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212)
 
  2) How is tez different from oozie, http://code.google.com/p/hop/,
  http://cs.brown.edu/~backman/cmr.html , and other DAG and or streaming
 map
  reduce tools/frameworks? Why should we use this and not those?

 Oozie is a completely different thing.  Oozie is a workflow engine and a
 scheduler.  It's core competencies are the ability to coordinate workflows
 of disparate job types (MR, Pig, Hive, etc.) and to schedule them.  It is
 not intended as an execution engine for apps such as Pig and Hive.

 I am not familiar with these other engines, but the short answer is that
 Tez is built to work on YARN, which works well for Hive since it is tied to
 Hadoop.
 
  3) When can we expect the first tez release?
 I don't know, but I hope sometime this fall.

 
  4) How much effort is involved in integrating hive and tez?
 Covered in the design doc.

 
  5) Who is ready to commit to this effort?
 I'll let people speak for themselves on that one.

 
  6) can we expect this work to be done in one hive release?
 Unlikely.  Initial integration will be done in one release, but as Tez is
 a new project I expect it will be adding features in the future that Hive
 will want to take advantage of.

 
  In my opinion we should not start any work on this tez-hive until these
  questions are answered to the satisfaction of the hive developers.

 Can we change this to not commit patches?  We can't tell willing people
 not to work on it.
 
 
 
 
 
 
 
 
  On Mon, Jul 15, 2013 at 9:51 PM, Edward Capriolo edlinuxg...@gmail.com
 wrote:
 
 
  The Hive bylaws,
  https://cwiki.apache.org/confluence/display/Hive/Bylaws , lay out what
  votes are needed for

[jira] [Commented] (HIVE-3976) Support specifying scale and precision with Hive decimal type


[ 
https://issues.apache.org/jira/browse/HIVE-3976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723372#comment-13723372
 ] 

Edward Capriolo commented on HIVE-3976:
---

We currently do not have qualifiers (20,2) for types in the hive language. This 
sounds like a fairly involved change I am very curious how they will interact 
with the already existing system.

 Support specifying scale and precision with Hive decimal type
 -

 Key: HIVE-3976
 URL: https://issues.apache.org/jira/browse/HIVE-3976
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, Types
Reporter: Mark Grover
Assignee: Xuefu Zhang

 HIVE-2693 introduced support for Decimal datatype in Hive. However, the 
 current implementation has unlimited precision and provides no way to specify 
 precision and scale when creating the table.
 For example, MySQL allows users to specify scale and precision of the decimal 
 datatype when creating the table:
 {code}
 CREATE TABLE numbers (a DECIMAL(20,2));
 {code}
 Hive should support something similar too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: Tez branch and tez based patches

2013-07-29 Thread Edward Capriolo

Also watched http://www.ustream.tv/recorded/36323173

I definitely see the win in being able to stream inter-stage output.

I see some cases where small intermediate results can be kept In memory.
But I was somewhat under the impression that the map reduce spill settings
kept stuff in memory, isn't that what spill settings are?

There is a few bullet points that came up repeatedly that I do not follow:

Something was said to the effect of Container reuse makes X faster.
Hadoop has jvm reuse. Not following what the difference is here? Not
everyone has a 10K node cluster.

Joins in map reduce are hard Really? I mean some of them are I guess, but
the typical join is very easy. Just shuffle by the join key. There was not
really enough low level details here saying why joins are better in tez.

Chosing the number of maps and reduces is hard Really? I do not find it
that hard, I think there are times when it's not perfect but I do not find
it hard. The talk did not really offer anything here technical on how tez
makes this better other then it could make it better.

The presentations mentioned streaming data, how do two nodes stream data
between a tasks and how it it reliable? If the sender or receiver dies does
the entire process have to start again?

Again one of the talks implied there is a prototype out there that launches
hive jobs into tez. I would like to see that, it might answer more
questions then a power point, and I could profile some common queries.

Random late night thoughts over,
Ed






On Tue, Jul 30, 2013 at 12:02 AM, Edward Capriolo edlinuxg...@gmail.comwrote:

 At ~25:00

 There is a working prototype of hive which is using tez as the targeted
 runtime

 Can I get a look at that code? Is it on github?

 Edward


 On Wed, Jul 17, 2013 at 3:35 PM, Alan Gates ga...@hortonworks.com wrote:

 Answers to some of your questions inlined.

 Alan.

 On Jul 16, 2013, at 10:20 PM, Edward Capriolo wrote:

  There are some points I want to bring up. First, I am on the PMC. Here
 is
  something I find relevant:
 
  http://www.apache.org/foundation/how-it-works.html
 
  --
 
  The role of the PMC from a Foundation perspective is oversight. The main
  role of the PMC is not code and not coding - but to ensure that all
 legal
  issues are addressed, that procedure is followed, and that each and
 every
  release is the product of the community as a whole. That is key to our
  litigation protection mechanisms.
 
  Secondly the role of the PMC is to further the long term development and
  health of the community as a whole, and to ensure that balanced and wide
  scale peer review and collaboration does happen. Within the ASF we worry
  about any community which centers around a few individuals who are
 working
  virtually uncontested. We believe that this is detrimental to quality,
  stability, and robustness of both code and long term social structures.
 
  
 
 
 https://blogs.apache.org/comdev/entry/what_makes_apache_projects_different
 
  -
 
  All other decisions happen on the dev list, discussions on the private
 list
  are kept to a minimum.
 
  If it didn't happen on the dev list, it didn't happen - which leads
 to:
 
  a) Elections of committers and PMC members are published on the dev list
  once finalized.
 
  b) Out-of-band discussions (IRC etc.) are summarized on the dev list as
  soon as they have impact on the project, code or community.
  -
 
  https://issues.apache.org/jira/browse/HIVE-4660 ironically titled Let
  their be Tez has not be +1 ed by any committer. It was never discussed
 on
  the dev or the user list (as far as I can tell).

 As all JIRA creations and updates are sent to dev@hive, creating a JIRA
 is de facto posting to the list.

 
  As a PMC member I feel we need more discussion on Tez on the dev list
 along
  with a wiki-fied design document. Topics of discussion should include:

 I talked with Gunther and he's working on posting a design doc on the
 wiki.  He has a PDF on the JIRA but he doesn't have write permissions yet
 on the wiki.

 
  1) What is tez?
 In Hadoop 2.0, YARN opens up the ability to have multiple execution
 frameworks in Hadoop.  Hadoop apps are no longer tied to MapReduce as the
 only execution option.  Tez is an effort to build an execution engine that
 is optimized for relational data processing, such as Hive and Pig.

 The biggest change here is to move away from only Map and Reduce as
 processing options and to allow alternate combinations of processing, such
 as map - reduce - reduce or tasks that take multiple inputs or shuffles
 that avoid sorting when it isn't needed.

 For a good intro to Tez, see Arun's presentation on it at the recent
 Hadoop summit (video http://www.youtube.com/watch?v=9ZLLzlsz7h8 slides
 http://www.slideshare.net/Hadoop_Summit/murhty-saha-june26255pmroom212)
 
  2) How is tez different from oozie,

[jira] [Commented] (HIVE-4838) Refactor MapJoin HashMap code to improve testability and readability


[ 
https://issues.apache.org/jira/browse/HIVE-4838?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723403#comment-13723403
 ] 

Edward Capriolo commented on HIVE-4838:
---

Hey, I think I may have mistakenly come to the conclusion that 
https://issues.apache.org/jira/browse/HIVE-2906
Passed tests when it did not. We might be best off reverting 2906 if it is a 
problem.


 Refactor MapJoin HashMap code to improve testability and readability
 

 Key: HIVE-4838
 URL: https://issues.apache.org/jira/browse/HIVE-4838
 Project: Hive
  Issue Type: Bug
Reporter: Brock Noland
Assignee: Brock Noland
 Attachments: HIVE-4838.patch, HIVE-4838.patch, HIVE-4838.patch, 
 HIVE-4838.patch, HIVE-4838.patch


 MapJoin is an essential component for high performance joins in Hive and the 
 current code has done great service for many years. However, the code is 
 showing it's age and currently suffers  from the following issues:
 * Uses static state via the MapJoinMetaData class to pass serialization 
 metadata to the Key, Row classes.
 * The api of a logical Table Container is not defined and therefore it's 
 unclear what apis HashMapWrapper 
 needs to publicize. Additionally HashMapWrapper has many used public methods.
 * HashMapWrapper contains logic to serialize, test memory bounds, and 
 implement the table container. Ideally these logical units could be seperated
 * HashTableSinkObjectCtx has unused fields and unused methods
 * CommonJoinOperator and children use ArrayList on left hand side when only 
 List is required
 * There are unused classes MRU, DCLLItemm and classes which duplicate 
 functionality MapJoinSingleKey and MapJoinDoubleKeys

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

[
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723407#comment-13723407
]

Yin Huai commented on HIVE-4952:

To fix this bug, Demux will be modified to be aware that rows associated with a
key are ordered by the tag. When Demux see a row with new tag coming, it will
know that rows with tags which are less than this incoming tag can be processed.

Taking the example in the description, with this fix, inputs of JOIN2 will be
ordered by the tag. When Demux sees a tag with 1, it will ask GBY to process
its buffer, and then GBY will ask JOIN1 to process its buffer. Before Demux
forwards a new row with the tag of 1 to JOIN2, all rows with the tag of 0 will
be forwarded into JOIN2.

When hive.join.emit.interval is small, queries optimized by Correlation
Optimizer may generate wrong results

Key: HIVE-4952
URL: https://issues.apache.org/jira/browse/HIVE-4952
Project: Hive
Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
Attachments: replay.txt

If we have a query like this ...
{code:sql}
SELECT xx.key, xx.cnt, yy.key
FROM
(SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key =
y.key) group by x.key) xx
JOIN src yy
ON xx.key=yy.key;
{\code}
After Correlation Optimizer, the operator tree in the reducer will be
{code}
JOIN2
|
|
MUX
/ \
/ \
GBY |
| |
JOIN1|
\ /
\ /
DEMUX
{\code}
For JOIN2, the right table will arrive at this operator first. If
hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even
it has not got any row from the left table. The logic related
hive.join.emit.interval in JoinOperator assumes that inputs will be ordered
by the tag. But, if a query has been optimized by Correlation Optimizer, this
assumption may not hold for those JoinOperators inside the reducer.

[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability


 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4843:
-

Attachment: HIVE-4843.4.patch

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, 
 HIVE-4843.4.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability


 [ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vikram Dixit K updated HIVE-4843:
-

Status: Patch Available  (was: Open)

Latest iteration after addressing comments.

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, 
 HIVE-4843.4.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results

2013-07-29 Thread Phabricator (JIRA)


 [ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-4952:
--

Attachment: HIVE-4952.D11889.1.patch

yhuai requested code review of HIVE-4952 [jira] When hive.join.emit.interval 
is small, queries optimized by Correlation Optimizer may generate wrong 
results.

Reviewers: JIRA

fix

If we have a query like this ...

SELECT xx.key, xx.cnt, yy.key
FROM
(SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
y.key) group by x.key) xx
JOIN src yy
ON xx.key=yy.key;

After Correlation Optimizer, the operator tree in the reducer will be

 JOIN2
   |
   |
  MUX
 /   \
/ \
   GBY |
|  |
  JOIN1|
\ /
 \   /
 DEMUX

For JOIN2, the right table will arrive at this operator first. If 
hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even it 
has not got any row from the left table. The logic related 
hive.join.emit.interval in JoinOperator assumes that inputs will be ordered by 
the tag. But, if a query has been optimized by Correlation Optimizer, this 
assumption may not hold for those JoinOperators inside the reducer.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D11889

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/DemuxOperator.java
  
ql/src/java/org/apache/hadoop/hive/ql/optimizer/correlation/QueryPlanTreeTransformation.java
  ql/src/test/queries/clientpositive/correlationoptimizer15.q
  ql/src/test/results/clientpositive/correlationoptimizer15.q.out

MANAGE HERALD RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/28311/

To: JIRA, yhuai


 When hive.join.emit.interval is small, queries optimized by Correlation 
 Optimizer may generate wrong results
 

 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4952.D11889.1.patch, replay.txt


 If we have a query like this ...
 {code:sql}
 SELECT xx.key, xx.cnt, yy.key
 FROM
 (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
 y.key) group by x.key) xx
 JOIN src yy
 ON xx.key=yy.key;
 {\code}
 After Correlation Optimizer, the operator tree in the reducer will be 
 {code}
  JOIN2
|
|
   MUX
  /   \
 / \
GBY |
 |  |
   JOIN1|
 \ /
  \   /
  DEMUX
 {\code}
 For JOIN2, the right table will arrive at this operator first. If 
 hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
 it has not got any row from the left table. The logic related 
 hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
 by the tag. But, if a query has been optimized by Correlation Optimizer, this 
 assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-4952) When hive.join.emit.interval is small, queries optimized by Correlation Optimizer may generate wrong results


 [ 
https://issues.apache.org/jira/browse/HIVE-4952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yin Huai updated HIVE-4952:
---

Status: Patch Available  (was: Open)

 When hive.join.emit.interval is small, queries optimized by Correlation 
 Optimizer may generate wrong results
 

 Key: HIVE-4952
 URL: https://issues.apache.org/jira/browse/HIVE-4952
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0
Reporter: Yin Huai
Assignee: Yin Huai
 Attachments: HIVE-4952.D11889.1.patch, replay.txt


 If we have a query like this ...
 {code:sql}
 SELECT xx.key, xx.cnt, yy.key
 FROM
 (SELECT x.key as key, count(1) as cnt FROM src1 x JOIN src1 y ON (x.key = 
 y.key) group by x.key) xx
 JOIN src yy
 ON xx.key=yy.key;
 {\code}
 After Correlation Optimizer, the operator tree in the reducer will be 
 {code}
  JOIN2
|
|
   MUX
  /   \
 / \
GBY |
 |  |
   JOIN1|
 \ /
  \   /
  DEMUX
 {\code}
 For JOIN2, the right table will arrive at this operator first. If 
 hive.join.emit.interval is small, e.g. 1, JOIN2 will output the results even 
 it has not got any row from the left table. The logic related 
 hive.join.emit.interval in JoinOperator assumes that inputs will be ordered 
 by the tag. But, if a query has been optimized by Correlation Optimizer, this 
 assumption may not hold for those JoinOperators inside the reducer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4843) Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and readability


[ 
https://issues.apache.org/jira/browse/HIVE-4843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723419#comment-13723419
 ] 

Edward Capriolo commented on HIVE-4843:
---

{code}
ListPath inputPaths = Utilities.getInputPaths(newJob, 
selectTask.getWork().getMapWork(), emptyScratchDir.toString(), ctx);
{code}

Can we remove any Path/File toString() and just pass the Path if possible?

 Refactoring MapRedTask and ExecDriver for better re-usability (for tez) and 
 readability
 ---

 Key: HIVE-4843
 URL: https://issues.apache.org/jira/browse/HIVE-4843
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.12.0, tez-branch
Reporter: Vikram Dixit K
Assignee: Vikram Dixit K
 Attachments: HIVE-4843.1.patch, HIVE-4843.2.patch, HIVE-4843.3.patch, 
 HIVE-4843.4.patch


 Currently, there are static apis in multiple locations in ExecDriver and 
 MapRedTask that can be leveraged if put in the already existing utility class 
 in the exec package. This would help making the code more maintainable, 
 readable and also re-usable by other run-time infra such as tez.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4950) Hive childSuspend is broken (debugging local hadoop jobs)


[ 
https://issues.apache.org/jira/browse/HIVE-4950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723421#comment-13723421
 ] 

Hive QA commented on HIVE-4950:
---



{color:red}Overall{color}: -1 at least one tests failed

Here are the results of testing the latest attachment:
https://issues.apache.org/jira/secure/attachment/12594755/HIVE-4950.patch

{color:red}ERROR:{color} -1 due to 1 failed/errored test(s), 2736 tests executed
*Failed tests:*
{noformat}
org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_serde_user_properties
{noformat}

Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/226/testReport
Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/226/console

Messages:
{noformat}
Executing org.apache.hive.ptest.execution.PrepPhase
Executing org.apache.hive.ptest.execution.ExecutionPhase
Executing org.apache.hive.ptest.execution.ReportingPhase
Tests failed with: TestsFailedException: 1 tests failed
{noformat}

This message is automatically generated.

 Hive childSuspend is broken (debugging local hadoop jobs)
 -

 Key: HIVE-4950
 URL: https://issues.apache.org/jira/browse/HIVE-4950
 Project: Hive
  Issue Type: Bug
Affects Versions: 0.11.0
Reporter: Laljo John Pullokkaran
Assignee: Laljo John Pullokkaran
 Fix For: 0.11.1

 Attachments: HIVE-4950.patch


 Hive debug has an option to suspend child JVMs, which seems to be broken 
 currently (--debug childSuspend=y). Note that this mode may be useful only 
 when running in local mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-4734) Use custom ObjectInspectors for AvroSerde