[jira] [Commented] (HIVE-3472) Build An Analytical SQL Engine for MapReduce

2012-10-12 Thread Shengsheng Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13474838#comment-13474838
 ] 

Shengsheng Huang commented on HIVE-3472:


@Lianhui Thanks for comment. I read the nexr's slides. And there's another 
interesting contribution about SQL window functions presented in Hadoop Summit 
2012. http://www.slideshare.net/Hadoop_Summit/analytical-queries-with-hive. 
It looks to me many people are aware that Hive needs improvement to better 
accomodate commmon OLAP requirements, and many of us actually share similar 
ideas about which areas to improve - for example, the SQL data type system, 
OLAP-oriented features (rank,rollup,window functions,etc.), nested  scalar 
subquery, and etc. 

It seemed nexr didn't open source their query planer (Hawk), which does the 
most SQL syntax transformation as I understand, on github (they did contributed 
a set of OLAP UDF implementations though). We would like to see Hive evolves 
faster to a better open source tool for OLAP analytics so that we opened this 
JIRA id to push this forward. And we're willing to contribute our efforts to 
open source. 


 Build An Analytical SQL Engine for MapReduce
 

 Key: HIVE-3472
 URL: https://issues.apache.org/jira/browse/HIVE-3472
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Shengsheng Huang
 Attachments: SQL-design.pdf


 While there are continuous efforts in extending Hive’s SQL support (e.g., see 
 some recent examples such as HIVE-2005 and HIVE-2810), many widely used SQL 
 constructs are still not supported in HiveQL, such as selecting from multiple 
 tables, subquery in WHERE clauses, etc.  
 We propose to build a SQL-92 full compatible engine (for MapReduce based 
 analytical query processing) as an extension to Hive. 
 The SQL frontend will co-exist with the HiveQL frontend; consequently, one 
 can  mix SQL and HiveQL statements in their queries (switching between HiveQL 
 mode and SQL-92 mode using a “hive.ql.mode” parameter before each query 
 statement). This way useful Hive extensions are still accessible to users. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3564) hivetest.py: revision number and applied patch

2012-10-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3564:
-

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed. Thanks Ivan

 hivetest.py: revision number and applied patch
 --

 Key: HIVE-3564
 URL: https://issues.apache.org/jira/browse/HIVE-3564
 Project: Hive
  Issue Type: Improvement
  Components: Testing Infrastructure
Reporter: Ivan Gorbachev
Assignee: Ivan Gorbachev
 Attachments: hive-3564.0.patch.txt


 It's required to add new option for hivetest.py which will allow to show base 
 revision number and applied patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3556) Test Path - Alias for explain extended

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475057#comment-13475057
 ] 

Namit Jain commented on HIVE-3556:
--

+1

 Test Path - Alias for explain extended
 -

 Key: HIVE-3556
 URL: https://issues.apache.org/jira/browse/HIVE-3556
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3556.patch.1


 Test framework masks output of Path - Alias for explain extended. This 
 makes it impossible to verify the output is right. 
 Design is to add a new entry Truncated Path - Alias to MapredWork. It has 
 the same content as Path - Alias except the prefix including file schema 
 and temp dir is removed. The following config will be used for prefix-removal:
 METASTOREWAREHOUSE(hive.metastore.warehouse.dir, /user/hive/warehouse),
 This will keep Path - Alias intact and also test it's result is right.
 The first use case is to verify list bucketing query's result is right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3276) optimize union sub-queries

2012-10-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3276:
-

Attachment: hive.3276.12.patch

 optimize union sub-queries
 --

 Key: HIVE-3276
 URL: https://issues.apache.org/jira/browse/HIVE-3276
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3276.10.patch, hive.3276.11.patch, 
 hive.3276.12.patch, HIVE-3276.1.patch, hive.3276.2.patch, hive.3276.3.patch, 
 hive.3276.4.patch, hive.3276.5.patch, hive.3276.6.patch, hive.3276.7.patch, 
 hive.3276.8.patch, hive.3276.9.patch


 It might be a good idea to optimize simple union queries containing 
 map-reduce jobs in at least one of the sub-qeuries.
 For eg:
 a query like:
 insert overwrite table T1 partition P1
 select * from 
 (
   subq1
 union all
   subq2
 ) u;
 today creates 3 map-reduce jobs, one for subq1, another for subq2 and 
 the final one for the union. 
 It might be a good idea to optimize this. Instead of creating the union 
 task, it might be simpler to create a move task (or something like a move
 task), where the outputs of the two sub-queries will be moved to the final 
 directory. This can easily extend to more than 2 sub-queries in the union.
 This is very useful if there is a select * followed by filesink after the
 union. This can be independently useful, and also be used to optimize the
 skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html.
 If there is a select, filter between the union and the filesink, the select
 and the filter can be moved before the union, and the follow-up job can
 still be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3276) optimize union sub-queries

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475058#comment-13475058
 ] 

Namit Jain commented on HIVE-3276:
--

@Carl, comments addressed.

 optimize union sub-queries
 --

 Key: HIVE-3276
 URL: https://issues.apache.org/jira/browse/HIVE-3276
 Project: Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.3276.10.patch, hive.3276.11.patch, 
 hive.3276.12.patch, HIVE-3276.1.patch, hive.3276.2.patch, hive.3276.3.patch, 
 hive.3276.4.patch, hive.3276.5.patch, hive.3276.6.patch, hive.3276.7.patch, 
 hive.3276.8.patch, hive.3276.9.patch


 It might be a good idea to optimize simple union queries containing 
 map-reduce jobs in at least one of the sub-qeuries.
 For eg:
 a query like:
 insert overwrite table T1 partition P1
 select * from 
 (
   subq1
 union all
   subq2
 ) u;
 today creates 3 map-reduce jobs, one for subq1, another for subq2 and 
 the final one for the union. 
 It might be a good idea to optimize this. Instead of creating the union 
 task, it might be simpler to create a move task (or something like a move
 task), where the outputs of the two sub-queries will be moved to the final 
 directory. This can easily extend to more than 2 sub-queries in the union.
 This is very useful if there is a select * followed by filesink after the
 union. This can be independently useful, and also be used to optimize the
 skewed joins https://cwiki.apache.org/Hive/skewed-join-optimization.html.
 If there is a select, filter between the union and the filesink, the select
 and the filter can be moved before the union, and the follow-up job can
 still be removed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3433) Implement CUBE and ROLLUP operators in Hive

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475060#comment-13475060
 ] 

Namit Jain commented on HIVE-3433:
--

[~kevinwilfong], addressed comments

 Implement CUBE and ROLLUP operators in Hive
 ---

 Key: HIVE-3433
 URL: https://issues.apache.org/jira/browse/HIVE-3433
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Sambavi Muthukrishnan
Assignee: Namit Jain
 Attachments: hive.3433.1.patch, hive.3433.2.patch, hive.3433.3.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3472) Build An Analytical SQL Engine for MapReduce

2012-10-12 Thread alex gemini (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475080#comment-13475080
 ] 

alex gemini commented on HIVE-3472:
---

It's seems there're two unrelated issues we discussed here,one for analytic 
function,one for compatible for sql-92.
for analytic features,I think it's OK to built a new parser since there is no 
standard,special vendor has some special grammar, we can just treat it like a 
new framework with a parameter to control whether enable it or not,just like 
HIVE-896 did, we open a new session with special shell command,new feature with 
lots of code integration into hive is always change many thing like 
metastore,execution engine,explain tree etc. I guess it's ok to have a 
parameter switch.  But why we are using pl/sql parser, what exactly analytical 
features we're talking about? for sql-92 compatible issue we should discussed 
in HIVE-3561.

 Build An Analytical SQL Engine for MapReduce
 

 Key: HIVE-3472
 URL: https://issues.apache.org/jira/browse/HIVE-3472
 Project: Hive
  Issue Type: New Feature
Affects Versions: 0.10.0
Reporter: Shengsheng Huang
 Attachments: SQL-design.pdf


 While there are continuous efforts in extending Hive’s SQL support (e.g., see 
 some recent examples such as HIVE-2005 and HIVE-2810), many widely used SQL 
 constructs are still not supported in HiveQL, such as selecting from multiple 
 tables, subquery in WHERE clauses, etc.  
 We propose to build a SQL-92 full compatible engine (for MapReduce based 
 analytical query processing) as an extension to Hive. 
 The SQL frontend will co-exist with the HiveQL frontend; consequently, one 
 can  mix SQL and HiveQL statements in their queries (switching between HiveQL 
 mode and SQL-92 mode using a “hive.ql.mode” parameter before each query 
 statement). This way useful Hive extensions are still accessible to users. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3561) Build a full SQL-compliant parser for Hive

2012-10-12 Thread alex gemini (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475092#comment-13475092
 ] 

alex gemini commented on HIVE-3561:
---

why we need a full sql-compliant parser? what exactly feature we're talking 
about?sql-92 has some features likes Temporary tables,call level 
interface,Scrolling cursors etc.IMO Maybe we should discuss individual feature 
instead of discussing whether we need a new parser or not.

 Build a full SQL-compliant parser for Hive
 --

 Key: HIVE-3561
 URL: https://issues.apache.org/jira/browse/HIVE-3561
 Project: Hive
  Issue Type: Sub-task
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Shengsheng Huang

 To build a full SQL compliant engine on Hive, we'll need a full SQL complant 
 parser. The current Hive parser missed a lot of grammar units from standard 
 SQL. To support full SQL there're possibly four approaches:
 1.Extend the existing Hive parser to support full SQL constructs. We need to 
 modify the current Hive.g and add any missing grammar units and resolve 
 conflicts. 
 2.Reuse an existing open source SQL compliant parser and extend it to support 
 Hive extensions. We may need to adapt Semantic Analyzers to the new AST 
 structure.  
 3.Reuse an existing SQL compliant parser and make it co-exist with the 
 existing Hive parser. Both parsers share the same CliDriver interface. Use a 
 query mode configuration to switch the query mode between SQL and HQL (this 
 is the approach we're now using in the 0.9.0 demo project)
 4.Reuse an existing SQL compliant parser and make it co-exist with the 
 existing Hive parser. Use a separate xxxCliDriver interface for standard SQL. 
  
 Let's discuss which is the best approach. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3556) Test Path - Alias for explain extended

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475104#comment-13475104
 ] 

Namit Jain commented on HIVE-3556:
--

[~gangtimliu], can you load the new patch ?

 Test Path - Alias for explain extended
 -

 Key: HIVE-3556
 URL: https://issues.apache.org/jira/browse/HIVE-3556
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3556.patch.1


 Test framework masks output of Path - Alias for explain extended. This 
 makes it impossible to verify the output is right. 
 Design is to add a new entry Truncated Path - Alias to MapredWork. It has 
 the same content as Path - Alias except the prefix including file schema 
 and temp dir is removed. The following config will be used for prefix-removal:
 METASTOREWAREHOUSE(hive.metastore.warehouse.dir, /user/hive/warehouse),
 This will keep Path - Alias intact and also test it's result is right.
 The first use case is to verify list bucketing query's result is right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HIVE-3556) Test Path - Alias for explain extended

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475104#comment-13475104
 ] 

Namit Jain edited comment on HIVE-3556 at 10/12/12 4:14 PM:


[~gangtimliu], can you load the new patch file ?

  was (Author: namit):
[~gangtimliu], can you load the new patch ?
  
 Test Path - Alias for explain extended
 -

 Key: HIVE-3556
 URL: https://issues.apache.org/jira/browse/HIVE-3556
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3556.patch.1


 Test framework masks output of Path - Alias for explain extended. This 
 makes it impossible to verify the output is right. 
 Design is to add a new entry Truncated Path - Alias to MapredWork. It has 
 the same content as Path - Alias except the prefix including file schema 
 and temp dir is removed. The following config will be used for prefix-removal:
 METASTOREWAREHOUSE(hive.metastore.warehouse.dir, /user/hive/warehouse),
 This will keep Path - Alias intact and also test it's result is right.
 The first use case is to verify list bucketing query's result is right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3556) Test Path - Alias for explain extended

2012-10-12 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3556:
---

Attachment: HIVE-3556.patch.2

 Test Path - Alias for explain extended
 -

 Key: HIVE-3556
 URL: https://issues.apache.org/jira/browse/HIVE-3556
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3556.patch.1, HIVE-3556.patch.2


 Test framework masks output of Path - Alias for explain extended. This 
 makes it impossible to verify the output is right. 
 Design is to add a new entry Truncated Path - Alias to MapredWork. It has 
 the same content as Path - Alias except the prefix including file schema 
 and temp dir is removed. The following config will be used for prefix-removal:
 METASTOREWAREHOUSE(hive.metastore.warehouse.dir, /user/hive/warehouse),
 This will keep Path - Alias intact and also test it's result is right.
 The first use case is to verify list bucketing query's result is right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3556) Test Path - Alias for explain extended

2012-10-12 Thread Gang Tim Liu (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475160#comment-13475160
 ] 

Gang Tim Liu commented on HIVE-3556:


Yes, just load one. thanks


 Test Path - Alias for explain extended
 -

 Key: HIVE-3556
 URL: https://issues.apache.org/jira/browse/HIVE-3556
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu
 Attachments: HIVE-3556.patch.1, HIVE-3556.patch.2


 Test framework masks output of Path - Alias for explain extended. This 
 makes it impossible to verify the output is right. 
 Design is to add a new entry Truncated Path - Alias to MapredWork. It has 
 the same content as Path - Alias except the prefix including file schema 
 and temp dir is removed. The following config will be used for prefix-removal:
 METASTOREWAREHOUSE(hive.metastore.warehouse.dir, /user/hive/warehouse),
 This will keep Path - Alias intact and also test it's result is right.
 The first use case is to verify list bucketing query's result is right.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3572) Expressions in distribute/cluster/order/sort by used to work sometimes

2012-10-12 Thread Kevin Wilfong (JIRA)
Kevin Wilfong created HIVE-3572:
---

 Summary: Expressions in distribute/cluster/order/sort by used to 
work sometimes
 Key: HIVE-3572
 URL: https://issues.apache.org/jira/browse/HIVE-3572
 Project: Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain


This query used to work
explain select key from src distribute by (key + 50);

But after HIVE-3268 it fails in compilation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3573) Wrap logic introduced in HIVE-3268 in a config

2012-10-12 Thread Kevin Wilfong (JIRA)
Kevin Wilfong created HIVE-3573:
---

 Summary: Wrap logic introduced in HIVE-3268 in a config
 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain


This patch introduces some code which can fundamentally break 
distribute/order/cluster/sort by if there is a bug (as was found).  We should 
add a config around this code so administrators can quickly turn it off if 
needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3570) Add/fix facility to collect operator specific statisticsin hive + add hash-in/hash-out counter for GroupBy Optr

2012-10-12 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3570:
--

Attachment: HIVE-3570.D5985.1.patch

satadru added you to the CC list for the revision HIVE-3570 [jira] Hive 
changes for Optr level stats.
Reviewers: njain

TEST PLAN
  Single box testing

REVISION DETAIL
  https://reviews.facebook.net/D5985

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java
  ql/src/java/org/apache/hadoop/hive/ql/exec/Operator.java

To: JIRA


 Add/fix facility to collect operator specific statisticsin hive + add 
 hash-in/hash-out counter for GroupBy Optr
 ---

 Key: HIVE-3570
 URL: https://issues.apache.org/jira/browse/HIVE-3570
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.9.0
Reporter: Satadru Pan
Assignee: Satadru Pan
Priority: Minor
 Attachments: HIVE-3570.1.patch.txt, HIVE-3570.D5985.1.patch


 Requirement: Collect Operator specific stats for hive queries. Use the 
 counter framework available in Hive Operator.java to accomplish that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Work started] (HIVE-3570) Add/fix facility to collect operator specific statisticsin hive + add hash-in/hash-out counter for GroupBy Optr

2012-10-12 Thread Satadru Pan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HIVE-3570 started by Satadru Pan.

 Add/fix facility to collect operator specific statisticsin hive + add 
 hash-in/hash-out counter for GroupBy Optr
 ---

 Key: HIVE-3570
 URL: https://issues.apache.org/jira/browse/HIVE-3570
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.9.0
Reporter: Satadru Pan
Assignee: Satadru Pan
Priority: Minor
 Attachments: HIVE-3570.1.patch.txt, HIVE-3570.D5985.1.patch


 Requirement: Collect Operator specific stats for hive queries. Use the 
 counter framework available in Hive Operator.java to accomplish that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3570) Add/fix facility to collect operator specific statisticsin hive + add hash-in/hash-out counter for GroupBy Optr

2012-10-12 Thread Satadru Pan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Satadru Pan updated HIVE-3570:
--

Status: Patch Available  (was: In Progress)

 Add/fix facility to collect operator specific statisticsin hive + add 
 hash-in/hash-out counter for GroupBy Optr
 ---

 Key: HIVE-3570
 URL: https://issues.apache.org/jira/browse/HIVE-3570
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.9.0
Reporter: Satadru Pan
Assignee: Satadru Pan
Priority: Minor
 Attachments: HIVE-3570.1.patch.txt, HIVE-3570.D5985.1.patch


 Requirement: Collect Operator specific stats for hive queries. Use the 
 counter framework available in Hive Operator.java to accomplish that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3573) Wrap logic introduced in HIVE-3268 in a config

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475277#comment-13475277
 ] 

Namit Jain commented on HIVE-3573:
--

https://reviews.facebook.net/D6015

 Wrap logic introduced in HIVE-3268 in a config
 --

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain

 This patch introduces some code which can fundamentally break 
 distribute/order/cluster/sort by if there is a bug (as was found).  We should 
 add a config around this code so administrators can quickly turn it off if 
 needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3573) Wrap logic introduced in HIVE-3268 in a config

2012-10-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3573:
-

Attachment: hive.3573.1.patch

 Wrap logic introduced in HIVE-3268 in a config
 --

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain
 Attachments: hive.3573.1.patch


 This patch introduces some code which can fundamentally break 
 distribute/order/cluster/sort by if there is a bug (as was found).  We should 
 add a config around this code so administrators can quickly turn it off if 
 needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3573) Revert HIVE-3268

2012-10-12 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3573:


Summary: Revert HIVE-3268  (was: Wrap logic introduced in HIVE-3268 in a 
config)

 Revert HIVE-3268
 

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain
 Attachments: hive.3573.1.patch


 This patch introduces some code which can fundamentally break 
 distribute/order/cluster/sort by if there is a bug (as was found).  We should 
 add a config around this code so administrators can quickly turn it off if 
 needed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3573) Revert HIVE-3268

2012-10-12 Thread Kevin Wilfong (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Wilfong updated HIVE-3573:


Description: This patch introduces some code which can breaks 
distribute/order/cluster/sort by.  We should revert this code until it can be 
fixed (HIVE-3572).  (was: This patch introduces some code which can 
fundamentally break distribute/order/cluster/sort by if there is a bug (as was 
found).  We should add a config around this code so administrators can quickly 
turn it off if needed.)

 Revert HIVE-3268
 

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain
 Attachments: hive.3573.1.patch


 This patch introduces some code which can breaks 
 distribute/order/cluster/sort by.  We should revert this code until it can be 
 fixed (HIVE-3572).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3573) Revert HIVE-3268

2012-10-12 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475284#comment-13475284
 ] 

Kevin Wilfong commented on HIVE-3573:
-

If you plan to remove the config once the bug is fixed, we'd probably be better 
off just reverting it.

 Revert HIVE-3268
 

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain
 Attachments: hive.3573.1.patch


 This patch introduces some code which can breaks 
 distribute/order/cluster/sort by.  We should revert this code until it can be 
 fixed (HIVE-3572).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3073) Hive List Bucketing - DML support

2012-10-12 Thread Gang Tim Liu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gang Tim Liu updated HIVE-3073:
---

Summary: Hive List Bucketing - DML support   (was: Hive List Bucketing - 
DML support (single column/manual load))

 Hive List Bucketing - DML support 
 --

 Key: HIVE-3073
 URL: https://issues.apache.org/jira/browse/HIVE-3073
 Project: Hive
  Issue Type: New Feature
  Components: SQL
Reporter: Gang Tim Liu
Assignee: Gang Tim Liu

 If a hive table column has skewed keys, query performance on non-skewed key 
 is always impacted. Hive List Bucketing feature will address it:
 https://cwiki.apache.org/Hive/listbucketing.html
 This jira issue will track DML change for the feature:
 1. single skewed column
 2. manual load data

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3573) Revert HIVE-3268

2012-10-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3573:
-

Attachment: hive.3573.2.patch

 Revert HIVE-3268
 

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain
 Attachments: hive.3573.1.patch, hive.3573.2.patch


 This patch introduces some code which can breaks 
 distribute/order/cluster/sort by.  We should revert this code until it can be 
 fixed (HIVE-3572).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3573) Revert HIVE-3268

2012-10-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3573:
-

Status: Patch Available  (was: Open)

 Revert HIVE-3268
 

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain
 Attachments: hive.3573.1.patch, hive.3573.2.patch


 This patch introduces some code which can breaks 
 distribute/order/cluster/sort by.  We should revert this code until it can be 
 fixed (HIVE-3572).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3574) Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using Hadoop BIN)

2012-10-12 Thread Jeremy A. Lucas (JIRA)
Jeremy A. Lucas created HIVE-3574:
-

 Summary: Allow Hive to Submit MapReduce jobs via the MapReduce API 
(instead of using Hadoop BIN)
 Key: HIVE-3574
 URL: https://issues.apache.org/jira/browse/HIVE-3574
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, SQL
Affects Versions: 0.9.0, 0.8.1, 0.8.0, 0.7.1, 0.7.0, 0.6.0, 0.5.0, 0.4.1, 
0.4.0, 0.3.0, 0.10.0, 0.9.1
 Environment: All environments would be affected by this
Reporter: Jeremy A. Lucas
Priority: Minor


Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
would allow for potentially greater compatibility across platforms, in addition 
to allowing for these jobs to be run easily against pseudo-clusters in tests 
(think MiniMRCluster).

This kind of change could involve something as simple as using a Hadoop 
Configuration object with a generic ToolRunner or something similar to run jobs.

Specifically, this kind of change would most likely occur in the execute() 
method of org.apache.hadoop.hive.ql.exec.MapRedTask (note that this class 
already has access to a JobConf object as well, which could serve in itself as 
a Configuration object).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3574) Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using Hadoop BIN)

2012-10-12 Thread Jeremy A. Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy A. Lucas updated HIVE-3574:
--

Description: 
Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
would allow for potentially greater compatibility across platforms, in addition 
to allowing for these jobs to be run easily against pseudo-clusters in tests 
(think MiniMRCluster).

This kind of change could involve something as simple as using a Hadoop 
Configuration object with a generic ToolRunner or something similar to run jobs.

Specifically, this kind of change would most likely occur in the execute() 
method of org.apache.hadoop.hive.ql.exec.MapRedTask.

  was:
Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
would allow for potentially greater compatibility across platforms, in addition 
to allowing for these jobs to be run easily against pseudo-clusters in tests 
(think MiniMRCluster).

This kind of change could involve something as simple as using a Hadoop 
Configuration object with a generic ToolRunner or something similar to run jobs.

Specifically, this kind of change would most likely occur in the execute() 
method of org.apache.hadoop.hive.ql.exec.MapRedTask (note that this class 
already has access to a JobConf object as well, which could serve in itself as 
a Configuration object).


 Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using 
 Hadoop BIN)
 ---

 Key: HIVE-3574
 URL: https://issues.apache.org/jira/browse/HIVE-3574
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, SQL
Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 
 0.8.1, 0.9.0, 0.10.0, 0.9.1
 Environment: All environments would be affected by this
Reporter: Jeremy A. Lucas
Priority: Minor
  Labels: feature, test

 Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
 would allow for potentially greater compatibility across platforms, in 
 addition to allowing for these jobs to be run easily against pseudo-clusters 
 in tests (think MiniMRCluster).
 This kind of change could involve something as simple as using a Hadoop 
 Configuration object with a generic ToolRunner or something similar to run 
 jobs.
 Specifically, this kind of change would most likely occur in the execute() 
 method of org.apache.hadoop.hive.ql.exec.MapRedTask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3570) Add/fix facility to collect operator specific statisticsin hive + add hash-in/hash-out counter for GroupBy Optr

2012-10-12 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475379#comment-13475379
 ] 

Phabricator commented on HIVE-3570:
---

njain has commented on the revision HIVE-3570 [jira] Hive changes for Optr 
level stats.

  Please add a test.

  Add a simple hook which prints the number of hashed rows.

INLINE COMMENTS
  ql/src/java/org/apache/hadoop/hive/ql/exec/GroupByOperator.java:1000 
NUM_INPUT_ROWS are present in every operator.
  You dont need it again.

REVISION DETAIL
  https://reviews.facebook.net/D5985

To: njain, satadru
Cc: JIRA


 Add/fix facility to collect operator specific statisticsin hive + add 
 hash-in/hash-out counter for GroupBy Optr
 ---

 Key: HIVE-3570
 URL: https://issues.apache.org/jira/browse/HIVE-3570
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.9.0
Reporter: Satadru Pan
Assignee: Satadru Pan
Priority: Minor
 Attachments: HIVE-3570.1.patch.txt, HIVE-3570.D5985.1.patch


 Requirement: Collect Operator specific stats for hive queries. Use the 
 counter framework available in Hive Operator.java to accomplish that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3570) Add/fix facility to collect operator specific statisticsin hive + add hash-in/hash-out counter for GroupBy Optr

2012-10-12 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3570:
-

Status: Open  (was: Patch Available)

Comments on phabricator

 Add/fix facility to collect operator specific statisticsin hive + add 
 hash-in/hash-out counter for GroupBy Optr
 ---

 Key: HIVE-3570
 URL: https://issues.apache.org/jira/browse/HIVE-3570
 Project: Hive
  Issue Type: Improvement
  Components: Statistics
Affects Versions: 0.9.0
Reporter: Satadru Pan
Assignee: Satadru Pan
Priority: Minor
 Attachments: HIVE-3570.1.patch.txt, HIVE-3570.D5985.1.patch


 Requirement: Collect Operator specific stats for hive queries. Use the 
 counter framework available in Hive Operator.java to accomplish that.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3574) Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using Hadoop BIN)

2012-10-12 Thread Jeremy A. Lucas (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy A. Lucas updated HIVE-3574:
--

Description: 
The current behavior of the MapRedTask is to start a process that invokes the 
hadoop jar command, passing each additional jobconf property as an argument 
to this Hadoop CLI.

Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
would allow for potentially greater compatibility across platforms, in addition 
to allowing for these jobs to be run easily against pseudo-clusters in tests 
(think MiniMRCluster).

This kind of change could involve something as simple as using a Hadoop 
Configuration object with a generic ToolRunner or something similar to run jobs.

Specifically, this kind of change would most likely occur in the execute() 
method of org.apache.hadoop.hive.ql.exec.MapRedTask.

  was:
Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
would allow for potentially greater compatibility across platforms, in addition 
to allowing for these jobs to be run easily against pseudo-clusters in tests 
(think MiniMRCluster).

This kind of change could involve something as simple as using a Hadoop 
Configuration object with a generic ToolRunner or something similar to run jobs.

Specifically, this kind of change would most likely occur in the execute() 
method of org.apache.hadoop.hive.ql.exec.MapRedTask.


 Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using 
 Hadoop BIN)
 ---

 Key: HIVE-3574
 URL: https://issues.apache.org/jira/browse/HIVE-3574
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, SQL
Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 
 0.8.1, 0.9.0, 0.10.0, 0.9.1
 Environment: All environments would be affected by this
Reporter: Jeremy A. Lucas
Priority: Minor
  Labels: feature, test

 The current behavior of the MapRedTask is to start a process that invokes the 
 hadoop jar command, passing each additional jobconf property as an argument 
 to this Hadoop CLI.
 Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
 would allow for potentially greater compatibility across platforms, in 
 addition to allowing for these jobs to be run easily against pseudo-clusters 
 in tests (think MiniMRCluster).
 This kind of change could involve something as simple as using a Hadoop 
 Configuration object with a generic ToolRunner or something similar to run 
 jobs.
 Specifically, this kind of change would most likely occur in the execute() 
 method of org.apache.hadoop.hive.ql.exec.MapRedTask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3518) QTestUtil side-effects

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475393#comment-13475393
 ] 

Namit Jain commented on HIVE-3518:
--

+1

 QTestUtil side-effects
 --

 Key: HIVE-3518
 URL: https://issues.apache.org/jira/browse/HIVE-3518
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure, Tests
Reporter: Ivan Gorbachev
Assignee: Navis
 Attachments: HIVE-3518.D5865.1.patch, HIVE-3518.D5865.2.patch, 
 metadata_export_drop.q


 It seems that QTestUtil has side-effects. This test 
 ([^metadata_export_drop.q]) causes failure of other tests on cleanup stage:
 {quote}
 Exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:845)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:821)
 at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:445)
 at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:300)
 at org.apache.hadoop.hive.cli.TestCliDriver.tearDown(TestCliDriver.java:87)
 at junit.framework.TestCase.runBare(TestCase.java:140)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
 at 
 org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
 Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:132)
 at 
 org.apache.hadoop.fs.ProxyFileSystem.swizzleParamPath(ProxyFileSystem.java:56)
 at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:214)
 at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1120)
 at 
 org.apache.hadoop.hive.ql.parse.MetaDataExportListener.export_meta_data(MetaDataExportListener.java:81)
 at 
 org.apache.hadoop.hive.ql.parse.MetaDataExportListener.onEvent(MetaDataExportListener.java:106)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1024)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table(HiveMetaStore.java:1185)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:566)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:839)
 ... 17 more
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 ... 28 more
 {quote}
 Flushing 'hive.metastore.pre.event.listeners' into empty string solves the 
 issue. During debugging I figured out this property wan't cleaned for other 
 tests after it was set in metadata_export_drop.q.
 How to reproduce:
 {code} ant test -Dtestcase=TestCliDriver -Dqfile=metadata_export_drop.q,some 
 test.q{code}
 where some test.q means any test which contains CREATE statement. For 
 example, sample10.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3518) QTestUtil side-effects

2012-10-12 Thread Phabricator (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475394#comment-13475394
 ] 

Phabricator commented on HIVE-3518:
---

njain has accepted the revision HIVE-3518 [jira] QTestUtil side-effects.

REVISION DETAIL
  https://reviews.facebook.net/D5865

BRANCH
  DPAL-1907

To: JIRA, njain, navis


 QTestUtil side-effects
 --

 Key: HIVE-3518
 URL: https://issues.apache.org/jira/browse/HIVE-3518
 Project: Hive
  Issue Type: Bug
  Components: Testing Infrastructure, Tests
Reporter: Ivan Gorbachev
Assignee: Navis
 Attachments: HIVE-3518.D5865.1.patch, HIVE-3518.D5865.2.patch, 
 metadata_export_drop.q


 It seems that QTestUtil has side-effects. This test 
 ([^metadata_export_drop.q]) causes failure of other tests on cleanup stage:
 {quote}
 Exception: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
 path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:845)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:821)
 at org.apache.hadoop.hive.ql.QTestUtil.cleanUp(QTestUtil.java:445)
 at org.apache.hadoop.hive.ql.QTestUtil.shutdown(QTestUtil.java:300)
 at org.apache.hadoop.hive.cli.TestCliDriver.tearDown(TestCliDriver.java:87)
 at junit.framework.TestCase.runBare(TestCase.java:140)
 at junit.framework.TestResult$1.protect(TestResult.java:110)
 at junit.framework.TestResult.runProtected(TestResult.java:128)
 at junit.framework.TestResult.run(TestResult.java:113)
 at junit.framework.TestCase.run(TestCase.java:124)
 at junit.framework.TestSuite.runTest(TestSuite.java:232)
 at junit.framework.TestSuite.run(TestSuite.java:227)
 at 
 org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
 at 
 org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:460)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:673)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:386)
 at 
 org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:196)
 Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at org.apache.hadoop.fs.Path.initialize(Path.java:140)
 at org.apache.hadoop.fs.Path.init(Path.java:132)
 at 
 org.apache.hadoop.fs.ProxyFileSystem.swizzleParamPath(ProxyFileSystem.java:56)
 at org.apache.hadoop.fs.ProxyFileSystem.mkdirs(ProxyFileSystem.java:214)
 at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:183)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1120)
 at 
 org.apache.hadoop.hive.ql.parse.MetaDataExportListener.export_meta_data(MetaDataExportListener.java:81)
 at 
 org.apache.hadoop.hive.ql.parse.MetaDataExportListener.onEvent(MetaDataExportListener.java:106)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table_core(HiveMetaStore.java:1024)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.drop_table(HiveMetaStore.java:1185)
 at 
 org.apache.hadoop.hive.metastore.HiveMetaStoreClient.dropTable(HiveMetaStoreClient.java:566)
 at org.apache.hadoop.hive.ql.metadata.Hive.dropTable(Hive.java:839)
 ... 17 more
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
 file:../build/ql/test/data/exports/HIVE-3427/src.2012-09-28-11-38-17
 at java.net.URI.checkPath(URI.java:1787)
 at java.net.URI.init(URI.java:735)
 at org.apache.hadoop.fs.Path.initialize(Path.java:137)
 ... 28 more
 {quote}
 Flushing 'hive.metastore.pre.event.listeners' into empty string solves the 
 issue. During debugging I figured out this property wan't cleaned for other 
 tests after it was set in metadata_export_drop.q.
 How to reproduce:
 {code} ant test -Dtestcase=TestCliDriver -Dqfile=metadata_export_drop.q,some 
 test.q{code}
 where some test.q means any test which contains CREATE statement. For 
 example, sample10.q

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HIVE-3471) Implement grouping sets/grouping_id in hive

2012-10-12 Thread Ivan Gorbachev (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ivan Gorbachev reassigned HIVE-3471:


Assignee: Ivan Gorbachev  (was: Namit Jain)

 Implement grouping sets/grouping_id in hive
 ---

 Key: HIVE-3471
 URL: https://issues.apache.org/jira/browse/HIVE-3471
 Project: Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain
Assignee: Ivan Gorbachev



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3574) Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using Hadoop BIN)

2012-10-12 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475405#comment-13475405
 ] 

Ashutosh Chauhan commented on HIVE-3574:


+1 on the idea. I have also been hit by this. Reliance on hadoop script to 
launch MR jobs is not cool.

 Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using 
 Hadoop BIN)
 ---

 Key: HIVE-3574
 URL: https://issues.apache.org/jira/browse/HIVE-3574
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, SQL
Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 
 0.8.1, 0.9.0, 0.10.0, 0.9.1
 Environment: All environments would be affected by this
Reporter: Jeremy A. Lucas
Priority: Minor
  Labels: feature, test

 The current behavior of the MapRedTask is to start a process that invokes the 
 hadoop jar command, passing each additional jobconf property as an argument 
 to this Hadoop CLI.
 Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
 would allow for potentially greater compatibility across platforms, in 
 addition to allowing for these jobs to be run easily against pseudo-clusters 
 in tests (think MiniMRCluster).
 This kind of change could involve something as simple as using a Hadoop 
 Configuration object with a generic ToolRunner or something similar to run 
 jobs.
 Specifically, this kind of change would most likely occur in the execute() 
 method of org.apache.hadoop.hive.ql.exec.MapRedTask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HIVE-3575) maintain dependency between views/view partitions and tables/partitions

2012-10-12 Thread Namit Jain (JIRA)
Namit Jain created HIVE-3575:


 Summary: maintain dependency between views/view partitions and 
tables/partitions
 Key: HIVE-3575
 URL: https://issues.apache.org/jira/browse/HIVE-3575
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Namit Jain




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3575) maintain dependency between views/view partitions and tables/partitions

2012-10-12 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475420#comment-13475420
 ] 

Namit Jain commented on HIVE-3575:
--

Hive supports both partitioned and unpartitioned views. Let us consider a 
specific example:
create table T (key string, value string) partitioned by (ds string, hr string);
insert overwrite table T partition (ds='1', hr='1') ...;
..
insert overwrite table T partition (ds='1', hr='24') ...;
T is a partitioned table by date and hour, and Tview is a view which 
conceptually denotes the table T partitioned by ds.
create view Tview (key string, value string) partitioned by (ds string) as 
select key, value, ds from T;
When all the hourly partitions are created for a day (ds='1'), the 
corresponding partition can be added to Tview
alter view Tview add partition (ds='1');
There is a implicit dependency between Tview@ds=1 and T@ds=1/hr=1, T@ds=1/hr=2, 
 T@ds=1/hr=24, but that dependency is not captured anywhere
in the metastore. It would be useful to explicitly create that dependency. This 
dependency can be used for all kinds of auditing purposes. 
The table's partition T@ds=1/hr=1 cannot be dropped unless the view partition 
Tview@ds=1 is dropped.



 maintain dependency between views/view partitions and tables/partitions
 ---

 Key: HIVE-3575
 URL: https://issues.apache.org/jira/browse/HIVE-3575
 Project: Hive
  Issue Type: New Feature
  Components: Metastore, Query Processor
Reporter: Namit Jain



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3573) Revert HIVE-3268

2012-10-12 Thread Kevin Wilfong (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475422#comment-13475422
 ] 

Kevin Wilfong commented on HIVE-3573:
-

+1 Thanks Namit

 Revert HIVE-3268
 

 Key: HIVE-3573
 URL: https://issues.apache.org/jira/browse/HIVE-3573
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.10.0
Reporter: Kevin Wilfong
Assignee: Namit Jain
 Attachments: hive.3573.1.patch, hive.3573.2.patch


 This patch introduces some code which can breaks 
 distribute/order/cluster/sort by.  We should revert this code until it can be 
 fixed (HIVE-3572).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3574) Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using Hadoop BIN)

2012-10-12 Thread Andrew Look (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475437#comment-13475437
 ] 

Andrew Look commented on HIVE-3574:
---

+1. I find it to be a much more robust and flexible approach to rely on what's 
in the classpath rather than what's installed on the system.

 Allow Hive to Submit MapReduce jobs via the MapReduce API (instead of using 
 Hadoop BIN)
 ---

 Key: HIVE-3574
 URL: https://issues.apache.org/jira/browse/HIVE-3574
 Project: Hive
  Issue Type: Improvement
  Components: Query Processor, SQL
Affects Versions: 0.3.0, 0.4.0, 0.4.1, 0.5.0, 0.6.0, 0.7.0, 0.7.1, 0.8.0, 
 0.8.1, 0.9.0, 0.10.0, 0.9.1
 Environment: All environments would be affected by this
Reporter: Jeremy A. Lucas
Priority: Minor
  Labels: feature, test

 The current behavior of the MapRedTask is to start a process that invokes the 
 hadoop jar command, passing each additional jobconf property as an argument 
 to this Hadoop CLI.
 Having Hive to submit generated jobs to an M/R cluster via the MapReduce API 
 would allow for potentially greater compatibility across platforms, in 
 addition to allowing for these jobs to be run easily against pseudo-clusters 
 in tests (think MiniMRCluster).
 This kind of change could involve something as simple as using a Hadoop 
 Configuration object with a generic ToolRunner or something similar to run 
 jobs.
 Specifically, this kind of change would most likely occur in the execute() 
 method of org.apache.hadoop.hive.ql.exec.MapRedTask.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-2935) Implement HiveServer2

2012-10-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-2935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475454#comment-13475454
 ] 

Alan Gates commented on HIVE-2935:
--

Carl, we'd like to start helping get these patches in shape for commit to the 
trunk.  If we start posting patches on top of the existing patches we'll have a 
mess.  Does it make sense to commit these quickly to a branch so we can 
collaborate and then merge them into trunk when they're solid?  

 Implement HiveServer2
 -

 Key: HIVE-2935
 URL: https://issues.apache.org/jira/browse/HIVE-2935
 Project: Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Carl Steinbach
Assignee: Carl Steinbach
  Labels: HiveServer2
 Attachments: beelinepositive.tar.gz, HIVE-2935.1.notest.patch.txt, 
 HIVE-2935.2.notest.patch.txt, HIVE-2935.2.nothrift.patch.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HIVE-3377) ant model-jar command fails in metastore

2012-10-12 Thread Krish (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475526#comment-13475526
 ] 

Krish commented on HIVE-3377:
-

If someone could share how Eclipse was setup, that would be great. I followed 
following steps but last command fails with error message get-test does not 
exists in the project hive - please help.

$ ant clean package eclipse-files
$ cd metastore
$ ant model-jar
$ cd ..
$ ant gen-test  --- Error with the command

 ant model-jar command fails in metastore
 

 Key: HIVE-3377
 URL: https://issues.apache.org/jira/browse/HIVE-3377
 Project: Hive
  Issue Type: Bug
  Components: Metastore
Affects Versions: 0.10.0
Reporter: Vandana Ayyalasomayajula
Priority: Minor
  Labels: build

 Running ant model-jar command to set up eclipse dev environment from the 
 following wiki:
 https://cwiki.apache.org/Hive/gettingstarted-eclipsesetup.html
 fails with the following message:
 BUILD FAILED
 **/workspace/hive-trunk/metastore/build.xml:22: The following error occurred 
 while executing this line:
 **/workspace/hive-trunk/build-common.xml:112: Problem: failed to create task 
 or type osfamily
 Cause: The name is undefined.
 Action: Check the spelling.
 Action: Check that any custom tasks/types have been declared.
 Action: Check that any presetdef/macrodef declarations have taken place.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira