[jira] Updated: (HIVE-352) Make Hive support column based storage

2009-04-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-352:
--

Attachment: hive-352-2009-4-17.patch

Fixed the select problem.
And refatored the TestRCFile class.

> Make Hive support column based storage
> --
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
> Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch, 
> Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700013#action_12700013
 ] 

Zheng Shao commented on HIVE-416:
-

@Raghu: I just had a second thought on that approach. The new production you 
add is left-recursive and it's not permitted in LL(k), but it's possible to use 
precedence rules to fix that.
However given all the change including the flattening, it seems to me that's 
too much work with very little benefit - who care about the optional brackets, 
for usage it's exactly the same.

For Venky's case, it's a separate problem. Venky's case is more like supporting 
"a" and "a". We should be able to support it easily once we allow 
omitting the sub query alias. 

> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-402) Create regexp_extract udf

2009-04-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1272#action_1272
 ] 

Namit Jain commented on HIVE-402:
-

1. Can you remove   LOG.warn("here please"); from evaluate 
2. Do you want to make the last parameter extractIndex optional ?

Otherwise, it looks good. 
Do we need to backport it in branch 3 also ?

> Create regexp_extract udf
> -
>
> Key: HIVE-402
> URL: https://issues.apache.org/jira/browse/HIVE-402
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
> Attachments: hive-402.1.patch
>
>
> This will allow users to extract substrings from a string based on a regular 
> expression.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-428) Implement Map-side Hash-Join in Hive

2009-04-16 Thread He Yongqiang (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699987#action_12699987
 ] 

He Yongqiang commented on HIVE-428:
---

ohh, sorry.
i am ok to close this for a duplicate or merge them.

> Implement Map-side Hash-Join in Hive
> 
>
> Key: HIVE-428
> URL: https://issues.apache.org/jira/browse/HIVE-428
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>
> There are many situations that join will perform much better if map side hash 
> join is used. We have a small test with a simple equal join of  two tables, 
> plain MR join with no map side hash join will execute about 50 seconds in a 
> 6-node cluster (each node 8core, 4G mem). With the mapside hash join is 
> applied, it only needs about 15 seconds.
> The map side hash join can only be used when there is small files, which can 
> be replicated to each map. The map side hash join can be coexeuted together 
> with the map-side filter.
> For example, 
> select A.a, A.c, B.b from A,B where A.a=B.d and A.a < 12 and B.b=10
> In our experiment, this statement can be translated into  three different 
> plans if both A and B are plain data file ( with no special compress).
> Plan 1
> Map-Reduce
> both A and B are input for the map. the shuffle data involved is very large.
> Plan 2
> 1) first filter B.b to a temp file B1 -- this is seperate Map only job
> 2) replicate B1 to each map when filter A and join them in the map
> no reduce is used
> Plan 3
> produce a job which's each mapper is filtering A (so the mapper is assigned 
> with regard to only A), and directly replicate B to each mapper
> Before each mapper is started filtering A, filter B and load passed B into 
> memory. And then start the mapper and join in the mem.
> Plan 3 performs better in our experiment because it saved a seperate map-only 
> job. But Plan2 is suitable for the situation when B's original file is very 
> large, but its filtered file is much small.
> This is the basic idea of Map side hash join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-428) Implement Map-side Hash-Join in Hive

2009-04-16 Thread Prasad Chakka (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699978#action_12699978
 ] 

Prasad Chakka commented on HIVE-428:


I think there is already a JIRA opened for this and some patch already exists...

https://issues.apache.org/jira/browse/HIVE-195

> Implement Map-side Hash-Join in Hive
> 
>
> Key: HIVE-428
> URL: https://issues.apache.org/jira/browse/HIVE-428
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
>
> There are many situations that join will perform much better if map side hash 
> join is used. We have a small test with a simple equal join of  two tables, 
> plain MR join with no map side hash join will execute about 50 seconds in a 
> 6-node cluster (each node 8core, 4G mem). With the mapside hash join is 
> applied, it only needs about 15 seconds.
> The map side hash join can only be used when there is small files, which can 
> be replicated to each map. The map side hash join can be coexeuted together 
> with the map-side filter.
> For example, 
> select A.a, A.c, B.b from A,B where A.a=B.d and A.a < 12 and B.b=10
> In our experiment, this statement can be translated into  three different 
> plans if both A and B are plain data file ( with no special compress).
> Plan 1
> Map-Reduce
> both A and B are input for the map. the shuffle data involved is very large.
> Plan 2
> 1) first filter B.b to a temp file B1 -- this is seperate Map only job
> 2) replicate B1 to each map when filter A and join them in the map
> no reduce is used
> Plan 3
> produce a job which's each mapper is filtering A (so the mapper is assigned 
> with regard to only A), and directly replicate B to each mapper
> Before each mapper is started filtering A, filter B and load passed B into 
> memory. And then start the mapper and join in the mem.
> Plan 3 performs better in our experiment because it saved a seperate map-only 
> job. But Plan2 is suitable for the situation when B's original file is very 
> large, but its filtered file is much small.
> This is the basic idea of Map side hash join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-428) Implement Map-side Hash-Join in Hive

2009-04-16 Thread He Yongqiang (JIRA)
Implement Map-side Hash-Join in Hive


 Key: HIVE-428
 URL: https://issues.apache.org/jira/browse/HIVE-428
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: He Yongqiang


There are many situations that join will perform much better if map side hash 
join is used. We have a small test with a simple equal join of  two tables, 
plain MR join with no map side hash join will execute about 50 seconds in a 
6-node cluster (each node 8core, 4G mem). With the mapside hash join is 
applied, it only needs about 15 seconds.

The map side hash join can only be used when there is small files, which can be 
replicated to each map. The map side hash join can be coexeuted together with 
the map-side filter.

For example, 
select A.a, A.c, B.b from A,B where A.a=B.d and A.a < 12 and B.b=10
In our experiment, this statement can be translated into  three different plans 
if both A and B are plain data file ( with no special compress).

Plan 1
Map-Reduce
both A and B are input for the map. the shuffle data involved is very large.

Plan 2
1) first filter B.b to a temp file B1 -- this is seperate Map only job
2) replicate B1 to each map when filter A and join them in the map
no reduce is used

Plan 3
produce a job which's each mapper is filtering A (so the mapper is assigned 
with regard to only A), and directly replicate B to each mapper
Before each mapper is started filtering A, filter B and load passed B into 
memory. And then start the mapper and join in the mem.

Plan 3 performs better in our experiment because it saved a seperate map-only 
job. But Plan2 is suitable for the situation when B's original file is very 
large, but its filtered file is much small.

This is the basic idea of Map side hash join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Raghotham Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699957#action_12699957
 ] 

Raghotham Murthy commented on HIVE-416:
---

How about the following:

Add the production:
{{{
expr -> expr ',' expr
}}}
With this production and expr -> '(' expr ')' we can support arbitrarily nested 
parentheses. The issue is that the new production will create a left-deep tree 
of comma-expressions. We could implement a method which takes such a tree and 
flatten out comma expressions into expression lists.

Also, I remember Venky asking for arbitrarily nested parentheses around queries 
for his query authoring tool. We could do something similar and create 
comma-query-expressions.

> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA_hive.427.1.patch_UNIT_TEST_FAILED

2009-04-16 Thread Murli Varadachari

ERROR: UNIT TEST using PATCH hive.427.1.patch FAILED!!

[junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED
BUILD FAILED


JIRA_HIVE-416.1.1.patch_UNIT_TEST_SUCCEEDED

2009-04-16 Thread Murli Varadachari

SUCCESS: BUILD AND UNIT TEST using PATCH HIVE-416.1.1.patch PASSED!!



[jira] Updated: (HIVE-427) configuration parameters missing in hive-default,.xml

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-427:


Status: Patch Available  (was: Open)

> configuration parameters missing in hive-default,.xml
> -
>
> Key: HIVE-427
> URL: https://issues.apache.org/jira/browse/HIVE-427
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.427.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-427) configuration parameters missing in hive-default,.xml

2009-04-16 Thread Namit Jain (JIRA)
configuration parameters missing in hive-default,.xml
-

 Key: HIVE-427
 URL: https://issues.apache.org/jira/browse/HIVE-427
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Namit Jain
Assignee: Namit Jain
 Attachments: hive.427.1.patch



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-427) configuration parameters missing in hive-default,.xml

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-427:


Attachment: hive.427.1.patch

> configuration parameters missing in hive-default,.xml
> -
>
> Key: HIVE-427
> URL: https://issues.apache.org/jira/browse/HIVE-427
> Project: Hadoop Hive
>  Issue Type: Bug
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.427.1.patch
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699898#action_12699898
 ] 

Zheng Shao commented on HIVE-416:
-

> About the comment on optional brackets - clearly these are optional in 
> expressions. So how do we support those expression e.g (a+b) and a+b are both 
> valid sql expressoins if we cannot support this without backtracking...

We do support "(a+b)" and "a+b".

The problem is that there is no easy way of supporting both "a))), b)" and 
"a, b". No matter what is k, it's not possible to determine whether the 
first "(" is the optional bracket for the expression list, or just part of the 
first expression.

I will need to go over the antlr book to know more about Semantic/Syntactic 
predicate to know whether that is possible.

> Identifier DOT Identifier.
Treating it as a lexical rule won't allow both T.a.b and a.b.  I am making a 
first Identifier a TOK_TABLE_OR_COL. I will let SemanticAnalyzer to decide 
whether it is a table name or column name. Not sure that should go into the 
same transaction or not since it's a much bigger change.


> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-416:


Attachment: HIVE-416.1.1.patch

Extracted common prefix for ALTER TABLE, and Removed all "{k=5}".

> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699873#action_12699873
 ] 

Ashish Thusoo commented on HIVE-416:


I think for Identifier DOT Identifier we should probably treat it as a lexical 
rule rather than a grammar rule. It will also make it much simpler to support 
optional aliasing with complex types. 

Right now

select T.a.b FROM T

and 

select  a.b FROM T

is very hard to handle in the SemanticAnalyzer as the grammar treats a as a 
table alias instead of a complex column name.


About the comment on optional brackets - clearly these are optional in 
expressions. So how do we support those expression e.g (a+b) and a+b are both 
valid sql expressoins if we cannot support this without backtracking...


> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Ashish Thusoo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699874#action_12699874
 ] 

Ashish Thusoo commented on HIVE-416:


Can we use semantic/syntactic predicates to support the optional brackets?

> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699867#action_12699867
 ] 

Namit Jain commented on HIVE-404:
-

Thats right - that's what genReduceSinkPlan does.

After the change, if a sorting/clustering column is present, a second 
map-reduce job will sort/cluster by those columns, so that we can get the 
global order

> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch, hive.404.2.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699868#action_12699868
 ] 

Namit Jain commented on HIVE-404:
-

The second map-reduce job will have only 1 reducer with the sorting columns 
preserved - so that will do exactly what you are saying

> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch, hive.404.2.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap

2009-04-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-426:


   Resolution: Fixed
Fix Version/s: 0.4.0
 Release Note: HIVE-426. Fix undeterministic query plan because of 
aliasToWork. (Namit Jain via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to trunk. Thanks Namit!

> undeterministic query results since aliasToWork in mapredWork is a hashmap
> --
>
> Key: HIVE-426
> URL: https://issues.apache.org/jira/browse/HIVE-426
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Fix For: 0.4.0
>
> Attachments: hive.426.1.patch
>
>
> undeterministic query results since aliasToWork in mapredWork is a hashmap

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699863#action_12699863
 ] 

Zheng Shao commented on HIVE-404:
-

I think the users would expect the results of LIMIT to be sorted in total order 
- if user says "SORT BY key LIMIT 10", he probably wants the global top 10, no 
matter how many reducers we have.

I think it's necessary to have the second map-reduce job in case of "SORT 
BY/CLUSTER BY", but we also want the second map-reduce job to have the right 
sort cols between the map-reduce boundary so we can get the global top ones.



> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch, hive.404.2.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.19 #65

2009-04-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/65/changes

Changes:

[zshao] HIVE-423. Change launch templates to use hive_model.jar. (Raghotham 
Murthy via zshao)

[zshao] HIVE-421. Fix union followed by multi-table insert. (Namit Jain via 
zshao).

--
[...truncated 29575 lines...]
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out
 
[junit] Done query: unknown_column2.q
[junit] Begin query: unknown_column3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out
 
[junit] Done query: unknown_column3.q
[junit] Begin query: unknown_column4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out
 
[junit] Done query: unknown_column4.q
[junit] Begin query: unknown_column5.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out
 
[junit] Done query: unknown_column5.q
[junit] Begin query: unknown_column6.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out
 
[junit] Done query: unknown_column6.q
[junit] Begin query: unknown_function1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=1

[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699858#action_12699858
 ] 

Namit Jain commented on HIVE-404:
-

Forgot to clarify the FetchTask issue. FetchTask does not perform any merge - 
it opens files one-by-one until limit is reached (if limit is specified).
It is the responsibility of the server to have the data appropriately sorted.

> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch, hive.404.2.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Zheng Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699853#action_12699853
 ] 

Zheng Shao commented on HIVE-416:
-

1. I checked the generated code for {k=5;}, it's a nested if so there is no 
performance penalty. But I agree most grammars have k up to 3, and it should be 
easy to extract the common prefix, so I will do it.

2. optional brackets won't be possible with a LL(k) parser with any k (without 
backtrack), because I can construct an arbitarily long string like 
"(((a+b..." and it's not possible to know whether the first "(" is the 
optional bracket or not.

Most people who has been using "SELECT TRANSFORM" are adding the brackets, 
while those using "MAP/REDUCE" are probably not (think "MAP" / "REDUCE" similar 
to "SELECT"), that's why I made the choice like that. We can discuss more on 
this if needed.


> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-421) union followed by multi-table insert does not work properly

2009-04-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-421:


   Resolution: Fixed
Fix Version/s: 0.4.0
   0.3.1
 Release Note: HIVE-421. Fix union followed by multi-table insert. (Namit 
Jain via zshao)
 Hadoop Flags: [Reviewed]
   Status: Resolved  (was: Patch Available)

Committed to branch-0.3. Thanks Namit!

> union followed by multi-table insert does not work properly
> ---
>
> Key: HIVE-421
> URL: https://issues.apache.org/jira/browse/HIVE-421
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>Priority: Critical
> Fix For: 0.3.1, 0.4.0
>
> Attachments: hive.421.1.patch, hive.421.2.branch.patch, 
> hive.421.2.patch
>
>
> Like jira 413, multi-table inserts has some problems with unions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Hive-trunk-h0.18 #67

2009-04-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/67/changes

Changes:

[zshao] HIVE-423. Change launch templates to use hive_model.jar. (Raghotham 
Murthy via zshao)

[zshao] HIVE-421. Fix union followed by multi-table insert. (Namit Jain via 
zshao).

--
[...truncated 30413 lines...]
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out
 
[junit] Done query: unknown_column2.q
[junit] Begin query: unknown_column3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out
 
[junit] Done query: unknown_column3.q
[junit] Begin query: unknown_column4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out
 
[junit] Done query: unknown_column4.q
[junit] Begin query: unknown_column5.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out
 
[junit] Done query: unknown_column5.q
[junit] Begin query: unknown_column6.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out
 
[junit] Done query: unknown_column6.q
[junit] Begin query: unknown_function1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=1

JIRA_hive.426.1.patch_UNIT_TEST_SUCCEEDED

2009-04-16 Thread Murli Varadachari

SUCCESS: BUILD AND UNIT TEST using PATCH hive.426.1.patch PASSED!!



Build failed in Hudson: Hive-trunk-h0.17 #64

2009-04-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/64/changes

Changes:

[zshao] HIVE-423. Change launch templates to use hive_model.jar. (Raghotham 
Murthy via zshao)

[zshao] HIVE-421. Fix union followed by multi-table insert. (Namit Jain via 
zshao).

--
[...truncated 25113 lines...]
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out
 
[junit] Done query: unknown_column2.q
[junit] Begin query: unknown_column3.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out
 
[junit] Done query: unknown_column3.q
[junit] Begin query: unknown_column4.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out
 
[junit] Done query: unknown_column4.q
[junit] Begin query: unknown_column5.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out
 
[junit] Done query: unknown_column5.q
[junit] Begin query: unknown_column6.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12}
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table srcbucket
[junit] OK
[junit] Loading data to table src
[junit] OK
[junit] diff 
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out
  
http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out
 
[junit] Done query: unknown_column6.q
[junit] Begin query: unknown_function1.q
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11}
[junit] OK
[junit] Loading data to table srcpart partition {ds=2008-04-09, hr=1

[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g

2009-04-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699840#action_12699840
 ] 

Namit Jain commented on HIVE-416:
-

I had some questions.

1. Wont it be better to factor out the common part in a seperate rule instead 
of providing the look-ahead.
For eg:

instead of:

alterStatement
options {k=5;}
@init { msgs.push("alter statement"); }
@after { msgs.pop(); }
: alterStatementRename
| alterStatementAddCol
| alterStatementDropPartitions
| alterStatementAddPartitions
| alterStatementProperties
| alterStatementSerdeProperties
;


wont it be better to factor out < ALTER TABLE identifier> in a common and then 
have the remaining rules ?


2.  On the same lines, I could not understand the reason for brackets around 
SELECT TRANSFORM and
no brackets around MAP/REDUCE.


Instead of this:

selectClause
@init { msgs.push("select clause"); }
@after { msgs.pop(); }
:
KW_SELECT (KW_ALL | dist=KW_DISTINCT)?
selectList -> {$dist == null}? ^(TOK_SELECT selectList)
   ->  ^(TOK_SELECTDI selectList)
|
trfmClause  ->^(TOK_SELECT ^(TOK_SELEXPR trfmClause) )
;



if we factor out:

KW_SELECT for the first part and the transform clause, brackets should become 
optional.


Am I missing something here ?

> Get rid of backtrack in Hive.g
> --
>
> Key: HIVE-416
> URL: https://issues.apache.org/jira/browse/HIVE-416
> Project: Hadoop Hive
>  Issue Type: Improvement
>  Components: Query Processor
>Affects Versions: 0.4.0
>Reporter: Zheng Shao
>Assignee: Zheng Shao
> Fix For: 0.4.0
>
> Attachments: HIVE-416.1.patch
>
>
> Hive.g now still uses "backtrack=true". "backtrack" not only slows down the 
> parsing in case of error, it can also produce wrong syntax error messages 
> (usually based on the last try of the backtracking).
> We should follow 
> http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar
>  to remove the need of doing backtrack.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA_hive.421.2.branch.patch_FAILED_TO_APPLY_PATCH

2009-04-16 Thread Murli Varadachari

Summary: This patch from JIRA hive.421.2.branch.patch failed to apply to the 
apache Hive sources.

17 out of 32 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join2.q.xml.rej
15 out of 16 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input2.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join3.q.xml.rej
17 out of 18 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input3.q.xml.rej
9 out of 18 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join4.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input4.q.xml.rej
9 out of 18 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join5.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input5.q.xml.rej
9 out of 18 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join6.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input_testxpath2.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input6.q.xml.rej
4 out of 14 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join7.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input7.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input8.q.xml.rej
9 out of 18 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join8.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input_testsequencefile.q.xml.rej
5 out of 6 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/union.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input9.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/udf1.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/udf4.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input_testxpath.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/udf6.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input_part1.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/groupby1.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/groupby2.q.xml.rej
5 out of 6 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/subq.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/groupby3.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/groupby4.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/groupby5.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/groupby6.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/case_sensitivity.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input20.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/sample1.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/sample2.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/sample3.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/sample4.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/sample5.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/sample6.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/sample7.q.xml.rej
4 out of 5 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/cast1.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/join1.q.xml.rej
7 out of 8 hunks FAILED -- saving rejects to file 
ql/src/test/results/compiler/plan/input1.q.xml.rej
Reversed (or previously applied) patch detected!  Skipping patch.
6 out of 6 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java.rej
Reversed (or previously applied) patch detected!  Skipping patch.
4 out of 4 hunks ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java.rej
Reversed (or previously applied) patch detected!  Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file 
ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRRe

[jira] Commented: (HIVE-192) Cannot create table with timestamp type column

2009-04-16 Thread Shyam Sundar Sarkar (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699818#action_12699818
 ] 

Shyam Sundar Sarkar commented on HIVE-192:
--

I was following Hive Developer Guide and found that one important section is 
missing.
Section on "3.4. Adding new unit tests" has no instructions about how to add a 
new unit test.
I had to go through trial and error methods (with velocity templates) to add a 
new unit test
in the test suite.

I request that someone from original test suite designer team should write few 
words 
for this imporatnt subsection.

Regards,
shyam_sar...@yahoo.com


> Cannot create table with timestamp type column
> --
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Johan Oskarsson
> Fix For: 0.4.0
>
> Attachments: create_2.q.txt, TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-192) Cannot create table with timestamp type column

2009-04-16 Thread Shyam Sundar Sarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Sundar Sarkar updated HIVE-192:
-

Comment: was deleted

(was: This is the diff file for showing the changes in the Hive.g grammar with 
new TimestampType added.

Thanks,
shyam_sar...@yahoo.com)

> Cannot create table with timestamp type column
> --
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Johan Oskarsson
> Fix For: 0.4.0
>
> Attachments: create_2.q.txt, TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-192) Cannot create table with timestamp type column

2009-04-16 Thread Shyam Sundar Sarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Sundar Sarkar updated HIVE-192:
-

Comment: was deleted

(was: Functional test for Timestamp.)

> Cannot create table with timestamp type column
> --
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Johan Oskarsson
> Fix For: 0.4.0
>
> Attachments: create_2.q.txt, TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-192) Cannot create table with timestamp type column

2009-04-16 Thread Shyam Sundar Sarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Sundar Sarkar updated HIVE-192:
-

Comment: was deleted

(was: Can someone please help me to find out why I am getting exception in 
setUp() method inside TestCliTimestampDriver.java file (attached) ?
I followed all lines and methods from existing CliDriver test class in Hive and 
modified just to test TIMESTAMP sysntax in some queries.
I stepped through the setUp under debug mode and it gave error in QTestUtil at 
the line :

private String tmpdir =  System.getProperty("user.dir")+"/../build/ql/tmp";

where "user.dir" was home dir of hive (not inside build dir).

If I run the general CliDriver tests and then try to run my test for TIMESTAMP, 
above exception does not show up.
However, I am getting exception at the line :

testFiles = conf.get("test.data.files").replace('\\', '/').replace("c:", "");

inside QTestUtil constructor.

My question ::  Why am I getting setUp() exception when I do not need a data 
file ?
Can someone suggest a specific step that I am missing ?

Thanks,
shyam_sar...@yahoo.com)

> Cannot create table with timestamp type column
> --
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Johan Oskarsson
> Fix For: 0.4.0
>
> Attachments: create_2.q.txt, TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-192) Cannot create table with timestamp type column

2009-04-16 Thread Shyam Sundar Sarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Sundar Sarkar updated HIVE-192:
-

Comment: was deleted

(was: I added functional test cases for TIMESTAMP. Can someone suggest more 
test cases?

The Java code for test driver is attached ::

(/hive/build/ql/test/src/org/apache/hadoop/hive/cli/TestCliTimestampDriver.java)

Can someone please tell me how do I get results and logs for the following call 
::

qt = new QTestUtil("/home/ssarkar/hive/ql/src/test/results/clientpositive", 
"/home/ssarkar/hive/build/ql/test/logs/clientpositive");

I am getting Exception. 

At this point can I add any arbitrary results and log files?

Thanks,
shyam_sar...@yahoo.com

)

> Cannot create table with timestamp type column
> --
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Johan Oskarsson
> Fix For: 0.4.0
>
> Attachments: create_2.q.txt, TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-192) Cannot create table with timestamp type column

2009-04-16 Thread Shyam Sundar Sarkar (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shyam Sundar Sarkar updated HIVE-192:
-

Attachment: (was: TestCliTimestampDriver.java.txt)

> Cannot create table with timestamp type column
> --
>
> Key: HIVE-192
> URL: https://issues.apache.org/jira/browse/HIVE-192
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.2.0
>Reporter: Johan Oskarsson
> Fix For: 0.4.0
>
> Attachments: create_2.q.txt, TIMESTAMP_specification.txt
>
>
> create table something2 (test timestamp);
> ERROR: DDL specifying type timestamp which has not been defined
> java.lang.RuntimeException: specifying type timestamp which has not been 
> defined
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101)
>   at 
> org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97)
>   at 
> org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180)
>   at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641)
>   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98)
>   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215)
>   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174)
>   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207)
>   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-426:


Attachment: hive.426.1.patch

> undeterministic query results since aliasToWork in mapredWork is a hashmap
> --
>
> Key: HIVE-426
> URL: https://issues.apache.org/jira/browse/HIVE-426
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.426.1.patch
>
>
> undeterministic query results since aliasToWork in mapredWork is a hashmap

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-426:


Status: Patch Available  (was: Open)

> undeterministic query results since aliasToWork in mapredWork is a hashmap
> --
>
> Key: HIVE-426
> URL: https://issues.apache.org/jira/browse/HIVE-426
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
> Attachments: hive.426.1.patch
>
>
> undeterministic query results since aliasToWork in mapredWork is a hashmap

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-421) union followed by multi-table insert does not work properly

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-421:


Attachment: hive.421.2.branch.patch

> union followed by multi-table insert does not work properly
> ---
>
> Key: HIVE-421
> URL: https://issues.apache.org/jira/browse/HIVE-421
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Reporter: Namit Jain
>Assignee: Namit Jain
>Priority: Critical
> Attachments: hive.421.1.patch, hive.421.2.branch.patch, 
> hive.421.2.patch
>
>
> Like jira 413, multi-table inserts has some problems with unions.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap

2009-04-16 Thread Namit Jain (JIRA)
undeterministic query results since aliasToWork in mapredWork is a hashmap
--

 Key: HIVE-426
 URL: https://issues.apache.org/jira/browse/HIVE-426
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Reporter: Namit Jain
Assignee: Namit Jain


undeterministic query results since aliasToWork in mapredWork is a hashmap

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA_hive.404.2.patch_UNIT_TEST_SUCCEEDED

2009-04-16 Thread Murli Varadachari

SUCCESS: BUILD AND UNIT TEST using PATCH hive.404.2.patch PASSED!!



[jira] Created: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time

2009-04-16 Thread Alex Loddengaard (JIRA)
HWI JSP pages should be compiled at build-time instead of run-time
--

 Key: HIVE-425
 URL: https://issues.apache.org/jira/browse/HIVE-425
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Web UI
Reporter: Alex Loddengaard


HWI JSP pages are compiled via the ant jar at run-time.  Doing so at run-time 
requires ant as a dependency and also makes developing slightly more tricky, as 
compiler errors are not discovered until HWI is deployed and running.  HWI 
should be instrumented in such a way where the JSP pages are compiled by ant at 
build-time instead, just as the Hadoop status pages are.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-404:


Status: Patch Available  (was: Open)

incorporated Zheng's comments

> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch, hive.404.2.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-404:


Status: Open  (was: Patch Available)

> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch, hive.404.2.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-404:


Attachment: hive.404.2.patch

> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch, hive.404.2.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"

2009-04-16 Thread Namit Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699737#action_12699737
 ] 

Namit Jain commented on HIVE-404:
-

1. Will do - distributeBy check is not needed
2. It creates a second map-reduce if it is not a query, so inserts is not a 
problem. Sort order is propagated to the second map-reduce job.
I dont think FetchTask needs any change - the current change performs a 
second map-reduce job which should fix the problem.

Will do the changes and reload the patch.


> Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
> 
>
> Key: HIVE-404
> URL: https://issues.apache.org/jira/browse/HIVE-404
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.3.0, 0.4.0
>Reporter: Zheng Shao
>Assignee: Namit Jain
> Attachments: hive.404.1.patch
>
>
> Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected 
> results with the query of  "SELECT * FROM t SORT BY col1 LIMIT 100"
> Basically, in the first map-reduce job, each reducer will get sorted data and 
> only keep the first 100. In the second map-reduce job, we will distribute and 
> sort the data randomly, before feeding into a single reducer that outputs the 
> first 100.
> In short, the query will output 100 random records in N * 100 top records 
> from each of the reducer in the first map-reduce job.
> This is contradicting to what people expects.
> We should propagate the SORT BY columns to the second map-reduce job.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-352) Make Hive support column based storage

2009-04-16 Thread He Yongqiang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

He Yongqiang updated HIVE-352:
--

Attachment: hive-352-2009-4-16.patch

against the latest truck.

1) added a simple rcfile_columar.q file for test.
{noformat}
DROP TABLE columnTable;
CREATE table columnTable (key STRING, value STRING)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.lazy.ColumnarSerDe'
STORED AS
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat';

FROM src
INSERT OVERWRITE TABLE columnTable SELECT src.key, src.value LIMIT 10;
describe columnTable;

SELECT columnTable.* FROM columnTable;
{noformat}

2) let ColumnarSerDe's serialize returns BytesRefArrayWritable instead of Text

BTW, it seems rcfile_columar.q.out does not contain results of SELECT 
columnTable.* FROM columnTable; 
but after the test, i saw file 
ql/test/data/warehouse/columntable/attempt_local_0001_r_00_0, and it did 
contain the data inserted. 
Why the select got nothing?

> Make Hive support column based storage
> --
>
> Key: HIVE-352
> URL: https://issues.apache.org/jira/browse/HIVE-352
> Project: Hadoop Hive
>  Issue Type: New Feature
>Reporter: He Yongqiang
> Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, 
> HIve-352-draft-2009-03-28.patch, Hive-352-draft-2009-03-30.patch
>
>
> column based storage has been proven a better storage layout for OLAP. 
> Hive does a great job on raw row oriented storage. In this issue, we will 
> enhance hive to support column based storage. 
> Acctually we have done some work on column based storage on top of hdfs, i 
> think it will need some review and refactoring to port it to Hive.
> Any thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-424) PartitionPruner fails with "WHERE ds = '2009-03-01'"

2009-04-16 Thread Zheng Shao (JIRA)
PartitionPruner fails with "WHERE ds = '2009-03-01'"


 Key: HIVE-424
 URL: https://issues.apache.org/jira/browse/HIVE-424
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Query Processor
Affects Versions: 0.3.1, 0.4.0
Reporter: Zheng Shao


The PartitionPruner will output a "Unknown Exception: null" when the condition 
in "where clause" contains fields with no table aliases.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-423) change launch templates to use hive_model.jar

2009-04-16 Thread Zheng Shao (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated HIVE-423:


  Resolution: Fixed
Release Note: HIVE-423. Change launch templates to use hive_model.jar. 
(Raghotham Murthy via zshao)
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Committed to both trunk and branch-0.3. Thanks Raghu!

> change launch templates to use hive_model.jar
> -
>
> Key: HIVE-423
> URL: https://issues.apache.org/jira/browse/HIVE-423
> Project: Hadoop Hive
>  Issue Type: Bug
>  Components: Testing Infrastructure
>Reporter: Raghotham Murthy
>Assignee: Raghotham Murthy
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: hive-423.1.patch
>
>
> the model-jar target now builds hive_model.jar instead of 
> metastore_model.jar. This causes the launch files (used to run tests) to not 
> work anymore in eclipse.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



JIRA_hive-423.1.patch_UNIT_TEST_FAILED

2009-04-16 Thread Murli Varadachari

ERROR: UNIT TEST using PATCH hive-423.1.patch FAILED!!

[junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED
[junit] < FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapRedTask
[junit] Test org.apache.hadoop.hive.ql.TestMTQueries FAILED
BUILD FAILED


*UNIT TEST FAILURE for apache HIVE* Hadoop.Version=0.17.1 based on SVN Rev# 765482.123

2009-04-16 Thread Murli Varadachari
[junit] < FAILED: Parse Error: line 1:32 cannot recognize input '.' in 
table column identifier
[junit] > FAILED: Parse Error: line 1:32 cannot recognize input '.' in 
expression specification
[junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED
BUILD FAILED
[junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED
[junit] < FAILED: Parse Error: line 1:32 cannot recognize input '.' in 
table column identifier
[junit] > FAILED: Parse Error: line 1:32 cannot recognize input '.' in 
expression specification
[junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED
BUILD FAILED