[jira] Updated: (HIVE-352) Make Hive support column based storage
[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-352: -- Attachment: hive-352-2009-4-17.patch Fixed the select problem. And refatored the TestRCFile class. > Make Hive support column based storage > -- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: He Yongqiang > Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > hive-352-2009-4-17.patch, HIve-352-draft-2009-03-28.patch, > Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700013#action_12700013 ] Zheng Shao commented on HIVE-416: - @Raghu: I just had a second thought on that approach. The new production you add is left-recursive and it's not permitted in LL(k), but it's possible to use precedence rules to fix that. However given all the change including the flattening, it seems to me that's too much work with very little benefit - who care about the optional brackets, for usage it's exactly the same. For Venky's case, it's a separate problem. Venky's case is more like supporting "a" and "a". We should be able to support it easily once we allow omitting the sub query alias. > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-402) Create regexp_extract udf
[ https://issues.apache.org/jira/browse/HIVE-402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1272#action_1272 ] Namit Jain commented on HIVE-402: - 1. Can you remove LOG.warn("here please"); from evaluate 2. Do you want to make the last parameter extractIndex optional ? Otherwise, it looks good. Do we need to backport it in branch 3 also ? > Create regexp_extract udf > - > > Key: HIVE-402 > URL: https://issues.apache.org/jira/browse/HIVE-402 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Raghotham Murthy >Assignee: Raghotham Murthy > Attachments: hive-402.1.patch > > > This will allow users to extract substrings from a string based on a regular > expression. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-428) Implement Map-side Hash-Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699987#action_12699987 ] He Yongqiang commented on HIVE-428: --- ohh, sorry. i am ok to close this for a duplicate or merge them. > Implement Map-side Hash-Join in Hive > > > Key: HIVE-428 > URL: https://issues.apache.org/jira/browse/HIVE-428 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: He Yongqiang > > There are many situations that join will perform much better if map side hash > join is used. We have a small test with a simple equal join of two tables, > plain MR join with no map side hash join will execute about 50 seconds in a > 6-node cluster (each node 8core, 4G mem). With the mapside hash join is > applied, it only needs about 15 seconds. > The map side hash join can only be used when there is small files, which can > be replicated to each map. The map side hash join can be coexeuted together > with the map-side filter. > For example, > select A.a, A.c, B.b from A,B where A.a=B.d and A.a < 12 and B.b=10 > In our experiment, this statement can be translated into three different > plans if both A and B are plain data file ( with no special compress). > Plan 1 > Map-Reduce > both A and B are input for the map. the shuffle data involved is very large. > Plan 2 > 1) first filter B.b to a temp file B1 -- this is seperate Map only job > 2) replicate B1 to each map when filter A and join them in the map > no reduce is used > Plan 3 > produce a job which's each mapper is filtering A (so the mapper is assigned > with regard to only A), and directly replicate B to each mapper > Before each mapper is started filtering A, filter B and load passed B into > memory. And then start the mapper and join in the mem. > Plan 3 performs better in our experiment because it saved a seperate map-only > job. But Plan2 is suitable for the situation when B's original file is very > large, but its filtered file is much small. > This is the basic idea of Map side hash join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-428) Implement Map-side Hash-Join in Hive
[ https://issues.apache.org/jira/browse/HIVE-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699978#action_12699978 ] Prasad Chakka commented on HIVE-428: I think there is already a JIRA opened for this and some patch already exists... https://issues.apache.org/jira/browse/HIVE-195 > Implement Map-side Hash-Join in Hive > > > Key: HIVE-428 > URL: https://issues.apache.org/jira/browse/HIVE-428 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: He Yongqiang > > There are many situations that join will perform much better if map side hash > join is used. We have a small test with a simple equal join of two tables, > plain MR join with no map side hash join will execute about 50 seconds in a > 6-node cluster (each node 8core, 4G mem). With the mapside hash join is > applied, it only needs about 15 seconds. > The map side hash join can only be used when there is small files, which can > be replicated to each map. The map side hash join can be coexeuted together > with the map-side filter. > For example, > select A.a, A.c, B.b from A,B where A.a=B.d and A.a < 12 and B.b=10 > In our experiment, this statement can be translated into three different > plans if both A and B are plain data file ( with no special compress). > Plan 1 > Map-Reduce > both A and B are input for the map. the shuffle data involved is very large. > Plan 2 > 1) first filter B.b to a temp file B1 -- this is seperate Map only job > 2) replicate B1 to each map when filter A and join them in the map > no reduce is used > Plan 3 > produce a job which's each mapper is filtering A (so the mapper is assigned > with regard to only A), and directly replicate B to each mapper > Before each mapper is started filtering A, filter B and load passed B into > memory. And then start the mapper and join in the mem. > Plan 3 performs better in our experiment because it saved a seperate map-only > job. But Plan2 is suitable for the situation when B's original file is very > large, but its filtered file is much small. > This is the basic idea of Map side hash join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-428) Implement Map-side Hash-Join in Hive
Implement Map-side Hash-Join in Hive Key: HIVE-428 URL: https://issues.apache.org/jira/browse/HIVE-428 Project: Hadoop Hive Issue Type: New Feature Reporter: He Yongqiang There are many situations that join will perform much better if map side hash join is used. We have a small test with a simple equal join of two tables, plain MR join with no map side hash join will execute about 50 seconds in a 6-node cluster (each node 8core, 4G mem). With the mapside hash join is applied, it only needs about 15 seconds. The map side hash join can only be used when there is small files, which can be replicated to each map. The map side hash join can be coexeuted together with the map-side filter. For example, select A.a, A.c, B.b from A,B where A.a=B.d and A.a < 12 and B.b=10 In our experiment, this statement can be translated into three different plans if both A and B are plain data file ( with no special compress). Plan 1 Map-Reduce both A and B are input for the map. the shuffle data involved is very large. Plan 2 1) first filter B.b to a temp file B1 -- this is seperate Map only job 2) replicate B1 to each map when filter A and join them in the map no reduce is used Plan 3 produce a job which's each mapper is filtering A (so the mapper is assigned with regard to only A), and directly replicate B to each mapper Before each mapper is started filtering A, filter B and load passed B into memory. And then start the mapper and join in the mem. Plan 3 performs better in our experiment because it saved a seperate map-only job. But Plan2 is suitable for the situation when B's original file is very large, but its filtered file is much small. This is the basic idea of Map side hash join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699957#action_12699957 ] Raghotham Murthy commented on HIVE-416: --- How about the following: Add the production: {{{ expr -> expr ',' expr }}} With this production and expr -> '(' expr ')' we can support arbitrarily nested parentheses. The issue is that the new production will create a left-deep tree of comma-expressions. We could implement a method which takes such a tree and flatten out comma expressions into expression lists. Also, I remember Venky asking for arbitrarily nested parentheses around queries for his query authoring tool. We could do something similar and create comma-query-expressions. > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
JIRA_hive.427.1.patch_UNIT_TEST_FAILED
ERROR: UNIT TEST using PATCH hive.427.1.patch FAILED!! [junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED BUILD FAILED
JIRA_HIVE-416.1.1.patch_UNIT_TEST_SUCCEEDED
SUCCESS: BUILD AND UNIT TEST using PATCH HIVE-416.1.1.patch PASSED!!
[jira] Updated: (HIVE-427) configuration parameters missing in hive-default,.xml
[ https://issues.apache.org/jira/browse/HIVE-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-427: Status: Patch Available (was: Open) > configuration parameters missing in hive-default,.xml > - > > Key: HIVE-427 > URL: https://issues.apache.org/jira/browse/HIVE-427 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.427.1.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-427) configuration parameters missing in hive-default,.xml
configuration parameters missing in hive-default,.xml - Key: HIVE-427 URL: https://issues.apache.org/jira/browse/HIVE-427 Project: Hadoop Hive Issue Type: Bug Reporter: Namit Jain Assignee: Namit Jain Attachments: hive.427.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-427) configuration parameters missing in hive-default,.xml
[ https://issues.apache.org/jira/browse/HIVE-427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-427: Attachment: hive.427.1.patch > configuration parameters missing in hive-default,.xml > - > > Key: HIVE-427 > URL: https://issues.apache.org/jira/browse/HIVE-427 > Project: Hadoop Hive > Issue Type: Bug >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.427.1.patch > > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699898#action_12699898 ] Zheng Shao commented on HIVE-416: - > About the comment on optional brackets - clearly these are optional in > expressions. So how do we support those expression e.g (a+b) and a+b are both > valid sql expressoins if we cannot support this without backtracking... We do support "(a+b)" and "a+b". The problem is that there is no easy way of supporting both "a))), b)" and "a, b". No matter what is k, it's not possible to determine whether the first "(" is the optional bracket for the expression list, or just part of the first expression. I will need to go over the antlr book to know more about Semantic/Syntactic predicate to know whether that is possible. > Identifier DOT Identifier. Treating it as a lexical rule won't allow both T.a.b and a.b. I am making a first Identifier a TOK_TABLE_OR_COL. I will let SemanticAnalyzer to decide whether it is a table name or column name. Not sure that should go into the same transaction or not since it's a much bigger change. > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-416: Attachment: HIVE-416.1.1.patch Extracted common prefix for ALTER TABLE, and Removed all "{k=5}". > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.1.patch, HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699873#action_12699873 ] Ashish Thusoo commented on HIVE-416: I think for Identifier DOT Identifier we should probably treat it as a lexical rule rather than a grammar rule. It will also make it much simpler to support optional aliasing with complex types. Right now select T.a.b FROM T and select a.b FROM T is very hard to handle in the SemanticAnalyzer as the grammar treats a as a table alias instead of a complex column name. About the comment on optional brackets - clearly these are optional in expressions. So how do we support those expression e.g (a+b) and a+b are both valid sql expressoins if we cannot support this without backtracking... > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699874#action_12699874 ] Ashish Thusoo commented on HIVE-416: Can we use semantic/syntactic predicates to support the optional brackets? > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699867#action_12699867 ] Namit Jain commented on HIVE-404: - Thats right - that's what genReduceSinkPlan does. After the change, if a sorting/clustering column is present, a second map-reduce job will sort/cluster by those columns, so that we can get the global order > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch, hive.404.2.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699868#action_12699868 ] Namit Jain commented on HIVE-404: - The second map-reduce job will have only 1 reducer with the sorting columns preserved - so that will do exactly what you are saying > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch, hive.404.2.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap
[ https://issues.apache.org/jira/browse/HIVE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-426: Resolution: Fixed Fix Version/s: 0.4.0 Release Note: HIVE-426. Fix undeterministic query plan because of aliasToWork. (Namit Jain via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to trunk. Thanks Namit! > undeterministic query results since aliasToWork in mapredWork is a hashmap > -- > > Key: HIVE-426 > URL: https://issues.apache.org/jira/browse/HIVE-426 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Fix For: 0.4.0 > > Attachments: hive.426.1.patch > > > undeterministic query results since aliasToWork in mapredWork is a hashmap -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699863#action_12699863 ] Zheng Shao commented on HIVE-404: - I think the users would expect the results of LIMIT to be sorted in total order - if user says "SORT BY key LIMIT 10", he probably wants the global top 10, no matter how many reducers we have. I think it's necessary to have the second map-reduce job in case of "SORT BY/CLUSTER BY", but we also want the second map-reduce job to have the right sort cols between the map-reduce boundary so we can get the global top ones. > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch, hive.404.2.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.19 #65
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/65/changes Changes: [zshao] HIVE-423. Change launch templates to use hive_model.jar. (Raghotham Murthy via zshao) [zshao] HIVE-421. Fix union followed by multi-table insert. (Namit Jain via zshao). -- [...truncated 29575 lines...] [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out [junit] Done query: unknown_column2.q [junit] Begin query: unknown_column3.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out [junit] Done query: unknown_column3.q [junit] Begin query: unknown_column4.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out [junit] Done query: unknown_column4.q [junit] Begin query: unknown_column5.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out [junit] Done query: unknown_column5.q [junit] Begin query: unknown_column6.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.19/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out [junit] Done query: unknown_column6.q [junit] Begin query: unknown_function1.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=1
[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699858#action_12699858 ] Namit Jain commented on HIVE-404: - Forgot to clarify the FetchTask issue. FetchTask does not perform any merge - it opens files one-by-one until limit is reached (if limit is specified). It is the responsibility of the server to have the data appropriately sorted. > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch, hive.404.2.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699853#action_12699853 ] Zheng Shao commented on HIVE-416: - 1. I checked the generated code for {k=5;}, it's a nested if so there is no performance penalty. But I agree most grammars have k up to 3, and it should be easy to extract the common prefix, so I will do it. 2. optional brackets won't be possible with a LL(k) parser with any k (without backtrack), because I can construct an arbitarily long string like "(((a+b..." and it's not possible to know whether the first "(" is the optional bracket or not. Most people who has been using "SELECT TRANSFORM" are adding the brackets, while those using "MAP/REDUCE" are probably not (think "MAP" / "REDUCE" similar to "SELECT"), that's why I made the choice like that. We can discuss more on this if needed. > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-421) union followed by multi-table insert does not work properly
[ https://issues.apache.org/jira/browse/HIVE-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-421: Resolution: Fixed Fix Version/s: 0.4.0 0.3.1 Release Note: HIVE-421. Fix union followed by multi-table insert. (Namit Jain via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to branch-0.3. Thanks Namit! > union followed by multi-table insert does not work properly > --- > > Key: HIVE-421 > URL: https://issues.apache.org/jira/browse/HIVE-421 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain >Priority: Critical > Fix For: 0.3.1, 0.4.0 > > Attachments: hive.421.1.patch, hive.421.2.branch.patch, > hive.421.2.patch > > > Like jira 413, multi-table inserts has some problems with unions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Hive-trunk-h0.18 #67
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/67/changes Changes: [zshao] HIVE-423. Change launch templates to use hive_model.jar. (Raghotham Murthy via zshao) [zshao] HIVE-421. Fix union followed by multi-table insert. (Namit Jain via zshao). -- [...truncated 30413 lines...] [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out [junit] Done query: unknown_column2.q [junit] Begin query: unknown_column3.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out [junit] Done query: unknown_column3.q [junit] Begin query: unknown_column4.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out [junit] Done query: unknown_column4.q [junit] Begin query: unknown_column5.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out [junit] Done query: unknown_column5.q [junit] Begin query: unknown_column6.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out [junit] Done query: unknown_column6.q [junit] Begin query: unknown_function1.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=1
JIRA_hive.426.1.patch_UNIT_TEST_SUCCEEDED
SUCCESS: BUILD AND UNIT TEST using PATCH hive.426.1.patch PASSED!!
Build failed in Hudson: Hive-trunk-h0.17 #64
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/64/changes Changes: [zshao] HIVE-423. Change launch templates to use hive_model.jar. (Raghotham Murthy via zshao) [zshao] HIVE-421. Fix union followed by multi-table insert. (Namit Jain via zshao). -- [...truncated 25113 lines...] [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column2.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column2.q.out [junit] Done query: unknown_column2.q [junit] Begin query: unknown_column3.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column3.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column3.q.out [junit] Done query: unknown_column3.q [junit] Begin query: unknown_column4.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column4.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column4.q.out [junit] Done query: unknown_column4.q [junit] Begin query: unknown_column5.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column5.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column5.q.out [junit] Done query: unknown_column5.q [junit] Begin query: unknown_column6.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=12} [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table srcbucket [junit] OK [junit] Loading data to table src [junit] OK [junit] diff http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/build/ql/test/logs/negative/unknown_column6.q.out http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.17/ws/hive/ql/src/test/results/compiler/errors/unknown_column6.q.out [junit] Done query: unknown_column6.q [junit] Begin query: unknown_function1.q [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-08, hr=12} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=11} [junit] OK [junit] Loading data to table srcpart partition {ds=2008-04-09, hr=1
[jira] Commented: (HIVE-416) Get rid of backtrack in Hive.g
[ https://issues.apache.org/jira/browse/HIVE-416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699840#action_12699840 ] Namit Jain commented on HIVE-416: - I had some questions. 1. Wont it be better to factor out the common part in a seperate rule instead of providing the look-ahead. For eg: instead of: alterStatement options {k=5;} @init { msgs.push("alter statement"); } @after { msgs.pop(); } : alterStatementRename | alterStatementAddCol | alterStatementDropPartitions | alterStatementAddPartitions | alterStatementProperties | alterStatementSerdeProperties ; wont it be better to factor out < ALTER TABLE identifier> in a common and then have the remaining rules ? 2. On the same lines, I could not understand the reason for brackets around SELECT TRANSFORM and no brackets around MAP/REDUCE. Instead of this: selectClause @init { msgs.push("select clause"); } @after { msgs.pop(); } : KW_SELECT (KW_ALL | dist=KW_DISTINCT)? selectList -> {$dist == null}? ^(TOK_SELECT selectList) -> ^(TOK_SELECTDI selectList) | trfmClause ->^(TOK_SELECT ^(TOK_SELEXPR trfmClause) ) ; if we factor out: KW_SELECT for the first part and the transform clause, brackets should become optional. Am I missing something here ? > Get rid of backtrack in Hive.g > -- > > Key: HIVE-416 > URL: https://issues.apache.org/jira/browse/HIVE-416 > Project: Hadoop Hive > Issue Type: Improvement > Components: Query Processor >Affects Versions: 0.4.0 >Reporter: Zheng Shao >Assignee: Zheng Shao > Fix For: 0.4.0 > > Attachments: HIVE-416.1.patch > > > Hive.g now still uses "backtrack=true". "backtrack" not only slows down the > parsing in case of error, it can also produce wrong syntax error messages > (usually based on the last try of the backtracking). > We should follow > http://www.antlr.org/wiki/display/ANTLR3/How+to+remove+global+backtracking+from+your+grammar > to remove the need of doing backtrack. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
JIRA_hive.421.2.branch.patch_FAILED_TO_APPLY_PATCH
Summary: This patch from JIRA hive.421.2.branch.patch failed to apply to the apache Hive sources. 17 out of 32 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join2.q.xml.rej 15 out of 16 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input2.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join3.q.xml.rej 17 out of 18 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input3.q.xml.rej 9 out of 18 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join4.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input4.q.xml.rej 9 out of 18 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join5.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input5.q.xml.rej 9 out of 18 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join6.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input_testxpath2.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input6.q.xml.rej 4 out of 14 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join7.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input7.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input8.q.xml.rej 9 out of 18 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join8.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input_testsequencefile.q.xml.rej 5 out of 6 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/union.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input9.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/udf1.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/udf4.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input_testxpath.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/udf6.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input_part1.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/groupby1.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/groupby2.q.xml.rej 5 out of 6 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/subq.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/groupby3.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/groupby4.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/groupby5.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/groupby6.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/case_sensitivity.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input20.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/sample1.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/sample2.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/sample3.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/sample4.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/sample5.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/sample6.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/sample7.q.xml.rej 4 out of 5 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/cast1.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/join1.q.xml.rej 7 out of 8 hunks FAILED -- saving rejects to file ql/src/test/results/compiler/plan/input1.q.xml.rej Reversed (or previously applied) patch detected! Skipping patch. 6 out of 6 hunks ignored -- saving rejects to file ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRProcContext.java.rej Reversed (or previously applied) patch detected! Skipping patch. 4 out of 4 hunks ignored -- saving rejects to file ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRUnion1.java.rej Reversed (or previously applied) patch detected! Skipping patch. 1 out of 1 hunk ignored -- saving rejects to file ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRRe
[jira] Commented: (HIVE-192) Cannot create table with timestamp type column
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699818#action_12699818 ] Shyam Sundar Sarkar commented on HIVE-192: -- I was following Hive Developer Guide and found that one important section is missing. Section on "3.4. Adding new unit tests" has no instructions about how to add a new unit test. I had to go through trial and error methods (with velocity templates) to add a new unit test in the test suite. I request that someone from original test suite designer team should write few words for this imporatnt subsection. Regards, shyam_sar...@yahoo.com > Cannot create table with timestamp type column > -- > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0 >Reporter: Johan Oskarsson > Fix For: 0.4.0 > > Attachments: create_2.q.txt, TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-192) Cannot create table with timestamp type column
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Sundar Sarkar updated HIVE-192: - Comment: was deleted (was: This is the diff file for showing the changes in the Hive.g grammar with new TimestampType added. Thanks, shyam_sar...@yahoo.com) > Cannot create table with timestamp type column > -- > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0 >Reporter: Johan Oskarsson > Fix For: 0.4.0 > > Attachments: create_2.q.txt, TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-192) Cannot create table with timestamp type column
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Sundar Sarkar updated HIVE-192: - Comment: was deleted (was: Functional test for Timestamp.) > Cannot create table with timestamp type column > -- > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0 >Reporter: Johan Oskarsson > Fix For: 0.4.0 > > Attachments: create_2.q.txt, TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-192) Cannot create table with timestamp type column
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Sundar Sarkar updated HIVE-192: - Comment: was deleted (was: Can someone please help me to find out why I am getting exception in setUp() method inside TestCliTimestampDriver.java file (attached) ? I followed all lines and methods from existing CliDriver test class in Hive and modified just to test TIMESTAMP sysntax in some queries. I stepped through the setUp under debug mode and it gave error in QTestUtil at the line : private String tmpdir = System.getProperty("user.dir")+"/../build/ql/tmp"; where "user.dir" was home dir of hive (not inside build dir). If I run the general CliDriver tests and then try to run my test for TIMESTAMP, above exception does not show up. However, I am getting exception at the line : testFiles = conf.get("test.data.files").replace('\\', '/').replace("c:", ""); inside QTestUtil constructor. My question :: Why am I getting setUp() exception when I do not need a data file ? Can someone suggest a specific step that I am missing ? Thanks, shyam_sar...@yahoo.com) > Cannot create table with timestamp type column > -- > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0 >Reporter: Johan Oskarsson > Fix For: 0.4.0 > > Attachments: create_2.q.txt, TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-192) Cannot create table with timestamp type column
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Sundar Sarkar updated HIVE-192: - Comment: was deleted (was: I added functional test cases for TIMESTAMP. Can someone suggest more test cases? The Java code for test driver is attached :: (/hive/build/ql/test/src/org/apache/hadoop/hive/cli/TestCliTimestampDriver.java) Can someone please tell me how do I get results and logs for the following call :: qt = new QTestUtil("/home/ssarkar/hive/ql/src/test/results/clientpositive", "/home/ssarkar/hive/build/ql/test/logs/clientpositive"); I am getting Exception. At this point can I add any arbitrary results and log files? Thanks, shyam_sar...@yahoo.com ) > Cannot create table with timestamp type column > -- > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0 >Reporter: Johan Oskarsson > Fix For: 0.4.0 > > Attachments: create_2.q.txt, TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-192) Cannot create table with timestamp type column
[ https://issues.apache.org/jira/browse/HIVE-192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shyam Sundar Sarkar updated HIVE-192: - Attachment: (was: TestCliTimestampDriver.java.txt) > Cannot create table with timestamp type column > -- > > Key: HIVE-192 > URL: https://issues.apache.org/jira/browse/HIVE-192 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.2.0 >Reporter: Johan Oskarsson > Fix For: 0.4.0 > > Attachments: create_2.q.txt, TIMESTAMP_specification.txt > > > create table something2 (test timestamp); > ERROR: DDL specifying type timestamp which has not been defined > java.lang.RuntimeException: specifying type timestamp which has not been > defined > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldType(thrift_grammar.java:1879) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Field(thrift_grammar.java:1545) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.FieldList(thrift_grammar.java:1501) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Struct(thrift_grammar.java:1171) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.TypeDefinition(thrift_grammar.java:497) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Definition(thrift_grammar.java:439) > at > org.apache.hadoop.hive.serde2.dynamic_type.thrift_grammar.Start(thrift_grammar.java:101) > at > org.apache.hadoop.hive.serde2.dynamic_type.DynamicSerDe.initialize(DynamicSerDe.java:97) > at > org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:180) > at org.apache.hadoop.hive.ql.metadata.Table.initSerDe(Table.java:141) > at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:202) > at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:641) > at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:98) > at org.apache.hadoop.hive.ql.Driver.run(Driver.java:215) > at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:174) > at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:207) > at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:305) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap
[ https://issues.apache.org/jira/browse/HIVE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-426: Attachment: hive.426.1.patch > undeterministic query results since aliasToWork in mapredWork is a hashmap > -- > > Key: HIVE-426 > URL: https://issues.apache.org/jira/browse/HIVE-426 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.426.1.patch > > > undeterministic query results since aliasToWork in mapredWork is a hashmap -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap
[ https://issues.apache.org/jira/browse/HIVE-426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-426: Status: Patch Available (was: Open) > undeterministic query results since aliasToWork in mapredWork is a hashmap > -- > > Key: HIVE-426 > URL: https://issues.apache.org/jira/browse/HIVE-426 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain > Attachments: hive.426.1.patch > > > undeterministic query results since aliasToWork in mapredWork is a hashmap -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-421) union followed by multi-table insert does not work properly
[ https://issues.apache.org/jira/browse/HIVE-421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-421: Attachment: hive.421.2.branch.patch > union followed by multi-table insert does not work properly > --- > > Key: HIVE-421 > URL: https://issues.apache.org/jira/browse/HIVE-421 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Reporter: Namit Jain >Assignee: Namit Jain >Priority: Critical > Attachments: hive.421.1.patch, hive.421.2.branch.patch, > hive.421.2.patch > > > Like jira 413, multi-table inserts has some problems with unions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-426) undeterministic query results since aliasToWork in mapredWork is a hashmap
undeterministic query results since aliasToWork in mapredWork is a hashmap -- Key: HIVE-426 URL: https://issues.apache.org/jira/browse/HIVE-426 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Namit Jain undeterministic query results since aliasToWork in mapredWork is a hashmap -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
JIRA_hive.404.2.patch_UNIT_TEST_SUCCEEDED
SUCCESS: BUILD AND UNIT TEST using PATCH hive.404.2.patch PASSED!!
[jira] Created: (HIVE-425) HWI JSP pages should be compiled at build-time instead of run-time
HWI JSP pages should be compiled at build-time instead of run-time -- Key: HIVE-425 URL: https://issues.apache.org/jira/browse/HIVE-425 Project: Hadoop Hive Issue Type: Improvement Components: Web UI Reporter: Alex Loddengaard HWI JSP pages are compiled via the ant jar at run-time. Doing so at run-time requires ant as a dependency and also makes developing slightly more tricky, as compiler errors are not discovered until HWI is deployed and running. HWI should be instrumented in such a way where the JSP pages are compiled by ant at build-time instead, just as the Hadoop status pages are. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-404: Status: Patch Available (was: Open) incorporated Zheng's comments > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch, hive.404.2.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-404: Status: Open (was: Patch Available) > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch, hive.404.2.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Namit Jain updated HIVE-404: Attachment: hive.404.2.patch > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch, hive.404.2.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-404) Problems in "SELECT * FROM t SORT BY col1 LIMIT 100"
[ https://issues.apache.org/jira/browse/HIVE-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699737#action_12699737 ] Namit Jain commented on HIVE-404: - 1. Will do - distributeBy check is not needed 2. It creates a second map-reduce if it is not a query, so inserts is not a problem. Sort order is propagated to the second map-reduce job. I dont think FetchTask needs any change - the current change performs a second map-reduce job which should fix the problem. Will do the changes and reload the patch. > Problems in "SELECT * FROM t SORT BY col1 LIMIT 100" > > > Key: HIVE-404 > URL: https://issues.apache.org/jira/browse/HIVE-404 > Project: Hadoop Hive > Issue Type: Bug > Components: Query Processor >Affects Versions: 0.3.0, 0.4.0 >Reporter: Zheng Shao >Assignee: Namit Jain > Attachments: hive.404.1.patch > > > Unless the user specify "set mapred.reduce.tasks=1;", he will see unexpected > results with the query of "SELECT * FROM t SORT BY col1 LIMIT 100" > Basically, in the first map-reduce job, each reducer will get sorted data and > only keep the first 100. In the second map-reduce job, we will distribute and > sort the data randomly, before feeding into a single reducer that outputs the > first 100. > In short, the query will output 100 random records in N * 100 top records > from each of the reducer in the first map-reduce job. > This is contradicting to what people expects. > We should propagate the SORT BY columns to the second map-reduce job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-352) Make Hive support column based storage
[ https://issues.apache.org/jira/browse/HIVE-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] He Yongqiang updated HIVE-352: -- Attachment: hive-352-2009-4-16.patch against the latest truck. 1) added a simple rcfile_columar.q file for test. {noformat} DROP TABLE columnTable; CREATE table columnTable (key STRING, value STRING) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.ColumnarSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'; FROM src INSERT OVERWRITE TABLE columnTable SELECT src.key, src.value LIMIT 10; describe columnTable; SELECT columnTable.* FROM columnTable; {noformat} 2) let ColumnarSerDe's serialize returns BytesRefArrayWritable instead of Text BTW, it seems rcfile_columar.q.out does not contain results of SELECT columnTable.* FROM columnTable; but after the test, i saw file ql/test/data/warehouse/columntable/attempt_local_0001_r_00_0, and it did contain the data inserted. Why the select got nothing? > Make Hive support column based storage > -- > > Key: HIVE-352 > URL: https://issues.apache.org/jira/browse/HIVE-352 > Project: Hadoop Hive > Issue Type: New Feature >Reporter: He Yongqiang > Attachments: hive-352-2009-4-15.patch, hive-352-2009-4-16.patch, > HIve-352-draft-2009-03-28.patch, Hive-352-draft-2009-03-30.patch > > > column based storage has been proven a better storage layout for OLAP. > Hive does a great job on raw row oriented storage. In this issue, we will > enhance hive to support column based storage. > Acctually we have done some work on column based storage on top of hdfs, i > think it will need some review and refactoring to port it to Hive. > Any thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-424) PartitionPruner fails with "WHERE ds = '2009-03-01'"
PartitionPruner fails with "WHERE ds = '2009-03-01'" Key: HIVE-424 URL: https://issues.apache.org/jira/browse/HIVE-424 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.3.1, 0.4.0 Reporter: Zheng Shao The PartitionPruner will output a "Unknown Exception: null" when the condition in "where clause" contains fields with no table aliases. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-423) change launch templates to use hive_model.jar
[ https://issues.apache.org/jira/browse/HIVE-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zheng Shao updated HIVE-423: Resolution: Fixed Release Note: HIVE-423. Change launch templates to use hive_model.jar. (Raghotham Murthy via zshao) Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Committed to both trunk and branch-0.3. Thanks Raghu! > change launch templates to use hive_model.jar > - > > Key: HIVE-423 > URL: https://issues.apache.org/jira/browse/HIVE-423 > Project: Hadoop Hive > Issue Type: Bug > Components: Testing Infrastructure >Reporter: Raghotham Murthy >Assignee: Raghotham Murthy >Priority: Minor > Fix For: 0.4.0 > > Attachments: hive-423.1.patch > > > the model-jar target now builds hive_model.jar instead of > metastore_model.jar. This causes the launch files (used to run tests) to not > work anymore in eclipse. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
JIRA_hive-423.1.patch_UNIT_TEST_FAILED
ERROR: UNIT TEST using PATCH hive-423.1.patch FAILED!! [junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED [junit] < FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask [junit] Test org.apache.hadoop.hive.ql.TestMTQueries FAILED BUILD FAILED
*UNIT TEST FAILURE for apache HIVE* Hadoop.Version=0.17.1 based on SVN Rev# 765482.123
[junit] < FAILED: Parse Error: line 1:32 cannot recognize input '.' in table column identifier [junit] > FAILED: Parse Error: line 1:32 cannot recognize input '.' in expression specification [junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED BUILD FAILED [junit] Test org.apache.hadoop.hive.cli.TestCliDriver FAILED [junit] < FAILED: Parse Error: line 1:32 cannot recognize input '.' in table column identifier [junit] > FAILED: Parse Error: line 1:32 cannot recognize input '.' in expression specification [junit] Test org.apache.hadoop.hive.cli.TestNegativeCliDriver FAILED BUILD FAILED