[jira] Commented: (HIVE-307) LOAD DATA LOCAL INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910799#action_12910799 ] Ashish Thusoo commented on HIVE-307: Hi Kirk, Thanks for the contribution. Can you add a simple test case with your patch? Ashish LOAD DATA LOCAL INPATH fails when the table already contains a file of the same name -- Key: HIVE-307 URL: https://issues.apache.org/jira/browse/HIVE-307 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Zheng Shao Priority: Critical Attachments: HIVE-307.patch Failed with exception checkPaths: /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already exists FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-307) LOAD DATA LOCAL INPATH fails when the table already contains a file of the same name
[ https://issues.apache.org/jira/browse/HIVE-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-307: --- Status: Open (was: Patch Available) Assignee: Kirk True Cancelling the patch because of a missing test case. Krik, would be great if you can resubmit with the test case. Otherwise the code looks fine to me. Ashish LOAD DATA LOCAL INPATH fails when the table already contains a file of the same name -- Key: HIVE-307 URL: https://issues.apache.org/jira/browse/HIVE-307 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Zheng Shao Assignee: Kirk True Priority: Critical Attachments: HIVE-307.patch Failed with exception checkPaths: /user/zshao/warehouse/tmp_user_msg_history/test_user_msg_history already exists FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [VOTE] Hive as a TLP
With 10 +1 votes this vote passes. Owen, Please forward this to the Apache board. Thanks, Ashish -Original Message- From: Tom White [mailto:t...@cloudera.com] Sent: Friday, August 27, 2010 10:24 AM To: gene...@hadoop.apache.org Subject: Re: [VOTE] Hive as a TLP +1 Tom On Thu, Aug 26, 2010 at 1:01 PM, Ashish Thusoo athu...@facebook.com wrote: The Hive development community voted and passed the following resolution. The details of the vote is at http://www.bit.ly/aJogyU The PMC will comprise of the current committers on Hive (as of 8/24/2010) with Namit Jain being the chair. Please vote on sending this resolution to the Apache Board. Thanks, Ashish Draft Resolution to be sent to the Apache Board --- Establish the Apache Hive Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to parallel analysis of large data sets for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Hive Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Hive Project be and hereby is responsible for the creation and maintenance of software related to parallel analysis of large data sets; and be it further RESOLVED, that the office of Vice President, Apache Hive be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Hive Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Hive Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Hive Project: * Namit Jain (na...@apache.org) * John Sichi (j...@apache.org) * Zheng Shao (zs...@apache.org) * Edward Capriolo (appodic...@apache.org) * Raghotham Murthy (r...@apache.org) * Ning Zhang (nzh...@apache.org) * Paul Yang (pa...@apache.org) * He Yongqiang (he yongqi...@apache.org) * Prasad Chakka (pras...@apache.org) * Joydeep Sen Sarma (jsensa...@apache.org) * Ashish Thusoo (athu...@apache.org) NOW, THEREFORE, BE IT FURTHER RESOLVED, that Namit Jain be appointed to the office of Vice President, Apache Hive, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache Hive PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache Hive Project; and be it further RESOLVED, that the Apache Hive Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop Hive sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hive sub-project encumbered upon the Apache Hadoop Project are hereafter discharged.
[VOTE] Draft Resolution to make Hive a TLP
Folks, I am going to make the following proposal at gene...@hadoop.apache.org In summary this proposal does the following things: 1. Establishes the PMC as comprising of the current committers of Hive (as of today - 8/24/2010). 2. Proposes Namit Jain to the chair of the project (PMC chairs have no more power than other PMC members, but they are responsible for writing regular reports for the Apache board, assigning rights to new committers, etc.) 3. Tasks the PMC to come up with the bylaws for governance of the project. Please vote on this as soon as possible(yes I should have done this as part of the earlier vote, but please bear with me), so that we can get the ball rolling on this... Thanks, Ashish Draft Resolution to be sent to the Apache Board --- Establish the Apache Hive Project WHEREAS, the Board of Directors deems it to be in the best interests of the Foundation and consistent with the Foundation's purpose to establish a Project Management Committee charged with the creation and maintenance of open-source software related to parallel analysis of large data sets for distribution at no charge to the public. NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee (PMC), to be known as the Apache Hive Project, be and hereby is established pursuant to Bylaws of the Foundation; and be it further RESOLVED, that the Apache Hive Project be and hereby is responsible for the creation and maintenance of software related to parallel analysis of large data sets; and be it further RESOLVED, that the office of Vice President, Apache Hive be and hereby is created, the person holding such office to serve at the direction of the Board of Directors as the chair of the Apache Hive Project, and to have primary responsibility for management of the projects within the scope of responsibility of the Apache Hive Project; and be it further RESOLVED, that the persons listed immediately below be and hereby are appointed to serve as the initial members of the Apache Hive Project: * Namit Jain (na...@apache.org) * John Sichi (j...@apache.org) * Zheng Shao (zs...@apache.org) * Edward Capriolo (appodic...@apache.org) * Raghotham Murthy (r...@apache.org) * Ning Zhang (nzh...@apache.org) * Paul Yang (pa...@apache.org) * He Yongqiang (he yongqi...@apache.org) * Prasad Chakka (pras...@apache.org) * Joydeep Sen Sarma (jsensa...@apache.org) * Ashish Thusoo (athu...@apache.org) NOW, THEREFORE, BE IT FURTHER RESOLVED, that Namit Jain be appointed to the office of Vice President, Apache Hive, to serve in accordance with and subject to the direction of the Board of Directors and the Bylaws of the Foundation until death, resignation, retirement, removal or disqualification, or until a successor is appointed; and be it further RESOLVED, that the initial Apache Hive PMC be and hereby is tasked with the creation of a set of bylaws intended to encourage open development and increased participation in the Apache Hive Project; and be it further RESOLVED, that the Apache Hive Project be and hereby is tasked with the migration and rationalization of the Apache Hadoop Hive sub-project; and be it further RESOLVED, that all responsibilities pertaining to the Apache Hive sub-project encumbered upon the Apache Hadoop Project are hereafter discharged.
RE: [DISCUSSION] Move to become a TLP
Thanks everyone who voted. Looks like this is unanimous at this point. I will start the proceedings in the Hadoop PMC to make Hive a TLP. Ashish -Original Message- From: Paul Yang [mailto:py...@facebook.com] Sent: Thursday, August 19, 2010 4:05 PM To: hive-dev@hadoop.apache.org Subject: RE: [DISCUSSION] Move to become a TLP +1 -Original Message- From: Joydeep Sen Sarma [mailto:jssa...@facebook.com] Sent: Thursday, August 19, 2010 3:30 PM To: hive-dev@hadoop.apache.org Subject: RE: [DISCUSSION] Move to become a TLP +1 -Original Message- From: Carl Steinbach [mailto:c...@cloudera.com] Sent: Thursday, August 19, 2010 3:18 PM To: hive-dev@hadoop.apache.org Subject: Re: [DISCUSSION] Move to become a TLP +1 On Thu, Aug 19, 2010 at 3:15 PM, Ning Zhang nzh...@facebook.com wrote: +1 as well. On Aug 19, 2010, at 3:06 PM, Zheng Shao wrote: +1. Zheng On Mon, Aug 16, 2010 at 11:58 AM, John Sichi jsi...@facebook.com wrote: +1 from me. The momentum on cross-company collaboration we're +seeing now, plus big integration contributions such as the new storage handlers (HyperTable and Cassandra), are all signs that Hive is growing up fast. HBase recently took the same route, so I'm going to have a chat with Jonathan Gray to find out what that involved for them. JVS On Aug 14, 2010, at 4:42 PM, Jeff Hammerbacher wrote: Yes, I think Hive is ready to become a TLP. On Fri, Aug 13, 2010 at 1:36 PM, Ashish Thusoo athu...@facebook.com wrote: Nice one Ed... Folks, Please chime in. I think we should close this out next week one way or the other. We can consider this a vote at this point, so please vote on this issue. Thanks, Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, August 12, 2010 8:05 AM To: hive-dev@hadoop.apache.org Subject: Re: [DISCUSSION] Move to become a TLP On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo athu...@facebook.com wrote: Folks, This question has come up in the PMC once again and would be great to hear once more on this topic. What do people think? Are we ready to become a TLP? Thanks, Ashish I thought of one more benefit. We can rename our packages from org.apache.hadoop.hive.* to org.apache.hive.* :) -- Yours, Zheng http://www.linkedin.com/in/zshao
RE: [DISCUSSION] Move to become a TLP
Nice one Ed... Folks, Please chime in. I think we should close this out next week one way or the other. We can consider this a vote at this point, so please vote on this issue. Thanks, Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, August 12, 2010 8:05 AM To: hive-dev@hadoop.apache.org Subject: Re: [DISCUSSION] Move to become a TLP On Wed, Aug 11, 2010 at 9:15 PM, Ashish Thusoo athu...@facebook.com wrote: Folks, This question has come up in the PMC once again and would be great to hear once more on this topic. What do people think? Are we ready to become a TLP? Thanks, Ashish I thought of one more benefit. We can rename our packages from org.apache.hadoop.hive.* to org.apache.hive.* :)
[DISCUSSION] Move to become a TLP
Folks, This question has come up in the PMC once again and would be great to hear once more on this topic. What do people think? Are we ready to become a TLP? Thanks, Ashish
RE: Hive should start moving to the new hadoop mapreduce api.
+1 to this Ashish -Original Message- From: yongqiang he [mailto:heyongqiang...@gmail.com] Sent: Thursday, July 29, 2010 10:54 AM To: hive-dev@hadoop.apache.org Subject: Hive should start moving to the new hadoop mapreduce api. Hi all, In offline discussions when we fixing HIVE-1492, we think it maybe good now to start thinking to move Hive to use new MapReduce context API, and also start deprecating Hadoop-0.17.0 support in Hive. Basically the new MapReduce API gives Hive more control at runtime. Any thoughts on this? Thanks
RE: Hive should start moving to the new hadoop mapreduce api.
Yes these are mutually exclusive. Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, July 29, 2010 11:20 AM To: hive-dev@hadoop.apache.org Subject: Re: Hive should start moving to the new hadoop mapreduce api. Aren't these things mutually exclusive? The new Map Reduce API appeared in 20. Deprecating 17 seems reasonable, but we still have to support the old api for 18 and 19 correct? On Thu, Jul 29, 2010 at 2:11 PM, Ashish Thusoo athu...@facebook.com wrote: +1 to this Ashish -Original Message- From: yongqiang he [mailto:heyongqiang...@gmail.com] Sent: Thursday, July 29, 2010 10:54 AM To: hive-dev@hadoop.apache.org Subject: Hive should start moving to the new hadoop mapreduce api. Hi all, In offline discussions when we fixing HIVE-1492, we think it maybe good now to start thinking to move Hive to use new MapReduce context API, and also start deprecating Hadoop-0.17.0 support in Hive. Basically the new MapReduce API gives Hive more control at runtime. Any thoughts on this? Thanks
RE: Hive should start moving to the new hadoop mapreduce api.
Before deciding that, we should pool the user list to see if this would be too disruptive for anyone.. Ashish -Original Message- From: Ning Zhang [mailto:nzh...@facebook.com] Sent: Thursday, July 29, 2010 12:18 PM To: hive-dev@hadoop.apache.org Subject: Re: Hive should start moving to the new hadoop mapreduce api. Maybe we should decide hive-0.7 as the last branch to support hadoop pre-0.20 API and later branches of Hive will be switched to the new hadoop API? On Jul 29, 2010, at 11:53 AM, Ashish Thusoo wrote: Yes these are mutually exclusive. Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, July 29, 2010 11:20 AM To: hive-dev@hadoop.apache.org Subject: Re: Hive should start moving to the new hadoop mapreduce api. Aren't these things mutually exclusive? The new Map Reduce API appeared in 20. Deprecating 17 seems reasonable, but we still have to support the old api for 18 and 19 correct? On Thu, Jul 29, 2010 at 2:11 PM, Ashish Thusoo athu...@facebook.com wrote: +1 to this Ashish -Original Message- From: yongqiang he [mailto:heyongqiang...@gmail.com] Sent: Thursday, July 29, 2010 10:54 AM To: hive-dev@hadoop.apache.org Subject: Hive should start moving to the new hadoop mapreduce api. Hi all, In offline discussions when we fixing HIVE-1492, we think it maybe good now to start thinking to move Hive to use new MapReduce context API, and also start deprecating Hadoop-0.17.0 support in Hive. Basically the new MapReduce API gives Hive more control at runtime. Any thoughts on this? Thanks
RE: [howldev] Initial thoughts on authorization in howl
Hi Pradeep, I get from this note that the authorization that you are talking about here are basically the management of the permissions on the hdfs directories corresponding to the tables and the partitions. So from that angle this sounds good to me. There is a whole set of permissions/authorizations with regard to the metadata operations themselves eg. Who should be able to run an alter table add column or describe table etc. I presume that would be beyond the scope of this change and would come in later? I am thinking more in terms of the permissions model that is supported in SQL using GRANT statements etc. I also presume that by conf variables you mean the key value properties that Hive can store in the metadata and not the hive conf variables, right? Ashish -Original Message- From: John Sichi [mailto:jsi...@facebook.com] Sent: Wednesday, July 28, 2010 2:22 PM To: hive-dev@hadoop.apache.org Subject: Fwd: [howldev] Initial thoughts on authorization in howl Begin forwarded message: From: Pradeep Kamath prade...@yahoo-inc.commailto:prade...@yahoo-inc.com Date: July 27, 2010 4:38:42 PM PDT To: howl...@yahoogroups.commailto:howl...@yahoogroups.com Subject: [howldev] Initial thoughts on authorization in howl Reply-To: howl...@yahoogroups.commailto:howl...@yahoogroups.com The initial thoughts on authorization in howl are to model authorization (for DDL ops like create table/drop table/add partition etc) after hdfs permissions. To be able to do this, we would like to extend createTable() to add the ability to record a different group from the user's primary group and to record the complete unix permissions on the table directory. Also, we would like to have a way for partition directories to inherit permissions and group information based on the table directory. To keep the metastore backward compatible for use with hive, I propose having conf variables to achieve these objectives: - table.group.namehttp://table.group.name - value will indicate the name of the unix group for the table directory. This will be used by createTable() to perform a chgrp to the value provided. This property will provide the user the ability to choose from one of the many unix groups he is part of to associate with the table. - table.permissions - value will be of the form rwxrwxrwx to indicate read-write-execute permissions on the table directory. This will be used by createTable() to perform a chmod to the value provided. This will let the user decide what permissions he wants on the table. - partitions.inherit.permissions - a value of true will indicate that partitions inherit the group name and permissions of the table level directory. This will be used by addPartition() to perform a chgrp and chmod to the values as on the table directory. I favor conf properties over API changes since the complete authorization design for hive is not finalized yet. These properties can be deprecated/removed when that is in place. These properties would also be useful to some installation of vanilla hive since at least DFS level authorization can now be achieved by hive without the user having to manually perform chgrp and chmod operations on DFS. I would like to hear from hive developers/committers whether this would be acceptable for hive and also thoughts from others. Pradeep __._,_.___ Your email settings: Individual Email|Traditional Change settings via the Webhttp://groups.yahoo.com/group/howldev/join;_ylc=X3oDMTJnZXE5ZHNwBF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNzdG5ncwRzdGltZQMxMjgwMjczOTQ2 (Yahoo! ID required) Change settings via email: Switch delivery to Daily Digestmailto:howldev-dig...@yahoogroups.com?subject=email%20delivery:%20Digest | Switch to Fully Featuredmailto:howldev-fullfeatu...@yahoogroups.com?subject=change%20delivery%20format:%20Fully%20Featured Visit Your Group http://groups.yahoo.com/group/howldev;_ylc=X3oDMTJlOWw0Y3F0BF9TAzk3NDc2NTkwBGdycElkAzYzNDIwNTA4BGdycHNwSWQDMTcwNzI4MTk0MgRzZWMDZnRyBHNsawNocGYEc3RpbWUDMTI4MDI3Mzk0Ng-- | Yahoo! Groups Terms of Use http://docs.yahoo.com/info/terms/ | Unsubscribe mailto:howldev-unsubscr...@yahoogroups.com?subject=unsubscribe __,_._,___
RE: Hive Web Interface Broken YET AGAIN!
Can you point to the JIRA that introduced this problem? Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, July 29, 2010 7:38 AM To: hive-dev@hadoop.apache.org Subject: Hive Web Interface Broken YET AGAIN! All, While the web interface is not as widely used as the cli, people do use it. Its init process has been broken 3 times I can remember (once by the shims), once by adding version numbers to the jars, and now it is affected by the libjars. [r...@etl02 ~]# hive --service hwi Exception in thread main java.io.IOException: Error opening job jar: -libjars at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:114) at java.util.jar.JarFile.init(JarFile.java:133) at java.util.jar.JarFile.init(JarFile.java:70) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) I notice someone patched the cli do deal with this. There is no test coverage for the shell scripts. But it seems like only some of the scripts were repaired: bin/ext/cli.sh bin/ext/lineage.sh bin/ext/metastore.sh I wonder why only half the scripts are repaired? In general if something changes in hive or hadoop that causes the cli to break we should fix it across the board. I feel like every time a release is coming up I test drive the web interface to find a simple script problem stops it from running. Edward
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892932#action_12892932 ] Ashish Thusoo commented on HIVE-417: Started looking at this. One initial question I had - why is virtualcolumn class in the serde2 package? Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892939#action_12892939 ] Ashish Thusoo commented on HIVE-417: Also, how is the file name populated? That is not done through the IOContext? Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing-8-thrift-metastore-remodel.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, hive.indexing.11.patch, idx2.png, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo reassigned HIVE-1264: --- Assignee: Venkatesh S Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Reporter: Jeff Hammerbacher Assignee: Venkatesh S Attachments: HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12892517#action_12892517 ] Ashish Thusoo commented on HIVE-1264: - Can these changes be packed in the shims layer. So all the calls can be replaced with a call to shims with the shim for 20.1xx doing the right thing. Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Reporter: Jeff Hammerbacher Assignee: Venkatesh S Attachments: HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884685#action_12884685 ] Ashish Thusoo commented on HIVE-417: Looked at the code and have some questions... Can you explain how the metastore object model is laid out. It seems that the table names of the index are stored in key value properties of the table that the index is created on. Is that correct? Would it be better to put a key reference from the index table to the base table instead (similar to what is done for partitions)? Also, how would this be used to query the table? Can you give an example? Is the idea here to select from the index an then pass the offsets to another query to look up the table? An example or a test which shows the query on the base table would be useful. Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing.3.patch, hive-indexing.5.thrift.patch, indexing_with_ql_rewrites_trunk_953221.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-287) count distinct on multiple columns does not work
[ https://issues.apache.org/jira/browse/HIVE-287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884699#action_12884699 ] Ashish Thusoo commented on HIVE-287: @John Another disadvantage for doing C that I can think of is the fact that count would become a keyword and then any column names would have to be quoted. Not a big deal but just something that would be a side effect of going with C. count distinct on multiple columns does not work Key: HIVE-287 URL: https://issues.apache.org/jira/browse/HIVE-287 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Arvind Prabhakar Attachments: HIVE-287-1.patch, HIVE-287-2.patch, HIVE-287-3.patch, HIVE-287-4.patch The following query does not work: select count(distinct col1, col2) from Tbl -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1449) Table aliases in order by clause lead to semantic analysis failure
Table aliases in order by clause lead to semantic analysis failure -- Key: HIVE-1449 URL: https://issues.apache.org/jira/browse/HIVE-1449 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Fix For: 0.7.0 A simple statement of the form select a.account_id, count(1) from tmp_ash_test2 a group by a.account_id order by a.account_id; throws a sematic analysis exception where as select a.account_id, count(1) from tmp_ash_test2 a group by a.account_id order by account_id; works fine (the second query does not have the table alias a in the order by clause. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1428) ALTER TABLE ADD PARTITION fails with a remote Thirft metastore
[ https://issues.apache.org/jira/browse/HIVE-1428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884401#action_12884401 ] Ashish Thusoo commented on HIVE-1428: - Is your question the fact that build.dir is an empty string? build.dir gets defined in build-common.xml which inturn picks properties from build.properties. The build.xml in the metastore directory includes build-common.xml so it should be getting build.dir. How are you running this test? ALTER TABLE ADD PARTITION fails with a remote Thirft metastore -- Key: HIVE-1428 URL: https://issues.apache.org/jira/browse/HIVE-1428 Project: Hadoop Hive Issue Type: Bug Components: Metastore Affects Versions: 0.6.0, 0.7.0 Reporter: Paul Yang Attachments: HIVE-1428.patch, TestHiveMetaStoreRemote.java If the hive cli is configured to use a remote metastore, ALTER TABLE ... ADD PARTITION commands will fail with an error similar to the following: [prade...@chargesize:~/dev/howl]hive --auxpath ult-serde.jar -e ALTER TABLE mytable add partition(datestamp = '20091101', srcid = '10',action) location '/user/pradeepk/mytable/20091101/10'; 10/06/16 17:08:59 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively Hive history file=/tmp/pradeepk/hive_job_log_pradeepk_201006161709_1934304805.txt FAILED: Error in metadata: org.apache.thrift.TApplicationException: get_partition failed: unknown result FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask [prade...@chargesize:~/dev/howl] This is due to a check that tries to retrieve the partition to see if it exists. If it does not, an attempt is made to pass a null value from the metastore. Since thrift does not support null return values, an exception is thrown. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: predicate pushdown, HIVE-1395, and HIVE-1342
I will look into those. Ashish -Original Message- From: John Sichi [mailto:jsi...@facebook.com] Sent: Monday, June 28, 2010 4:54 PM To: hive-dev@hadoop.apache.org Subject: predicate pushdown, HIVE-1395, and HIVE-1342 Could the person who originally developed predicate pushdown take a look at these two bugs and add hints? Thanks, JVS
[jira] Updated: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1271: Status: Resolved (was: Patch Available) Resolution: Fixed Committed. Thanks Arvind! Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: 6.0 and trunk look broken to me
Not sure if this is just my env but on 0.6.0 when I run the unit tests I get a bunch of errors of the following form: [junit] Begin query: alter3.q [junit] java.lang.NoSuchFieldError: HIVESESSIONSILENT [junit] at org.apache.hadoop.hive.ql.exec.ExecDriver.main(ExecDriver.java:1052) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [junit] at java.lang.reflect.Method.invoke(Method.java:597) [junit] at org.apache.hadoop.util.RunJar.main(RunJar.java:155) [junit] at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194) [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) [junit] at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) [junit] at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220) [junit] -Original Message- From: John Sichi [mailto:jsi...@facebook.com] Sent: Wednesday, June 23, 2010 2:15 PM To: hive-dev@hadoop.apache.org Subject: Re: 6.0 and trunk look broken to me (You mean 0.6, right?) I'm not able to reproduce this (just tested with latest trunk on Linux and Mac). Is anyone else seeing it? JVS On Jun 23, 2010, at 1:51 PM, Edward Capriolo wrote: Trunk and 6.0 both show this in hadoop local mode and hadoop distributed mode. export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_loca edw...@ec dist]$ export HADOOP_HOME=/home/edward/hadoop/hadoop-0.20.2_local[edw...@ec dist]$ bin/hive Hive history file=/tmp/edward/hive_job_log_edward_201006231647_1723542005.txt hive show tables; FAILED: Parse Error: line 0:-1 cannot recognize input 'EOF' [edw...@ec dist]$ more /tmp/edward/hive.log 2010-06-23 16:41:00,749 ERROR ql.Driver (SessionState.java:printError(277)) - FAILED: Parse Error: line 0:-1 cannot recognize input 'EOF' org.apache.hadoop.hive.ql.parse.ParseException: line 0:-1 cannot recognize input 'EOF' at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:401) at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:299) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:379) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:302) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881998#action_12881998 ] Ashish Thusoo commented on HIVE-1271: - I have committed this to trunk and will commit to 0.6.0 soon. One thing I did overlook though. We should add a test case for this. Can you do that as part of another JIRA as this one is already partially committed. Thanks, Ashish Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881306#action_12881306 ] Ashish Thusoo commented on HIVE-1271: - I am looking at this. Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1271) Case sensitiveness of type information specified when using custom reducer causes type mismatch
[ https://issues.apache.org/jira/browse/HIVE-1271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12881319#action_12881319 ] Ashish Thusoo commented on HIVE-1271: - Looks good to me. However, why remove the check on Category? Also why drop the default implementation of the equals method for TypeInfo? Case sensitiveness of type information specified when using custom reducer causes type mismatch --- Key: HIVE-1271 URL: https://issues.apache.org/jira/browse/HIVE-1271 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Arvind Prabhakar Fix For: 0.6.0 Attachments: HIVE-1271-1.patch, HIVE-1271.patch Type information specified while using a custom reduce script is converted to lower case, and causes type mismatch during query semantic analysis . The following REDUCE query where field name = userId failed. hive CREATE TABLE SS ( a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ); OK hive FROM (select * from srcTable DISTRIBUTE BY id SORT BY id) s INSERT OVERWRITE TABLE SS REDUCE * USING 'myreduce.py' AS (a INT, b INT, vals ARRAYSTRUCTuserId:INT, y:STRING ) ; FAILED: Error in semantic analysis: line 2:27 Cannot insert into target table because column number/types are different SS: Cannot convert column 2 from arraystructuserId:int,y:string to arraystructuserid:int,y:string. The same query worked fine after changing userId to userid. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Vertical partitioning
If you are querying this data again and again you could just create another table which has only those 10 columns (more like a materialized view approach - though that is not there in Hive yet.) This ofcourse uses up some space as compared to vertical partitioning but if the rcfile performance is not good enough, this could be the workaround for now. Also do you see a lot more time spent on I/O in your queries? Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Thursday, June 17, 2010 9:02 AM To: hive-dev@hadoop.apache.org Subject: Re: Vertical partitioning On Thu, Jun 17, 2010 at 3:00 AM, jaydeep vishwakarma jaydeep.vishwaka...@mkhoj.com wrote: Just looking opportunity and feasibility for it. In one of my table have more than 20 fields where most of the time I need only 10 main fields. We rarely need other fields for day to day analysis. Regards, Jaydeep Ning Zhang wrote: Hive support columnar storage (RCFile) but not vertical partitioning. Is there any use case for vertical partitioning? On Jun 16, 2010, at 6:41 AM, jaydeep vishwakarma wrote: Hi, Does hive support Vertical partitioning? Regards, Jaydeep The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. The information contained in this communication is intended solely for the use of the individual or entity to whom it is addressed and others authorized to receive it. It may contain confidential or legally privileged information. If you are not the intended recipient you are hereby notified that any disclosure, copying, distribution or taking any action in reliance on the contents of this information is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us immediately by responding to this email and then delete it from your system. The firm is neither liable for the proper and complete transmission of the information contained in this communication nor for any delay in its receipt. Vertical partitioning is just as practical in a traditional RDBMS as it would be in hive. Normally you would do it for a few reasons: 1) You have some rarely used columns and you want to reduce the table/row size 2) Your DBMS has terrible blob/clob/text support and the only want to get large objects out of your way is to put them in other tables. If you go the option of vertical partitioning in hive, you may have to join to select the columns you need. I do not consider row serialization and de serialization to be the majority of a hive job, and in most cases hadoop handles 1 large file better then two smaller ones. Then again we have some tables 140+ columns so i can see vertical partitioning helping with those tables but it doubles the management.
RE: how to set the debut parameters of hive?
I think if you just pass the java parameters on the command line it should just work. So bin/hive And your parameters. I have not tried it though, mostly I am just able to debug using eclipse (you can create the related eclipse files by doing cd metastore ant model-jar cd .. ant eclipse-files Ashish -Original Message- From: Zhou Shuaifeng [mailto:zhoushuaif...@huawei.com] Sent: Friday, June 11, 2010 12:00 AM To: hive-dev@hadoop.apache.org Cc: ac.pi...@huawei.com Subject: how to set the debut parameters of hive? Hi, I want to debug hive remotely, how to set the config? E.g. debug hdfs is seeting DEBUG_PARAMETERS in the file 'bin/hadoop', so, how to set the debug parameters of hive? Thanks a lot. - This e-mail and its attachments contain confidential information from HUAWEI, which is intended only for the person or entity whose address is listed above. Any use of the information contained herein in any way (including, but not limited to, total or partial disclosure, reproduction, or dissemination) by persons other than the intended recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender by phone or email immediately and delete it!
[jira] Updated: (HIVE-1373) Missing connection pool plugin in Eclipse classpath
[ https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1373: Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Fix Version/s: 0.6.0 Resolution: Fixed Committed. Thanks Vinithra!! Missing connection pool plugin in Eclipse classpath --- Key: HIVE-1373 URL: https://issues.apache.org/jira/browse/HIVE-1373 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Environment: Eclipse, Linux Reporter: Vinithra Varadharajan Assignee: Vinithra Varadharajan Fix For: 0.6.0 Attachments: HIVE-1373.patch In a recent checkin, connection pool dependency was introduced but eclipse .classpath file was not updated. This causes launch configurations from within Eclipse to fail. {code} hive show tables; show tables; 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 10/05/26 14:59:46 INFO ql.Driver: query plan = file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException org.apache.hadoop.hive.ql.metadata.HiveException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472) at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303) Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:208) at org.apache.hadoop.hive.metastore.ObjectStore.initialize
[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column
[ https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877233#action_12877233 ] Ashish Thusoo commented on HIVE-1397: - +1. This would be a cool contribution. histogram() UDAF for a numerical column --- Key: HIVE-1397 URL: https://issues.apache.org/jira/browse/HIVE-1397 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Assignee: Mayank Lahiri Fix For: 0.6.0 A histogram() UDAF to generate an approximate histogram of a numerical (byte, short, double, long, etc.) column. The result is returned as a map of (x,y) histogram pairs, and can be plotted in Gnuplot using impulses (for example). The algorithm is currently adapted from A streaming parallel decision tree algorithm by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space proportional to the number of histogram bins specified. It has no approximation guarantees, but seems to work well when there is a lot of data and a large number (e.g. 50-100) of histogram bins specified. A typical call might be: SELECT histogram(val, 10) FROM some_table; where the result would be a histogram with 10 bins, returned as a Hive map object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1139) GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys
[ https://issues.apache.org/jira/browse/HIVE-1139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877232#action_12877232 ] Ashish Thusoo commented on HIVE-1139: - Arvind, I thought the whole point of this JIRA was to make HashMapWrapper to support java.util.Map, no? If that would be a separate JIRA, what would this one be for? Sorry for being a bit dense here but if you could clarify that would be great. Thanks, Ashish GroupByOperator sometimes throws OutOfMemory error when there are too many distinct keys Key: HIVE-1139 URL: https://issues.apache.org/jira/browse/HIVE-1139 Project: Hadoop Hive Issue Type: Bug Reporter: Ning Zhang Assignee: Arvind Prabhakar When a partial aggregation performed on a mapper, a HashMap is created to keep all distinct keys in main memory. This could leads to OOM exception when there are too many distinct keys for a particular mapper. A workaround is to set the map split size smaller so that each mapper takes less number of rows. A better solution is to use the persistent HashMapWrapper (currently used in CommonJoinOperator) to spill overflow rows to disk. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1398) Support union all without an outer select *
Support union all without an outer select * --- Key: HIVE-1398 URL: https://issues.apache.org/jira/browse/HIVE-1398 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo In hive for union alls the query has to be wrapped in an sub query as shown below: select * from (select c1 from t1 union all select c2 from t2); This JIRA proposes to fix that to support select c1 from t1 union all select c2 from t2; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-417) Implement Indexing in Hive
[ https://issues.apache.org/jira/browse/HIVE-417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877236#action_12877236 ] Ashish Thusoo commented on HIVE-417: A couple of comments on this: A complication that happens by doing a rewrite just after parse is that you loose the ability to report back errors that correspond to the original query. Also the metadata that you need to do the rewrite is only available after phase 1 of semantic analysis. So in my opinion the rewrite should be done after semantic analysis but before plan generation. Is that what you had in mind... so something like... [Query parser] [Query semantic analysis] [Query optimization] ... Implement Indexing in Hive -- Key: HIVE-417 URL: https://issues.apache.org/jira/browse/HIVE-417 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Affects Versions: 0.3.0, 0.3.1, 0.4.0, 0.6.0 Reporter: Prasad Chakka Assignee: He Yongqiang Attachments: hive-417.proto.patch, hive-417-2009-07-18.patch, hive-indexing.3.patch Implement indexing on Hive so that lookup and range queries are efficient. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath
[ https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872962#action_12872962 ] Ashish Thusoo commented on HIVE-1373: - 1 copy is anyway done from lib to dist/lib for these jars. If we go directly to ivy we would copy things from the ivy cache to dist/lib. So the number of copies in the build process would remain the same, no? There is of course the first time overhead of downloading these jars from their repos to the ivy cache. Missing connection pool plugin in Eclipse classpath --- Key: HIVE-1373 URL: https://issues.apache.org/jira/browse/HIVE-1373 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Environment: Eclipse, Linux Reporter: Vinithra Varadharajan Assignee: Vinithra Varadharajan Priority: Minor Attachments: HIVE-1373.patch In a recent checkin, connection pool dependency was introduced but eclipse .classpath file was not updated. This causes launch configurations from within Eclipse to fail. {code} hive show tables; show tables; 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 10/05/26 14:59:46 INFO ql.Driver: query plan = file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException org.apache.hadoop.hive.ql.metadata.HiveException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472) at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303) Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698
[jira] Assigned: (HIVE-1368) Hive JDBC Integration with SQuirrel SQL Client support Enhanced
[ https://issues.apache.org/jira/browse/HIVE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo reassigned HIVE-1368: --- Assignee: Sunil Kumar Sunil, I have added you as a contributor so you can assign JIRAs to yourself. Hive JDBC Integration with SQuirrel SQL Client support Enhanced --- Key: HIVE-1368 URL: https://issues.apache.org/jira/browse/HIVE-1368 Project: Hadoop Hive Issue Type: Improvement Components: Clients Affects Versions: 0.5.0 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1 Reporter: Sunil Kumar Assignee: Sunil Kumar Fix For: 0.5.0 Attachments: Hive JDBC Integration with SQuirrel SQL Client support Enhanced.doc, SQLClient_support.patch Hive JDBC Integration with SQuirrel SQL Client support Enhanced:- Hive JDBC Client enhanced to browse hive default schema tables through Squirrel SQL Client. This enhancement help to browse the hive table's structure i.e. table's column and their data type in the Squirrel SQL client interface and SQL query can be also performed on the tables through Squirrel SQL client. To enable this following Hive JDBC Java files are modified and added:- 1.Methods of org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.java are updated. 2.Hive org.apache.hadoop.hive.jdbc.ResultSet.java updated and extended (org.apache.hadoop.hive.jdbc.ExtendedHiveResultSet.java) to support additional JDBC metadata 3.Methods of org.apache.hadoop.hive.jdbc. HiveResultSetMetaData are updated. 4.Methods of org.apache.hadoop.hive.jdbc. HiveConnection are updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1368) Hive JDBC Integration with SQuirrel SQL Client support Enhanced
[ https://issues.apache.org/jira/browse/HIVE-1368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872965#action_12872965 ] Ashish Thusoo commented on HIVE-1368: - In my opinion best would be to load this patch to HIVE-1126 and name it for 0.5.0 in case others want to use it for 0.5.0 and mark this JIRA as a duplicate of that one. Hive JDBC Integration with SQuirrel SQL Client support Enhanced --- Key: HIVE-1368 URL: https://issues.apache.org/jira/browse/HIVE-1368 Project: Hadoop Hive Issue Type: Improvement Components: Clients Affects Versions: 0.5.0 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1 Reporter: Sunil Kumar Assignee: Sunil Kumar Fix For: 0.5.0 Attachments: Hive JDBC Integration with SQuirrel SQL Client support Enhanced.doc, SQLClient_support.patch Hive JDBC Integration with SQuirrel SQL Client support Enhanced:- Hive JDBC Client enhanced to browse hive default schema tables through Squirrel SQL Client. This enhancement help to browse the hive table's structure i.e. table's column and their data type in the Squirrel SQL client interface and SQL query can be also performed on the tables through Squirrel SQL client. To enable this following Hive JDBC Java files are modified and added:- 1.Methods of org.apache.hadoop.hive.jdbc.HiveDatabaseMetaData.java are updated. 2.Hive org.apache.hadoop.hive.jdbc.ResultSet.java updated and extended (org.apache.hadoop.hive.jdbc.ExtendedHiveResultSet.java) to support additional JDBC metadata 3.Methods of org.apache.hadoop.hive.jdbc. HiveResultSetMetaData are updated. 4.Methods of org.apache.hadoop.hive.jdbc. HiveConnection are updated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement
[ https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo reassigned HIVE-1346: --- Assignee: Sunil Kumar Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement -- Key: HIVE-1346 URL: https://issues.apache.org/jira/browse/HIVE-1346 Project: Hadoop Hive Issue Type: Bug Components: Clients Affects Versions: 0.5.0 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1 Reporter: Sunil Kumar Assignee: Sunil Kumar Priority: Minor Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, HIVE-1346_patch.patch when where clause used in the hive query hive -ResultSetMetaData does not give original table column name. While when where clause not used ResultSetMetaData gives original table column names. I have used following code:- String tableName = user; String sql = select * from + tableName + where id=1; result = stmt.executeQuery(sql); ResultSetMetaData metaData = result.getMetaData(); int columnCount = metaData.getColumnCount(); for (int i = 1; i = columnCount; i++) { System.out.println(Column name: + metaData.getColumnName(i)); } executing above code i got following result:- Column name:_col1 Column name:_col2 while original user table columns names were (id,name). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement
[ https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872968#action_12872968 ] Ashish Thusoo commented on HIVE-1346: - Hi Sunil, Have you created this patch on 0.5.0 branch or trunk? Are you proposing that this goes into both 0.5.1 and trunk? Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement -- Key: HIVE-1346 URL: https://issues.apache.org/jira/browse/HIVE-1346 Project: Hadoop Hive Issue Type: Bug Components: Clients Affects Versions: 0.5.0 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1 Reporter: Sunil Kumar Assignee: Sunil Kumar Priority: Minor Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, HIVE-1346_patch.patch when where clause used in the hive query hive -ResultSetMetaData does not give original table column name. While when where clause not used ResultSetMetaData gives original table column names. I have used following code:- String tableName = user; String sql = select * from + tableName + where id=1; result = stmt.executeQuery(sql); ResultSetMetaData metaData = result.getMetaData(); int columnCount = metaData.getColumnCount(); for (int i = 1; i = columnCount; i++) { System.out.println(Column name: + metaData.getColumnName(i)); } executing above code i got following result:- Column name:_col1 Column name:_col2 while original user table columns names were (id,name). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1346) Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement
[ https://issues.apache.org/jira/browse/HIVE-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872975#action_12872975 ] Ashish Thusoo commented on HIVE-1346: - @Namit, in what cases would colAlias ever be null. There seems to be code which checks for this around line 3314 in the trunk branch. But afaik we should always be generating a colAlias (at least the default ones). Just wanted to make sure that we are covering all the basis with this fix. Ashish Table column name changed to _col1,_col2 ..._coln when where clause used in the select quert statement -- Key: HIVE-1346 URL: https://issues.apache.org/jira/browse/HIVE-1346 Project: Hadoop Hive Issue Type: Bug Components: Clients Affects Versions: 0.5.0 Environment: ubuntu8.04, jdk-6,hive-0.5.0, hadoop-0.20.1 Reporter: Sunil Kumar Assignee: Sunil Kumar Priority: Minor Attachments: HIVE-1346_patch.patch, HIVE-1346_patch.patch, HIVE-1346_patch.patch when where clause used in the hive query hive -ResultSetMetaData does not give original table column name. While when where clause not used ResultSetMetaData gives original table column names. I have used following code:- String tableName = user; String sql = select * from + tableName + where id=1; result = stmt.executeQuery(sql); ResultSetMetaData metaData = result.getMetaData(); int columnCount = metaData.getColumnCount(); for (int i = 1; i = columnCount; i++) { System.out.println(Column name: + metaData.getColumnName(i)); } executing above code i got following result:- Column name:_col1 Column name:_col2 while original user table columns names were (id,name). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1375) dynamic partitions should not create some of the partitions if the query fails
[ https://issues.apache.org/jira/browse/HIVE-1375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872976#action_12872976 ] Ashish Thusoo commented on HIVE-1375: - An example would be great to help explain this problem better? Thanks, Ashish dynamic partitions should not create some of the partitions if the query fails -- Key: HIVE-1375 URL: https://issues.apache.org/jira/browse/HIVE-1375 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Namit Jain Assignee: Ning Zhang Fix For: 0.6.0 Currently, if a bad row exists, which cannot be part of a partitioning column, it fails - but some of the partitions may already have been created -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1374) Query compile-only option
[ https://issues.apache.org/jira/browse/HIVE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872981#action_12872981 ] Ashish Thusoo commented on HIVE-1374: - Is doing explain on the query enough? If the proposal to convert queries into explains when run with -c option? Also consider the following example in a query.hql script.. create table foo(bar string); insert overwrite table foo select c1 from old_foo; What would happen to the create statement in this compile only option? Maybe it is better to provide a switch to do parse only checks? Query compile-only option - Key: HIVE-1374 URL: https://issues.apache.org/jira/browse/HIVE-1374 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Paul Yang Assignee: Paul Yang A compile-only option might be useful for helping users quickly prototype queries, fix errors, and do test runs. The proposed change would be adding a -c switch that behaves like -e but only compiles the specified query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1372) New algorithm for variance() UDAF
[ https://issues.apache.org/jira/browse/HIVE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1372: Status: Patch Available (was: Open) Hi Mayank, Thanks for the contribution. Please do a submit patch when you put up a patch for a JIRA. Thanks, Ashish New algorithm for variance() UDAF - Key: HIVE-1372 URL: https://issues.apache.org/jira/browse/HIVE-1372 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Priority: Minor Fix For: 0.6.0 Attachments: HIVE-1372.patch A new algorithm for the UDAF that computes variance. This is pretty much a drop-in replacement for the current UDAF, and has two benefits: provably numerically stable (reference included in comments), and reduces arithmetic operations by about half. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1372) New algorithm for variance() UDAF
[ https://issues.apache.org/jira/browse/HIVE-1372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo reassigned HIVE-1372: --- Assignee: Mayank Lahiri Also I have added you as a contributor, so you should be able to assign JIRAs to yourself. Thanks, Ashish New algorithm for variance() UDAF - Key: HIVE-1372 URL: https://issues.apache.org/jira/browse/HIVE-1372 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Assignee: Mayank Lahiri Priority: Minor Fix For: 0.6.0 Attachments: HIVE-1372.patch A new algorithm for the UDAF that computes variance. This is pretty much a drop-in replacement for the current UDAF, and has two benefits: provably numerically stable (reference included in comments), and reduces arithmetic operations by about half. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1359) Unit test should be shim-aware
[ https://issues.apache.org/jira/browse/HIVE-1359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872990#action_12872990 ] Ashish Thusoo commented on HIVE-1359: - +1 to all the great suggestions in this discussion... I have one more thing to add. Would it be more maintainable to associate the include/exclude information with the test as the key as opposed to the version being the key i.e. instead of 0.20.0 include - test1.q, test2.q .. exclude - test3.q 0.17.0 include - test3.q exclude - test1.q we do test1.q exclude - 0.17.0 test2.q include - = 0.17.0 or something on that line... this may make adding tests to versions fairly easy. Unit test should be shim-aware -- Key: HIVE-1359 URL: https://issues.apache.org/jira/browse/HIVE-1359 Project: Hadoop Hive Issue Type: New Feature Reporter: Ning Zhang Assignee: Ning Zhang Attachments: unit_tests.txt Some features in Hive only works for certain Hadoop versions through shim. However the unit test structure is not shim-aware in that there is only one set of queries and expected outputs for all Hadoop versions. This may not be sufficient when we will have different output for different Hadoop versions. One example is CombineHiveInputFormat wich is only available from Hadoop 0.20. The plan using CombineHiveInputFormat and HiveInputFormat may be different. Another example is archival partitions (HAR) which is also only available from 0.20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1265) Function Registry should should auto-detect UDFs from UDF Description
[ https://issues.apache.org/jira/browse/HIVE-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872993#action_12872993 ] Ashish Thusoo commented on HIVE-1265: - Can you explain more what you mean by it is picking up the test class path? When you get the classes for a package, it should return you all the classes in that package irrespective of the location. +1 to the general approach here. Function Registry should should auto-detect UDFs from UDF Description -- Key: HIVE-1265 URL: https://issues.apache.org/jira/browse/HIVE-1265 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: hive-1265-patch.diff We should be able to register functions dynamically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1371) remove blank in rcfilecat
[ https://issues.apache.org/jira/browse/HIVE-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1371: Status: Patch Available (was: Open) Hi Yongqiang, Please do a submit patch when putting up a patch. Thanks, Ashish remove blank in rcfilecat - Key: HIVE-1371 URL: https://issues.apache.org/jira/browse/HIVE-1371 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive.1371.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1371) remove blank in rcfilecat
[ https://issues.apache.org/jira/browse/HIVE-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872995#action_12872995 ] Ashish Thusoo commented on HIVE-1371: - +1. Will commit. remove blank in rcfilecat - Key: HIVE-1371 URL: https://issues.apache.org/jira/browse/HIVE-1371 Project: Hadoop Hive Issue Type: Bug Reporter: He Yongqiang Assignee: He Yongqiang Attachments: hive.1371.1.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()
[ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872997#action_12872997 ] Ashish Thusoo commented on HIVE-1369: - I I do not see any drawbacks here. I think another requirement from this was that the serialization be such that it is order preserving whereever there is a notion of order, as this serde could also be used to serialize between map/reduce boundaries. So if the implementation takes care of that and does not introduce oerhead I think this would be good. Others, what do you think about this? Ashish LazySimpleSerDe should be able to read classes that support some form of toString() --- Key: HIVE-1369 URL: https://issues.apache.org/jira/browse/HIVE-1369 Project: Hadoop Hive Issue Type: Improvement Reporter: Alex Kozlov Priority: Minor Original Estimate: 2h Remaining Estimate: 2h Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects. It should be pretty easy to extend the class to read any object that implements toString() method. Ideas or concerns? Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1373) Missing connection pool plugin in Eclipse classpath
[ https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo reassigned HIVE-1373: --- Assignee: Vinithra Varadharajan Have added you to the contributors so you should be able to assign things to yourself now. Thx. Missing connection pool plugin in Eclipse classpath --- Key: HIVE-1373 URL: https://issues.apache.org/jira/browse/HIVE-1373 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Environment: Eclipse, Linux Reporter: Vinithra Varadharajan Assignee: Vinithra Varadharajan Priority: Minor Attachments: HIVE-1373.patch In a recent checkin, connection pool dependency was introduced but eclipse .classpath file was not updated. This causes launch configurations from within Eclipse to fail. {code} hive show tables; show tables; 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 10/05/26 14:59:46 INFO ql.Driver: query plan = file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException org.apache.hadoop.hive.ql.metadata.HiveException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472) at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303) Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:208) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:153
[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath
[ https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872237#action_12872237 ] Ashish Thusoo commented on HIVE-1373: - +1. Looks good to me. I think in future we should move all the lib dependencies in the eclipse files to come from build/dist/lib as that will help us migrate more stuff over to ivy. Will run tests and commit once the tests pass. Missing connection pool plugin in Eclipse classpath --- Key: HIVE-1373 URL: https://issues.apache.org/jira/browse/HIVE-1373 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Environment: Eclipse, Linux Reporter: Vinithra Varadharajan Assignee: Vinithra Varadharajan Priority: Minor Attachments: HIVE-1373.patch In a recent checkin, connection pool dependency was introduced but eclipse .classpath file was not updated. This causes launch configurations from within Eclipse to fail. {code} hive show tables; show tables; 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 10/05/26 14:59:46 INFO ql.Driver: query plan = file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException org.apache.hadoop.hive.ql.metadata.HiveException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472) at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303) Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:803) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:698) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:191
[jira] Commented: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
[ https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872238#action_12872238 ] Ashish Thusoo commented on HIVE-802: Should we just mark this as a duplicate of 1176 in that case? Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it - Key: HIVE-802 URL: https://issues.apache.org/jira/browse/HIVE-802 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Todd Lipcon Assignee: Arvind Prabhakar There's a bug in DataNucleus that causes this issue: http://www.jpox.org/servlet/jira/browse/NUCCORE-371 To reproduce, simply put your hive source tree in a directory that contains a '+' character. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-80) Allow Hive Server to run multiple queries simulteneously
[ https://issues.apache.org/jira/browse/HIVE-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872240#action_12872240 ] Ashish Thusoo commented on HIVE-80: --- yes I think what Ning is saying is correct. We should however add a test case to the unit tests to check that. I am not sure that we added a test case for the parallel execution stuff. Allow Hive Server to run multiple queries simulteneously Key: HIVE-80 URL: https://issues.apache.org/jira/browse/HIVE-80 Project: Hadoop Hive Issue Type: Improvement Components: Server Infrastructure Reporter: Raghotham Murthy Assignee: Neil Conway Priority: Critical Attachments: hive_input_format_race-2.patch Can use one driver object per query. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [DISCUSSION] To be (or not to be) a TLP - that is the question
What is the advantage of becoming a TLP to the project itself? I have heard that it is something that apache wants, but considering that we are very comfortable on how Hive interacts with the Hadoop ecosystem as a sub project for Hadoop, there has to be some big incentive for the project to be a TLP and nowhere have a seen how this would benefit Hive. Any thoughts on that? Ashish From: Jeff Hammerbacher [mailto:ham...@cloudera.com] Sent: Wednesday, April 21, 2010 7:35 PM To: hive-dev@hadoop.apache.org Cc: Ashish Thusoo Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question Hive already does the work to run on multiple versions of Hadoop, and the release cycle is independent of Hadoop's. I don't see why it should remain a subproject. I'm +1 on Hive becoming a TLP. On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao zsh...@gmail.commailto:zsh...@gmail.com wrote: As a Hive committer, I don't feel the benefit we get from becoming a TLP is big enough (compared with the cost) to make Hive a TLP. From Chris's comment I see that the cost is not that big, but I still wonder what benefit we will get from that. Also I didn't get the idea of the joke (In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP). I don't see any reasons that applies to Pig but not Hive. We should continue the discussion here, but anything in the Pig's discussion should also be considered here. Zheng On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah a...@cloudera.commailto:a...@cloudera.com wrote: I am personally +1 on Hive being a TLP, I think it did reach the community adoption and maturity level required for that. In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP :) (jk). The real question to ask is whether there is a volunteer to take care of the administrative tasks, which isn't a ton of work afaiu (I am willing to volunteer if no body else up to the task, but I am not a committer and only contributed a minor patch for bash/cygwin). BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP tradeoffs. I happen to agree with all he says, and frankly I couldn't have wrote it better my self. I highlight certain parts from his message, but I recommend you read the whole thing. -- Forwarded message -- From: Chris Douglas cdoug...@apache.orgmailto:cdoug...@apache.org Date: Tue, Apr 13, 2010 at 11:46 PM Subject: Subprojects and TLP status To: gene...@hadoop.apache.orgmailto:gene...@hadoop.apache.org, priv...@hadoop.apache.orgmailto:priv...@hadoop.apache.org Most of Hadoop's subprojects have discussed becoming top-level Apache projects (TLPs) in the last few weeks. Most have expressed a desire to remain in Hadoop. The salient parts of the discussions I've read tend to focus on three aspects: a technical dependence on Hadoop, additional overhead as a TLP, and visibility both within the Hadoop ecosystem and in the open source community generally. Life as a TLP: this is not much harder than being a Hadoop subproject, and the Apache preferences being tossed around- particularly insufficiently diverse- are not blockers. Every subproject needs to write a section of the report Hadoop sends to the board; almost the same report, sent to a new address. The initial cost is similarly light: copy bylaws, send a few notes to INFRA, and follow some directions. I think the estimated costs are far higher than they will be in practice. Inertia is a powerful force, but it should be overcome. The directions are here, and should not intimidating: http://apache.org/dev/project-creation.html Visibility: the Hadoop site does not need to change. For each subproject, we can literally change the hyperlinks to point to the new page and be done. Long-term, linking to all ASF projects that run on Hadoop from a prominent page is something we all want. So particularly in the medium-term that most are considering: visibility through the website will not change. Each subproject will still be linked from the front page. Hadoop would not be nearly as popular as it is without Zookeeper, HBase, Hive, and Pig. All statistics on work in shared MapReduce clusters show that users vastly prefer running Pig and Hive queries to writing MapReduce jobs. HBase continues to push features in HDFS that increase its adoption and relevance outside MapReduce, while sharing some of its NoSQL limelight. Zookeeper is not only a linchpin in real workloads, but many proposals for future features require it. The bottom line is that MapReduce and HDFS need these projects for visibility and adoption in precisely the same way. I don't think separate TLPs will uncouple the broader community from one another. Technical dependence: this has two dimensions. First, influencing MapReduce and HDFS. This is nonsense. Earning influence by contributing to a subproject is the only way to push code changes
[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket
[ https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859956#action_12859956 ] Ashish Thusoo commented on HIVE-987: I am +1 on this. I think this can open up good possibilities. I have not looked at sqlline code but how much does it depend on actually SQL dialect. Plus, how easy is it to extend to hdfs related command e.g. the CLI today has commands that can do set of conf variables. It also supports the hadoop dfs commands as well which talk directly to hdfs. I am not sure if too many people use them, but I do. Would be great to get them integrated with sqlline if that is possible. Hive CLI Omnibus Improvement ticket --- Key: HIVE-987 URL: https://issues.apache.org/jira/browse/HIVE-987 Project: Hadoop Hive Issue Type: Improvement Reporter: Carl Steinbach Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar Add the following features to the Hive CLI: * Command History * ReadLine support ** HIVE-120: Add readline support/support for alt-based commands in the CLI ** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably need to use JLine instead. * Tab completion ** HIVE-97: tab completion for hive cli * Embedded/Standalone CLI modes, and ability to connect to different Hive Server instances. ** HIVE-818: Create a Hive CLI that connects to hive ThriftServer * .hiverc configuration file ** HIVE-920: .hiverc doesnt work * Improved support for comments. ** HIVE-430: Ability to comment desired for hive query files * Different output formats ** HIVE-49: display column header on CLI ** XML output format For additional inspiration we may want to look at the Postgres psql shell: http://www.postgresql.org/docs/8.1/static/app-psql.html Finally, it would be really cool if we implemented this in a generic fashion and spun it off as an apache-commons shell framework. It seems like most of the Apache Hadoop projects have their own shells, and I'm sure the same is true for non-Hadoop Apache projects as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1320) NPE with lineage in a query of union alls on joins.
NPE with lineage in a query of union alls on joins. --- Key: HIVE-1320 URL: https://issues.apache.org/jira/browse/HIVE-1320 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo The following query generates a NPE in the lineage ctx code EXPLAIN INSERT OVERWRITE TABLE dest_l1 SELECT j.* FROM (SELECT t1.key, p1.value FROM src1 t1 LEFT OUTER JOIN src p1 ON (t1.key = p1.key) UNION ALL SELECT t2.key, p2.value FROM src1 t2 LEFT OUTER JOIN src p2 ON (t2.key = p2.key)) j; The stack trace is: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116) at org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.
[ https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1320: Attachment: HIVE-1320.patch Fixed the NPE. The cause was that we were not checking for inp_dep to be null in the union all code path. We have to do that for all operators that have more than 1 parents. NPE with lineage in a query of union alls on joins. --- Key: HIVE-1320 URL: https://issues.apache.org/jira/browse/HIVE-1320 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1320.patch The following query generates a NPE in the lineage ctx code EXPLAIN INSERT OVERWRITE TABLE dest_l1 SELECT j.* FROM (SELECT t1.key, p1.value FROM src1 t1 LEFT OUTER JOIN src p1 ON (t1.key = p1.key) UNION ALL SELECT t2.key, p2.value FROM src1 t2 LEFT OUTER JOIN src p2 ON (t2.key = p2.key)) j; The stack trace is: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116) at org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.
[ https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1320: Status: Patch Available (was: Open) Affects Version/s: 0.6.0 Fix Version/s: 0.6.0 NPE with lineage in a query of union alls on joins. --- Key: HIVE-1320 URL: https://issues.apache.org/jira/browse/HIVE-1320 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Ashish Thusoo Assignee: Ashish Thusoo Fix For: 0.6.0 Attachments: HIVE-1320.patch The following query generates a NPE in the lineage ctx code EXPLAIN INSERT OVERWRITE TABLE dest_l1 SELECT j.* FROM (SELECT t1.key, p1.value FROM src1 t1 LEFT OUTER JOIN src p1 ON (t1.key = p1.key) UNION ALL SELECT t2.key, p2.value FROM src1 t2 LEFT OUTER JOIN src p2 ON (t2.key = p2.key)) j; The stack trace is: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116) at org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[DISCUSSION] To be (or not to be) a TLP - that is the question
Hi Folks, Recently Apache Board asked the Hadoop PMC if some sub projects can become top level projects. In the opinion of the board, big umbrella projects make it difficult to monitor the health of the communities within the sub projects. If Hive does become a TLP, then we would have to elect our own PMC and take on all the administrative tasks that the Hadoop PMC does for us. So there is definitely more administrative work involved as a TLP. So the question is whether we should take on this additional task keeping at this time and what tangible advantages and disadvantages would such a move entail for the project. Would like to hear what the community thinks on this issue. Thanks, Ashish PS: As some reference to what is happening in the other subprojects, at this time PIG and Zookeeper have decided NOT to become TLPs where as Hbase and Avro have decided to become TLPs.
[jira] Commented: (HIVE-1293) Concurreny Model for Hive
[ https://issues.apache.org/jira/browse/HIVE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857110#action_12857110 ] Ashish Thusoo commented on HIVE-1293: - I would vote for versioning. Since we do not have to deal with the complexity of a buffer cache I think this would be much simpler to implement than what is possible in traditional databases. At the same time, for locks we will have to do a lease based mechanism anyway in order to protect against locks leaking because of client crashes. And when you account for that, it seems that locking would not be significantly simpler to implement than versioning. Concurreny Model for Hive - Key: HIVE-1293 URL: https://issues.apache.org/jira/browse/HIVE-1293 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Concurrency model for Hive: Currently, hive does not provide a good concurrency model. The only guanrantee provided in case of concurrent readers and writers is that reader will not see partial data from the old version (before the write) and partial data from the new version (after the write). This has come across as a big problem, specially for background processes performing maintenance operations. The following possible solutions come to mind. 1. Locks: Acquire read/write locks - they can be acquired at the beginning of the query or the write locks can be delayed till move task (when the directory is actually moved). Care needs to be taken for deadlocks. 2. Versioning: The writer can create a new version if the current version is being read. Note that, it is not equivalent to snapshots, the old version can only be accessed by the current readers, and will be deleted when all of them have finished. Comments. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131_8.patch Another one with test fixes. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch, HIVE-1131_8.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131_6.patch With fixes to tests and with null dropped. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131_7.patch Another patch which fixes the QueryPlan to have LinkedHashMaps as that was also creating instability in the tests. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Release Note: This changes the signature of PostExecute.java Hadoop Flags: [Incompatible change] Status: Patch Available (was: Open) submitting. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch, HIVE-1131_5.patch, HIVE-1131_6.patch, HIVE-1131_7.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852013#action_12852013 ] Ashish Thusoo commented on HIVE-1131: - I looked at the ExecutionCtx stuff. There are atleast 3 different unrelated fields in SessionState that we should also move to the ExecutionCtx. I will file a follow up JIRA for it but I think we should get this one in. I did see some test failures due to using HashMaps and the consequent change in ordering after I refreshed. Will fix that and upload a new patch. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131_5.patch Added a more centralized function to decide what is the dependency type. Also reduced the number of dependency types to SIMPLE, EXPRESSION and SELECT. SIMPLE = a copy of the column, EXPRESSION = UDF, UDAF, UDTF or union all, SCRIPT = if a user script is used. Also fixed the HashMap to LinkedHashMap.. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch, HIVE-1131_5.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851674#action_12851674 ] Ashish Thusoo commented on HIVE-1131: - Look at the DataContainer class. That has a partition in it. And the Dependency has a mapping from Partition to the dependencies. Can you explain more your concerns on inefficiency? For S6 actually the queryplan is the wrong place to store the lineageinfo. Because of the dynamic partitioning work that Ning is doing, I have to generate the partition to dependency mapping at run time. So I would rather store it in a run time structure as opposed to a compile time structure. SessionState fits that bill, though I think we should have another structure called ExecutionCtx for this. But otherwise I think we want to store this in a runtime structure. S2 will add some more comments. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131_2.patch Patch with all the review comments incorporated. This is just the source patch. Will be uploading the fixed tests shortly. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849624#action_12849624 ] Ashish Thusoo commented on HIVE-1131: - Comment 3 from Raghu and comment S2-S4 from Zheng are not yet incorporated. The new patch overhauls things a bit to support Partition level lineage and does this in a post execute hook. It gets rid of the visits and the iterator classes. Will fix the other comments in the patch with the test cases. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131_3.patch This fixes all the review comments. Will post the patch with tests separately. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849633#action_12849633 ] Ashish Thusoo commented on HIVE-1131: - Also I did not find any instance of S3 in the code. Perhaps you just mentioned it for completeness but in case you do find an instance please let me know the offending file. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131_4.patch This patch has all the tests updated as well. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch, HIVE-1131_2.patch, HIVE-1131_3.patch, HIVE-1131_4.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [ANNOUNCEMENT] Contributor Workshop at Yahoo!
Sounds like a good idea to me. If anyone @FB wants to join, maybe they could do it with you. Ashish -Original Message- From: Carl Steinbach [mailto:c...@cloudera.com] Sent: Thursday, March 25, 2010 2:09 PM To: hive-dev@hadoop.apache.org Subject: Re: [ANNOUNCEMENT] Contributor Workshop at Yahoo! I'm happy to organize this if no one else wants to. Let me know if there are any objections. Otherwise I will send an email to the Y! at the end of the day. Thanks. Carl On Thu, Mar 25, 2010 at 11:14 AM, Jeff Hammerbacher ham...@cloudera.comwrote: Has someone already emailed about a Hive workshop? On Thu, Mar 25, 2010 at 10:33 AM, Owen O'Malley o...@yahoo-inc.com wrote: Yahoo is organizing Contributor's Workshops on the day after the Hadoop Summit (10 June 2010) for both Hadoop Core (HDFS MapReduce) and Pig. We would be happy to provide space for any of the other Hadoop sub-projects as well! If you are interested in organizing such a workshop for one of the Hadoop sub-projects, please email us at hadoopcontributorr...@yahoo-inc.com with WORKSHOP ORGANIZER (project) in the subject line. See you all at the Hadoop Summit - June 29th, http://www.hadoopsummit.org/ Thanks, Owen O'Malley Eric Baldeschwieler
[jira] Commented: (HIVE-1117) Make QueryPlan serializable
[ https://issues.apache.org/jira/browse/HIVE-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12833117#action_12833117 ] Ashish Thusoo commented on HIVE-1117: - What would be the advantage to use Avro here? We do not really have a requirement of cross language clients for this thing? To me throwing Avro in the mix is just adding another dependency that is not really needed.. no? Make QueryPlan serializable --- Key: HIVE-1117 URL: https://issues.apache.org/jira/browse/HIVE-1117 Project: Hadoop Hive Issue Type: Improvement Reporter: Zheng Shao Assignee: Zheng Shao Fix For: 0.6.0 We need to make QueryPlan serializable so that we can resume the query some time later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1131) Add column lineage information to the pre execution hooks
[ https://issues.apache.org/jira/browse/HIVE-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1131: Attachment: HIVE-1131.patch This is just the source patch. Will publish the test patch soon. Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1131.patch We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1123) Checkstyle fixes
[ https://issues.apache.org/jira/browse/HIVE-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829816#action_12829816 ] Ashish Thusoo commented on HIVE-1123: - Apart from the indentation of throws clause is there any other major sticking point. Personally speaking I don't have a strong preference for the indentation of throws. Going with 2 indents probably makes it easier for eclipse to catch this. @Carl I do think that there is value in publishing the entire set of rules that you have used. Checkstyle fixes Key: HIVE-1123 URL: https://issues.apache.org/jira/browse/HIVE-1123 Project: Hadoop Hive Issue Type: Task Reporter: Carl Steinbach Assignee: Carl Steinbach Attachments: HIVE-1123.checkstyle.patch, HIVE-1123.cli.2.patch, HIVE-1123.cli.patch, HIVE-1123.common.2.patch, HIVE-1123.common.patch, HIVE-1123.contrib.2.patch, HIVE-1123.contrib.patch, HIVE-1123.hwi.2.patch, HIVE-1123.hwi.patch, HIVE-1123.jdbc.2.patch, HIVE-1123.jdbc.patch, HIVE-1123.metastore.2.patch, HIVE-1123.metastore.patch, HIVE-1123.ql.2.patch, HIVE-1123.ql.patch, HIVE-1123.serde.2.patch, HIVE-1123.serde.patch, HIVE-1123.service.2.patch, HIVE-1123.service.patch, HIVE-1123.shims.2.patch, HIVE-1123.shims.patch Fix checkstyle errors. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1131) Add column lineage information to the pre execution hooks
Add column lineage information to the pre execution hooks - Key: HIVE-1131 URL: https://issues.apache.org/jira/browse/HIVE-1131 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo We need a mechanism to pass the lineage information of the various columns of a table to a pre execution hook so that applications can use that for: - auditing - dependency checking and many other applications. The proposal is to expose this through a bunch of classes to the pre execution hook interface to the clients and put in the necessary transformation logic in the optimizer to generate this information. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: HIVE-49 and other forms of CLI niceness
Looks like a good suggestion. Ideally the driver code should just return a structure that encodes the columns separately as opposed to a single serialized string today and the formatting logic should all be in the CliDriver Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Wednesday, January 27, 2010 1:00 PM To: hive-u...@hadoop.apache.org Subject: HIVE-49 and other forms of CLI niceness All, Some simple features in Hive can really bring down the learning curve for new users. I am teaching some how to use hive. A buddy if mine did this. hive select * from mt_date_test; OK a 2010-01-01 NULL b 2009-12-31 NULL c 2010-01-27 NULL hive select * from mt_date_test where my_date '2010-01-01'; 2010-01-27 08:18:27,008 map = 100%, reduce =100% Ended Job = job_200909171715_20264 OK I instantly suspected 1) whiteplace 2) delimeters hive select key from mt_date_test; OK a 2010-01-01 b 2009-12-31 c 2010-01-27 !!BINGO!! Should we use a pipe | or some other column delimiter like the mysql CLI does? and have this be a property that is on by default hive.cli.columnseparator='\t' hive.cli.columnseparator='|' In its current state the user understandably made the assumption that '' does not work on strings. Should we add some expose the format of the results in Driver so that the CLI can effectively split the rows by column?
RE: Hive in maven
Yes you should open a JIRA for this. Ashish -Original Message- From: Gerrit [mailto:gvanvuu...@specificmedia.com] Sent: Friday, January 22, 2010 7:33 AM To: hive-dev@hadoop.apache.org Subject: Re: Hive in maven Hi, Yes I'll start on creating the pom.xml The fastest and recommended way of doing this is by having a maven repo and sync it with the official maven repos (one way ). Also future hive releases would be just a matter of loading to this repo and its automatically synced with the official maven repo. If a manual upload is requested it takes more time. (says that on there website ) shall I open a jira for this? Cheers, Gerrit On Thu, 2010-01-21 at 12:28 -0800, Yongqiang He wrote: Hi Gerrit, Can you help uploading to maven? Thanks Yongqiang On 1/20/10 2:21 AM, Gerrit gvanvuu...@specificmedia.com wrote: Yep: The main maven page is: http://maven.apache.org/guides/mini/guide-central-repository-upload. html (see section Sync'ing your own repository to the central repository automatically) For groupId and artifactId conventions see: http://maven.apache.org/guides/mini/guide-naming-conventions.html) I have been a maven user for some time now and can help out to make the pom, document how to; setup and deploy, if you need help. For internal repos you could use: http://nexus.sonatype.org/ http://www.jfrog.org/products.php On Tue, 2010-01-19 at 23:31 -0800, Zheng Shao wrote: This is a good idea. Can you point us to some references on how to upload it to maven? Zheng On Mon, Jan 18, 2010 at 1:20 PM, Gerrit gvanvuu...@specificmedia.comwrote: Hi guys, Would it be possible to add the hive jars to the main maven repo? If there is not objections I can make the request to the main repo if you agree. The reason for this need is that I've created a Loader for the pig project to read HiveRCTables (https://issues.apache.org/jira/browse/PIG-1117) and currently use ant to directly download the libraries from the apache site using: get verbose=true src=${apache.dist.site}/${hive.groupId}/${hive.artifactId}/${hive .artifactI d}-${hive.version}/${hive.artifactId}-${hive.version}-hadoop-${had oop.versio n}-bin.tar.gz dest=lib-hivedeps/${hive.artifactId}-${hive.version}-hadoop-${had oop.versio n}-bin.tar.gz/ I would much prefer using ivy or maven and it makes this much cleaner. Thanks, Gerrit
RE: Unit test result depends on platform.
Can you file a JIRA and give us the unit tests that fail. That would be very helpful. I suspect some of the test queries may be missing a sort by predicate so they could have different sort orders as compared to the expected output. Ashish -Original Message- From: Mafish Liu [mailto:maf...@gmail.com] Sent: Monday, January 18, 2010 5:30 PM To: hive-dev@hadoop.apache.org Subject: Re: Unit test result depends on platform. Attachments are listing programs. -- maf...@gmail.com
RE: New Hive committer Ning Zhang
Congrats!! Ashish -Original Message- From: Zheng Shao [mailto:zsh...@gmail.com] Sent: Monday, January 11, 2010 11:51 AM To: hive-dev@hadoop.apache.org Subject: New Hive committer Ning Zhang Ning has done a lot of work on Hive. Hadoop PMC recently approved Ning Zhang as a new committer to Hive. Congratulations Ning! -- Yours, Zheng
[jira] Commented: (HIVE-972) support views
[ https://issues.apache.org/jira/browse/HIVE-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793711#action_12793711 ] Ashish Thusoo commented on HIVE-972: Pretty comprehensive writeup :) Here are my comments: 1. It may be better to just go with a flat model to keep things simple. Also whenever we do materialized views in future you do have an object that is part table and part view and you may just need the flat model anyway at that point. The primary reason though to go with the flat model would be simplicity and less severe schema migration of the metastore schema. 2. For dependency tracking there is already code in hive that uses pre execution hooks to track lineage. That could easily be used to extract view dependencies (table level dependencies) when you create the view metadata. Raghu also had done some work on column lineage and perhaps that can be used to capture column lineage. I think for the first cut we should just go with table dependencies and leave column stuff for later. We should have the lenient dependency invalidation scheme (perhaps for both drops and alters) because at least that way users can inspect view definitions and then fix them later. Accordingly then we would need a flag to mark an invalidated view and maybe some way of looking at that list. I think we can punt the cascade option for now as that seems to be an optimization in the user workflow and could be added later. Thoughts? The restrict though is probably more useful. We could have that be the default in the strict mode (Hive has a strict mode which disallows queries on partitioned tables in case a where clause on the partition column was not specified), Not sure on what we should do about temporary functions but if we use views to transform our internal logs to another schema (nectar imps - context) then we may need it. 3. I am not sure if supporting limit is important but I can see good use of order by when we do materialized views. The sorted property could be helpful there and would be good to capture. We already capture those for tables. 4. I think fast path should work seemlessly, once the fast path with filters is done, no? 5. I think we can punt view modification for now if we support ways of inspecting the view sql for folks. support views - Key: HIVE-972 URL: https://issues.apache.org/jira/browse/HIVE-972 Project: Hadoop Hive Issue Type: New Feature Components: Metastore, Query Processor Reporter: Namit Jain Assignee: John Sichi Hive currently does not support views. It would be a very nice feature to have. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [VOTE] hive release candidate 0.4.1-rc3
+1 on the basis of tests run on the dev tar ball. Ashish -Original Message- From: Zheng Shao [mailto:zsh...@gmail.com] Sent: Monday, November 30, 2009 11:37 AM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] hive release candidate 0.4.1-rc3 I tried binary tarball with both hadoop 0.17 and 0.20 and both worked. Please vote. Zheng On Fri, Nov 27, 2009 at 7:32 AM, Zheng Shao zsh...@gmail.com wrote: One more modification to the Tarballs: Location moved to http://people.apache.org/~zshao/hive-0.4.1-candidate-3/ I also made both the source tarball and binary tarball. Zheng On Sat, Nov 21, 2009 at 11:03 AM, Zheng Shao zsh...@gmail.com wrote: I forgot to modify the version in build.properties and make the tarball. Here it is: svn: https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.1-rc3/ Tarball: http://people.apache.org/~zshao/hive-0.4.1-dev.tar.gz Please vote. Zheng On Sun, Nov 15, 2009 at 1:14 AM, Ashish Thusoo athu...@facebook.com wrote: Zheng, I cannot find the tar ball. What is the location? Ashish From: Zheng Shao [zsh...@gmail.com] Sent: Thursday, November 12, 2009 4:14 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] hive release candidate 0.4.1-rc2 Please vote. We would like release 0.4.1 to go out as soon as possible since it fixed some critical bugs in 0.4.0. Zheng On Wed, Nov 11, 2009 at 6:34 AM, Zheng Shao zsh...@gmail.com wrote: I have made a release candidate 0.4.1-rc2. We've fixed several critical bugs to hive release 0.4.0. We need hive release 0.4.1 out asap. Here are the list of changes: HIVE-884. Metastore Server should call System.exit() on error. (Zheng Shao via pchakka) HIVE-864. Fix map-join memory-leak. (Namit Jain via zshao) HIVE-878. Update the hash table entry before flushing in Group By hash aggregation (Zheng Shao via namit) HIVE-882. Create a new directory every time for scratch. (Namit Jain via zshao) HIVE-890. Fix cli.sh for detecting Hadoop versions. (Paul Huff via zshao) HIVE-892. Hive to kill hadoop jobs using POST. (Dhruba Borthakur via zshao) HIVE-883. URISyntaxException when partition value contains special chars. (Zheng Shao via namit) * HIVE-902. Fix cli.sh to work with hadoop versions less than 20. (Carl Steinbach via zshao) *: New since release candidate 0.4.1-rc0. Please vote. -- Yours, Zheng -- Yours, Zheng -- Yours, Zheng -- Yours, Zheng -- Yours, Zheng
[jira] Created: (HIVE-939) Extend hive streaming to support counter updates similar to hadoop streaming.
Extend hive streaming to support counter updates similar to hadoop streaming. - Key: HIVE-939 URL: https://issues.apache.org/jira/browse/HIVE-939 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo The code to update hadoop counters needs to be ported from hadoop streaming to the streaming code in Hive. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [VOTE] hive release candidate 0.4.1-rc2
Zheng, I cannot find the tar ball. What is the location? Ashish From: Zheng Shao [zsh...@gmail.com] Sent: Thursday, November 12, 2009 4:14 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] hive release candidate 0.4.1-rc2 Please vote. We would like release 0.4.1 to go out as soon as possible since it fixed some critical bugs in 0.4.0. Zheng On Wed, Nov 11, 2009 at 6:34 AM, Zheng Shao zsh...@gmail.com wrote: I have made a release candidate 0.4.1-rc2. We've fixed several critical bugs to hive release 0.4.0. We need hive release 0.4.1 out asap. Here are the list of changes: HIVE-884. Metastore Server should call System.exit() on error. (Zheng Shao via pchakka) HIVE-864. Fix map-join memory-leak. (Namit Jain via zshao) HIVE-878. Update the hash table entry before flushing in Group By hash aggregation (Zheng Shao via namit) HIVE-882. Create a new directory every time for scratch. (Namit Jain via zshao) HIVE-890. Fix cli.sh for detecting Hadoop versions. (Paul Huff via zshao) HIVE-892. Hive to kill hadoop jobs using POST. (Dhruba Borthakur via zshao) HIVE-883. URISyntaxException when partition value contains special chars. (Zheng Shao via namit) * HIVE-902. Fix cli.sh to work with hadoop versions less than 20. (Carl Steinbach via zshao) *: New since release candidate 0.4.1-rc0. Please vote. -- Yours, Zheng -- Yours, Zheng
RE: Hive Performance
There are a bunch of optimizations that deal with skewed data in Hive as well. The optimizer is rule based and the user has to hint the query - similar to what is done in RDBMS. We have mostly done our performance work on the benchmark published in the SIGMOD paper. Ashish -Original Message- From: Edward Capriolo [mailto:edlinuxg...@gmail.com] Sent: Saturday, November 07, 2009 11:19 AM To: hive-dev@hadoop.apache.org Subject: Re: Hive Performance A friend and I were disgussing pig vs hive in general yesterday. On the surface hive is an sql like language.pig is its own language 'pig latin' however in the end I think they both end up doing column projections, joins,etc. In the end it is a similar operation happening on the same cluster. So performance wise I expect the performance will eventually be similair. now pig offering more sql support is a large undertaking. While pig looks very versatile I resently emultated the example on cloudera's blog for geoip locating traffic in pig. I did this in hive with an external perl script using map/transform. (It did not take a page long pig program) I also think the hive udf framework can be used in place of many piggybank functions. Also unless I am missing something a udf is native java. Seems like piggybank functions are going to be piping /streaming output I can't see that performing better. To backtrack if pig adds sql, will we need hive? If hive adds something like tsql will we need pig? On 11/7/09, Rob Stewart robstewar...@googlemail.com wrote: Hi there. I'm in the process of writing a paper, and part of it I aim to write (yet another) comparative study on various interfaces with Hadoop. This will almost certainly include Pig and Hive, probably MapReduce, and maybe JAQL. I have read the papers published on the Hive JIRA (pig vs hive vs MapReduce for 2 queries, an aggregation, and a join). I am, however, wanting to know a bit from the Hive community. 1. Do you guys (the Hive developers) have a standardized benchmarking tool to use prior to each Hive release? I am thinking of something similar to PigMix, used by the Pig developers. In case you don't know, PigMix is a set of 12 designed queries, implemented in Pig and Java Hadoop, and comparisons are made on execution time. Does the Hive community have something similar? 2. The Pig wiki point out some unique features of Pig that allow optimal execution performance. For instance, they have a methods to optimize queries on skewed data (by taking samples of the data for reduce key allocations. Is there something about the implementation of Hive that gives it some functionality not found in other interfaces. And better still, would there some Hive implementation that could work as a proof of concept to show any optimized features of Hive? 3. One section suggested for investigation within the Pig development team is to create a SQL like language that could be compiled down through Pig to MR jobs. If such a project was to achieve parity with Hive's SQL like interface, where would be the distinction be between Pig and Hive. Certainly, from a users perspective, there would be very little difference. If the only difference turns out to be the execution performance achieved by one interface over another, where would this put the inferior interface (be that either Pig or Hive) in terms of its relevance in the Hadoop software stack? Many thanks, Rob Stewart
RE: Make me as a member of hive developer
Hi Mohan, The instructions to subscribe to the mailing list is here... http://hadoop.apache.org/hive/mailing_lists.html#Developers Ashish -Original Message- From: Mohan Agarwal [mailto:mohan.agarwa...@gmail.com] Sent: Monday, November 02, 2009 8:45 AM To: hive-dev@hadoop.apache.org Subject: Make me as a member of hive developer
[jira] Commented: (HIVE-884) Metastore Server should exit if error happens
[ https://issues.apache.org/jira/browse/HIVE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766966#action_12766966 ] Ashish Thusoo commented on HIVE-884: Can we add a test case? Otherwise changes look good Metastore Server should exit if error happens - Key: HIVE-884 URL: https://issues.apache.org/jira/browse/HIVE-884 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.1, 0.5.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-884.1.patch Currently, HiveMetaStore (the thrift server) is not exiting when the main thread saw an Exception. The process should exit when that happens. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-885) Better error messages for debugging serde problem at reducer input
[ https://issues.apache.org/jira/browse/HIVE-885?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766983#action_12766983 ] Ashish Thusoo commented on HIVE-885: Will values.next() always return a BytesWritable? Better error messages for debugging serde problem at reducer input -- Key: HIVE-885 URL: https://issues.apache.org/jira/browse/HIVE-885 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-885.1.patch Sometimes we are seeing serde exceptions at the reducer side with hadoop 0.20. This should help debug the problem. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[ANNOUNCE] Hive 0.4.0 released
Hi Folks, We have release the rc2 candidate that Namit had generated as Hive 0.4.0. You can find download it from the download page. http://hadoop.apache.org/hive/releases.html#Download Thanks, Ashish
RE: Hive and MapReduce
adding hive-user and hive-dev lists. And removing the common mailing list.. Can you elaborate a bit on the datasize. By default Hive should just be relying on hadoop to give you the number of mappers depending on the number of splits you have in your data. Ashish -Original Message- From: Touretsky, Gregory [mailto:gregory.touret...@intel.com] Sent: Monday, October 12, 2009 3:02 AM To: Touretsky, Gregory; common-u...@hadoop.apache.org Subject: RE: Hive and MapReduce Ok, the patch below actually works. Re-built Hadoop cluster and everything works now. Now I have to understand how to force Hive to run 1 mapper for complicated query on the large table... From: Touretsky, Gregory Sent: Sunday, October 11, 2009 4:39 PM To: common-u...@hadoop.apache.org Cc: Touretsky, Gregory Subject: Hive and MapReduce Hi, I'm running Hadoop 0.20.1 and Hive (checked out revision 824063). Direct MapReduce task succeeds, but Map task created by Hive fails: hive select * from pokes where foo100; Total MapReduce jobs = 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_200910111626_0001, Tracking URL = http://itstl0016.iil.intel.com:50030/jobdetails.jsp?jobid=job_200910111626_0001 Kill Command = /nfs/iil/disks/rep_tests_gtouret01/hadoop/bin/hadoop job -Dmapred.job.tracker=itstl0016.iil.intel.com:9001 -kill job_200910111626_0001 2009-10-11 04:26:57,844 map = 100%, reduce = 100% Ended Job = job_200910111626_0001 with errors FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.ExecDriver From the logs/hadoop--jobtracker-.iil.intel.com.log: 2009-10-11 16:26:56,829 INFO org.apache.hadoop.mapred.JobInProgress: Initializing job_200910111626_0001 2009-10-11 16:26:57,091 INFO org.apache.hadoop.mapred.JobInProgress: Input size for job job_200910111626_0001 = 13. Number of splits = 1 2009-10-11 16:26:57,225 ERROR org.apache.hadoop.mapred.JobTracker: Job initialization failed: java.lang.IllegalArgumentException: Network location name contains /: /IDC1-DC201/WE/34(I've had the same issue with the /default_rack) at org.apache.hadoop.net.NodeBase.set(NodeBase.java:75) at org.apache.hadoop.net.NodeBase.init(NodeBase.java:57) at org.apache.hadoop.mapred.JobTracker.addHostToNodeMapping(JobTracker.java:2390) at org.apache.hadoop.mapred.JobTracker.resolveAndAddToTopology(JobTracker.java:2384) at org.apache.hadoop.mapred.JobInProgress.createCache(JobInProgress.java:349) at org.apache.hadoop.mapred.JobInProgress.initTasks(JobInProgress.java:450) at org.apache.hadoop.mapred.JobTracker.initJob(JobTracker.java:3147) at org.apache.hadoop.mapred.EagerTaskInitializationListener$InitJob.run(EagerTaskInitializationListener.java:79) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2009-10-11 16:26:57,225 INFO org.apache.hadoop.mapred.JobTracker: Failing job job_200910111626_0001 2009-10-11 16:26:57,866 INFO org.apache.hadoop.mapred.JobTracker: Killing job job_200910111626_0001 Any suggestion? I saw patches in https://issues.apache.org/jira/browse/HADOOP-5759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12712524#action_12712524, but I can't apply all of them cleanly to my Hadoop sources... Thanks, Gregory - Intel Israel (74) Limited This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
[jira] Updated: (HIVE-805) Session level metastore
[ https://issues.apache.org/jira/browse/HIVE-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-805: --- Attachment: HIVE-805-1.patch Incorporated Prasad's review comments. I have not yet disabled this for partition tables though. Session level metastore --- Key: HIVE-805 URL: https://issues.apache.org/jira/browse/HIVE-805 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.2.0 Reporter: Ashish Thusoo Assignee: Ashish Thusoo Fix For: 0.5.0 Attachments: HIVE-805-1.patch, HIVE-805.patch Implement a shadow metastore that is in memory and runs for a session. This can contain definitions for session specific views that can be used to implement data flow variables in Hive. It can also be used for testing scripts. First we will support the later use case where in all the DDL statements in the session create objects in the session metastore and all the queries are converted to explain internal. Any thoughts on load commands? This feature is enabled when set hive.session.test = true is done in the session. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [VOTE] vote for release candidate for hive
+1. Also sending this to the PMC for approval. Hi PMC, The release candidate that Namit prepared can be found at the following location: http://people.apache.org/~namit/hive-0.4.0-candidate-2/ It has the hive 0.4.0 releases for hadoop 0.17, 0.18, 0.19 and 0.20. Please try it out and vote on it. Thanks, Ashish From: Min Zhou [coderp...@gmail.com] Sent: Tuesday, September 29, 2009 6:35 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] vote for release candidate for hive I saw it, +1 for all test passed . On Wed, Sep 30, 2009 at 1:59 AM, Namit Jain nj...@facebook.com wrote: I did find the files: [nj...@dev029 /tmp]$ ls -lrt hive-0.4.0-dev-hadoop-0.19.0/src total 33580 drwxr-xr-x 4 njain users 4096 Aug 11 16:41 docs drwxr-xr-x 7 njain users 4096 Aug 11 16:41 data -rw-r--r-- 1 njain users15675 Aug 11 16:41 README.txt -rw-r--r-- 1 njain users 2810 Sep 2 10:44 TestTruncate.launch -rw-r--r-- 1 njain users 2804 Sep 2 10:44 TestMTQueries.launch -rw-r--r-- 1 njain users 2807 Sep 2 10:44 TestJdbc.launch -rw-r--r-- 1 njain users 2808 Sep 2 10:44 TestHive.launch -rw-r--r-- 1 njain users 2805 Sep 2 10:44 TestCliDriver.launch -rw-r--r-- 1 njain users17045 Sep 10 15:16 build.xml -rw-r--r-- 1 njain users 850 Sep 10 15:16 build.properties -rw-r--r-- 1 njain users12520 Sep 10 15:16 build-common.xml -rw-r--r-- 1 njain users33431 Sep 17 18:15 CHANGES.txt -rw-r--r-- 1 njain users 1071 Sep 18 13:26 runscr -rw-r--r-- 1 njain users 23392371 Sep 18 13:26 hive-0.4.0-hadoop-0.20.0-dev.tar.gz -rw-r--r-- 1 njain users 10735695 Sep 18 13:27 hive-0.4.0-hadoop-0.20.0-bin.tar.gz drwxr-xr-x 3 njain users 4096 Sep 29 10:54 jdbc drwxr-xr-x 2 njain users 4096 Sep 29 10:54 ivy drwxr-xr-x 4 njain users 4096 Sep 29 10:54 hwi drwxr-xr-x 4 njain users 4096 Sep 29 10:54 eclipse-templates drwxr-xr-x 3 njain users 4096 Sep 29 10:54 contrib drwxr-xr-x 2 njain users 4096 Sep 29 10:54 conf drwxr-xr-x 3 njain users 4096 Sep 29 10:54 common drwxr-xr-x 4 njain users 4096 Sep 29 10:54 cli drwxr-xr-x 3 njain users 4096 Sep 29 10:54 ant drwxr-xr-x 2 njain users 4096 Sep 29 10:54 testutils drwxr-xr-x 2 njain users 4096 Sep 29 10:54 testlibs drwxr-xr-x 3 njain users 4096 Sep 29 10:54 shims drwxr-xr-x 6 njain users 4096 Sep 29 10:54 service drwxr-xr-x 4 njain users 4096 Sep 29 10:54 serde drwxr-xr-x 5 njain users 4096 Sep 29 10:54 ql drwxr-xr-x 4 njain users 4096 Sep 29 10:54 odbc drwxr-xr-x 6 njain users 4096 Sep 29 10:54 metastore drwxr-xr-x 2 njain users 4096 Sep 29 10:54 lib drwxr-xr-x 3 njain users 4096 Sep 29 10:54 bin I have attached the output. -Original Message- From: Min Zhou [mailto:coderp...@gmail.com] Sent: Tuesday, September 22, 2009 6:29 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] vote for release candidate for hive Hi Namit I meant http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.19.0-dev.tar.gzhttp://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.19.0-dev.tar.gz Min On Wed, Sep 23, 2009 at 5:31 AM, Namit Jain nj...@facebook.com wrote: Which one are you looking at ? I downloaded just now from: http://people.apache.org/~namit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gzhttp://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gz http://people.apache.org/%7Enamit/hive-0.4.0-candidate-2/hive-0.4.0-hadoop-0.20.0-dev.tar.gz and it contains CHANGE.txt and build.xml etc. Did you download the binary tarball ? Thanks, -namit -Original Message- From: Min Zhou [mailto:coderp...@gmail.com] Sent: Monday, September 21, 2009 7:46 PM To: hive-dev@hadoop.apache.org Subject: Re: [VOTE] vote for release candidate for hive Hi Namit, I haven't found build.xml, CHANGES.txt from your tarball. They must be included so that we can test it and check the changes, I think. Thanks, Min On Sat, Sep 19, 2009 at 4:42 AM, Namit Jain nj...@facebook.com wrote: It is available from http://people.apache.org/~namit/ http://people.apache.org/%7Enamit/ http://people.apache.org/%7Enamit/ http://people.apache.org/%7Enamit/ Thanks, -namit -Original Message- From: Ashish Thusoo Sent: Thursday, September 17, 2009 11:55 PM To: hive-dev@hadoop.apache.org; Namit Jain Subject: RE: [VOTE] vote for release candidate for hive Namit, Can you make it available from http://people.apache.org/~njain/ http://people.apache.org/%7Enjain/ http://people.apache.org/%7Enjain/ http://people.apache.org/%7Enjain/ That way people who do not have access to the apache machines will also be able to try the candidate. Thanks, Ashish
[ANNOUNCE] Edwardo Capriolo as a Hive committer
Hi Folks, We are happy to add Edwardo as a committer to the Hive project. Edwardo has made many contributions to Hive over the last year including the Hive Web Interface. My heartiest congratulations and a warm welcome to him in the hive committers group. Cheers, Ashish
RE: [VOTE] vote for release candidate for hive
Namit, Can you make it available from http://people.apache.org/~njain/ That way people who do not have access to the apache machines will also be able to try the candidate. Thanks, Ashish From: Namit Jain [nj...@facebook.com] Sent: Thursday, September 17, 2009 6:32 PM To: Namit Jain; hive-dev@hadoop.apache.org Subject: [VOTE] vote for release candidate for hive Following the convention -Original Message- From: Namit Jain Sent: Thursday, September 17, 2009 6:31 PM To: hive-dev@hadoop.apache.org Subject: vote for release candidate for hive I have created another release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc2/ Let me know if it is OK to publish this release candidate. The only change from the previous candidate (https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc1/) is the fix for https://issues.apache.org/jira/browse/HIVE-838 The tar ball can be found at: people.apache.org /home/namit/public_html/hive-0.4.0-candidate-2/hive-0.4.0-dev.tar.gz* Thanks, -namit
[jira] Commented: (HIVE-78) Authentication infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-78?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756823#action_12756823 ] Ashish Thusoo commented on HIVE-78: --- @Min I agree with Edwards thought here. We have to foster a collaborative environment and not be dismissive of each others ideas and approaches. Much of the work in the community happens on a volunteer basis and whatever time anyone puts on the project is a bonus and should be respected by all. It does make sense to keep authentication separate from authorization because in most environments there are already directories which deal with the former. Creating yet another store for passwords just leads to an administration nightmare as the account administrators have to create accounts for new users at multiple places. So lets just focus on authorization and let the directory infrastructure deal with authentication. Will look at your patch as well. Authentication infrastructure for Hive -- Key: HIVE-78 URL: https://issues.apache.org/jira/browse/HIVE-78 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Ashish Thusoo Assignee: Edward Capriolo Attachments: hive-78-metadata-v1.patch, hive-78-syntax-v1.patch, hive-78.diff Allow hive to integrate with existing user repositories for authentication and authorization infromation. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: vote for a release candidate
Clearly with the fix it is still dangerous for them to use LOAD INTO unless they understand the consistency implications or have put work arounds to address some reader crashes. I agree though that since this is a regression, we should get the functionality to what it was in 0.3 Ashish -Original Message- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Saturday, September 12, 2009 3:45 PM To: hive-dev@hadoop.apache.org Subject: Re: vote for a release candidate Hi Namit, Yes, we have customers who are using LOAD INTO without OVERWRITE. The use case is for collecting session data into a table partitioned by the hour of session start time. Since sessions are of varying lengths, incremental loads are necessary as sessions finish up. There are a couple of possible workarounds, but all of them have drawbacks. -Todd On Thu, Sep 10, 2009 at 6:58 PM, Namit Jain nj...@facebook.com wrote: I am not sure 718 is a valid requirement. I think it got in by legacy. Should we even support LOAD INTO ? We only support INSERT OVERWRITE, similarly, we should only support LOAD OVERWRITE INTO. Is anyone using LOAD INTO without OVERWRITE ? Thanks, -namit -Original Message- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, September 10, 2009 4:28 PM To: hive-dev@hadoop.apache.org Subject: Re: vote for a release candidate What do you guys think the feasibility of HIVE-718 being fixed for 0.4.0 is? I think a completely correct solution is likely to be very tough to achieve, but as is it's a regression from 0.3.0 in that the functionality silently fails. -Todd On Thu, Sep 10, 2009 at 3:24 PM, Namit Jain nj...@facebook.com wrote: I have created a release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/ Let me know if it is OK to publish this release candidate. Thanks, -namit
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755292#action_12755292 ] Ashish Thusoo commented on HIVE-718: +1 Looks good to me. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Namit Jain Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt, hive.718.1.patch The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754058#action_12754058 ] Ashish Thusoo commented on HIVE-718: Apologies on following this earlier. It caught my attention as Todd brought up whether we should get this into 0.4.0 release as this is a regression when compared to 0.3.0. I checked the code on 0.3.0 and it seems to be the same as that in 0.4.0. So I am not sure if this is a regression. If this is not a regression then potentially we can go out with 0.4.0 without this and document this? As is evident by this discussion LOAD INTO and its cousin INSERT INTO (when we have it) are very tricky. Almost all our code has been written with the overwrite semantics. Appending new data to an existing partition would need more work to get right and I feel we should punt it and document that insert into is not reliable - I think it has never been reliable. In order to safely implement the INSERT INTO and LOAD INTO semantics one approach is to introduce a notion of versions on the DML commands which is encoded in the directory structure i.e. instead of storing things as xyz/part- we store the files as xyz/v1/part- and so on so forth. We store the latest created version in the metastore entry for that table. When a reader comes in it first looks at this entry and then finds a version corresponding to that in the table. The versions themselves could be garbage collected by deleting version directories that are older than say some configurable duration old and this could either be done lazily by the writer on the table or by an active garbage collector in the background. These are of course somewhat involved changes and would solve the isolation and atomicity problems. The later becase v1 is a directory so moving data to that directory would be a rename and hence atomic. Thoughts? Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: vote for a release candidate
Just replied on the JIRA. Is this really a regression - the code in 0.3.0 and 0.4.0 seems similar... Ashish From: Namit Jain [nj...@facebook.com] Sent: Thursday, September 10, 2009 6:58 PM To: hive-dev@hadoop.apache.org Subject: RE: vote for a release candidate I am not sure 718 is a valid requirement. I think it got in by legacy. Should we even support LOAD INTO ? We only support INSERT OVERWRITE, similarly, we should only support LOAD OVERWRITE INTO. Is anyone using LOAD INTO without OVERWRITE ? Thanks, -namit -Original Message- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, September 10, 2009 4:28 PM To: hive-dev@hadoop.apache.org Subject: Re: vote for a release candidate What do you guys think the feasibility of HIVE-718 being fixed for 0.4.0 is? I think a completely correct solution is likely to be very tough to achieve, but as is it's a regression from 0.3.0 in that the functionality silently fails. -Todd On Thu, Sep 10, 2009 at 3:24 PM, Namit Jain nj...@facebook.com wrote: I have created a release candidate for Hive. https://svn.apache.org/repos/asf/hadoop/hive/tags/release-0.4.0-rc0/ Let me know if it is OK to publish this release candidate. Thanks, -namit
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754397#action_12754397 ] Ashish Thusoo commented on HIVE-718: @prasad, can you explain your comment about the external process stuff? Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.