[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket
[ https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859751#action_12859751 ] Raghotham Murthy commented on HIVE-987: --- jdbc/src/test/org/apache/hadoop/hive/jdbc/TestJdbcDriver.java shows using hive's jdbc driver in embedded mode. Doesnt it run as part of ant test? Hive CLI Omnibus Improvement ticket --- Key: HIVE-987 URL: https://issues.apache.org/jira/browse/HIVE-987 Project: Hadoop Hive Issue Type: Improvement Reporter: Carl Steinbach Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar Add the following features to the Hive CLI: * Command History * ReadLine support ** HIVE-120: Add readline support/support for alt-based commands in the CLI ** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably need to use JLine instead. * Tab completion ** HIVE-97: tab completion for hive cli * Embedded/Standalone CLI modes, and ability to connect to different Hive Server instances. ** HIVE-818: Create a Hive CLI that connects to hive ThriftServer * .hiverc configuration file ** HIVE-920: .hiverc doesnt work * Improved support for comments. ** HIVE-430: Ability to comment desired for hive query files * Different output formats ** HIVE-49: display column header on CLI ** XML output format For additional inspiration we may want to look at the Postgres psql shell: http://www.postgresql.org/docs/8.1/static/app-psql.html Finally, it would be really cool if we implemented this in a generic fashion and spun it off as an apache-commons shell framework. It seems like most of the Apache Hadoop projects have their own shells, and I'm sure the same is true for non-Hadoop Apache projects as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [DISCUSSION] To be (or not to be) a TLP - that is the question
On Wed, Apr 21, 2010 at 10:35 PM, Jeff Hammerbacher ham...@cloudera.comwrote: Hive already does the work to run on multiple versions of Hadoop, and the release cycle is independent of Hadoop's. I don't see why it should remain a subproject. I'm +1 on Hive becoming a TLP. On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao zsh...@gmail.com wrote: As a Hive committer, I don't feel the benefit we get from becoming a TLP is big enough (compared with the cost) to make Hive a TLP. From Chris's comment I see that the cost is not that big, but I still wonder what benefit we will get from that. Also I didn't get the idea of the joke (In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP). I don't see any reasons that applies to Pig but not Hive. We should continue the discussion here, but anything in the Pig's discussion should also be considered here. Zheng On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah a...@cloudera.com wrote: I am personally +1 on Hive being a TLP, I think it did reach the community adoption and maturity level required for that. In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP :) (jk). The real question to ask is whether there is a volunteer to take care of the administrative tasks, which isn't a ton of work afaiu (I am willing to volunteer if no body else up to the task, but I am not a committer and only contributed a minor patch for bash/cygwin). BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP tradeoffs. I happen to agree with all he says, and frankly I couldn't have wrote it better my self. I highlight certain parts from his message, but I recommend you read the whole thing. -- Forwarded message -- From: Chris Douglas cdoug...@apache.org Date: Tue, Apr 13, 2010 at 11:46 PM Subject: Subprojects and TLP status To: gene...@hadoop.apache.org, priv...@hadoop.apache.org Most of Hadoop's subprojects have discussed becoming top-level Apache projects (TLPs) in the last few weeks. Most have expressed a desire to remain in Hadoop. The salient parts of the discussions I've read tend to focus on three aspects: a technical dependence on Hadoop, additional overhead as a TLP, and visibility both within the Hadoop ecosystem and in the open source community generally. Life as a TLP: this is not much harder than being a Hadoop subproject, and the Apache preferences being tossed around- particularly insufficiently diverse- are not blockers. Every subproject needs to write a section of the report Hadoop sends to the board; almost the same report, sent to a new address. The initial cost is similarly light: copy bylaws, send a few notes to INFRA, and follow some directions. I think the estimated costs are far higher than they will be in practice. Inertia is a powerful force, but it should be overcome. The directions are here, and should not intimidating: http://apache.org/dev/project-creation.html Visibility: the Hadoop site does not need to change. For each subproject, we can literally change the hyperlinks to point to the new page and be done. Long-term, linking to all ASF projects that run on Hadoop from a prominent page is something we all want. So particularly in the medium-term that most are considering: visibility through the website will not change. Each subproject will still be linked from the front page. Hadoop would not be nearly as popular as it is without Zookeeper, HBase, Hive, and Pig. All statistics on work in shared MapReduce clusters show that users vastly prefer running Pig and Hive queries to writing MapReduce jobs. HBase continues to push features in HDFS that increase its adoption and relevance outside MapReduce, while sharing some of its NoSQL limelight. Zookeeper is not only a linchpin in real workloads, but many proposals for future features require it. The bottom line is that MapReduce and HDFS need these projects for visibility and adoption in precisely the same way. I don't think separate TLPs will uncouple the broader community from one another. Technical dependence: this has two dimensions. First, influencing MapReduce and HDFS. This is nonsense. Earning influence by contributing to a subproject is the only way to push code changes; nobody from any of these projects has violated that by unilaterally committing to HDFS or MapReduce, anyway. And anyone cynical enough to believe that MapReduce and HDFS would deliberately screw over or ignore dependent projects because they don't have PMC members is plainly unsuited to community-driven development. I understand that these projects need to protect their users, but lobbying rights are not an actual benefit. Second, being a coherent part of the Hadoop ecosystem. It is (mostly) true that
Hudson build is back to normal : Hive-trunk-h0.18 #421
See http://hudson.zones.apache.org/hudson/job/Hive-trunk-h0.18/421/changes
RE: [DISCUSSION] To be (or not to be) a TLP - that is the question
What is the advantage of becoming a TLP to the project itself? I have heard that it is something that apache wants, but considering that we are very comfortable on how Hive interacts with the Hadoop ecosystem as a sub project for Hadoop, there has to be some big incentive for the project to be a TLP and nowhere have a seen how this would benefit Hive. Any thoughts on that? Ashish From: Jeff Hammerbacher [mailto:ham...@cloudera.com] Sent: Wednesday, April 21, 2010 7:35 PM To: hive-dev@hadoop.apache.org Cc: Ashish Thusoo Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question Hive already does the work to run on multiple versions of Hadoop, and the release cycle is independent of Hadoop's. I don't see why it should remain a subproject. I'm +1 on Hive becoming a TLP. On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao zsh...@gmail.commailto:zsh...@gmail.com wrote: As a Hive committer, I don't feel the benefit we get from becoming a TLP is big enough (compared with the cost) to make Hive a TLP. From Chris's comment I see that the cost is not that big, but I still wonder what benefit we will get from that. Also I didn't get the idea of the joke (In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP). I don't see any reasons that applies to Pig but not Hive. We should continue the discussion here, but anything in the Pig's discussion should also be considered here. Zheng On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah a...@cloudera.commailto:a...@cloudera.com wrote: I am personally +1 on Hive being a TLP, I think it did reach the community adoption and maturity level required for that. In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP :) (jk). The real question to ask is whether there is a volunteer to take care of the administrative tasks, which isn't a ton of work afaiu (I am willing to volunteer if no body else up to the task, but I am not a committer and only contributed a minor patch for bash/cygwin). BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP tradeoffs. I happen to agree with all he says, and frankly I couldn't have wrote it better my self. I highlight certain parts from his message, but I recommend you read the whole thing. -- Forwarded message -- From: Chris Douglas cdoug...@apache.orgmailto:cdoug...@apache.org Date: Tue, Apr 13, 2010 at 11:46 PM Subject: Subprojects and TLP status To: gene...@hadoop.apache.orgmailto:gene...@hadoop.apache.org, priv...@hadoop.apache.orgmailto:priv...@hadoop.apache.org Most of Hadoop's subprojects have discussed becoming top-level Apache projects (TLPs) in the last few weeks. Most have expressed a desire to remain in Hadoop. The salient parts of the discussions I've read tend to focus on three aspects: a technical dependence on Hadoop, additional overhead as a TLP, and visibility both within the Hadoop ecosystem and in the open source community generally. Life as a TLP: this is not much harder than being a Hadoop subproject, and the Apache preferences being tossed around- particularly insufficiently diverse- are not blockers. Every subproject needs to write a section of the report Hadoop sends to the board; almost the same report, sent to a new address. The initial cost is similarly light: copy bylaws, send a few notes to INFRA, and follow some directions. I think the estimated costs are far higher than they will be in practice. Inertia is a powerful force, but it should be overcome. The directions are here, and should not intimidating: http://apache.org/dev/project-creation.html Visibility: the Hadoop site does not need to change. For each subproject, we can literally change the hyperlinks to point to the new page and be done. Long-term, linking to all ASF projects that run on Hadoop from a prominent page is something we all want. So particularly in the medium-term that most are considering: visibility through the website will not change. Each subproject will still be linked from the front page. Hadoop would not be nearly as popular as it is without Zookeeper, HBase, Hive, and Pig. All statistics on work in shared MapReduce clusters show that users vastly prefer running Pig and Hive queries to writing MapReduce jobs. HBase continues to push features in HDFS that increase its adoption and relevance outside MapReduce, while sharing some of its NoSQL limelight. Zookeeper is not only a linchpin in real workloads, but many proposals for future features require it. The bottom line is that MapReduce and HDFS need these projects for visibility and adoption in precisely the same way. I don't think separate TLPs will uncouple the broader community from one another. Technical dependence: this has two dimensions. First, influencing MapReduce and HDFS. This is nonsense. Earning influence by contributing to a subproject is the only way to push code changes;
Re: [DISCUSSION] To be (or not to be) a TLP - that is the question
I am definitely against moving Hive out of Hadoop. There is appreciable representation of Hive inside the Hadoop PMC and, as far as I can say, there is no additional burden on the Hadooo PMC to make Hive remain inside Hadoop. I respect Jeff/Amr's comments on their viewpoints, but I beg to differ from that. I really do not see any benefit on moving Hive out of Hadoop. thanks, dhruba On Thu, Apr 22, 2010 at 10:09 AM, Ashish Thusoo athu...@facebook.comwrote: What is the advantage of becoming a TLP to the project itself? I have heard that it is something that apache wants, but considering that we are very comfortable on how Hive interacts with the Hadoop ecosystem as a sub project for Hadoop, there has to be some big incentive for the project to be a TLP and nowhere have a seen how this would benefit Hive. Any thoughts on that? Ashish From: Jeff Hammerbacher [mailto:ham...@cloudera.com] Sent: Wednesday, April 21, 2010 7:35 PM To: hive-dev@hadoop.apache.org Cc: Ashish Thusoo Subject: Re: [DISCUSSION] To be (or not to be) a TLP - that is the question Hive already does the work to run on multiple versions of Hadoop, and the release cycle is independent of Hadoop's. I don't see why it should remain a subproject. I'm +1 on Hive becoming a TLP. On Tue, Apr 20, 2010 at 2:03 PM, Zheng Shao zsh...@gmail.commailto: zsh...@gmail.com wrote: As a Hive committer, I don't feel the benefit we get from becoming a TLP is big enough (compared with the cost) to make Hive a TLP. From Chris's comment I see that the cost is not that big, but I still wonder what benefit we will get from that. Also I didn't get the idea of the joke (In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP). I don't see any reasons that applies to Pig but not Hive. We should continue the discussion here, but anything in the Pig's discussion should also be considered here. Zheng On Mon, Apr 19, 2010 at 5:48 PM, Amr Awadallah a...@cloudera.commailto: a...@cloudera.com wrote: I am personally +1 on Hive being a TLP, I think it did reach the community adoption and maturity level required for that. In fact, one could argue that Pig opting not to be TLP yet is why Hive should go TLP :) (jk). The real question to ask is whether there is a volunteer to take care of the administrative tasks, which isn't a ton of work afaiu (I am willing to volunteer if no body else up to the task, but I am not a committer and only contributed a minor patch for bash/cygwin). BTW, here is a very nice summary from Yahoo's Chris Douglas on TLP tradeoffs. I happen to agree with all he says, and frankly I couldn't have wrote it better my self. I highlight certain parts from his message, but I recommend you read the whole thing. -- Forwarded message -- From: Chris Douglas cdoug...@apache.orgmailto:cdoug...@apache.org Date: Tue, Apr 13, 2010 at 11:46 PM Subject: Subprojects and TLP status To: gene...@hadoop.apache.orgmailto:gene...@hadoop.apache.org, priv...@hadoop.apache.orgmailto:priv...@hadoop.apache.org Most of Hadoop's subprojects have discussed becoming top-level Apache projects (TLPs) in the last few weeks. Most have expressed a desire to remain in Hadoop. The salient parts of the discussions I've read tend to focus on three aspects: a technical dependence on Hadoop, additional overhead as a TLP, and visibility both within the Hadoop ecosystem and in the open source community generally. Life as a TLP: this is not much harder than being a Hadoop subproject, and the Apache preferences being tossed around- particularly insufficiently diverse- are not blockers. Every subproject needs to write a section of the report Hadoop sends to the board; almost the same report, sent to a new address. The initial cost is similarly light: copy bylaws, send a few notes to INFRA, and follow some directions. I think the estimated costs are far higher than they will be in practice. Inertia is a powerful force, but it should be overcome. The directions are here, and should not intimidating: http://apache.org/dev/project-creation.html Visibility: the Hadoop site does not need to change. For each subproject, we can literally change the hyperlinks to point to the new page and be done. Long-term, linking to all ASF projects that run on Hadoop from a prominent page is something we all want. So particularly in the medium-term that most are considering: visibility through the website will not change. Each subproject will still be linked from the front page. Hadoop would not be nearly as popular as it is without Zookeeper, HBase, Hive, and Pig. All statistics on work in shared MapReduce clusters show that users vastly prefer running Pig and Hive queries to writing MapReduce jobs. HBase continues to push features in HDFS that increase its adoption and relevance outside MapReduce, while
[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket
[ https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859949#action_12859949 ] John Sichi commented on HIVE-987: - @Raghu: you are right. I screwed up when I was testing the embedded mode; it actually works fine already. Hive CLI Omnibus Improvement ticket --- Key: HIVE-987 URL: https://issues.apache.org/jira/browse/HIVE-987 Project: Hadoop Hive Issue Type: Improvement Reporter: Carl Steinbach Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar Add the following features to the Hive CLI: * Command History * ReadLine support ** HIVE-120: Add readline support/support for alt-based commands in the CLI ** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably need to use JLine instead. * Tab completion ** HIVE-97: tab completion for hive cli * Embedded/Standalone CLI modes, and ability to connect to different Hive Server instances. ** HIVE-818: Create a Hive CLI that connects to hive ThriftServer * .hiverc configuration file ** HIVE-920: .hiverc doesnt work * Improved support for comments. ** HIVE-430: Ability to comment desired for hive query files * Different output formats ** HIVE-49: display column header on CLI ** XML output format For additional inspiration we may want to look at the Postgres psql shell: http://www.postgresql.org/docs/8.1/static/app-psql.html Finally, it would be really cool if we implemented this in a generic fashion and spun it off as an apache-commons shell framework. It seems like most of the Apache Hadoop projects have their own shells, and I'm sure the same is true for non-Hadoop Apache projects as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1318) External Tables: Selecting a partition that does not exist produces errors
External Tables: Selecting a partition that does not exist produces errors -- Key: HIVE-1318 URL: https://issues.apache.org/jira/browse/HIVE-1318 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.0 Reporter: Edward Capriolo Attachments: partdoom.q {noformat} dfs -mkdir /tmp/a; dfs -mkdir /tmp/a/b; dfs -mkdir /tmp/a/c; create external table abc( key string, val string ) partitioned by (part int) location '/tmp/a/'; alter table abc ADD PARTITION (part=1) LOCATION 'b'; alter table abc ADD PARTITION (part=2) LOCATION 'c'; select key from abc where part=1; select key from abct where part=70; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1318) External Tables: Selecting a partition that does not exist produces errors
[ https://issues.apache.org/jira/browse/HIVE-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1318: -- Attachment: partdoom.q External Tables: Selecting a partition that does not exist produces errors -- Key: HIVE-1318 URL: https://issues.apache.org/jira/browse/HIVE-1318 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.0 Reporter: Edward Capriolo Attachments: partdoom.q {noformat} dfs -mkdir /tmp/a; dfs -mkdir /tmp/a/b; dfs -mkdir /tmp/a/c; create external table abc( key string, val string ) partitioned by (part int) location '/tmp/a/'; alter table abc ADD PARTITION (part=1) LOCATION 'b'; alter table abc ADD PARTITION (part=2) LOCATION 'c'; select key from abc where part=1; select key from abct where part=70; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket
[ https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859956#action_12859956 ] Ashish Thusoo commented on HIVE-987: I am +1 on this. I think this can open up good possibilities. I have not looked at sqlline code but how much does it depend on actually SQL dialect. Plus, how easy is it to extend to hdfs related command e.g. the CLI today has commands that can do set of conf variables. It also supports the hadoop dfs commands as well which talk directly to hdfs. I am not sure if too many people use them, but I do. Would be great to get them integrated with sqlline if that is possible. Hive CLI Omnibus Improvement ticket --- Key: HIVE-987 URL: https://issues.apache.org/jira/browse/HIVE-987 Project: Hadoop Hive Issue Type: Improvement Reporter: Carl Steinbach Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar Add the following features to the Hive CLI: * Command History * ReadLine support ** HIVE-120: Add readline support/support for alt-based commands in the CLI ** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably need to use JLine instead. * Tab completion ** HIVE-97: tab completion for hive cli * Embedded/Standalone CLI modes, and ability to connect to different Hive Server instances. ** HIVE-818: Create a Hive CLI that connects to hive ThriftServer * .hiverc configuration file ** HIVE-920: .hiverc doesnt work * Improved support for comments. ** HIVE-430: Ability to comment desired for hive query files * Different output formats ** HIVE-49: display column header on CLI ** XML output format For additional inspiration we may want to look at the Postgres psql shell: http://www.postgresql.org/docs/8.1/static/app-psql.html Finally, it would be really cool if we implemented this in a generic fashion and spun it off as an apache-commons shell framework. It seems like most of the Apache Hadoop projects have their own shells, and I'm sure the same is true for non-Hadoop Apache projects as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1319) Alter table add partition fails if ADD PARTITION is not in upper case
Alter table add partition fails if ADD PARTITION is not in upper case - Key: HIVE-1319 URL: https://issues.apache.org/jira/browse/HIVE-1319 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.0 Reporter: Edward Capriolo {noformat} dfs -mkdir /tmp/a; dfs -mkdir /tmp/a/b; dfs -mkdir /tmp/a/c; create external table abc( key string, val string ) partitioned by (part int) location '/tmp/a/'; alter table abc ADD PARTITION (part=1) LOCATION 'b'; alter table abc add partition (part=2) LOCATION 'c'; select key from abc where part=1; select key from abct where part=70; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket
[ https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859976#action_12859976 ] John Sichi commented on HIVE-987: - sqlline is agnostic as to SQL dialect, so commands such as show/describe/dfs just work. (The one exception I have found so far is the set command, which is throwing an NPE; probably something about the result set we return to list all the settings. Shouldn't be hard to fix.) sqlline has some of its own commands such as !help and !quit; these are always prefixed with bang. Anything else it just sends through, with the exception of comments, which it strips off before sending. Hive CLI Omnibus Improvement ticket --- Key: HIVE-987 URL: https://issues.apache.org/jira/browse/HIVE-987 Project: Hadoop Hive Issue Type: Improvement Reporter: Carl Steinbach Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar Add the following features to the Hive CLI: * Command History * ReadLine support ** HIVE-120: Add readline support/support for alt-based commands in the CLI ** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably need to use JLine instead. * Tab completion ** HIVE-97: tab completion for hive cli * Embedded/Standalone CLI modes, and ability to connect to different Hive Server instances. ** HIVE-818: Create a Hive CLI that connects to hive ThriftServer * .hiverc configuration file ** HIVE-920: .hiverc doesnt work * Improved support for comments. ** HIVE-430: Ability to comment desired for hive query files * Different output formats ** HIVE-49: display column header on CLI ** XML output format For additional inspiration we may want to look at the Postgres psql shell: http://www.postgresql.org/docs/8.1/static/app-psql.html Finally, it would be really cool if we implemented this in a generic fashion and spun it off as an apache-commons shell framework. It seems like most of the Apache Hadoop projects have their own shells, and I'm sure the same is true for non-Hadoop Apache projects as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-987) Hive CLI Omnibus Improvement ticket
[ https://issues.apache.org/jira/browse/HIVE-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12859978#action_12859978 ] John Sichi commented on HIVE-987: - Change SQLLINE_OPTS to this to use embedded mode: SQLLINE_OPTS='-u jdbc:hive:// -d org.apache.hadoop.hive.jdbc.HiveDriver -n sa' Options such as this can also be overridden on the command line when invoking, e.g. to connect to a particular server: hive --service beeline -u jdbc:hive://theirserver:10001/default Hive CLI Omnibus Improvement ticket --- Key: HIVE-987 URL: https://issues.apache.org/jira/browse/HIVE-987 Project: Hadoop Hive Issue Type: Improvement Reporter: Carl Steinbach Attachments: HIVE-987.1.patch, sqlline-1.0.8_eb.jar Add the following features to the Hive CLI: * Command History * ReadLine support ** HIVE-120: Add readline support/support for alt-based commands in the CLI ** Java-ReadLine is LGPL, but it depends on GPL readline library. We probably need to use JLine instead. * Tab completion ** HIVE-97: tab completion for hive cli * Embedded/Standalone CLI modes, and ability to connect to different Hive Server instances. ** HIVE-818: Create a Hive CLI that connects to hive ThriftServer * .hiverc configuration file ** HIVE-920: .hiverc doesnt work * Improved support for comments. ** HIVE-430: Ability to comment desired for hive query files * Different output formats ** HIVE-49: display column header on CLI ** XML output format For additional inspiration we may want to look at the Postgres psql shell: http://www.postgresql.org/docs/8.1/static/app-psql.html Finally, it would be really cool if we implemented this in a generic fashion and spun it off as an apache-commons shell framework. It seems like most of the Apache Hadoop projects have their own shells, and I'm sure the same is true for non-Hadoop Apache projects as well. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1320) NPE with lineage in a query of union alls on joins.
NPE with lineage in a query of union alls on joins. --- Key: HIVE-1320 URL: https://issues.apache.org/jira/browse/HIVE-1320 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo The following query generates a NPE in the lineage ctx code EXPLAIN INSERT OVERWRITE TABLE dest_l1 SELECT j.* FROM (SELECT t1.key, p1.value FROM src1 t1 LEFT OUTER JOIN src p1 ON (t1.key = p1.key) UNION ALL SELECT t2.key, p2.value FROM src1 t2 LEFT OUTER JOIN src p2 ON (t2.key = p2.key)) j; The stack trace is: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116) at org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.
[ https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1320: Attachment: HIVE-1320.patch Fixed the NPE. The cause was that we were not checking for inp_dep to be null in the union all code path. We have to do that for all operators that have more than 1 parents. NPE with lineage in a query of union alls on joins. --- Key: HIVE-1320 URL: https://issues.apache.org/jira/browse/HIVE-1320 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Reporter: Ashish Thusoo Assignee: Ashish Thusoo Attachments: HIVE-1320.patch The following query generates a NPE in the lineage ctx code EXPLAIN INSERT OVERWRITE TABLE dest_l1 SELECT j.* FROM (SELECT t1.key, p1.value FROM src1 t1 LEFT OUTER JOIN src p1 ON (t1.key = p1.key) UNION ALL SELECT t2.key, p2.value FROM src1 t2 LEFT OUTER JOIN src p2 ON (t2.key = p2.key)) j; The stack trace is: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116) at org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1320) NPE with lineage in a query of union alls on joins.
[ https://issues.apache.org/jira/browse/HIVE-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashish Thusoo updated HIVE-1320: Status: Patch Available (was: Open) Affects Version/s: 0.6.0 Fix Version/s: 0.6.0 NPE with lineage in a query of union alls on joins. --- Key: HIVE-1320 URL: https://issues.apache.org/jira/browse/HIVE-1320 Project: Hadoop Hive Issue Type: Bug Components: Query Processor Affects Versions: 0.6.0 Reporter: Ashish Thusoo Assignee: Ashish Thusoo Fix For: 0.6.0 Attachments: HIVE-1320.patch The following query generates a NPE in the lineage ctx code EXPLAIN INSERT OVERWRITE TABLE dest_l1 SELECT j.* FROM (SELECT t1.key, p1.value FROM src1 t1 LEFT OUTER JOIN src p1 ON (t1.key = p1.key) UNION ALL SELECT t2.key, p2.value FROM src1 t2 LEFT OUTER JOIN src p2 ON (t2.key = p2.key)) j; The stack trace is: FAILED: Hive Internal Error: java.lang.NullPointerException(null) java.lang.NullPointerException at org.apache.hadoop.hive.ql.optimizer.lineage.LineageCtx$Index.mergeDependency(LineageCtx.java:116) at org.apache.hadoop.hive.ql.optimizer.lineage.OpProcFactory$UnionLineage.process(OpProcFactory.java:396) at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:54) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.PreOrderWalker.walk(PreOrderWalker.java:59) at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102) at org.apache.hadoop.hive.ql.optimizer.lineage.Generator.transform(Generator.java:72) at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:83) at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:5976) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) at org.apache.hadoop.hive.ql.parse.ExplainSemanticAnalyzer.analyzeInternal(ExplainSemanticAnalyzer.java:48) at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:126) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860061#action_12860061 ] John Sichi commented on HIVE-259: - I couldn't see the point of having two competing UDF guide pages, so I renamed the XPath-specific one as such and linked it from the main one. Just housekeeping to reduce confusion; I did not actually add the percentile info. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Fix For: 0.6.0 Attachments: HIVE-259-2.patch, HIVE-259-3.patch, HIVE-259.1.patch, HIVE-259.4.patch, HIVE-259.5.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.