[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site
[ https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917016#action_12917016 ] Edward Capriolo commented on HIVE-1611: --- Now that hive is TLP we likely have to get the ball rolling and cut the cord with hadoop. I will contact infra and see what our options are. We have a few issues. -we need to move the SVN from a hadoop subproject to a toplevel svn. -after we do that we need to take the forest docs and move them into hive then can change the search box If we want to see the skinconf change done first, we should open/transfer this ticket to core I believe. Add alternative search-provider to Hive site Key: HIVE-1611 URL: https://issues.apache.org/jira/browse/HIVE-1611 Project: Hadoop Hive Issue Type: Improvement Reporter: Alex Baranau Assignee: Alex Baranau Priority: Minor Attachments: HIVE-1611.patch Use search-hadoop.com service to make available search in Hive sources, MLs, wiki, etc. This was initially proposed on user mailing list. The search service was already added in site's skin (common for all Hadoop related projects) before so this issue is about enabling it for Hive. The ultimate goal is to use it at all Hadoop's sub-projects' sites. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1668) Move HWI out to Github
[ https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914584#action_12914584 ] Edward Capriolo commented on HIVE-1668: --- Jeff, I disagree. The build and test errors are not insurmountable. In fact some if not most of the ERRORS were cascading changes that were not tested properly. For example: https://issues.apache.org/jira/browse/HIVE-1183 was a fix I had to do because someone broke it here. https://issues.apache.org/jira/browse/HIVE-978 because someone wanted all jars to be named whatever.${version} and did not bother to look across all the shell script files that startup hive. https://issues.apache.org/jira/browse/HIVE-1294 again someone changed some shell scripts and only tested the cli. https://issues.apache.org/jira/browse/HIVE-752 again someone broke hwi without testing it. https://issues.apache.org/jira/browse/HIVE-1615, not really anyone's fault but no API stability across hive. I do not see why one method went away and another similar method took its place. I have been of course talking about moving HWI to wikit for a while moving from JSP to Servlet/ Java code will fix errors, but the little time I do have I usually have to spend detecting and cleaning up other breakages. HUE and Beeswax I honestly do not know, but sounds like you need extra magical stuff to make this work, and HWI works with hive on its own (onless people break it) Move HWI out to Github -- Key: HIVE-1668 URL: https://issues.apache.org/jira/browse/HIVE-1668 Project: Hadoop Hive Issue Type: Improvement Components: Web UI Reporter: Jeff Hammerbacher I have seen HWI cause a number of build and test errors, and it's now going to cost us some extra work for integration with security. We've worked on hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick with HWI. I think it's time to move it out to Github. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1668) Move HWI out to Github
[ https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914605#action_12914605 ] Edward Capriolo commented on HIVE-1668: --- Plus, not to get too far off topic, but there is a huge portion of the hadoop community that thinks Security? So what? Who cares? I am not going to run Active Directory or Kerberos just so I can say My hadoop is is secure . It adds latency to many processes, complexity to the overall design of hadoop, and does not even encrypt data in transit. Many people are going to elect not to use hadoop security for those reasons. Is extra work a reason not to do something? Are we going to move the Hive Thrift server out to github too because of the burden of extra work? It is a lot of extra work for me when hadoop renames all its jmx counters or tells me all my code is deprecated because of our new slick mapreduce.* api. I have learned to roll with the punches. Move HWI out to Github -- Key: HIVE-1668 URL: https://issues.apache.org/jira/browse/HIVE-1668 Project: Hadoop Hive Issue Type: Improvement Components: Web UI Reporter: Jeff Hammerbacher I have seen HWI cause a number of build and test errors, and it's now going to cost us some extra work for integration with security. We've worked on hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick with HWI. I think it's time to move it out to Github. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1668) Move HWI out to Github
[ https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914669#action_12914669 ] Edward Capriolo commented on HIVE-1668: --- {quote}That's not a great argument for keeping code that's onerous to maintain in trunk.{quote} Its not onerous to maintain. As you can see from the tickets I pointed out it broke because it was not tested. For example, https://issues.apache.org/jira/browse/HIVE-752 when designing SHIM classes that specify a classname in a string, one has to make sure they get the class name correct. I know it was an over site, but I am sure someone fired up the CLI and made sure the class name was correct. As for https://issues.apache.org/jira/browse/HIVE-978, I specifically mentioned how to test this any why it should be tested in the patch and it still turned out not to work right. pragmatic is the perfect word. HWI was never made to be fancy. Anyone who has hive can build and run the web interface. With no extra dependencies. It looks like to use Beeswax you need Hue, which means you need to go somewhere else and get it and install it. It seems like you need to patch or load extra plugins to your namenode and datanode like org.apache.hadoop.thriftfs.NamenodePlugin, It looks like (http://archive.cloudera.com/cdh/3/hue/manual.html#_install_hue) you need: gcc gcc libxml2-devel libxml2-dev libxslt-devel libxslt-dev mysql-devel librarysqlclient-dev python-develpython-dev python-setuptools python-setuptools sqlite-devellibsqlite3-dev The pragmatic approach, is to use the web interface provided by hive. You do not need anything external like python, or have to make any changes to their environment. That is why I think we should stay part of the hive distribution. I'm -1 on taking it out. Move HWI out to Github -- Key: HIVE-1668 URL: https://issues.apache.org/jira/browse/HIVE-1668 Project: Hadoop Hive Issue Type: Improvement Components: Web UI Reporter: Jeff Hammerbacher I have seen HWI cause a number of build and test errors, and it's now going to cost us some extra work for integration with security. We've worked on hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick with HWI. I think it's time to move it out to Github. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1668) Move HWI out to Github
[ https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914741#action_12914741 ] Edward Capriolo commented on HIVE-1668: --- {quote}It should also help mature the product for eventual inclusion in trunk.{quote} Why would we move something from hive out to github, just to move it back to hive? {quote}Empirically, they don't. The value of the web interface to users is not nearly as high as the pain it causes the developers for maintenance.{quote} Who are these developers who maintain it? Has anyone every added a feature beside me? I'm not complaining. http://blog.milford.io/2010/06/getting-the-hive-web-interface-hwi-to-work-on-centos/ {quote}The Hive Web Interface is a pretty sweet deal.{quote} Sounds like people like it. Why are we debating the past state of hwi? It works now. If someone reports a bug I typically investigate and patch that same day. I challenge anyone to open a ticket on core user, called remove name node web interface to github and tried to say now offers a better name node interface using python. The ticket would instantly get a RESOLVED: WILL NOT FIX. Why is this any different? Move HWI out to Github -- Key: HIVE-1668 URL: https://issues.apache.org/jira/browse/HIVE-1668 Project: Hadoop Hive Issue Type: Improvement Components: Web UI Reporter: Jeff Hammerbacher I have seen HWI cause a number of build and test errors, and it's now going to cost us some extra work for integration with security. We've worked on hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick with HWI. I think it's time to move it out to Github. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913741#action_12913741 ] Edward Capriolo commented on HIVE-842: -- By attack the Web UI separately what is meant? Will it be broken or non-functional at any phase here? That is what I find happens often, some of it is really the WUI's fault for using JSP and not servlets, but there is no simple way to code cover the wui and all the different ways its gets broken. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-268) Insert Overwrite Directory to accept configurable table row format
[ https://issues.apache.org/jira/browse/HIVE-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910656#action_12910656 ] Edward Capriolo commented on HIVE-268: -- Still not exactly what you want but with CTAS you can essentially get a folder in /user/hive/warehouse/tableIWant with the format you want. Insert Overwrite Directory to accept configurable table row format Key: HIVE-268 URL: https://issues.apache.org/jira/browse/HIVE-268 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Zheng Shao Assignee: Paul Yang There is no way for the users to control the file format when they are outputting the result into a directory. We should allow: {code} INSERT OVERWRITE DIRECTORY /user/zshao/result ROW FORMAT DELIMITED FIELDS TERMINATED BY '9' SELECT tablea.* from tablea; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods
[ https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1615: -- Fix Version/s: 0.7.0 (was: 0.6.0) Affects Version/s: 0.6.0 (was: 0.5.1) Web Interface JSP needs Refactoring for removed meta store methods -- Key: HIVE-1615 URL: https://issues.apache.org/jira/browse/HIVE-1615 Project: Hadoop Hive Issue Type: Bug Components: Web UI Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Blocker Fix For: 0.7.0 Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt Some meta store methods being called from JSP have been removed. Really should prioritize compiling jsp into servlet code again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined
[ https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1613: -- Fix Version/s: 0.6.0 (was: 0.7.0) Affects Version/s: 0.5.1 (was: 0.6.0) Priority: Blocker (was: Major) I think we should patch this as well functionality was broken. hive --service jar looks for hadoop version but was not defined --- Key: HIVE-1613 URL: https://issues.apache.org/jira/browse/HIVE-1613 Project: Hadoop Hive Issue Type: Bug Components: Clients Affects Versions: 0.5.1 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Blocker Fix For: 0.6.0 Attachments: hive-1613.patch.txt hive --service jar fails. I have to open another ticket to clean up the scripts and unify functions like version detection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1613) hive --service jar looks for hadoop version but was not defined
hive --service jar looks for hadoop version but was not defined --- Key: HIVE-1613 URL: https://issues.apache.org/jira/browse/HIVE-1613 Project: Hadoop Hive Issue Type: Bug Components: Clients Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 hive --service jar fails. I have to open another ticket to clean up the scripts and unify functions like version detection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined
[ https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1613: -- Status: Patch Available (was: Open) hive --service jar looks for hadoop version but was not defined --- Key: HIVE-1613 URL: https://issues.apache.org/jira/browse/HIVE-1613 Project: Hadoop Hive Issue Type: Bug Components: Clients Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: hive-1613.patch.txt hive --service jar fails. I have to open another ticket to clean up the scripts and unify functions like version detection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined
[ https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1613: -- Attachment: hive-1613.patch.txt hive --service jar looks for hadoop version but was not defined --- Key: HIVE-1613 URL: https://issues.apache.org/jira/browse/HIVE-1613 Project: Hadoop Hive Issue Type: Bug Components: Clients Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: hive-1613.patch.txt hive --service jar fails. I have to open another ticket to clean up the scripts and unify functions like version detection. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for deprecated meta store methods
[ https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1615: -- Attachment: hive-1615.patch.txt Web Interface JSP needs Refactoring for deprecated meta store methods - Key: HIVE-1615 URL: https://issues.apache.org/jira/browse/HIVE-1615 Project: Hadoop Hive Issue Type: Bug Components: Web UI Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: hive-1615.patch.txt Some meta store methods being called from JSP have been removed. Really should prioritize compiling jsp into servlet code again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods
[ https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1615: -- Summary: Web Interface JSP needs Refactoring for removed meta store methods (was: Web Interface JSP needs Refactoring for deprecated meta store methods) Web Interface JSP needs Refactoring for removed meta store methods -- Key: HIVE-1615 URL: https://issues.apache.org/jira/browse/HIVE-1615 Project: Hadoop Hive Issue Type: Bug Components: Web UI Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: hive-1615.patch.txt Some meta store methods being called from JSP have been removed. Really should prioritize compiling jsp into servlet code again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods
[ https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1615: -- Attachment: hive-1615.patch.2.txt Web Interface JSP needs Refactoring for removed meta store methods -- Key: HIVE-1615 URL: https://issues.apache.org/jira/browse/HIVE-1615 Project: Hadoop Hive Issue Type: Bug Components: Web UI Affects Versions: 0.7.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt Some meta store methods being called from JSP have been removed. Really should prioritize compiling jsp into servlet code again. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-471) A UDF for simple reflection
[ https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-471: - Attachment: HIVE-471.6.patch.txt A UDF for simple reflection --- Key: HIVE-471 URL: https://issues.apache.org/jira/browse/HIVE-471 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Fix For: 0.7.0 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, hive-471.diff There are many methods in java that are static and have no arguments or can be invoked with one simple parameter. More complicated functions will require a UDF but one generic one can work as a poor-mans UDF. {noformat} SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, isEmpty) FROM src LIMIT 1; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-471) A UDF for simple reflection
[ https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-471: - Status: Patch Available (was: Open) A UDF for simple reflection --- Key: HIVE-471 URL: https://issues.apache.org/jira/browse/HIVE-471 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Fix For: 0.7.0 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, hive-471.diff There are many methods in java that are static and have no arguments or can be invoked with one simple parameter. More complicated functions will require a UDF but one generic one can work as a poor-mans UDF. {noformat} SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, isEmpty) FROM src LIMIT 1; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-471) A UDF for simple reflection
[ https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-471: - Status: Patch Available (was: Open) Affects Version/s: 0.6.0 (was: 0.5.1) Fix Version/s: 0.7.0 A UDF for simple reflection --- Key: HIVE-471 URL: https://issues.apache.org/jira/browse/HIVE-471 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Fix For: 0.7.0 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, hive-471.diff There are many methods in java that are static and have no arguments or can be invoked with one simple parameter. More complicated functions will require a UDF but one generic one can work as a poor-mans UDF. {noformat} SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, isEmpty) FROM src LIMIT 1; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902616#action_12902616 ] Edward Capriolo commented on HIVE-1434: --- Maven, I am on the fence about it. We actually do not need all the libs I included. Having them in a tarball sounds good, but making a maven repo for only this purpose seems to be a lot of work. {quote} Should we attempt to factor out the HBase commonality immediately, or commit the overlapping code and then do refactoring as a followup? I'm fine either way; I can give suggestions on how to create the reusable abstract bases and where to package+name them.{quote} If you can specify specific instances then sure. The code may be 99% the same, but that one nuance is going to make the abstractions confusing and useless. I await further review. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: cas-handle.tar.gz, hive-1434-1.txt, hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1505) Support non-UTF8 data
[ https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900697#action_12900697 ] Edward Capriolo commented on HIVE-1505: --- Maybe you should fork hive and call it chive. On a serious node . Great job. Would you consider editing the cli.xml in the xdocs to explain this feature? I think it would be very helpful look in docs/xdocs/. Support non-UTF8 data - Key: HIVE-1505 URL: https://issues.apache.org/jira/browse/HIVE-1505 Project: Hadoop Hive Issue Type: New Feature Components: Serializers/Deserializers Affects Versions: 0.5.0 Reporter: bc Wong Assignee: Ted Xu Attachments: trunk-encoding.patch I'd like to work with non-UTF8 data easily. Suppose I have data in latin1. Currently, doing a select * will return the upper ascii characters in '\xef\xbf\xbd', which is the replacement character '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different encodings, or to have a concept of byte string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1555) JDBC Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900039#action_12900039 ] Edward Capriolo commented on HIVE-1555: --- I wonder if this could end up being a very effective way to query shared data stores. I think I saw something like this in futurama.. Dont worry about querying blank, let me worry about querying blank. http://www.google.com/url?sa=tsource=webcd=2ved=0CBcQFjABurl=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DB5cAwTEEGNEei=Qk9sTLAThIqXB__DzDwusg=AFQjCNH_TOUS1cl6t0gZXefRURw0a_feZg JDBC Storage Handler Key: HIVE-1555 URL: https://issues.apache.org/jira/browse/HIVE-1555 Project: Hadoop Hive Issue Type: New Feature Affects Versions: 0.5.0 Reporter: Bob Robertson Original Estimate: 24h Remaining Estimate: 24h With the Cassandra and HBase Storage Handlers I thought it would make sense to include a generic JDBC RDBMS Storage Handler so that you could import a standard DB table into Hive. Many people must want to perform HiveQL joins, etc against tables in other systems etc. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1434: -- Attachment: hive-1434-4-patch.txt Refactored the code, added xdoc, more extensive testing. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: cas-handle.tar.gz, hive-1434-1.txt, hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR
[ https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898030#action_12898030 ] Edward Capriolo commented on HIVE-1530: --- I like the default xml. Hive has many undocumented options, new ones are being added often. Are end users going to know which jar the default.xml are in? Users want to extracting a jar just to get the conf out of it to read the description of the setting. As for what hadoop does...I personally find it annoying to have navigate to hadoop/src/mapred/mapred-default.xml or to hadoop/src/hdfs/hdfs-default.xml to figure out what options I have for settings. So i do not really thing we should just do it to be like hadoop it it makes peoples life harder. If anything please keep it as hive-site.xml.sample. Include hive-default.xml and hive-log4j.properties in hive-common JAR - Key: HIVE-1530 URL: https://issues.apache.org/jira/browse/HIVE-1530 Project: Hadoop Hive Issue Type: Improvement Components: Configuration Reporter: Carl Steinbach hive-common-*.jar should include hive-default.xml and hive-log4j.properties, and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The hive-default.xml file that currently sits in the conf/ directory should be removed. Motivations for this change: * We explicitly tell users that they should never modify hive-default.xml yet give them the opportunity to do so by placing the file in the conf dir. * Many users are familiar with the Hadoop configuration mechanism that does not require *-default.xml files to be present in the HADOOP_CONF_DIR, and assume that the same is true for HIVE_CONF_DIR. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1434: -- Attachment: hive-1434-3-patch.txt Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: cas-handle.tar.gz, hive-1434-1.txt, hive-1434-2-patch.txt, hive-1434-3-patch.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1434: -- Status: Patch Available (was: Open) This patch has full read/write functionality. I am going to do another patch later today with xdocs, but do not expect any code changes. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: cas-handle.tar.gz, hive-1434-1.txt, hive-1434-2-patch.txt, hive-1434-3-patch.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1511) Hive plan serialization is slow
[ https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895352#action_12895352 ] Edward Capriolo commented on HIVE-1511: --- Also possibly a clever way to remove duplicate expressions that evaluate to the same result such as multiple key=0 Hive plan serialization is slow --- Key: HIVE-1511 URL: https://issues.apache.org/jira/browse/HIVE-1511 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Ning Zhang As reported by Edward Capriolo: For reference I did this as a test case SELECT * FROM src where key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR key=0 OR ...(100 more of these) No OOM but I gave up after the test case did not go anywhere for about 2 minutes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1434: -- Attachment: hive-1434-2-patch.txt Closing in on this one. This patch sets up build environment correctly. Proper test infrastructure. Patch is much cleaner. Still working on Serializing/Deserialing correctly so not very functional. 80% I think. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: cas-handle.tar.gz, hive-1434-1.txt, hive-1434-2-patch.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1441) Extend ivy offline mode to cover metastore downloads
[ https://issues.apache.org/jira/browse/HIVE-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894133#action_12894133 ] Edward Capriolo commented on HIVE-1441: --- Fresh checkout before any after patch. Still looking into it. {noformat} /properties testcase classname=org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote name=testPartition time=8.242 error message=Could not connect to meta store using any of the URIs provided type=org.apache.hadoop.hive.metastore.api.MetaExceptionM etaException(message:Could not connect to meta store using any of the URIs provided) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:160) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lt;initgt;(HiveMetaStoreClient.java:128) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lt;initgt;(HiveMetaStoreClient.java:71) at org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote.setUp(TestHiveMetaStoreRemote.java:64) at junit.framework.TestCase.runBare(TestCase.java:125) at junit.framework.TestResult$1.protect(TestResult.java:106) at junit.framework.TestResult.runProtected(TestResult.java:124) at junit.framework.TestResult.run(TestResult.java:109) at junit.framework.TestCase.run(TestCase.java:118) at junit.framework.TestSuite.runTest(TestSuite.java:208) at junit.framework.TestSuite.run(TestSuite.java:203) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931) at org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785) /error /testcase system-out![CDATA[Running metastore! ]]/system-out system-err![CDATA[]]/system-err /testsuite From eclipse: Running metastore! MetaException(message:hive.metastore.warehouse.dir is not set in the config or blank) at org.apache.hadoop.hive.metastore.Warehouse.init(Warehouse.java:58) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:155) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:125) at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:1965) at org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote$RunMS.run(TestHiveMetaStoreRemote.java:39) at java.lang.Thread.run(Thread.java:619) 10/07/30 16:03:22 ERROR metastore.HiveMetaStore: Metastore Thrift Server threw an exception. Exiting... 10/07/30 16:03:22 ERROR metastore.HiveMetaStore: MetaException(message:hive.metastore.warehouse.dir is not set in the config or blank) at org.apache.hadoop.hive.metastore.Warehouse.init(Warehouse.java:58) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:155) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:125) at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:1965) at org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote$RunMS.run(TestHiveMetaStoreRemote.java:39) at java.lang.Thread.run(Thread.java:619) {noformat} Extend ivy offline mode to cover metastore downloads Key: HIVE-1441 URL: https://issues.apache.org/jira/browse/HIVE-1441 Project: Hadoop Hive Issue Type: Improvement Components: Build Infrastructure Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.7.0 Attachments: HIVE-1441.1.patch We recently started downloading datanucleus jars via ivy, and the existing ivy offilne mode doesn't cover this, so we still end up trying to contact the ivy repository even with offline mode enabled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
[ https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1294: -- Priority: Blocker (was: Minor) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface Key: HIVE-1294 URL: https://issues.apache.org/jira/browse/HIVE-1294 Project: Hadoop Hive Issue Type: Bug Components: Web UI Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Edward Capriolo Priority: Blocker The Hive Webserver fails to startup with the following error message, if HIVE_AUX_JARS_PATH environment variable is set (works fine if unset). $ build/dist/bin/hive --service hwi Exception in thread main java.io.IOException: Error opening job jar: -libjars at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:114) at java.util.jar.JarFile.init(JarFile.java:133) at java.util.jar.JarFile.init(JarFile.java:70) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) Slightly modifying the command line to launch hadoop in hwi.sh solves the problem: $ diff bin/ext/hwi.sh /tmp/new-hwi.sh 28c28 exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@ --- exec $HADOOP jar ${HWI_JAR_FILE} $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS $@ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
[ https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo reassigned HIVE-1294: - Assignee: Edward Capriolo HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface Key: HIVE-1294 URL: https://issues.apache.org/jira/browse/HIVE-1294 Project: Hadoop Hive Issue Type: Bug Components: Web UI Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Edward Capriolo Priority: Minor The Hive Webserver fails to startup with the following error message, if HIVE_AUX_JARS_PATH environment variable is set (works fine if unset). $ build/dist/bin/hive --service hwi Exception in thread main java.io.IOException: Error opening job jar: -libjars at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:114) at java.util.jar.JarFile.init(JarFile.java:133) at java.util.jar.JarFile.init(JarFile.java:70) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) Slightly modifying the command line to launch hadoop in hwi.sh solves the problem: $ diff bin/ext/hwi.sh /tmp/new-hwi.sh 28c28 exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@ --- exec $HADOOP jar ${HWI_JAR_FILE} $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS $@ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
[ https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1294: -- Attachment: hive-1294.patch.txt HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface Key: HIVE-1294 URL: https://issues.apache.org/jira/browse/HIVE-1294 Project: Hadoop Hive Issue Type: Bug Components: Web UI Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Edward Capriolo Priority: Blocker Attachments: hive-1294.patch.txt The Hive Webserver fails to startup with the following error message, if HIVE_AUX_JARS_PATH environment variable is set (works fine if unset). $ build/dist/bin/hive --service hwi Exception in thread main java.io.IOException: Error opening job jar: -libjars at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:114) at java.util.jar.JarFile.init(JarFile.java:133) at java.util.jar.JarFile.init(JarFile.java:70) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) Slightly modifying the command line to launch hadoop in hwi.sh solves the problem: $ diff bin/ext/hwi.sh /tmp/new-hwi.sh 28c28 exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@ --- exec $HADOOP jar ${HWI_JAR_FILE} $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS $@ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
[ https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1294: -- Status: Patch Available (was: Open) Fix Version/s: 0.6.0 Hwi does not start correctly without this patch. HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface Key: HIVE-1294 URL: https://issues.apache.org/jira/browse/HIVE-1294 Project: Hadoop Hive Issue Type: Bug Components: Web UI Affects Versions: 0.5.0 Reporter: Dilip Joseph Assignee: Edward Capriolo Priority: Blocker Fix For: 0.6.0 Attachments: hive-1294.patch.txt The Hive Webserver fails to startup with the following error message, if HIVE_AUX_JARS_PATH environment variable is set (works fine if unset). $ build/dist/bin/hive --service hwi Exception in thread main java.io.IOException: Error opening job jar: -libjars at org.apache.hadoop.util.RunJar.main(RunJar.java:90) Caused by: java.util.zip.ZipException: error in opening zip file at java.util.zip.ZipFile.open(Native Method) at java.util.zip.ZipFile.init(ZipFile.java:114) at java.util.jar.JarFile.init(JarFile.java:133) at java.util.jar.JarFile.init(JarFile.java:70) at org.apache.hadoop.util.RunJar.main(RunJar.java:88) Slightly modifying the command line to launch hadoop in hwi.sh solves the problem: $ diff bin/ext/hwi.sh /tmp/new-hwi.sh 28c28 exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@ --- exec $HADOOP jar ${HWI_JAR_FILE} $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS $@ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes
[ https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893772#action_12893772 ] Edward Capriolo commented on HIVE-1492: --- the largest file is the correct file Is that generally true or an absolute fact? FileSinkOperator should remove duplicated files from the same task based on file sizes -- Key: HIVE-1492 URL: https://issues.apache.org/jira/browse/HIVE-1492 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.7.0 Reporter: Ning Zhang Assignee: Ning Zhang Fix For: 0.7.0 Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to retain only one file for each task. A task could produce multiple files due to failed attempts or speculative runs. The largest file should be retained rather than the first file for each task. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1434: -- Attachment: cas-handle.tar.gz This is not a quality patch yet. I am still experimenting with some ideas. Everying is free form and will likely change before the final patch. There are a few junk files (HiveIColumn,etc) which will not be part of the release. Thus far: CassandraSplit.java HiveCassandraTableInputFormat.java CassandraSerDe.java TestColumnFamilyInputFormat.java TestCassandraPut.java TestColumnFamilyInputFormat.java Are working and can give you an idea of where the code is going. Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: cas-handle.tar.gz, hive-1434-1.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script
[ https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1414: -- Attachment: hive-1414-2.txt New version only reads hiverc if -i option is not specified. Includes xdocs. automatically invoke .hiverc init script Key: HIVE-1414 URL: https://issues.apache.org/jira/browse/HIVE-1414 Project: Hadoop Hive Issue Type: Improvement Components: Clients Affects Versions: 0.5.0 Reporter: John Sichi Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: hive-1414-2.txt, hive-1414-patch-1.txt Similar to .bashrc but run Hive SQL commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script
[ https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1414: -- Status: Patch Available (was: Open) automatically invoke .hiverc init script Key: HIVE-1414 URL: https://issues.apache.org/jira/browse/HIVE-1414 Project: Hadoop Hive Issue Type: Improvement Components: Clients Affects Versions: 0.5.0 Reporter: John Sichi Assignee: Edward Capriolo Fix For: 0.7.0 Attachments: hive-1414-2.txt, hive-1414-patch-1.txt Similar to .bashrc but run Hive SQL commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-471) A UDF for simple reflection
[ https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-471: - Attachment: HIVE-471.4.patch A UDF for simple reflection --- Key: HIVE-471 URL: https://issues.apache.org/jira/browse/HIVE-471 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.5.1 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, HIVE-471.3.patch, HIVE-471.4.patch, hive-471.diff There are many methods in java that are static and have no arguments or can be invoked with one simple parameter. More complicated functions will require a UDF but one generic one can work as a poor-mans UDF. {noformat} SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, isEmpty) FROM src LIMIT 1; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-471) A UDF for simple reflection
[ https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-471: - Status: Patch Available (was: Open) A UDF for simple reflection --- Key: HIVE-471 URL: https://issues.apache.org/jira/browse/HIVE-471 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.5.1 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, HIVE-471.3.patch, HIVE-471.4.patch, hive-471.diff There are many methods in java that are static and have no arguments or can be invoked with one simple parameter. More complicated functions will require a UDF but one generic one can work as a poor-mans UDF. {noformat} SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, isEmpty) FROM src LIMIT 1; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control
[ https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1446: -- Attachment: hive-1446-part-1.diff Got most of the language manual Move Hive Documentation from the wiki to version control Key: HIVE-1446 URL: https://issues.apache.org/jira/browse/HIVE-1446 Project: Hadoop Hive Issue Type: Task Components: Documentation Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0, 0.7.0 Attachments: hive-1446-part-1.diff, hive-1446.diff, hive-logo-wide.png Move the Hive Language Manual (and possibly some other documents) from the Hive wiki to version control. This work needs to be coordinated with the hive-dev and hive-user community in order to avoid missing any edits as well as to avoid or limit unavailability of the docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886912#action_12886912 ] Edward Capriolo commented on HIVE-1096: --- I am having trouble uploading with the update diff function of the review board. As I mentioned several times, I really had one simple requirement {noformat} hive -hiveconf DAY=5 -e LOAD DATA INFILE '/tmp/${DAY}' into logs partition=${DAY} {noformat} I am all for doing things 100% correct, but this is such a simple thing I am really getting worn out by the endless revisions and doing lots of fancy things just because someone might want to do ${x${y}bla}. Really, I would like to make this ticket go +1, and get on with something more interesting. Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0, 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Status: Patch Available (was: Open) Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0, 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Attachment: hive-1096-12.patch.txt Change interpolate to substituteAdded the substitution logic to file, dfs, set , and query processor Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0, 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884384#action_12884384 ] Edward Capriolo commented on HIVE-1434: --- I actually got pretty far with this simply duplicating the logic in the Hbase Storage handler. Unfortunately I hit a snafu. Cassandra is not using the deprecated mapred.*, their input format is using mapreduce.*. I have seen a few tickets for this, and as far as I know hive is 100% mapred. So to get this done we either have to wait until hive is converted to mapreduce, or I have to make an old school mapred based input format for cassandra. @John am I wrong? Is there a way to work with mapreduce input formats that I am not understanding? Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: hive-1434-1.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884141#action_12884141 ] Edward Capriolo commented on HIVE-1135: --- Carl, thank you for the assist! Use Anakia for version controlled documentation --- Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1446) Move Hive Documentation from the wiki to version control
[ https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884142#action_12884142 ] Edward Capriolo commented on HIVE-1446: --- I will make a xdoc the CLI page Move Hive Documentation from the wiki to version control Key: HIVE-1446 URL: https://issues.apache.org/jira/browse/HIVE-1446 Project: Hadoop Hive Issue Type: Task Components: Documentation Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0, 0.7.0 Move the Hive Language Manual (and possibly some other documents) from the Hive wiki to version control. This work needs to be coordinated with the hive-dev and hive-user community in order to avoid missing any edits as well as to avoid or limit unavailability of the docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control
[ https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1446: -- Attachment: hive-logo-wide.png We need this wise logo to fix the align of the generated docs Move Hive Documentation from the wiki to version control Key: HIVE-1446 URL: https://issues.apache.org/jira/browse/HIVE-1446 Project: Hadoop Hive Issue Type: Task Components: Documentation Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0, 0.7.0 Attachments: hive-logo-wide.png Move the Hive Language Manual (and possibly some other documents) from the Hive wiki to version control. This work needs to be coordinated with the hive-dev and hive-user community in order to avoid missing any edits as well as to avoid or limit unavailability of the docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control
[ https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1446: -- Attachment: hive-1446.diff Includes the image in the vsl to fix the alignment Move Hive Documentation from the wiki to version control Key: HIVE-1446 URL: https://issues.apache.org/jira/browse/HIVE-1446 Project: Hadoop Hive Issue Type: Task Components: Documentation Reporter: Carl Steinbach Assignee: Carl Steinbach Fix For: 0.6.0, 0.7.0 Attachments: hive-1446.diff, hive-logo-wide.png Move the Hive Language Manual (and possibly some other documents) from the Hive wiki to version control. This work needs to be coordinated with the hive-dev and hive-user community in order to avoid missing any edits as well as to avoid or limit unavailability of the docs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1434) Cassandra Storage Handler
[ https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1434: -- Attachment: hive-1434-1.txt Just a start. (To prove that I am doing something with this ticket) Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: hive-1434-1.txt Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882349#action_12882349 ] Edward Capriolo commented on HIVE-1135: --- Bump:: I will fix the formatting later. Can we commit this we do not really need any unit tests here? Use Anakia for version controlled documentation --- Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1434) Cassandra Storage Handler
Cassandra Storage Handler - Key: HIVE-1434 URL: https://issues.apache.org/jira/browse/HIVE-1434 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Add a cassandra storage handler. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Status: Patch Available (was: Open) Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1096: -- Attachment: hive-1096-11-patch.txt Was not interpolating system:vars. Fixed with better test case. Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0, 0.7.0 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-11-patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1431) Hive CLI can't handle query files that begin with comments
[ https://issues.apache.org/jira/browse/HIVE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882024#action_12882024 ] Edward Capriolo commented on HIVE-1431: --- We have a few tickets open, we really need to move all this stuff to a real parser so we can properly deal with things like ';' or comments like this or whatever. It is painfully hard to work around all these type of things and we never get to the root of the problem. Hive CLI can't handle query files that begin with comments -- Key: HIVE-1431 URL: https://issues.apache.org/jira/browse/HIVE-1431 Project: Hadoop Hive Issue Type: Bug Components: CLI Reporter: Carl Steinbach Fix For: 0.6.0, 0.7.0 {code} % cat test.q -- This is a comment, followed by a command set -v; -- -- Another comment -- show tables; -- Last comment (master) [ ~/Projects/hive ] % hive test.q Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt hive -- This is a comment, followed by a command set -v; FAILED: Parse Error: line 2:0 cannot recognize input 'set' hive -- -- Another comment -- show tables; OK rawchunks Time taken: 5.334 seconds hive -- Last comment (master) [ ~/Projects/hive ] % {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1419) Policy on deserialization errors
[ https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880908#action_12880908 ] Edward Capriolo commented on HIVE-1419: --- I am looking through this and trying to wrap my head around it. Off hand do you know what happens in this situation. We have a table that we have added columns to over time create table tab (a int, b int); Over time we have added more columns alter table tab (a int, b int, c int) This works fine for us as selecting column c on older data returns null for that column. Will this behaviour be preserved ? Policy on deserialization errors Key: HIVE-1419 URL: https://issues.apache.org/jira/browse/HIVE-1419 Project: Hadoop Hive Issue Type: Improvement Components: Serializers/Deserializers Affects Versions: 0.5.0 Reporter: Vladimir Klimontovich Assignee: Vladimir Klimontovich Priority: Minor Fix For: 0.5.1, 0.6.0 Attachments: corrupted_records_0.5.patch, corrupted_records_0.5_ver2.patch, corrupted_records_trunk.patch, corrupted_records_trunk_ver2.patch When deserializer throws an exception the whole map tasks fails (see MapOperator.java file). It's not always an convenient behavior especially on huge datasets where several corrupted lines could be a normal practice. Proposed solution: 1) Have a counter of corrupted records 2) When a counter exceeds a limit (configurable via hive.max.deserializer.errors property, 0 by default) throw an exception. Otherwise just log and exception with WARN level. Patches for 0.5 branch and trunk are attached -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1405) Implement a .hiverc startup file
[ https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880207#action_12880207 ] Edward Capriolo commented on HIVE-1405: --- I was thinking we just look for hive_rc in the users home directory and/or in hive_home/bin. If we find that file we have to read it line by line and process it just like other hive commands. We could restrict this to just set or add commands but there is no reason it could not have a full query. Implement a .hiverc startup file Key: HIVE-1405 URL: https://issues.apache.org/jira/browse/HIVE-1405 Project: Hadoop Hive Issue Type: New Feature Reporter: Jonathan Chang Assignee: John Sichi When deploying hive, it would be nice to have a .hiverc file containing statements that would be automatically run whenever hive is launched. This way, we can automatically add JARs, create temporary functions, set flags, etc. for all users quickly. This should ideally be set up like .bashrc and the like with a global version and a user-local version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880303#action_12880303 ] Edward Capriolo commented on HIVE-1135: --- Great on ivy. As for the wiki I think we should just put a node at the top of the page that says Do not edit me. Edit xdocs instead. For the pages we have migrated. I want to do like a page every other day so it should be done soon enough. I actually have commit access but I usually leave the commits up to the experts. Also since I worked on this ticket I really should not be the commit person. Anyone else? Use Anakia for version controlled documentation --- Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script
[ https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1414: -- Attachment: hive-1414-patch-1.txt First attempt at patch. automatically invoke .hiverc init script Key: HIVE-1414 URL: https://issues.apache.org/jira/browse/HIVE-1414 Project: Hadoop Hive Issue Type: Improvement Components: Clients Affects Versions: 0.5.0 Reporter: John Sichi Assignee: Edward Capriolo Attachments: hive-1414-patch-1.txt Similar to .bashrc but run Hive SQL commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1414) automatically invoke .hiverc init script
[ https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880414#action_12880414 ] Edward Capriolo commented on HIVE-1414: --- Files automatically: ql sourced env[HIVE_HOME]/bin/.hiverc, property(user.home)/.hiverc. I think only the CLI needs these features. Users of hive service are accessing the session though code repetition is not a problem, the same is true with JDBC. CLI users get the most benefit from the .hiverc. What do you think? automatically invoke .hiverc init script Key: HIVE-1414 URL: https://issues.apache.org/jira/browse/HIVE-1414 Project: Hadoop Hive Issue Type: Improvement Components: Clients Affects Versions: 0.5.0 Reporter: John Sichi Assignee: Edward Capriolo Attachments: hive-1414-patch-1.txt Similar to .bashrc but run Hive SQL commands. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1405) Implement a .hiverc startup file
[ https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880028#action_12880028 ] Edward Capriolo commented on HIVE-1405: --- I like Carl's approach. The entire point of the hiverc is not to explicitly have to do invoke anything explicit to add jars. Implement a .hiverc startup file Key: HIVE-1405 URL: https://issues.apache.org/jira/browse/HIVE-1405 Project: Hadoop Hive Issue Type: New Feature Reporter: Jonathan Chang Assignee: John Sichi When deploying hive, it would be nice to have a .hiverc file containing statements that would be automatically run whenever hive is launched. This way, we can automatically add JARs, create temporary functions, set flags, etc. for all users quickly. This should ideally be set up like .bashrc and the like with a global version and a user-local version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1405) Implement a .hiverc startup file
[ https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880053#action_12880053 ] Edward Capriolo commented on HIVE-1405: --- {noformat} [edw...@ec dist]$ echo show tables a.sql [edw...@ec dist]$ bin/hive [edw...@ec dist]$ chmod a+x a.sql [edw...@ec dist]$ bin/hive Hive history file=/tmp/edward/hive_job_log_edward_201006172223_1189860304.txt [edw...@ec dist]$ pwd /mnt/data/hive/hive/build/dist [edw...@ec dist]$ bin/hive Hive history file=/tmp/edward/hive_job_log_edward_201006172223_310534855.txt hive ! /mnt/data/hive/hive/build/dist/a.sql; /mnt/data/hive/hive/build/dist/a.sql: line 1: show: command not found Command failed with exit code = 127 {noformat} ! seems to execute bash commands Dont we want to execute hive commands inside hive like add jar Implement a .hiverc startup file Key: HIVE-1405 URL: https://issues.apache.org/jira/browse/HIVE-1405 Project: Hadoop Hive Issue Type: New Feature Reporter: Jonathan Chang Assignee: John Sichi When deploying hive, it would be nice to have a .hiverc file containing statements that would be automatically run whenever hive is launched. This way, we can automatically add JARs, create temporary functions, set flags, etc. for all users quickly. This should ideally be set up like .bashrc and the like with a global version and a user-local version. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1135) Use Anakia for version controlled documentation
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1135: -- Attachment: wtf.png Cool on adding the logoBut something went wrong here. Unless I applied the patch from the left table looks wrong now. check the screen shot. Use Anakia for version controlled documentation --- Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1135: -- Attachment: hive-1135-3-patch.txt Fixed all items. Move hive language manual and tutorial to version control - Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1135: -- Attachment: jdom-1.1.LICENSE Move hive language manual and tutorial to version control - Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1135: -- Attachment: hive-1335-2.patch.txt Edited the build.xml a bit to deal with some incorrect paths. Also added the HiveDataDefinitionStatements wiki page for reference. Is everyone ok with using anakia and this structure. I would like to get this cleaned up with the current docs in place. Then I will do some follow up tickets and add some wiki pages, let me know if everyone is happy with the overall xdocs-docs process. Move hive language manual and all wiki based documentation to forest Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1135) Move hive language manual and all wiki based documentation to forest
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878816#action_12878816 ] Edward Capriolo commented on HIVE-1135: --- FMI. What is the hbase review board? Stash this stuff under docs/ instead of creating another top level directory (xdocs/) It seems like xdocs is convention, also did not want to step on whatever is in docs. I will see if they both can live in docs happily. JDOM ivy. Right now most/all the stuff in /lib comes from ivy. We should open up a another ticket and convert the entire project to ivy. velocity.log Yes my local version (next patch already fixes that) * Limit the initial import to the convents of the Hive Language Manual. I think some things should actually stay on the wiki, but the language manual is definitely one of those things that we want to have in VCS. I agree the initial import should come from the Hive Language Manual only. To me wiki just screems, I did not have time to write a full complete doc. Generalization coming: 99% of the things in the wiki should be in xdocs. Users only want one place for authoritative information. Wikis and xdoc will fall out of sync, confusion follows. Move hive language manual and all wiki based documentation to forest Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1135) Move hive language manual and all wiki based documentation to forest
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878818#action_12878818 ] Edward Capriolo commented on HIVE-1135: --- JDOM ivy. Right now most/all the stuff in /lib DOES NOT comes from ivy. We should open up a another ticket and convert the entire project to ivy. Move hive language manual and all wiki based documentation to forest Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1135: -- Attachment: jdom-1.1.jar Move hive language manual and all wiki based documentation to forest Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: jdom-1.1.jar Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1135: -- Attachment: hive-1335-1.patch.txt Patch, docs command beings anakia from xdocs directory. Move hive language manual and all wiki based documentation to forest Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1335-1.patch.txt, jdom-1.1.jar Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest
[ https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1135: -- Fix Version/s: 0.6.0 Affects Version/s: 0.5.0 Move hive language manual and all wiki based documentation to forest Key: HIVE-1135 URL: https://issues.apache.org/jira/browse/HIVE-1135 Project: Hadoop Hive Issue Type: Task Components: Documentation Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: hive-1335-1.patch.txt, jdom-1.1.jar Currently the Hive Language Manual and many other critical pieces of documentation are on the Hive wiki. Right now we count on the author of a patch to follow up and add wiki entries. While we do a decent job with this, new features can be missed. Or using running older/newer branches can not locate relevant documentation for their branch. ..example of a perception I do not think we want to give off... http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy We should generate our documentation in the way hadoop hbase does, inline using forest. I would like to take the lead on this, but we need a lot of consensus on doing this properly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1401) Web Interface can ony browse default
Web Interface can ony browse default Key: HIVE-1401 URL: https://issues.apache.org/jira/browse/HIVE-1401 Project: Hadoop Hive Issue Type: New Feature Components: Web UI Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1401) Web Interface can ony browse default
[ https://issues.apache.org/jira/browse/HIVE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1401: -- Attachment: HIVE-1401-1-patch.txt Web Interface can ony browse default Key: HIVE-1401 URL: https://issues.apache.org/jira/browse/HIVE-1401 Project: Hadoop Hive Issue Type: New Feature Components: Web UI Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: HIVE-1401-1-patch.txt -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column
[ https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877220#action_12877220 ] Edward Capriolo commented on HIVE-1397: --- Looks great. Can not wait. histogram() UDAF for a numerical column --- Key: HIVE-1397 URL: https://issues.apache.org/jira/browse/HIVE-1397 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.6.0 Reporter: Mayank Lahiri Assignee: Mayank Lahiri Fix For: 0.6.0 A histogram() UDAF to generate an approximate histogram of a numerical (byte, short, double, long, etc.) column. The result is returned as a map of (x,y) histogram pairs, and can be plotted in Gnuplot using impulses (for example). The algorithm is currently adapted from A streaming parallel decision tree algorithm by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space proportional to the number of histogram bins specified. It has no approximation guarantees, but seems to work well when there is a lot of data and a large number (e.g. 50-100) of histogram bins specified. A typical call might be: SELECT histogram(val, 10) FROM some_table; where the result would be a histogram with 10 bins, returned as a Hive map object. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath
[ https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876250#action_12876250 ] Edward Capriolo commented on HIVE-1373: --- {quote} 1 copy is anyway done from lib to dist/lib for these jars. If we go directly to ivy we would copy things from the ivy cache to dist/lib. So the number of copies in the build process would remain the same, no? There is of course the first time overhead of downloading these jars from their repos to the ivy cache. {quote} I follow what you are thinking. Currently the code I did takes specifc jars from metastore ivy dowloads. We could probably have ivy download directly to build/lib. I just think we should watch to make sure many unneeded jars do not appear. Missing connection pool plugin in Eclipse classpath --- Key: HIVE-1373 URL: https://issues.apache.org/jira/browse/HIVE-1373 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Environment: Eclipse, Linux Reporter: Vinithra Varadharajan Assignee: Vinithra Varadharajan Attachments: HIVE-1373.patch In a recent checkin, connection pool dependency was introduced but eclipse .classpath file was not updated. This causes launch configurations from within Eclipse to fail. {code} hive show tables; show tables; 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 10/05/26 14:59:46 INFO ql.Driver: query plan = file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException org.apache.hadoop.hive.ql.metadata.HiveException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472) at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303) Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) at
[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()
[ https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875909#action_12875909 ] Edward Capriolo commented on HIVE-1369: --- This seems interesting. In some cases toString() output can change between java versions for some objects. Do we need to compensate for that. LazySimpleSerDe should be able to read classes that support some form of toString() --- Key: HIVE-1369 URL: https://issues.apache.org/jira/browse/HIVE-1369 Project: Hadoop Hive Issue Type: Improvement Reporter: Alex Kozlov Assignee: Alex Kozlov Priority: Minor Attachments: HIVE-1369.patch Original Estimate: 2h Remaining Estimate: 2h Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text objects. It should be pretty easy to extend the class to read any object that implements toString() method. Ideas or concerns? Alex K -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1265) Function Registry should should auto-detect UDFs from UDF Description
[ https://issues.apache.org/jira/browse/HIVE-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872996#action_12872996 ] Edward Capriolo commented on HIVE-1265: --- {noformat} public static ListClass getClassesForPackage(String packageName, Class classType){ +ListClass matchingClasses = new ArrayListClass(); +File directory = null; +System.out.println(packageName.replace('.', File.separatorChar)); +URL u = Thread.currentThread().getContextClassLoader() +//URL u = new Object().getClass().c +.getResource(packageName.replace('.', File.separatorChar)); {noformat} It seems like this section of code only picks up classes in ql/test/org.apache.hadoop.hive.ql.udf. This must have something to do with classloaders/threads/ and getResource(). It seems like getResource is unaware that two folders could be responsible for the same resource. Or I have to find a better way to do this. Function Registry should should auto-detect UDFs from UDF Description -- Key: HIVE-1265 URL: https://issues.apache.org/jira/browse/HIVE-1265 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: hive-1265-patch.diff We should be able to register functions dynamically. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
[ https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872244#action_12872244 ] Edward Capriolo commented on HIVE-802: -- I just did a patch that adds connection pooling to DataNucleas. (Sorry that I jumped ahead of you). It should be easy to update now just bump the versions in metastore/ivy.xml. Please make sure the version you pick works with the connect pooling libs, as ivy fetches versions and dependants that do not work well together. Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it - Key: HIVE-802 URL: https://issues.apache.org/jira/browse/HIVE-802 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Todd Lipcon Assignee: Arvind Prabhakar There's a bug in DataNucleus that causes this issue: http://www.jpox.org/servlet/jira/browse/NUCCORE-371 To reproduce, simply put your hive source tree in a directory that contains a '+' character. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath
[ https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872337#action_12872337 ] Edward Capriolo commented on HIVE-1373: --- I was thinking to move everything that came from ivy to build lib. I see the benefit, but I saw this technique adding more copies and moves into the ant process. Try different approaches. I found none of them were better then the next. All involved doing more work hear and less there, or changing this classpath insteasd of putting a file into X folder. I was kinda confused on the best way to handle that. I would be interested to swee what you come up with. Missing connection pool plugin in Eclipse classpath --- Key: HIVE-1373 URL: https://issues.apache.org/jira/browse/HIVE-1373 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Environment: Eclipse, Linux Reporter: Vinithra Varadharajan Assignee: Vinithra Varadharajan Priority: Minor Attachments: HIVE-1373.patch In a recent checkin, connection pool dependency was introduced but eclipse .classpath file was not updated. This causes launch configurations from within Eclipse to fail. {code} hive show tables; show tables; 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null) 10/05/26 14:59:46 INFO ql.Driver: query plan = file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException org.apache.hadoop.hive.ql.metadata.HiveException: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491) at org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472) at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458) at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504) at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303) Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395) at org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547) at org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159) at
[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870477#action_12870477 ] Edward Capriolo commented on HIVE-1335: --- Can we go +1? DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335-1.patch.txt, hive-1335-2.patch.txt, hive-1335-3.patch.txt, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-471) A UDF for simple reflection
[ https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-471: - Fix Version/s: 0.6.0 Affects Version/s: 0.5.1 (was: 0.6.0) Should be good for trunk. A UDF for simple reflection --- Key: HIVE-471 URL: https://issues.apache.org/jira/browse/HIVE-471 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Affects Versions: 0.5.1 Reporter: Edward Capriolo Assignee: Edward Capriolo Priority: Minor Fix For: 0.6.0 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, HIVE-471.3.patch, hive-471.diff There are many methods in java that are static and have no arguments or can be invoked with one simple parameter. More complicated functions will require a UDF but one generic one can work as a poor-mans UDF. {noformat} SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, isEmpty) FROM src LIMIT 1; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1096) Hive Variables
[ https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870480#action_12870480 ] Edward Capriolo commented on HIVE-1096: --- I am back on this one. Keep your eye out for the next patch. Hive Variables -- Key: HIVE-1096 URL: https://issues.apache.org/jira/browse/HIVE-1096 Project: Hadoop Hive Issue Type: New Feature Reporter: Edward Capriolo Assignee: Edward Capriolo Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff From mailing list: --Amazon Elastic MapReduce version of Hive seems to have a nice feature called Variables. Basically you can define a variable via command-line while invoking hive with -d DT=2009-12-09 and then refer to the variable via ${DT} within the hive queries. This could be extremely useful. I can't seem to find this feature even on trunk. Is this feature currently anywhere in the roadmap?-- This could be implemented in many places. A simple place to put this is in Driver.compile or Driver.run we can do string substitutions at that level, and further downstream need not be effected. There could be some benefits to doing this further downstream, parser,plan. but based on the simple needs we may not need to overthink this. I will get started on implementing in compile unless someone wants to discuss this more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1351) Tool to cat rcfiles
[ https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869284#action_12869284 ] Edward Capriolo commented on HIVE-1351: --- As ning mentioned why move the cli code. If anything more of the code should be moving up into the main script rather then into smaller scripts. I see people making changes to only the cli. We have to make sure that fixes for things like cygwin get propogated to all files, or shared code gets shared. Also rcfilecat is just a debug util, but it should have a unit test right? Just cat to files to make sure it works? Tool to cat rcfiles --- Key: HIVE-1351 URL: https://issues.apache.org/jira/browse/HIVE-1351 Project: Hadoop Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Attachments: hive.1351.1.patch, hive.1351.2.patch It will be useful for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1351) Tool to cat rcfiles
[ https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869344#action_12869344 ] Edward Capriolo commented on HIVE-1351: --- This is so notpicky, but {noformat} +--rcfilecat) + SERVICE=rcfilecat + shift + ;; {noformat} I do not think we should do this. We are just giving alternate invocations that end up being more confusing. Why should you be able to do this: {noformat} hive --rcfilecat {noformat} but not {noformat} hive --hwi {noformat} ? as for execHiveCmd. If you want to share this why not move it up into bin/hive? We do not need to add a file to shared when subs specified in in bin/hive are already shared. Tool to cat rcfiles --- Key: HIVE-1351 URL: https://issues.apache.org/jira/browse/HIVE-1351 Project: Hadoop Hive Issue Type: New Feature Reporter: Namit Jain Assignee: He Yongqiang Fix For: 0.6.0 Attachments: hive.1351.1.patch, hive.1351.2.patch It will be useful for debugging -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Attachment: hive-1335-3.patch.txt DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335-1.patch.txt, hive-1335-2.patch.txt, hive-1335-3.patch.txt, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Status: Patch Available (was: Open) Affects Version/s: 0.5.0 DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335-1.patch.txt, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Attachment: hive-1335-1.patch.txt DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335-1.patch.txt, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867538#action_12867538 ] Edward Capriolo commented on HIVE-1335: --- Just a strange side-note. Why is the classpath specified in both build.xml and build-common.xml. Do we need it defined in both places? DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.5.0 Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335-1.patch.txt, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864564#action_12864564 ] Edward Capriolo commented on HIVE-1335: --- {noformat} [ivy:resolve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ :: [ivy:resolve] :: loading settings :: file = /mnt/data/hive/hive/ivy/ivysettings.xml [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] WARNINGS [ivy:resolve] module not found: proxool#proxool;0.9.0RC3 [ivy:resolve] hadoop-source: tried [ivy:resolve] -- artifact proxool#proxool;0.9.0RC3!proxool.jar: [ivy:resolve] http://mirror.facebook.net/facebook/hive-deps/hadoop/core/proxool-0.9.0RC3/proxool-0.9.0RC3.jar [ivy:resolve] apache-snapshot: tried [ivy:resolve] https://repository.apache.org/content/repositories/snapshots/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.pom [ivy:resolve] -- artifact proxool#proxool;0.9.0RC3!proxool.jar: [ivy:resolve] https://repository.apache.org/content/repositories/snapshots/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.jar [ivy:resolve] maven2: tried [ivy:resolve] http://repo1.maven.org/maven2/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.pom [ivy:resolve] -- artifact proxool#proxool;0.9.0RC3!proxool.jar: [ivy:resolve] http://repo1.maven.org/maven2/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.jar [ivy:resolve] :: [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: [ivy:resolve] :: [ivy:resolve] :: proxool#proxool;0.9.0RC3: not found [ivy:resolve] :: [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS {noformat} Lol wonderful. After ivy decides to go after a bunch of things I do not really need, and of course one of them fails. The future is here. DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864622#action_12864622 ] Edward Capriolo commented on HIVE-1335: --- No joy. {noformat} +dependency org=commons-dbcp name=commons-dbcp rev=1.2.2/ +dependency org=commons-pool name=commons-pool rev=1.2/ +dependency org=org.datanucleus name=datanucleus-connectionpool rev=1.0.2 + exclude module=proxool / + exclude module=c3p0 / +/dependency {noformat} Unfortunately the datanucleus-connectpool refuses to honor my request for commons-pool 1.2 and instead fetches 1.3, which ,you guest it, does not work. I am going to submit the original, figuring out what ivy is trying to do is taking too long. DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Status: Patch Available (was: Open) DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Attachment: hive-1335.patch.txt commons-dbcp-1.2.2.jar commons-pool-1.2.jar DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-pool-1.2.jar, datanucleus-connectionpool-1.0.2.jar, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Attachment: datanucleus-connectionpool-1.0.2.jar DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-pool-1.2.jar, datanucleus-connectionpool-1.0.2.jar, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Status: Patch Available (was: Open) The jars should be placed in hive-home/lib. Patch should be applied to trunk. No unit test needed, as long as existing unit tests continue to function all is well. DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-pool-1.2.jar, datanucleus-connectionpool-1.0.2.jar, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling
[ https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo updated HIVE-1335: -- Attachment: commons-dbcp.LICENSE commons-pool.LICENSE datanucleus-connectionpool.LICENSE DataNucleus should use connection pooling - Key: HIVE-1335 URL: https://issues.apache.org/jira/browse/HIVE-1335 Project: Hadoop Hive Issue Type: Improvement Reporter: Edward Capriolo Assignee: Edward Capriolo Fix For: 0.6.0 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, commons-pool-1.2.jar, commons-pool.LICENSE, datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, hive-1335.patch.txt Currently each Data Nucleus operation disconnects and reconnects to the MetaStore over jdbc. Queries fail to even explain properly in cases where a table has many partitions. This is fixed by enabling one parameter and including several jars. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-610) move all properties from jpox.properties to hive-site.xml
[ https://issues.apache.org/jira/browse/HIVE-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862797#action_12862797 ] Edward Capriolo commented on HIVE-610: -- Does this mean the jpox.properties is now ignored ? If so how to we set other JPOX variables? move all properties from jpox.properties to hive-site.xml -- Key: HIVE-610 URL: https://issues.apache.org/jira/browse/HIVE-610 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Affects Versions: 0.4.0 Reporter: Prasad Chakka Assignee: Prasad Chakka Fix For: 0.4.0 Attachments: hive-610.patch there some properties in jpox.properties and some in hive-site.xml. move all to the later file. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-1333) javax.jdo.option.NonTransactionalRead ignored?
javax.jdo.option.NonTransactionalRead ignored? -- Key: HIVE-1333 URL: https://issues.apache.org/jira/browse/HIVE-1333 Project: Hadoop Hive Issue Type: Bug Components: Metastore, Query Processor Reporter: Edward Capriolo {noformat} property namejavax.jdo.option.NonTransactionalRead/name valuetrue/value descriptionreads outside of transactions/description /property {noformat} hive show tables {noformat} 100430 14:41:39 1874 Connect hiv...@localhost on 1874 Init DB m6_ 1874 Query SHOW SESSION VARIABLES 1874 Query SHOW COLLATION 1874 Query SET character_set_results = NULL 1874 Query SET autocommit=1 1874 Query SET sql_mode='STRICT_TRANS_TABLES' 1874 Query SET autocommit=0 1874 Query SELECT @@session.tx_isolation 1874 Query SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED 1874 Query SELECT `THIS`.`TBL_NAME` FROM `TBLS` `THIS` LEFT OUTER JOIN `DBS` `THIS_DATABASE_NAME` ON `THIS`.`DB_ID` = `THIS_DATABASE_NAME`.`DB_ID` WHERE `THIS_DATABASE_NAME`.`NAME` = 'default' AND (LOWER(`THIS`.`TBL_NAME`) LIKE '_%' ESCAPE '\\' ) 1874 Query commit 1874 Query rollback 1874 Quit {noformat} now set to false {noformat} 100430 14:46:59 1889 Connect hiv...@localhost on 1889 Init DB m6_rshive 1889 Query SHOW SESSION VARIABLES 1889 Query SHOW COLLATION 1889 Query SET character_set_results = NULL 1889 Query SET autocommit=1 1889 Query SET sql_mode='STRICT_TRANS_TABLES' 1889 Query SET autocommit=0 1889 Query SELECT @@session.tx_isolation 1889 Query SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED 1889 Query SELECT `THIS`.`TBL_NAME` FROM `TBLS` `THIS` LEFT OUTER JOIN `DBS` `THIS_DATABASE_NAME` ON `THIS`.`DB_ID` = `THIS_DATABASE_NAME`.`DB_ID` WHERE `THIS_DATABASE_NAME`.`NAME` = 'default' AND (LOWER(`THIS`.`TBL_NAME`) LIKE '_%' ESCAPE '\\' ) 1889 Query commit 1889 Query rollback 1889 Quit {noformat} Unless I misuderstand the property it looks like the reads are still inside a transaction. Also why does this transaction call commit as well as rollback? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *
[ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862217#action_12862217 ] Edward Capriolo commented on HIVE-1328: --- I find external partitions to be pretty badly broken now. I am circling around one or two other bugs in them, that I am about to report. Users (including myself) are frustrated beause rather then working with data they have to work around bugs like HIVE-1318. I understand everyone has their own priorities. Call it what you will (inconsistancy/feature) we are adding to the capability of external tables while current features do not even work well. In particular HIVE-1318 is brutal. When working with my data I can make no assumptions when querying. I have to do all types of shell scripting to ensure that partitions exist before I query them, adding extra where clauses to carefully select ranges of partitions. If you are using external partitions at facebook, I wonder how you work around HIVE-1318, and I am also curious if you experience HIVE-1303 or is this just something in my environment. The handfull of users I have constantly have issues, does everyone there just 'suck it up'? make mapred.input.dir.recursive work for select * - Key: HIVE-1328 URL: https://issues.apache.org/jira/browse/HIVE-1328 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none. create table fact_daily(x int) partitioned by (ds string); create table fact_tz(x int) partitioned by (ds string, hr string, gmtoffset string); alter table fact_tz add partition (ds='2010-01-03', hr='1', gmtoffset='-8'); insert overwrite table fact_tz partition (ds='2010-01-03', hr='1', gmtoffset='-8') select key+11 from src where key=484; alter table fact_tz add partition (ds='2010-01-03', hr='2', gmtoffset='-7'); insert overwrite table fact_tz partition (ds='2010-01-03', hr='2', gmtoffset='-7') select key+12 from src where key=484; alter table fact_daily set tblproperties('EXTERNAL'='TRUE'); alter table fact_daily add partition (ds='2010-01-03') location '/user/hive/warehouse/fact_tz/ds=2010-01-03'; set mapred.input.dir.recursive=true; select * from fact_daily where ds='2010-01-03'; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *
[ https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862074#action_12862074 ] Edward Capriolo commented on HIVE-1328: --- Can we look at HIVE-1318 and maybe HIVE-1303 first. Already the external partitions seem to have bugs can we get them working properly before more features are added? make mapred.input.dir.recursive work for select * - Key: HIVE-1328 URL: https://issues.apache.org/jira/browse/HIVE-1328 Project: Hadoop Hive Issue Type: Improvement Components: Query Processor Affects Versions: 0.6.0 Reporter: John Sichi Assignee: John Sichi Fix For: 0.6.0 For the script below, we would like the behavior from MAPREDUCE-1501 to apply so that the select * returns two rows instead of none. create table fact_daily(x int) partitioned by (ds string); create table fact_tz(x int) partitioned by (ds string, hr string, gmtoffset string); alter table fact_tz add partition (ds='2010-01-03', hr='1', gmtoffset='-8'); insert overwrite table fact_tz partition (ds='2010-01-03', hr='1', gmtoffset='-8') select key+11 from src where key=484; alter table fact_tz add partition (ds='2010-01-03', hr='2', gmtoffset='-7'); insert overwrite table fact_tz partition (ds='2010-01-03', hr='2', gmtoffset='-7') select key+12 from src where key=484; alter table fact_daily set tblproperties('EXTERNAL'='TRUE'); alter table fact_daily add partition (ds='2010-01-03') location '/user/hive/warehouse/fact_tz/ds=2010-01-03'; set mapred.input.dir.recursive=true; select * from fact_daily where ds='2010-01-03'; -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-377) Some ANT jars should be included into hive
[ https://issues.apache.org/jira/browse/HIVE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Edward Capriolo resolved HIVE-377. -- Resolution: Won't Fix With newer releases of Hadoop Hive Jetty and Ant are packaged differently and this is no longer and issue. Some ANT jars should be included into hive -- Key: HIVE-377 URL: https://issues.apache.org/jira/browse/HIVE-377 Project: Hadoop Hive Issue Type: Improvement Components: Web UI Affects Versions: 0.3.0, 0.6.0 Reporter: Edward Capriolo Fix For: 0.4.2 The WEB UI requires HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/ant/lib/ant.jar HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/ant/lib/ant-launcher.jar Right now the start script does this. {noformat} #hwi requires ant jars # if [ $ANT_LIB = ] ; then # ANT_LIB=/opt/ant/libs # fi # for f in ${ANT_LIB}/*.jar; do # if [[ ! -f $f ]]; then # continue; # fi # HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f # done {noformat} Can we add these jars? This will add 1.4 MB to the hive. If we do not want to add these I would like to make the startup script fail if the environment variable is not correct. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files
[ https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860929#action_12860929 ] Edward Capriolo commented on HIVE-1326: --- I am +1 on the concept. Many times /tmp can be a ramdisk or mounted with some size restrictions. This type of bug an be very painful to track down when it happens. RowContainer uses hard-coded '/tmp/' path for temporary files - Key: HIVE-1326 URL: https://issues.apache.org/jira/browse/HIVE-1326 Project: Hadoop Hive Issue Type: Bug Environment: Hadoop 0.19.2 with Hive trunk. We're using FreeBSD 7.0, but that doesn't seem relevant. Reporter: Michael Klatt Attachments: rowcontainer.patch In our production hadoop environment, the /tmp/ is actually pretty small, and we encountered a problem when a query used the RowContainer class and filled up the /tmp/ partition. I tracked down the cause to the RowContainer class putting temporary files in the '/tmp/' path instead of using the configured Hadoop temporary path. I've attached a patch to fix this. Here's the traceback: 2010-04-25 12:05:05,120 INFO org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 rows: used memory = 385520312 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 rows: used memory = 341780472 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 rows: used memory = 301446768 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 rows: used memory = 399208768 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 rows: used memory = 364507216 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 rows: used memory = 332907280 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 rows: used memory = 298774096 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 rows: used memory = 396505408 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 rows: used memory = 362477288 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 rows: used memory = 327229744 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 rows: used memory = 296051904 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346) at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132) at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121) at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112) at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49) at java.io.DataOutputStream.write(DataOutputStream.java:90) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013) at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977) at org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343) at org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163) at org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118) at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456) at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436) at org.apache.hadoop.mapred.Child.main(Child.java:158) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:260) at
[jira] Created: (HIVE-1318) External Tables: Selecting a partition that does not exist produces errors
External Tables: Selecting a partition that does not exist produces errors -- Key: HIVE-1318 URL: https://issues.apache.org/jira/browse/HIVE-1318 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.5.0 Reporter: Edward Capriolo Attachments: partdoom.q {noformat} dfs -mkdir /tmp/a; dfs -mkdir /tmp/a/b; dfs -mkdir /tmp/a/c; create external table abc( key string, val string ) partitioned by (part int) location '/tmp/a/'; alter table abc ADD PARTITION (part=1) LOCATION 'b'; alter table abc ADD PARTITION (part=2) LOCATION 'c'; select key from abc where part=1; select key from abct where part=70; {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.