[jira] Commented: (HIVE-1611) Add alternative search-provider to Hive site

2010-10-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12917016#action_12917016
 ] 

Edward Capriolo commented on HIVE-1611:
---

 Now that hive is TLP we likely have to get the ball rolling and cut the cord 
with hadoop. I will contact infra and see what our options are. We have a few 
issues. 
-we need to move the SVN from a hadoop subproject to a toplevel svn. 
-after we do that we need to take the forest docs and move them into hive then 
can change the search box

If we want to see the skinconf change done first, we should open/transfer this 
ticket to core I believe.


 Add alternative search-provider to Hive site
 

 Key: HIVE-1611
 URL: https://issues.apache.org/jira/browse/HIVE-1611
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Alex Baranau
Assignee: Alex Baranau
Priority: Minor
 Attachments: HIVE-1611.patch


 Use search-hadoop.com service to make available search in Hive sources, MLs, 
 wiki, etc.
 This was initially proposed on user mailing list. The search service was 
 already added in site's skin (common for all Hadoop related projects) before 
 so this issue is about enabling it for Hive. The ultimate goal is to use it 
 at all Hadoop's sub-projects' sites.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914584#action_12914584
 ] 

Edward Capriolo commented on HIVE-1668:
---

Jeff,
I disagree. The build and test errors are not insurmountable. In fact some if 
not most of the ERRORS were cascading changes that were not tested properly. 
For example:

https://issues.apache.org/jira/browse/HIVE-1183 was a fix I had to do because 
someone broke it here. https://issues.apache.org/jira/browse/HIVE-978 because 
someone wanted all jars to be named whatever.${version} and did not bother to 
look across all the shell script files that startup hive. 

https://issues.apache.org/jira/browse/HIVE-1294 again someone changed some 
shell scripts and only tested the cli.

https://issues.apache.org/jira/browse/HIVE-752 again someone broke hwi without 
testing it.

https://issues.apache.org/jira/browse/HIVE-1615, not really anyone's fault but 
no API stability across hive. I do not see why one method went away and another 
similar method took its place.

I have been of course talking about moving HWI to wikit for a while moving from 
JSP to Servlet/ Java code will fix errors, but the little time I do have I 
usually have to spend detecting and cleaning up other breakages.

HUE and Beeswax I honestly do not know, but sounds like you need extra magical 
stuff to make this work, and HWI works with hive on its own (onless people 
break it)

 Move HWI out to Github
 --

 Key: HIVE-1668
 URL: https://issues.apache.org/jira/browse/HIVE-1668
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Web UI
Reporter: Jeff Hammerbacher

 I have seen HWI cause a number of build and test errors, and it's now going 
 to cost us some extra work for integration with security. We've worked on 
 hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
 Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
 with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914605#action_12914605
 ] 

Edward Capriolo commented on HIVE-1668:
---

Plus, not to get too far off topic, but there is a huge portion of the hadoop 
community that thinks Security? So what? Who cares? I am not going to run 
Active Directory or Kerberos just so I can say My hadoop is is secure . It 
adds latency to many processes, complexity to the overall design of hadoop, and 
does not even encrypt data in transit. Many people are going to elect not to 
use hadoop security for those reasons. Is extra work a reason not to do 
something? Are we going to move the Hive Thrift server out to github too 
because of the burden of extra work? It is a lot of extra work for me when 
hadoop renames all its jmx counters or tells me all my code is deprecated 
because of our new slick mapreduce.* api. I have learned to roll with the 
punches.

 Move HWI out to Github
 --

 Key: HIVE-1668
 URL: https://issues.apache.org/jira/browse/HIVE-1668
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Web UI
Reporter: Jeff Hammerbacher

 I have seen HWI cause a number of build and test errors, and it's now going 
 to cost us some extra work for integration with security. We've worked on 
 hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
 Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
 with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914669#action_12914669
 ] 

Edward Capriolo commented on HIVE-1668:
---

{quote}That's not a great argument for keeping code that's onerous to maintain 
in trunk.{quote}
Its not onerous to maintain. As you can see from the tickets I pointed out it 
broke because it was not tested. 

For example, 
https://issues.apache.org/jira/browse/HIVE-752 when designing SHIM classes that 
specify a classname in a string, one has to make sure they get the class name 
correct. I know it was an over site, but I am sure someone fired up the CLI and 
made sure the class name was correct.

As for https://issues.apache.org/jira/browse/HIVE-978, I specifically mentioned 
how to test this any why it should be tested in the patch and it still turned 
out not to work right. 

pragmatic is the perfect word. HWI was never made to be fancy. Anyone who has 
hive can build and run the web interface. With no extra dependencies. It looks 
like to use Beeswax you need Hue, which means you need to go somewhere else and 
get it and install it. It seems like you need to patch or load extra plugins to 
your namenode and datanode like org.apache.hadoop.thriftfs.NamenodePlugin, It 
looks like (http://archive.cloudera.com/cdh/3/hue/manual.html#_install_hue) you 
need: 
gcc  gcc
libxml2-devel   libxml2-dev
libxslt-devel   libxslt-dev
mysql-devel librarysqlclient-dev
python-develpython-dev
python-setuptools   python-setuptools
sqlite-devellibsqlite3-dev 

The pragmatic approach, is to use the web interface provided by hive. You do 
not need anything external like python, or have to make any changes to their 
environment. That is why I think we should stay part of the hive distribution. 
 
I'm -1 on taking it out.  

 Move HWI out to Github
 --

 Key: HIVE-1668
 URL: https://issues.apache.org/jira/browse/HIVE-1668
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Web UI
Reporter: Jeff Hammerbacher

 I have seen HWI cause a number of build and test errors, and it's now going 
 to cost us some extra work for integration with security. We've worked on 
 hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
 Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
 with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1668) Move HWI out to Github

2010-09-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914741#action_12914741
 ] 

Edward Capriolo commented on HIVE-1668:
---

{quote}It should also help mature the product for eventual inclusion in 
trunk.{quote}
Why would we move something from hive out to github, just to move it back to 
hive?

{quote}Empirically, they don't. The value of the web interface to users is not 
nearly as high as the pain it causes the developers for maintenance.{quote}
Who are these developers who maintain it? Has anyone every added a feature 
beside me? I'm not complaining.

http://blog.milford.io/2010/06/getting-the-hive-web-interface-hwi-to-work-on-centos/
{quote}The Hive Web Interface is a pretty sweet deal.{quote} 
Sounds like people like it. 

Why are we debating the past state of hwi? It works now. If someone reports a 
bug I typically investigate and patch that same day.

I challenge anyone to open a ticket on core user, called remove name node web 
interface to github and tried to say  now offers a better name node 
interface using python. The ticket would instantly get a RESOLVED: WILL NOT 
FIX.  Why is this any different? 










 Move HWI out to Github
 --

 Key: HIVE-1668
 URL: https://issues.apache.org/jira/browse/HIVE-1668
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Web UI
Reporter: Jeff Hammerbacher

 I have seen HWI cause a number of build and test errors, and it's now going 
 to cost us some extra work for integration with security. We've worked on 
 hundreds of clusters at Cloudera and I've never seen anyone use HWI. With the 
 Beeswax UI available in Hue, it's unlikely that anyone would prefer to stick 
 with HWI. I think it's time to move it out to Github.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913741#action_12913741
 ] 

Edward Capriolo commented on HIVE-842:
--

By attack the Web UI separately what is meant? Will it be broken or 
non-functional at any phase here? That is what I find happens often, some of it 
is really the WUI's fault for using JSP and not servlets, but there is no 
simple way to code cover the wui and all the different ways its gets broken. 

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-268) Insert Overwrite Directory to accept configurable table row format

2010-09-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910656#action_12910656
 ] 

Edward Capriolo commented on HIVE-268:
--

Still not exactly what you want but with CTAS you can essentially get a folder 
in /user/hive/warehouse/tableIWant with the format you want.

 Insert Overwrite Directory to accept configurable table row format
 

 Key: HIVE-268
 URL: https://issues.apache.org/jira/browse/HIVE-268
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Zheng Shao
Assignee: Paul Yang

 There is no way for the users to control the file format when they are 
 outputting the result into a directory.
 We should allow:
 {code}
 INSERT OVERWRITE DIRECTORY /user/zshao/result
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY '9'
 SELECT tablea.* from tablea;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Fix Version/s: 0.7.0
   (was: 0.6.0)
Affects Version/s: 0.6.0
   (was: 0.5.1)

 Web Interface JSP needs Refactoring for removed meta store methods
 --

 Key: HIVE-1615
 URL: https://issues.apache.org/jira/browse/HIVE-1615
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Blocker
 Fix For: 0.7.0

 Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt


 Some meta store methods being called from JSP have been removed. Really 
 should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-07 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1613:
--

Fix Version/s: 0.6.0
   (was: 0.7.0)
Affects Version/s: 0.5.1
   (was: 0.6.0)
 Priority: Blocker  (was: Major)

I think we should patch this as well functionality was broken.

 hive --service jar looks for hadoop version but was not defined
 ---

 Key: HIVE-1613
 URL: https://issues.apache.org/jira/browse/HIVE-1613
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.5.1
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Blocker
 Fix For: 0.6.0

 Attachments: hive-1613.patch.txt


 hive --service jar fails. I have to open another ticket to clean up the 
 scripts and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-03 Thread Edward Capriolo (JIRA)
hive --service jar looks for hadoop version but was not defined
---

 Key: HIVE-1613
 URL: https://issues.apache.org/jira/browse/HIVE-1613
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0


hive --service jar fails. I have to open another ticket to clean up the scripts 
and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1613:
--

Status: Patch Available  (was: Open)

 hive --service jar looks for hadoop version but was not defined
 ---

 Key: HIVE-1613
 URL: https://issues.apache.org/jira/browse/HIVE-1613
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: hive-1613.patch.txt


 hive --service jar fails. I have to open another ticket to clean up the 
 scripts and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1613) hive --service jar looks for hadoop version but was not defined

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1613:
--

Attachment: hive-1613.patch.txt

 hive --service jar looks for hadoop version but was not defined
 ---

 Key: HIVE-1613
 URL: https://issues.apache.org/jira/browse/HIVE-1613
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: hive-1613.patch.txt


 hive --service jar fails. I have to open another ticket to clean up the 
 scripts and unify functions like version detection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for deprecated meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Attachment: hive-1615.patch.txt

 Web Interface JSP needs Refactoring for deprecated meta store methods
 -

 Key: HIVE-1615
 URL: https://issues.apache.org/jira/browse/HIVE-1615
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1615.patch.txt


 Some meta store methods being called from JSP have been removed. Really 
 should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Summary: Web Interface JSP needs Refactoring for removed meta store methods 
 (was: Web Interface JSP needs Refactoring for deprecated meta store methods)

 Web Interface JSP needs Refactoring for removed meta store methods
 --

 Key: HIVE-1615
 URL: https://issues.apache.org/jira/browse/HIVE-1615
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1615.patch.txt


 Some meta store methods being called from JSP have been removed. Really 
 should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1615) Web Interface JSP needs Refactoring for removed meta store methods

2010-09-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1615:
--

Attachment: hive-1615.patch.2.txt

 Web Interface JSP needs Refactoring for removed meta store methods
 --

 Key: HIVE-1615
 URL: https://issues.apache.org/jira/browse/HIVE-1615
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: hive-1615.patch.2.txt, hive-1615.patch.txt


 Some meta store methods being called from JSP have been removed. Really 
 should prioritize compiling jsp into servlet code again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Attachment: HIVE-471.6.patch.txt

 A UDF for simple reflection
 ---

 Key: HIVE-471
 URL: https://issues.apache.org/jira/browse/HIVE-471
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.7.0

 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
 HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, 
 hive-471.diff


 There are many methods in java that are static and have no arguments or can 
 be invoked with one simple parameter. More complicated functions will require 
 a UDF but one generic one can work as a poor-mans UDF.
 {noformat}
 SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, 
 isEmpty)
 FROM src LIMIT 1;
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Status: Patch Available  (was: Open)

 A UDF for simple reflection
 ---

 Key: HIVE-471
 URL: https://issues.apache.org/jira/browse/HIVE-471
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.7.0

 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
 HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, HIVE-471.6.patch.txt, 
 hive-471.diff


 There are many methods in java that are static and have no arguments or can 
 be invoked with one simple parameter. More complicated functions will require 
 a UDF but one generic one can work as a poor-mans UDF.
 {noformat}
 SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, 
 isEmpty)
 FROM src LIMIT 1;
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-08-25 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

   Status: Patch Available  (was: Open)
Affects Version/s: 0.6.0
   (was: 0.5.1)
Fix Version/s: 0.7.0

 A UDF for simple reflection
 ---

 Key: HIVE-471
 URL: https://issues.apache.org/jira/browse/HIVE-471
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.7.0

 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
 HIVE-471.3.patch, HIVE-471.4.patch, HIVE-471.5.patch, hive-471.diff


 There are many methods in java that are static and have no arguments or can 
 be invoked with one simple parameter. More complicated functions will require 
 a UDF but one generic one can work as a poor-mans UDF.
 {noformat}
 SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, 
 isEmpty)
 FROM src LIMIT 1;
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-08-25 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12902616#action_12902616
 ] 

Edward Capriolo commented on HIVE-1434:
---

Maven, I am on the fence about it. We actually do not need all the libs I 
included. Having them in a tarball sounds good, but making a maven repo for 
only this purpose seems to be a lot of work.

{quote}
Should we attempt to factor out the HBase commonality immediately, or commit 
the overlapping code and then do refactoring as a followup? I'm fine either 
way; I can give suggestions on how to create the reusable abstract bases and 
where to package+name them.{quote}
If you can specify specific instances then sure. The code may be 99% the same, 
but that one nuance is going to make the abstractions confusing and useless. 

I await further review.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1505) Support non-UTF8 data

2010-08-20 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900697#action_12900697
 ] 

Edward Capriolo commented on HIVE-1505:
---

 Maybe you should fork hive and call it chive. 

On a serious node . Great job. Would you consider editing the cli.xml in the 
xdocs to explain this feature? I think it would be very helpful look in 
docs/xdocs/.

 Support non-UTF8 data
 -

 Key: HIVE-1505
 URL: https://issues.apache.org/jira/browse/HIVE-1505
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Serializers/Deserializers
Affects Versions: 0.5.0
Reporter: bc Wong
Assignee: Ted Xu
 Attachments: trunk-encoding.patch


 I'd like to work with non-UTF8 data easily.
 Suppose I have data in latin1. Currently, doing a select * will return the 
 upper ascii characters in '\xef\xbf\xbd', which is the replacement character 
 '\ufffd' encoded in UTF-8. Would be nice for Hive to understand different 
 encodings, or to have a concept of byte string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1555) JDBC Storage Handler

2010-08-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12900039#action_12900039
 ] 

Edward Capriolo commented on HIVE-1555:
---

I wonder if this could end up being a very effective way to query shared data 
stores. 

I think I saw something like this in futurama.. Dont worry about querying 
blank, let me worry about querying blank.
 
http://www.google.com/url?sa=tsource=webcd=2ved=0CBcQFjABurl=http%3A%2F%2Fwww.youtube.com%2Fwatch%3Fv%3DB5cAwTEEGNEei=Qk9sTLAThIqXB__DzDwusg=AFQjCNH_TOUS1cl6t0gZXefRURw0a_feZg

 JDBC Storage Handler
 

 Key: HIVE-1555
 URL: https://issues.apache.org/jira/browse/HIVE-1555
 Project: Hadoop Hive
  Issue Type: New Feature
Affects Versions: 0.5.0
Reporter: Bob Robertson
   Original Estimate: 24h
  Remaining Estimate: 24h

 With the Cassandra and HBase Storage Handlers I thought it would make sense 
 to include a generic JDBC RDBMS Storage Handler so that you could import a 
 standard DB table into Hive. Many people must want to perform HiveQL joins, 
 etc against tables in other systems etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-15 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-4-patch.txt

Refactored the code, added xdoc, more extensive testing.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-3-patch.txt, hive-1434-4-patch.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1530) Include hive-default.xml and hive-log4j.properties in hive-common JAR

2010-08-12 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898030#action_12898030
 ] 

Edward Capriolo commented on HIVE-1530:
---

I like the default xml. Hive has many undocumented options, new ones are being 
added often. Are end users going to know which jar the default.xml are in? 
Users want to extracting a jar just to get the conf out of it to read the 
description of the setting.

As for what hadoop does...I personally find it annoying to have navigate to 
hadoop/src/mapred/mapred-default.xml or to hadoop/src/hdfs/hdfs-default.xml to 
figure out what options I have for settings. So i do not really thing we should 
just do it to be like hadoop it it makes peoples life harder.

If anything please keep it as hive-site.xml.sample.


 Include hive-default.xml and hive-log4j.properties in hive-common JAR
 -

 Key: HIVE-1530
 URL: https://issues.apache.org/jira/browse/HIVE-1530
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Configuration
Reporter: Carl Steinbach

 hive-common-*.jar should include hive-default.xml and hive-log4j.properties,
 and similarly hive-exec-*.jar should include hive-exec-log4j.properties. The
 hive-default.xml file that currently sits in the conf/ directory should be 
 removed.
 Motivations for this change:
 * We explicitly tell users that they should never modify hive-default.xml yet 
 give them the opportunity to do so by placing the file in the conf dir.
 * Many users are familiar with the Hadoop configuration mechanism that does 
 not require *-default.xml files to be present in the HADOOP_CONF_DIR, and 
 assume that the same is true for HIVE_CONF_DIR.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-3-patch.txt

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-3-patch.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Status: Patch Available  (was: Open)

This patch has full read/write functionality. I am going to do another patch 
later today with xdocs, but do not expect any code changes.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, hive-1434-1.txt, 
 hive-1434-2-patch.txt, hive-1434-3-patch.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1511) Hive plan serialization is slow

2010-08-04 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895352#action_12895352
 ] 

Edward Capriolo commented on HIVE-1511:
---

Also possibly a clever way to remove duplicate expressions that evaluate to the 
same result such as multiple key=0

 Hive plan serialization is slow
 ---

 Key: HIVE-1511
 URL: https://issues.apache.org/jira/browse/HIVE-1511
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Ning Zhang

 As reported by Edward Capriolo:
 For reference I did this as a test case
 SELECT * FROM src where
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 key=0 OR key=0 OR key=0 OR  key=0 OR key=0 OR key=0 OR key=0 OR key=0
 OR key=0 OR key=0 OR key=0 OR
 ...(100 more of these)
 No OOM but I gave up after the test case did not go anywhere for about
 2 minutes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-08-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-2-patch.txt

Closing in on this one. This patch sets up build environment correctly. Proper 
test infrastructure. Patch is much cleaner. Still working on 
Serializing/Deserialing correctly so not very functional. 80% I think.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, hive-1434-1.txt, hive-1434-2-patch.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1441) Extend ivy offline mode to cover metastore downloads

2010-07-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12894133#action_12894133
 ] 

Edward Capriolo commented on HIVE-1441:
---

Fresh checkout before any after patch. Still looking into it.
{noformat}
 /properties
  testcase 
classname=org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote 
name=testPartition time=8.242
error message=Could not connect to meta store using any of the URIs 
provided type=org.apache.hadoop.hive.metastore.api.MetaExceptionM
etaException(message:Could not connect to meta store using any of the URIs 
provided)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:160)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lt;initgt;(HiveMetaStoreClient.java:128)
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.lt;initgt;(HiveMetaStoreClient.java:71)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote.setUp(TestHiveMetaStoreRemote.java:64)
at junit.framework.TestCase.runBare(TestCase.java:125)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:118)
at junit.framework.TestSuite.runTest(TestSuite.java:208)
at junit.framework.TestSuite.run(TestSuite.java:203)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.run(JUnitTestRunner.java:422)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.launch(JUnitTestRunner.java:931)
at 
org.apache.tools.ant.taskdefs.optional.junit.JUnitTestRunner.main(JUnitTestRunner.java:785)
/error
  /testcase
  system-out![CDATA[Running metastore!
]]/system-out
  system-err![CDATA[]]/system-err
/testsuite


From eclipse:
Running metastore!
MetaException(message:hive.metastore.warehouse.dir is not set in the config or 
blank)
at org.apache.hadoop.hive.metastore.Warehouse.init(Warehouse.java:58)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:155)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:125)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:1965)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote$RunMS.run(TestHiveMetaStoreRemote.java:39)
at java.lang.Thread.run(Thread.java:619)
10/07/30 16:03:22 ERROR metastore.HiveMetaStore: Metastore Thrift Server threw 
an exception. Exiting...
10/07/30 16:03:22 ERROR metastore.HiveMetaStore: 
MetaException(message:hive.metastore.warehouse.dir is not set in the config or 
blank)
at org.apache.hadoop.hive.metastore.Warehouse.init(Warehouse.java:58)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:155)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:125)
at 
org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:1965)
at 
org.apache.hadoop.hive.metastore.TestHiveMetaStoreRemote$RunMS.run(TestHiveMetaStoreRemote.java:39)
at java.lang.Thread.run(Thread.java:619)
{noformat}

 Extend ivy offline mode to cover metastore downloads
 

 Key: HIVE-1441
 URL: https://issues.apache.org/jira/browse/HIVE-1441
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Build Infrastructure
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.7.0

 Attachments: HIVE-1441.1.patch


 We recently started downloading datanucleus jars via ivy, and the existing 
 ivy offilne mode doesn't cover this, so we still end up trying to contact the 
 ivy repository even with offline mode enabled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1294:
--

Priority: Blocker  (was: Minor)

 HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
 

 Key: HIVE-1294
 URL: https://issues.apache.org/jira/browse/HIVE-1294
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Edward Capriolo
Priority: Blocker

 The Hive Webserver fails to startup with the following error message, if 
 HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
 $ build/dist/bin/hive --service hwi
 Exception in thread main java.io.IOException: Error opening job jar: 
 -libjars
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
 Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:114)
at java.util.jar.JarFile.init(JarFile.java:133)
at java.util.jar.JarFile.init(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
 Slightly modifying the command line to launch hadoop in hwi.sh solves the 
 problem:
 $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
 28c28
exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@
 ---
exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
  $@

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo reassigned HIVE-1294:
-

Assignee: Edward Capriolo

 HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
 

 Key: HIVE-1294
 URL: https://issues.apache.org/jira/browse/HIVE-1294
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Edward Capriolo
Priority: Minor

 The Hive Webserver fails to startup with the following error message, if 
 HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
 $ build/dist/bin/hive --service hwi
 Exception in thread main java.io.IOException: Error opening job jar: 
 -libjars
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
 Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:114)
at java.util.jar.JarFile.init(JarFile.java:133)
at java.util.jar.JarFile.init(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
 Slightly modifying the command line to launch hadoop in hwi.sh solves the 
 problem:
 $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
 28c28
exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@
 ---
exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
  $@

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1294:
--

Attachment: hive-1294.patch.txt

 HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
 

 Key: HIVE-1294
 URL: https://issues.apache.org/jira/browse/HIVE-1294
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Edward Capriolo
Priority: Blocker
 Attachments: hive-1294.patch.txt


 The Hive Webserver fails to startup with the following error message, if 
 HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
 $ build/dist/bin/hive --service hwi
 Exception in thread main java.io.IOException: Error opening job jar: 
 -libjars
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
 Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:114)
at java.util.jar.JarFile.init(JarFile.java:133)
at java.util.jar.JarFile.init(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
 Slightly modifying the command line to launch hadoop in hwi.sh solves the 
 problem:
 $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
 28c28
exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@
 ---
exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
  $@

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1294) HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface

2010-07-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1294:
--

   Status: Patch Available  (was: Open)
Fix Version/s: 0.6.0

Hwi does not start correctly without this patch.

 HIVE_AUX_JARS_PATH interferes with startup of Hive Web Interface
 

 Key: HIVE-1294
 URL: https://issues.apache.org/jira/browse/HIVE-1294
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Web UI
Affects Versions: 0.5.0
Reporter: Dilip Joseph
Assignee: Edward Capriolo
Priority: Blocker
 Fix For: 0.6.0

 Attachments: hive-1294.patch.txt


 The Hive Webserver fails to startup with the following error message, if 
 HIVE_AUX_JARS_PATH environment variable is set (works fine if unset).   
 $ build/dist/bin/hive --service hwi
 Exception in thread main java.io.IOException: Error opening job jar: 
 -libjars
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
 Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.init(ZipFile.java:114)
at java.util.jar.JarFile.init(JarFile.java:133)
at java.util.jar.JarFile.init(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
 Slightly modifying the command line to launch hadoop in hwi.sh solves the 
 problem:
 $ diff bin/ext/hwi.sh  /tmp/new-hwi.sh
 28c28
exec $HADOOP jar $AUX_JARS_CMD_LINE ${HWI_JAR_FILE} $CLASS $HIVE_OPTS $@
 ---
exec $HADOOP jar ${HWI_JAR_FILE}  $CLASS $AUX_JARS_CMD_LINE $HIVE_OPTS 
  $@

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1492) FileSinkOperator should remove duplicated files from the same task based on file sizes

2010-07-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12893772#action_12893772
 ] 

Edward Capriolo commented on HIVE-1492:
---

the largest file is the correct file 
Is that generally true or an absolute fact?

 FileSinkOperator should remove duplicated files from the same task based on 
 file sizes
 --

 Key: HIVE-1492
 URL: https://issues.apache.org/jira/browse/HIVE-1492
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ning Zhang
Assignee: Ning Zhang
 Fix For: 0.7.0

 Attachments: HIVE-1492.patch, HIVE-1492_branch-0.6.patch


 FileSinkOperator.jobClose() calls Utilities.removeTempOrDuplicateFiles() to 
 retain only one file for each task. A task could produce multiple files due 
 to failed attempts or speculative runs. The largest file should be retained 
 rather than the first file for each task. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-07-28 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: cas-handle.tar.gz

This is not a quality patch yet. I am still experimenting with some ideas. 
Everying is free form and will likely change before the final patch. There are 
a few junk files (HiveIColumn,etc) which will not be part of the release.
Thus far:
CassandraSplit.java
HiveCassandraTableInputFormat.java
CassandraSerDe.java
TestColumnFamilyInputFormat.java
TestCassandraPut.java
TestColumnFamilyInputFormat.java

Are working and can give you an idea of where the code is going.

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: cas-handle.tar.gz, hive-1434-1.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script

2010-07-21 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1414:
--

Attachment: hive-1414-2.txt

New version only reads hiverc if -i option is not specified. Includes xdocs.

 automatically invoke .hiverc init script
 

 Key: HIVE-1414
 URL: https://issues.apache.org/jira/browse/HIVE-1414
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Affects Versions: 0.5.0
Reporter: John Sichi
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: hive-1414-2.txt, hive-1414-patch-1.txt


 Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script

2010-07-21 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1414:
--

Status: Patch Available  (was: Open)

 automatically invoke .hiverc init script
 

 Key: HIVE-1414
 URL: https://issues.apache.org/jira/browse/HIVE-1414
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Affects Versions: 0.5.0
Reporter: John Sichi
Assignee: Edward Capriolo
 Fix For: 0.7.0

 Attachments: hive-1414-2.txt, hive-1414-patch-1.txt


 Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-07-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Attachment: HIVE-471.4.patch

 A UDF for simple reflection
 ---

 Key: HIVE-471
 URL: https://issues.apache.org/jira/browse/HIVE-471
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.5.1
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
 HIVE-471.3.patch, HIVE-471.4.patch, hive-471.diff


 There are many methods in java that are static and have no arguments or can 
 be invoked with one simple parameter. More complicated functions will require 
 a UDF but one generic one can work as a poor-mans UDF.
 {noformat}
 SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, 
 isEmpty)
 FROM src LIMIT 1;
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-07-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Status: Patch Available  (was: Open)

 A UDF for simple reflection
 ---

 Key: HIVE-471
 URL: https://issues.apache.org/jira/browse/HIVE-471
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.5.1
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
 HIVE-471.3.patch, HIVE-471.4.patch, hive-471.diff


 There are many methods in java that are static and have no arguments or can 
 be invoked with one simple parameter. More complicated functions will require 
 a UDF but one generic one can work as a poor-mans UDF.
 {noformat}
 SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, 
 isEmpty)
 FROM src LIMIT 1;
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-07-13 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1446:
--

Attachment: hive-1446-part-1.diff

Got most of the language manual

 Move Hive Documentation from the wiki to version control
 

 Key: HIVE-1446
 URL: https://issues.apache.org/jira/browse/HIVE-1446
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-1446-part-1.diff, hive-1446.diff, hive-logo-wide.png


 Move the Hive Language Manual (and possibly some other documents) from the 
 Hive wiki to version control. This work needs to be coordinated with the 
 hive-dev and hive-user community in order to avoid missing any edits as well 
 as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-07-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886912#action_12886912
 ] 

Edward Capriolo commented on HIVE-1096:
---

I am having trouble uploading with the update diff function of the review 
board.  As I mentioned several times, I really had one simple requirement

{noformat}
hive -hiveconf DAY=5 -e LOAD DATA INFILE '/tmp/${DAY}'  into logs 
partition=${DAY}
{noformat}

I am all for doing things 100% correct, but this is such a simple thing I am 
really getting worn out by the endless revisions and doing lots of fancy things 
just because someone might want to do ${x${y}bla}. 
Really, I would like to make this ticket go +1, and get on with something more 
interesting.

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0, 0.7.0

 Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
 hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, 
 hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-07-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Status: Patch Available  (was: Open)

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0, 0.7.0

 Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
 hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, 
 hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-07-08 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-12.patch.txt

Change interpolate to substituteAdded the substitution logic to file, dfs, 
set , and query processor

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0, 0.7.0

 Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
 hive-1096-11-patch.txt, hive-1096-12.patch.txt, hive-1096-2.diff, 
 hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1434) Cassandra Storage Handler

2010-07-01 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884384#action_12884384
 ] 

Edward Capriolo commented on HIVE-1434:
---

I actually got pretty far with this simply duplicating the logic in the Hbase 
Storage handler. Unfortunately I hit a snafu. Cassandra is not using the 
deprecated mapred.*, their input format is using mapreduce.*. I have seen a few 
tickets for this, and as far as I know hive is 100% mapred. So to get this done 
we either have to wait until hive is converted to mapreduce, or I have to make 
an old school mapred based input format for cassandra. 

@John am I wrong? Is there a way to work with mapreduce input formats that I am 
not understanding?



 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1434-1.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884141#action_12884141
 ] 

Edward Capriolo commented on HIVE-1135:
---

Carl, thank you for the assist!

 Use Anakia for version controlled documentation
 ---

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
 hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
 hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-06-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12884142#action_12884142
 ] 

Edward Capriolo commented on HIVE-1446:
---

I will make a xdoc the CLI page

 Move Hive Documentation from the wiki to version control
 

 Key: HIVE-1446
 URL: https://issues.apache.org/jira/browse/HIVE-1446
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0


 Move the Hive Language Manual (and possibly some other documents) from the 
 Hive wiki to version control. This work needs to be coordinated with the 
 hive-dev and hive-user community in order to avoid missing any edits as well 
 as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-06-30 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1446:
--

Attachment: hive-logo-wide.png

We need this wise logo to fix the align of the generated docs

 Move Hive Documentation from the wiki to version control
 

 Key: HIVE-1446
 URL: https://issues.apache.org/jira/browse/HIVE-1446
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-logo-wide.png


 Move the Hive Language Manual (and possibly some other documents) from the 
 Hive wiki to version control. This work needs to be coordinated with the 
 hive-dev and hive-user community in order to avoid missing any edits as well 
 as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1446) Move Hive Documentation from the wiki to version control

2010-06-30 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1446:
--

Attachment: hive-1446.diff

Includes the image in the vsl to fix the alignment

 Move Hive Documentation from the wiki to version control
 

 Key: HIVE-1446
 URL: https://issues.apache.org/jira/browse/HIVE-1446
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Reporter: Carl Steinbach
Assignee: Carl Steinbach
 Fix For: 0.6.0, 0.7.0

 Attachments: hive-1446.diff, hive-logo-wide.png


 Move the Hive Language Manual (and possibly some other documents) from the 
 Hive wiki to version control. This work needs to be coordinated with the 
 hive-dev and hive-user community in order to avoid missing any edits as well 
 as to avoid or limit unavailability of the docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1434) Cassandra Storage Handler

2010-06-29 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1434:
--

Attachment: hive-1434-1.txt

Just a start. (To prove that I am doing something with this ticket)

 Cassandra Storage Handler
 -

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1434-1.txt


 Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-24 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882349#action_12882349
 ] 

Edward Capriolo commented on HIVE-1135:
---

Bump:: I will fix the formatting later. Can we commit this we do not really 
need any unit tests here?

 Use Anakia for version controlled documentation
 ---

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
 hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
 hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1434) Cassandra Storage Handler

2010-06-24 Thread Edward Capriolo (JIRA)
Cassandra Storage Handler
-

 Key: HIVE-1434
 URL: https://issues.apache.org/jira/browse/HIVE-1434
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo


Add a cassandra storage handler.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Status: Patch Available  (was: Open)

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: 1096-9.diff, hive-1096-10-patch.txt, hive-1096-2.diff, 
 hive-1096-7.diff, hive-1096-8.diff, hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1096) Hive Variables

2010-06-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1096:
--

Attachment: hive-1096-11-patch.txt

Was not interpolating system:vars. Fixed with better test case.

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0, 0.7.0

 Attachments: 1096-9.diff, hive-1096-10-patch.txt, 
 hive-1096-11-patch.txt, hive-1096-2.diff, hive-1096-7.diff, hive-1096-8.diff, 
 hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1431) Hive CLI can't handle query files that begin with comments

2010-06-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12882024#action_12882024
 ] 

Edward Capriolo commented on HIVE-1431:
---

We have a few tickets open, we really need to move all this stuff to a real 
parser so we can properly deal with things like ';' or comments like this or 
whatever. It is painfully hard to work around all these type of things and we 
never get to the root of the problem.

 Hive CLI can't handle query files that begin with comments
 --

 Key: HIVE-1431
 URL: https://issues.apache.org/jira/browse/HIVE-1431
 Project: Hadoop Hive
  Issue Type: Bug
  Components: CLI
Reporter: Carl Steinbach
 Fix For: 0.6.0, 0.7.0


 {code}
 % cat test.q
 -- This is a comment, followed by a command
 set -v;
 -- 
 -- Another comment
 --
 show tables;
 -- Last comment
 (master) [ ~/Projects/hive ]
 % hive  test.q
 Hive history file=/tmp/carl/hive_job_log_carl_201006231606_1140875653.txt
 hive -- This is a comment, followed by a command
  set -v;
 FAILED: Parse Error: line 2:0 cannot recognize input 'set'
 hive -- 
  -- Another comment
  --
  show tables;
 OK
 rawchunks
 Time taken: 5.334 seconds
 hive -- Last comment
  (master) [ ~/Projects/hive ]
 % 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1419) Policy on deserialization errors

2010-06-21 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880908#action_12880908
 ] 

Edward Capriolo commented on HIVE-1419:
---

I am looking through this and trying to wrap my head around it. Off hand do you 
know what happens in this situation. 

We have a table that we have added columns to over time

create table tab (a int, b int);

Over time we have added more columns

alter table tab (a int, b int, c int)

This works fine for us as selecting column c on older data returns null for 
that column. Will this behaviour be preserved ?

 Policy on deserialization errors
 

 Key: HIVE-1419
 URL: https://issues.apache.org/jira/browse/HIVE-1419
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Serializers/Deserializers
Affects Versions: 0.5.0
Reporter: Vladimir Klimontovich
Assignee: Vladimir Klimontovich
Priority: Minor
 Fix For: 0.5.1, 0.6.0

 Attachments: corrupted_records_0.5.patch, 
 corrupted_records_0.5_ver2.patch, corrupted_records_trunk.patch, 
 corrupted_records_trunk_ver2.patch


 When deserializer throws an exception the whole map tasks fails (see 
 MapOperator.java file). It's not always an convenient behavior especially on 
 huge datasets where several corrupted lines could be a normal practice. 
 Proposed solution:
 1) Have a counter of corrupted records
 2) When a counter exceeds a limit (configurable via 
 hive.max.deserializer.errors property, 0 by default) throw an exception. 
 Otherwise just log and exception with WARN level.
 Patches for 0.5 branch and trunk are attached

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880207#action_12880207
 ] 

Edward Capriolo commented on HIVE-1405:
---

I was thinking we just look for hive_rc in the users home directory and/or in 
hive_home/bin. If we find that file we have to read it line by line and process 
it just like other hive commands. We could restrict this to just set or add 
commands but there is no reason it could not have a full query.

 Implement a .hiverc startup file
 

 Key: HIVE-1405
 URL: https://issues.apache.org/jira/browse/HIVE-1405
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Jonathan Chang
Assignee: John Sichi

 When deploying hive, it would be nice to have a .hiverc file containing 
 statements that would be automatically run whenever hive is launched.  This 
 way, we can automatically add JARs, create temporary functions, set flags, 
 etc. for all users quickly. 
 This should ideally be set up like .bashrc and the like with a global version 
 and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880303#action_12880303
 ] 

Edward Capriolo commented on HIVE-1135:
---

Great on ivy. 
As for the wiki I think we should just put a node at the top of the page that 
says Do not edit me. Edit xdocs instead. For the pages we have migrated. I 
want to do like a page every other day so it should be done soon enough. I 
actually have commit access but I usually leave the commits up to the experts. 
Also since I worked on this ticket I really should not be the commit person. 
Anyone else?

 Use Anakia for version controlled documentation
 ---

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
 hive-1135-5-patch.txt, hive-1135-6-patch.txt, hive-1335-1.patch.txt, 
 hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, wtf.png


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1414) automatically invoke .hiverc init script

2010-06-18 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1414:
--

Attachment: hive-1414-patch-1.txt

First attempt at patch.

 automatically invoke .hiverc init script
 

 Key: HIVE-1414
 URL: https://issues.apache.org/jira/browse/HIVE-1414
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Affects Versions: 0.5.0
Reporter: John Sichi
Assignee: Edward Capriolo
 Attachments: hive-1414-patch-1.txt


 Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1414) automatically invoke .hiverc init script

2010-06-18 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880414#action_12880414
 ] 

Edward Capriolo commented on HIVE-1414:
---

Files automatically: ql sourced env[HIVE_HOME]/bin/.hiverc, 
property(user.home)/.hiverc. I think only the CLI needs these features. Users 
of hive service are accessing the session though code repetition is not a 
problem, the same is true with JDBC. CLI users get the most benefit from the 
.hiverc. What do you think?

 automatically invoke .hiverc init script
 

 Key: HIVE-1414
 URL: https://issues.apache.org/jira/browse/HIVE-1414
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Clients
Affects Versions: 0.5.0
Reporter: John Sichi
Assignee: Edward Capriolo
 Attachments: hive-1414-patch-1.txt


 Similar to .bashrc but run Hive SQL commands.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880028#action_12880028
 ] 

Edward Capriolo commented on HIVE-1405:
---

I like Carl's approach. The entire point of the hiverc is not to explicitly 
have to do invoke anything explicit to add jars.

 Implement a .hiverc startup file
 

 Key: HIVE-1405
 URL: https://issues.apache.org/jira/browse/HIVE-1405
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Jonathan Chang
Assignee: John Sichi

 When deploying hive, it would be nice to have a .hiverc file containing 
 statements that would be automatically run whenever hive is launched.  This 
 way, we can automatically add JARs, create temporary functions, set flags, 
 etc. for all users quickly. 
 This should ideally be set up like .bashrc and the like with a global version 
 and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1405) Implement a .hiverc startup file

2010-06-17 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880053#action_12880053
 ] 

Edward Capriolo commented on HIVE-1405:
---

{noformat}
[edw...@ec dist]$ echo show tables  a.sql
[edw...@ec dist]$ bin/hive
[edw...@ec dist]$ chmod a+x a.sql 
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006172223_1189860304.txt
[edw...@ec dist]$ pwd
/mnt/data/hive/hive/build/dist
[edw...@ec dist]$ bin/hive
Hive history file=/tmp/edward/hive_job_log_edward_201006172223_310534855.txt
hive ! /mnt/data/hive/hive/build/dist/a.sql;
/mnt/data/hive/hive/build/dist/a.sql: line 1: show: command not found
Command failed with exit code = 127
{noformat}

! seems to execute bash commands

Dont we want to execute hive commands inside hive like add jar


 Implement a .hiverc startup file
 

 Key: HIVE-1405
 URL: https://issues.apache.org/jira/browse/HIVE-1405
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Jonathan Chang
Assignee: John Sichi

 When deploying hive, it would be nice to have a .hiverc file containing 
 statements that would be automatically run whenever hive is launched.  This 
 way, we can automatically add JARs, create temporary functions, set flags, 
 etc. for all users quickly. 
 This should ideally be set up like .bashrc and the like with a global version 
 and a user-local version.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Use Anakia for version controlled documentation

2010-06-16 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: wtf.png

Cool on adding the logoBut something went wrong here. Unless I applied the 
patch from the left table looks wrong now. check the screen shot.

 Use Anakia for version controlled documentation
 ---

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1135-3-patch.txt, hive-1135-4-patch.txt, 
 hive-1335-1.patch.txt, hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE, 
 wtf.png


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-15 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: hive-1135-3-patch.txt

Fixed all items.

 Move hive language manual and tutorial to version control
 -

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, 
 hive-1335-2.patch.txt, jdom-1.1.jar


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and tutorial to version control

2010-06-15 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: jdom-1.1.LICENSE

 Move hive language manual and tutorial to version control
 -

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1135-3-patch.txt, hive-1335-1.patch.txt, 
 hive-1335-2.patch.txt, jdom-1.1.jar, jdom-1.1.LICENSE


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: hive-1335-2.patch.txt

Edited the build.xml a bit to deal with some incorrect paths. Also added the 
HiveDataDefinitionStatements wiki page for reference. Is everyone ok with using 
anakia and this structure. I would like to get this cleaned up with the current 
docs in place. Then I will do some follow up tickets and add some wiki pages, 
let me know if everyone is happy with the overall xdocs-docs process.

 Move hive language manual and all wiki based documentation to forest
 

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, 
 jdom-1.1.jar


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878816#action_12878816
 ] 

Edward Capriolo commented on HIVE-1135:
---

FMI. What is the hbase review board? 

 Stash this stuff under docs/ instead of creating another top level directory 
 (xdocs/)
It seems like xdocs is convention, also did not want to step on whatever is in 
docs. I will see if they both can live in docs happily.

JDOM ivy. 

Right now most/all the stuff in /lib comes from ivy. We should open up a 
another ticket and convert the entire project to ivy.

velocity.log
Yes my local version (next patch already fixes that)

 * Limit the initial import to the convents of the Hive Language Manual. I 
 think some things should actually stay on the wiki, but the language manual 
 is definitely one of those things that we want to have in VCS.

I agree the initial import should come from the Hive Language Manual only. To 
me wiki just screems, I did not have time to write a full complete doc. 
Generalization coming: 99% of the things in the wiki should be in xdocs. Users 
only want one place for authoritative information. Wikis and xdoc will fall out 
of sync, confusion follows.

 Move hive language manual and all wiki based documentation to forest
 

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, 
 jdom-1.1.jar


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12878818#action_12878818
 ] 

Edward Capriolo commented on HIVE-1135:
---

JDOM ivy.

Right now most/all the stuff in /lib DOES NOT comes from ivy. We should open up 
a another ticket and convert the entire project to ivy.

 Move hive language manual and all wiki based documentation to forest
 

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1335-1.patch.txt, hive-1335-2.patch.txt, 
 jdom-1.1.jar


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: jdom-1.1.jar

 Move hive language manual and all wiki based documentation to forest
 

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: jdom-1.1.jar


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Attachment: hive-1335-1.patch.txt

Patch, docs command beings anakia from xdocs directory.

 Move hive language manual and all wiki based documentation to forest
 

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1335-1.patch.txt, jdom-1.1.jar


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1135) Move hive language manual and all wiki based documentation to forest

2010-06-11 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1135:
--

Fix Version/s: 0.6.0
Affects Version/s: 0.5.0

 Move hive language manual and all wiki based documentation to forest
 

 Key: HIVE-1135
 URL: https://issues.apache.org/jira/browse/HIVE-1135
 Project: Hadoop Hive
  Issue Type: Task
  Components: Documentation
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: hive-1335-1.patch.txt, jdom-1.1.jar


 Currently the Hive Language Manual and many other critical pieces of 
 documentation are on the Hive wiki. 
 Right now we count on the author of a patch to follow up and add wiki 
 entries. While we do a decent job with this, new features can be missed. Or 
 using running older/newer branches can not locate relevant documentation for 
 their branch. 
 ..example of a perception I do not think we want to give off...
 http://dev.hubspot.com/bid/30170/Who-Loves-the-Magic-Undocumented-Hive-Mapjoin-This-Guy
 We should generate our documentation in the way hadoop  hbase does, inline 
 using forest. I would like to take the lead on this, but we need a lot of 
 consensus on doing this properly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1401) Web Interface can ony browse default

2010-06-10 Thread Edward Capriolo (JIRA)
Web Interface can ony browse default


 Key: HIVE-1401
 URL: https://issues.apache.org/jira/browse/HIVE-1401
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Web UI
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1401) Web Interface can ony browse default

2010-06-10 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1401:
--

Attachment: HIVE-1401-1-patch.txt

 Web Interface can ony browse default
 

 Key: HIVE-1401
 URL: https://issues.apache.org/jira/browse/HIVE-1401
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Web UI
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: HIVE-1401-1-patch.txt




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1397) histogram() UDAF for a numerical column

2010-06-09 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877220#action_12877220
 ] 

Edward Capriolo commented on HIVE-1397:
---

Looks great. Can not wait.

 histogram() UDAF for a numerical column
 ---

 Key: HIVE-1397
 URL: https://issues.apache.org/jira/browse/HIVE-1397
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: Mayank Lahiri
Assignee: Mayank Lahiri
 Fix For: 0.6.0


 A histogram() UDAF to generate an approximate histogram of a numerical (byte, 
 short, double, long, etc.) column. The result is returned as a map of (x,y) 
 histogram pairs, and can be plotted in Gnuplot using impulses (for example). 
 The algorithm is currently adapted from A streaming parallel decision tree 
 algorithm by Ben-Haim and Tom-Tov, JMLR 11 (2010), and uses space 
 proportional to the number of histogram bins specified. It has no 
 approximation guarantees, but seems to work well when there is a lot of data 
 and a large number (e.g. 50-100) of histogram bins specified.
 A typical call might be:
 SELECT histogram(val, 10) FROM some_table;
 where the result would be a histogram with 10 bins, returned as a Hive map 
 object.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath

2010-06-07 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876250#action_12876250
 ] 

Edward Capriolo commented on HIVE-1373:
---


{quote}
1 copy is anyway done from lib to dist/lib for these jars. If we go directly to 
ivy we would copy things from the ivy cache to dist/lib. So the number of 
copies in the build process
would remain the same, no? There is of course the first time overhead of 
downloading these jars from their repos to the ivy cache.
{quote}

I follow what you are thinking. Currently the code I did takes specifc jars 
from metastore ivy dowloads. We could probably have ivy download directly to 
build/lib. I just think we should watch to make sure many unneeded jars do not 
appear.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 

[jira] Commented: (HIVE-1369) LazySimpleSerDe should be able to read classes that support some form of toString()

2010-06-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12875909#action_12875909
 ] 

Edward Capriolo commented on HIVE-1369:
---

This seems interesting. In some cases toString() output can change between java 
versions for some objects. Do we need to compensate for that.

 LazySimpleSerDe should be able to read classes that support some form of 
 toString()
 ---

 Key: HIVE-1369
 URL: https://issues.apache.org/jira/browse/HIVE-1369
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Alex Kozlov
Assignee: Alex Kozlov
Priority: Minor
 Attachments: HIVE-1369.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently LazySimpleSerDe is able to deserialize only BytesWritable or Text 
 objects.  It should be pretty easy to extend the class to read any object 
 that implements toString() method.
 Ideas or concerns?
 Alex K

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1265) Function Registry should should auto-detect UDFs from UDF Description

2010-05-28 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872996#action_12872996
 ] 

Edward Capriolo commented on HIVE-1265:
---

{noformat}
 public static ListClass getClassesForPackage(String packageName, Class 
classType){
+ListClass matchingClasses = new ArrayListClass();
+File directory = null;
+System.out.println(packageName.replace('.', File.separatorChar));
+URL u = Thread.currentThread().getContextClassLoader()
+//URL u = new Object().getClass().c
+.getResource(packageName.replace('.', File.separatorChar));
{noformat}

It seems like this section of code only picks up classes in 
ql/test/org.apache.hadoop.hive.ql.udf. This must have something to do with 
classloaders/threads/ and getResource(). It seems like getResource is unaware 
that two folders could be responsible for the same resource. Or I have to find 
a better way to do this.

 Function Registry should should auto-detect UDFs  from UDF Description
 --

 Key: HIVE-1265
 URL: https://issues.apache.org/jira/browse/HIVE-1265
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: hive-1265-patch.diff


 We should be able to register functions dynamically.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2010-05-27 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872244#action_12872244
 ] 

Edward Capriolo commented on HIVE-802:
--

I just did a patch that adds connection pooling to DataNucleas. (Sorry that I 
jumped ahead of you). It should be easy to update now just bump the versions in 
metastore/ivy.xml. Please make sure the version you pick works with the connect 
pooling libs, as ivy fetches versions and dependants that do not work well 
together.

 Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
 -

 Key: HIVE-802
 URL: https://issues.apache.org/jira/browse/HIVE-802
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Todd Lipcon
Assignee: Arvind Prabhakar

 There's a bug in DataNucleus that causes this issue:
 http://www.jpox.org/servlet/jira/browse/NUCCORE-371
 To reproduce, simply put your hive source tree in a directory that contains a 
 '+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1373) Missing connection pool plugin in Eclipse classpath

2010-05-27 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872337#action_12872337
 ] 

Edward Capriolo commented on HIVE-1373:
---

I was thinking to move everything that came from ivy to build lib. I see the 
benefit, but I saw this technique adding more copies and moves into the ant 
process. Try different approaches. I  found none of them were better then the 
next. All involved doing more work hear and less there, or changing this 
classpath insteasd of putting a file into X folder. I was kinda confused on the 
best way to handle that. I would be interested to swee what you come up with.

 Missing connection pool plugin in Eclipse classpath
 ---

 Key: HIVE-1373
 URL: https://issues.apache.org/jira/browse/HIVE-1373
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
 Environment: Eclipse, Linux
Reporter: Vinithra Varadharajan
Assignee: Vinithra Varadharajan
Priority: Minor
 Attachments: HIVE-1373.patch


 In a recent checkin, connection pool dependency was introduced but eclipse 
 .classpath file was not updated.  This causes launch configurations from 
 within Eclipse to fail.
 {code}
 hive show tables;
 show tables;
 10/05/26 14:59:46 INFO parse.ParseDriver: Parsing command: show tables
 10/05/26 14:59:46 INFO parse.ParseDriver: Parse Completed
 10/05/26 14:59:46 INFO ql.Driver: Semantic Analysis Completed
 10/05/26 14:59:46 INFO ql.Driver: Returning Hive schema: 
 Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from 
 deserializer)], properties:null)
 10/05/26 14:59:46 INFO ql.Driver: query plan = 
 file:/tmp/vinithra/hive_2010-05-26_14-59-46_058_1636674338194744357/queryplan.xml
 10/05/26 14:59:46 INFO ql.Driver: Starting command: show tables
 10/05/26 14:59:46 INFO metastore.HiveMetaStore: 0: Opening raw store with 
 implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
 10/05/26 14:59:46 INFO metastore.ObjectStore: ObjectStore, initialize called
 FAILED: Error in metadata: javax.jdo.JDOFatalInternalException: Error 
 creating transactional connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 10/05/26 14:59:47 ERROR exec.DDLTask: FAILED: Error in metadata: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
 org.apache.hadoop.hive.ql.metadata.HiveException: 
 javax.jdo.JDOFatalInternalException: Error creating transactional connection 
 factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at org.apache.hadoop.hive.ql.metadata.Hive.getTablesForDb(Hive.java:491)
   at 
 org.apache.hadoop.hive.ql.metadata.Hive.getTablesByPattern(Hive.java:472)
   at org.apache.hadoop.hive.ql.metadata.Hive.getAllTables(Hive.java:458)
   at org.apache.hadoop.hive.ql.exec.DDLTask.showTables(DDLTask.java:504)
   at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:176)
   at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:107)
   at 
 org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:55)
   at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:631)
   at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:504)
   at org.apache.hadoop.hive.ql.Driver.run(Driver.java:382)
   at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:138)
   at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:197)
   at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:303)
 Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional 
 connection factory
 NestedThrowables:
 java.lang.reflect.InvocationTargetException
   at 
 org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:395)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:547)
   at 
 org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:175)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
   at java.lang.reflect.Method.invoke(Method.java:597)
   at javax.jdo.JDOHelper$16.run(JDOHelper.java:1956)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.jdo.JDOHelper.invoke(JDOHelper.java:1951)
   at 
 javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1159)
   at 

[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling

2010-05-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870477#action_12870477
 ] 

Edward Capriolo commented on HIVE-1335:
---

Can we go +1?

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335-1.patch.txt, hive-1335-2.patch.txt, hive-1335-3.patch.txt, 
 hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-471) A UDF for simple reflection

2010-05-23 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-471:
-

Fix Version/s: 0.6.0
Affects Version/s: 0.5.1
   (was: 0.6.0)

Should be good for trunk.

 A UDF for simple reflection
 ---

 Key: HIVE-471
 URL: https://issues.apache.org/jira/browse/HIVE-471
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Affects Versions: 0.5.1
Reporter: Edward Capriolo
Assignee: Edward Capriolo
Priority: Minor
 Fix For: 0.6.0

 Attachments: hive-471-gen.diff, HIVE-471.1.patch, HIVE-471.2.patch, 
 HIVE-471.3.patch, hive-471.diff


 There are many methods in java that are static and have no arguments or can 
 be invoked with one simple parameter. More complicated functions will require 
 a UDF but one generic one can work as a poor-mans UDF.
 {noformat}
 SELECT reflect(java.lang.String, valueOf, 1), reflect(java.lang.String, 
 isEmpty)
 FROM src LIMIT 1;
 {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1096) Hive Variables

2010-05-23 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870480#action_12870480
 ] 

Edward Capriolo commented on HIVE-1096:
---

I am back on this one. Keep your eye out for the next patch.

 Hive Variables
 --

 Key: HIVE-1096
 URL: https://issues.apache.org/jira/browse/HIVE-1096
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Attachments: 1096-9.diff, hive-1096-2.diff, hive-1096-7.diff, 
 hive-1096-8.diff, hive-1096.diff


 From mailing list:
 --Amazon Elastic MapReduce version of Hive seems to have a nice feature 
 called Variables. Basically you can define a variable via command-line 
 while invoking hive with -d DT=2009-12-09 and then refer to the variable via 
 ${DT} within the hive queries. This could be extremely useful. I can't seem 
 to find this feature even on trunk. Is this feature currently anywhere in the 
 roadmap?--
 This could be implemented in many places.
 A simple place to put this is 
 in Driver.compile or Driver.run we can do string substitutions at that level, 
 and further downstream need not be effected. 
 There could be some benefits to doing this further downstream, parser,plan. 
 but based on the simple needs we may not need to overthink this.
 I will get started on implementing in compile unless someone wants to discuss 
 this more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1351) Tool to cat rcfiles

2010-05-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869284#action_12869284
 ] 

Edward Capriolo commented on HIVE-1351:
---

As ning mentioned why move the cli code. If anything more of the code should be 
moving up into the main script rather then into smaller scripts. I see people 
making changes to only the cli. We have to make sure that fixes for things like 
cygwin get propogated to all files, or shared code gets shared.

Also rcfilecat is just a debug util, but it should have a unit test right? Just 
cat to files to make sure it works?

 Tool to cat rcfiles
 ---

 Key: HIVE-1351
 URL: https://issues.apache.org/jira/browse/HIVE-1351
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Attachments: hive.1351.1.patch, hive.1351.2.patch


 It will be useful for debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1351) Tool to cat rcfiles

2010-05-19 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12869344#action_12869344
 ] 

Edward Capriolo commented on HIVE-1351:
---

This is so notpicky, but
{noformat}
+--rcfilecat)
+  SERVICE=rcfilecat
+  shift
+  ;;
{noformat}

I do not think we should do this. We are just giving alternate invocations that 
end up being more confusing.

Why should you be able to do this:
{noformat}
hive --rcfilecat
{noformat}

but not
{noformat} 
hive --hwi
{noformat}
?

as for execHiveCmd. If you want to share this why not move it up into bin/hive? 
We do not need to add a file to shared when subs specified in in bin/hive are 
already shared.  

 Tool to cat rcfiles
 ---

 Key: HIVE-1351
 URL: https://issues.apache.org/jira/browse/HIVE-1351
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Namit Jain
Assignee: He Yongqiang
 Fix For: 0.6.0

 Attachments: hive.1351.1.patch, hive.1351.2.patch


 It will be useful for debugging

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-18 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Attachment: hive-1335-3.patch.txt

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335-1.patch.txt, hive-1335-2.patch.txt, hive-1335-3.patch.txt, 
 hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.5.0

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335-1.patch.txt, hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-14 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Attachment: hive-1335-1.patch.txt

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335-1.patch.txt, hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling

2010-05-14 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867538#action_12867538
 ] 

Edward Capriolo commented on HIVE-1335:
---

Just a strange side-note. Why is the classpath specified in both build.xml and 
build-common.xml. Do we need it defined in both places?

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335-1.patch.txt, hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling

2010-05-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864564#action_12864564
 ] 

Edward Capriolo commented on HIVE-1335:
---

{noformat}
[ivy:resolve] :: Ivy 2.1.0 - 20090925235825 :: http://ant.apache.org/ivy/ ::
[ivy:resolve] :: loading settings :: file = 
/mnt/data/hive/hive/ivy/ivysettings.xml
[ivy:resolve] 
[ivy:resolve] :: problems summary ::
[ivy:resolve]  WARNINGS
[ivy:resolve]   module not found: proxool#proxool;0.9.0RC3
[ivy:resolve]    hadoop-source: tried
[ivy:resolve] -- artifact proxool#proxool;0.9.0RC3!proxool.jar:
[ivy:resolve] 
http://mirror.facebook.net/facebook/hive-deps/hadoop/core/proxool-0.9.0RC3/proxool-0.9.0RC3.jar
[ivy:resolve]    apache-snapshot: tried
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.pom
[ivy:resolve] -- artifact proxool#proxool;0.9.0RC3!proxool.jar:
[ivy:resolve] 
https://repository.apache.org/content/repositories/snapshots/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.jar
[ivy:resolve]    maven2: tried
[ivy:resolve] 
http://repo1.maven.org/maven2/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.pom
[ivy:resolve] -- artifact proxool#proxool;0.9.0RC3!proxool.jar:
[ivy:resolve] 
http://repo1.maven.org/maven2/proxool/proxool/0.9.0RC3/proxool-0.9.0RC3.jar
[ivy:resolve]   ::
[ivy:resolve]   ::  UNRESOLVED DEPENDENCIES ::
[ivy:resolve]   ::
[ivy:resolve]   :: proxool#proxool;0.9.0RC3: not found
[ivy:resolve]   ::
[ivy:resolve] 
[ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
{noformat}

Lol wonderful. After ivy decides to go after a bunch of things I do not really 
need, and of course one of them fails. The future is here.

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1335) DataNucleus should use connection pooling

2010-05-05 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12864622#action_12864622
 ] 

Edward Capriolo commented on HIVE-1335:
---

No joy.

{noformat}
+dependency org=commons-dbcp name=commons-dbcp rev=1.2.2/
+dependency org=commons-pool name=commons-pool rev=1.2/
+dependency org=org.datanucleus name=datanucleus-connectionpool 
rev=1.0.2
+  exclude module=proxool /
+  exclude module=c3p0 /
+/dependency
{noformat}

Unfortunately the datanucleus-connectpool refuses to honor my request for 
commons-pool 1.2 and instead fetches 1.3, which ,you guest it, does not work. I 
am going to submit the original, figuring out what ivy is trying to do is 
taking too long.

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-05 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Status: Patch Available  (was: Open)

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Attachment: hive-1335.patch.txt
commons-dbcp-1.2.2.jar
commons-pool-1.2.jar

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-pool-1.2.jar, 
 datanucleus-connectionpool-1.0.2.jar, hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Attachment: datanucleus-connectionpool-1.0.2.jar

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-pool-1.2.jar, 
 datanucleus-connectionpool-1.0.2.jar, hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Status: Patch Available  (was: Open)

The jars should be placed in hive-home/lib. Patch should be applied to trunk. 
No unit test needed, as long as existing unit tests continue to function all is 
well.

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-pool-1.2.jar, 
 datanucleus-connectionpool-1.0.2.jar, hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1335) DataNucleus should use connection pooling

2010-05-03 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo updated HIVE-1335:
--

Attachment: commons-dbcp.LICENSE
commons-pool.LICENSE
datanucleus-connectionpool.LICENSE

 DataNucleus should use connection pooling
 -

 Key: HIVE-1335
 URL: https://issues.apache.org/jira/browse/HIVE-1335
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Edward Capriolo
Assignee: Edward Capriolo
 Fix For: 0.6.0

 Attachments: commons-dbcp-1.2.2.jar, commons-dbcp.LICENSE, 
 commons-pool-1.2.jar, commons-pool.LICENSE, 
 datanucleus-connectionpool-1.0.2.jar, datanucleus-connectionpool.LICENSE, 
 hive-1335.patch.txt


 Currently each Data Nucleus operation disconnects and reconnects to the 
 MetaStore over jdbc. Queries fail to even explain properly in cases where a 
 table has many partitions. This is fixed by enabling one parameter and 
 including several jars.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-610) move all properties from jpox.properties to hive-site.xml

2010-04-30 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862797#action_12862797
 ] 

Edward Capriolo commented on HIVE-610:
--

Does this mean the jpox.properties is now ignored ? If so how to we set other 
JPOX variables?

 move all properties from jpox.properties to hive-site.xml 
 --

 Key: HIVE-610
 URL: https://issues.apache.org/jira/browse/HIVE-610
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Affects Versions: 0.4.0
Reporter: Prasad Chakka
Assignee: Prasad Chakka
 Fix For: 0.4.0

 Attachments: hive-610.patch


 there some properties in jpox.properties and some in hive-site.xml. move all 
 to the later file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-1333) javax.jdo.option.NonTransactionalRead ignored?

2010-04-30 Thread Edward Capriolo (JIRA)
javax.jdo.option.NonTransactionalRead ignored?
--

 Key: HIVE-1333
 URL: https://issues.apache.org/jira/browse/HIVE-1333
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Metastore, Query Processor
Reporter: Edward Capriolo


{noformat}

property
  namejavax.jdo.option.NonTransactionalRead/name
  valuetrue/value
  descriptionreads outside of transactions/description
/property

{noformat}
hive show tables

{noformat}
100430 14:41:39  1874 Connect   hiv...@localhost on 
 1874 Init DB   m6_
 1874 Query SHOW SESSION VARIABLES
 1874 Query SHOW COLLATION
 1874 Query SET character_set_results = NULL
 1874 Query SET autocommit=1
 1874 Query SET sql_mode='STRICT_TRANS_TABLES'
 1874 Query SET autocommit=0
 1874 Query SELECT @@session.tx_isolation
 1874 Query SET SESSION TRANSACTION ISOLATION LEVEL READ 
COMMITTED
 1874 Query SELECT `THIS`.`TBL_NAME` FROM `TBLS` `THIS` 
LEFT OUTER JOIN `DBS` `THIS_DATABASE_NAME` ON `THIS`.`DB_ID` = 
`THIS_DATABASE_NAME`.`DB_ID` WHERE `THIS_DATABASE_NAME`.`NAME` = 'default' AND 
(LOWER(`THIS`.`TBL_NAME`) LIKE '_%' ESCAPE '\\' )
 1874 Query commit
 1874 Query rollback
 1874 Quit  
{noformat}

now set to false


{noformat}
100430 14:46:59  1889 Connect   hiv...@localhost on 
 1889 Init DB   m6_rshive
 1889 Query SHOW SESSION VARIABLES
 1889 Query SHOW COLLATION
 1889 Query SET character_set_results = NULL
 1889 Query SET autocommit=1
 1889 Query SET sql_mode='STRICT_TRANS_TABLES'
 1889 Query SET autocommit=0
 1889 Query SELECT @@session.tx_isolation
 1889 Query SET SESSION TRANSACTION ISOLATION LEVEL READ 
COMMITTED
 1889 Query SELECT `THIS`.`TBL_NAME` FROM `TBLS` `THIS` 
LEFT OUTER JOIN `DBS` `THIS_DATABASE_NAME` ON `THIS`.`DB_ID` = 
`THIS_DATABASE_NAME`.`DB_ID` WHERE `THIS_DATABASE_NAME`.`NAME` = 'default' AND 
(LOWER(`THIS`.`TBL_NAME`) LIKE '_%' ESCAPE '\\' )
 1889 Query commit
 1889 Query rollback
 1889 Quit  

{noformat}

Unless I misuderstand the property it looks like the reads are still inside a 
transaction. Also why does this transaction call commit as well as rollback?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *

2010-04-29 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862217#action_12862217
 ] 

Edward Capriolo commented on HIVE-1328:
---

I find external partitions to be pretty badly broken now. I am circling around 
one or two other bugs in them, that I am about to report. Users (including 
myself) are frustrated beause rather then working with data they have to work 
around bugs like HIVE-1318. I understand everyone has their own priorities. 
Call it what you will (inconsistancy/feature) we are adding to the capability 
of external tables while current features do not even work well. 

In particular HIVE-1318 is brutal. When working with my data I can make no 
assumptions when querying. I have to do all types of shell scripting to ensure 
that partitions exist before I query them, adding extra where clauses to 
carefully select ranges of partitions. 

If you are using external partitions at facebook, I wonder how you work around 
HIVE-1318, and I am also curious if you experience HIVE-1303 or is this just 
something in my environment. The handfull of users I have constantly have 
issues, does everyone there just 'suck it up'?

 make mapred.input.dir.recursive work for select *
 -

 Key: HIVE-1328
 URL: https://issues.apache.org/jira/browse/HIVE-1328
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0


 For the script below, we would like the behavior from MAPREDUCE-1501 to apply 
 so that the select * returns two rows instead of none.
 create table fact_daily(x int)
 partitioned by (ds string);
 create table fact_tz(x int)
 partitioned by (ds string, hr string, gmtoffset string);
 alter table fact_tz 
 add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
 insert overwrite table fact_tz
 partition (ds='2010-01-03', hr='1', gmtoffset='-8')
 select key+11 from src where key=484;
 alter table fact_tz 
 add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
 insert overwrite table fact_tz
 partition (ds='2010-01-03', hr='2', gmtoffset='-7')
 select key+12 from src where key=484;
 alter table fact_daily
 set tblproperties('EXTERNAL'='TRUE');
 alter table fact_daily
 add partition (ds='2010-01-03')
 location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
 set mapred.input.dir.recursive=true;
 select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1328) make mapred.input.dir.recursive work for select *

2010-04-28 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12862074#action_12862074
 ] 

Edward Capriolo commented on HIVE-1328:
---

Can we look at HIVE-1318 and maybe HIVE-1303 first. Already the external 
partitions seem to have bugs can we get them working properly before more 
features are added?

 make mapred.input.dir.recursive work for select *
 -

 Key: HIVE-1328
 URL: https://issues.apache.org/jira/browse/HIVE-1328
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Query Processor
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
 Fix For: 0.6.0


 For the script below, we would like the behavior from MAPREDUCE-1501 to apply 
 so that the select * returns two rows instead of none.
 create table fact_daily(x int)
 partitioned by (ds string);
 create table fact_tz(x int)
 partitioned by (ds string, hr string, gmtoffset string);
 alter table fact_tz 
 add partition (ds='2010-01-03', hr='1', gmtoffset='-8');
 insert overwrite table fact_tz
 partition (ds='2010-01-03', hr='1', gmtoffset='-8')
 select key+11 from src where key=484;
 alter table fact_tz 
 add partition (ds='2010-01-03', hr='2', gmtoffset='-7');
 insert overwrite table fact_tz
 partition (ds='2010-01-03', hr='2', gmtoffset='-7')
 select key+12 from src where key=484;
 alter table fact_daily
 set tblproperties('EXTERNAL'='TRUE');
 alter table fact_daily
 add partition (ds='2010-01-03')
 location '/user/hive/warehouse/fact_tz/ds=2010-01-03';
 set mapred.input.dir.recursive=true;
 select * from fact_daily where ds='2010-01-03';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-377) Some ANT jars should be included into hive

2010-04-27 Thread Edward Capriolo (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edward Capriolo resolved HIVE-377.
--

Resolution: Won't Fix

With newer releases of Hadoop  Hive Jetty and Ant are packaged differently and 
this is no longer and issue. 

 Some ANT jars should be included into hive
 --

 Key: HIVE-377
 URL: https://issues.apache.org/jira/browse/HIVE-377
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Web UI
Affects Versions: 0.3.0, 0.6.0
Reporter: Edward Capriolo
 Fix For: 0.4.2


 The WEB UI requires
 HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/ant/lib/ant.jar
 HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:/opt/ant/lib/ant-launcher.jar
 Right now the start script does this.
 {noformat}
  #hwi requires ant jars
  # if [ $ANT_LIB =  ] ; then
  #   ANT_LIB=/opt/ant/libs
  # fi
  # for f in ${ANT_LIB}/*.jar; do
  #   if [[ ! -f $f ]]; then
  # continue;
  #   fi
  #   HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:$f
  # done
 {noformat}
 Can we add these jars? This will add 1.4 MB to the hive. If we do not want to 
 add these I would like to make the startup script fail if the environment 
 variable is not correct.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1326) RowContainer uses hard-coded '/tmp/' path for temporary files

2010-04-26 Thread Edward Capriolo (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12860929#action_12860929
 ] 

Edward Capriolo commented on HIVE-1326:
---

I am +1 on the concept.   Many times /tmp can be a ramdisk or mounted with some 
size restrictions. This type of bug an be very painful to track down when it 
happens.

 RowContainer uses hard-coded '/tmp/' path for temporary files
 -

 Key: HIVE-1326
 URL: https://issues.apache.org/jira/browse/HIVE-1326
 Project: Hadoop Hive
  Issue Type: Bug
 Environment: Hadoop 0.19.2 with Hive trunk.  We're using FreeBSD 7.0, 
 but that doesn't seem relevant.
Reporter: Michael Klatt
 Attachments: rowcontainer.patch


 In our production hadoop environment, the /tmp/ is actually pretty small, 
 and we encountered a problem when a query used the RowContainer class and 
 filled up the /tmp/ partition.  I tracked down the cause to the RowContainer 
 class putting temporary files in the '/tmp/' path instead of using the 
 configured Hadoop temporary path.  I've attached a patch to fix this.
 Here's the traceback:
 2010-04-25 12:05:05,120 INFO 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer: RowContainer created 
 temp file /tmp/hive-rowcontainer-1244151903/RowContainer7816.tmp
 2010-04-25 12:05:06,326 INFO ExecReducer: ExecReducer: processing 1000 
 rows: used memory = 385520312
 2010-04-25 12:05:08,513 INFO ExecReducer: ExecReducer: processing 1100 
 rows: used memory = 341780472
 2010-04-25 12:05:10,697 INFO ExecReducer: ExecReducer: processing 1200 
 rows: used memory = 301446768
 2010-04-25 12:05:12,837 INFO ExecReducer: ExecReducer: processing 1300 
 rows: used memory = 399208768
 2010-04-25 12:05:15,085 INFO ExecReducer: ExecReducer: processing 1400 
 rows: used memory = 364507216
 2010-04-25 12:05:17,260 INFO ExecReducer: ExecReducer: processing 1500 
 rows: used memory = 332907280
 2010-04-25 12:05:19,580 INFO ExecReducer: ExecReducer: processing 1600 
 rows: used memory = 298774096
 2010-04-25 12:05:21,629 INFO ExecReducer: ExecReducer: processing 1700 
 rows: used memory = 396505408
 2010-04-25 12:05:23,830 INFO ExecReducer: ExecReducer: processing 1800 
 rows: used memory = 362477288
 2010-04-25 12:05:25,914 INFO ExecReducer: ExecReducer: processing 1900 
 rows: used memory = 327229744
 2010-04-25 12:05:27,978 INFO ExecReducer: ExecReducer: processing 2000 
 rows: used memory = 296051904
 2010-04-25 12:05:28,155 FATAL ExecReducer: org.apache.hadoop.fs.FSError: 
 java.io.IOException: No space left on device
   at 
 org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
   at 
 java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
   at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at 
 org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:346)
   at 
 org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:150)
   at 
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
   at 
 org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
   at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
   at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
   at 
 org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
   at java.io.DataOutputStream.write(DataOutputStream.java:90)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1013)
   at 
 org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:977)
   at 
 org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat$1.write(HiveSequenceFileOutputFormat.java:70)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.spillBlock(RowContainer.java:343)
   at 
 org.apache.hadoop.hive.ql.exec.persistence.RowContainer.add(RowContainer.java:163)
   at 
 org.apache.hadoop.hive.ql.exec.JoinOperator.processOp(JoinOperator.java:118)
   at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:456)
   at 
 org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:244)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:436)
   at org.apache.hadoop.mapred.Child.main(Child.java:158)
 Caused by: java.io.IOException: No space left on device
   at java.io.FileOutputStream.writeBytes(Native Method)
   at java.io.FileOutputStream.write(FileOutputStream.java:260)
   at 
 

[jira] Created: (HIVE-1318) External Tables: Selecting a partition that does not exist produces errors

2010-04-22 Thread Edward Capriolo (JIRA)
External Tables: Selecting a partition that does not exist produces errors
--

 Key: HIVE-1318
 URL: https://issues.apache.org/jira/browse/HIVE-1318
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Edward Capriolo
 Attachments: partdoom.q

{noformat}
dfs -mkdir /tmp/a;
dfs -mkdir /tmp/a/b;
dfs -mkdir /tmp/a/c;
create external table abc( key string, val string  )
partitioned by (part int)
location '/tmp/a/';

alter table abc ADD PARTITION (part=1)  LOCATION 'b';
alter table abc ADD PARTITION (part=2)  LOCATION 'c';

select key from abc where part=1;
select key from abct where part=70;

{noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   >