[jira] Created: (HIVE-1696) Add delegation token support to metastore

2010-10-06 Thread Todd Lipcon (JIRA)
Add delegation token support to metastore
-

 Key: HIVE-1696
 URL: https://issues.apache.org/jira/browse/HIVE-1696
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Todd Lipcon


As discussed in HIVE-842, kerberos authentication is only sufficient for 
authentication of a hive user client to the metastore. There are other cases 
where thrift calls need to be authenticated when the caller is running in an 
environment without kerberos credentials. For example, an MR task running as 
part of a hive job may want to report statistics to the metastore, or a job may 
be running within the context of Oozie or Hive Server.

This JIRA is to implement support of delegation tokens for the metastore. The 
concept of a delegation token is borrowed from the Hadoop security design - the 
quick summary is that a kerberos-authenticated client may retrieve a binary 
token from the server. This token can then be passed to other clients which can 
use it to achieve authentication as the original user in lieu of a kerberos 
ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1696) Add delegation token support to metastore

2010-10-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918615#action_12918615
 ] 

Todd Lipcon commented on HIVE-1696:
---

A few of us had a phone call this morning. We briefly discussed a design for 
this, summarized below:

- The metastore should make use of the delegation token facilities in Hadoop 
Common. The classes in Common are already generic since they're used by both MR 
and HDFS for their delegation token types.
- The metastore needs to keep track of active delegation tokens across restarts 
- it probably makes sense to use the existing DB backing store for this.
- The metastore thrift API will need a new call, something like: {{binary 
getDelegationToken(1: string renewer)}} which returns the opaque token.
- We'll need to make some changes to HadoopThriftAuthBridge from HIVE-842 in 
order to support using a delegation token over SASL.

In terms of the use cases above, here are some thoughts on how the delegation 
tokens will be used:

h3. MR tasks reporting statistics

When a hive job is submitted, it will first obtain a DT from the hive 
metastore. This DT will be passed with the job, either as a private 
distributedcache file, or maybe base64-encoded in the jobconf itself. The MR 
tasks themselves will then load the token into the UGI before making calls. 
This is basically the pattern that normal hadoop MR jobs use to access HDFS 
from within a task.

h3. Oozie or Hive Server jobs

Before Oozie or Hive Server forks the child process which actually runs the 
job, it will need to obtain a delegation token from the metastore on behalf of 
the user running the job. It will then provide this to the child process using 
an environment variable or configuration property. In this case, Oozie or the 
Hive Server needs to be configured as a proxy superuser on the metastore - ie 
the oozie/_HOST or hiveserver/_HOST principal is allowed to impersonate other 
users in order to grab delegation tokens for them.

 Add delegation token support to metastore
 -

 Key: HIVE-1696
 URL: https://issues.apache.org/jira/browse/HIVE-1696
 Project: Hadoop Hive
  Issue Type: Sub-task
  Components: Metastore
Reporter: Todd Lipcon

 As discussed in HIVE-842, kerberos authentication is only sufficient for 
 authentication of a hive user client to the metastore. There are other cases 
 where thrift calls need to be authenticated when the caller is running in an 
 environment without kerberos credentials. For example, an MR task running as 
 part of a hive job may want to report statistics to the metastore, or a job 
 may be running within the context of Oozie or Hive Server.
 This JIRA is to implement support of delegation tokens for the metastore. The 
 concept of a delegation token is borrowed from the Hadoop security design - 
 the quick summary is that a kerberos-authenticated client may retrieve a 
 binary token from the server. This token can then be passed to other clients 
 which can use it to achieve authentication as the original user in lieu of a 
 kerberos ticket.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-10-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918745#action_12918745
 ] 

Todd Lipcon commented on HIVE-842:
--

Hey Pradeep. It sounds like it might be - I haven't seen that error before, but 
I also have only been testing with actual service principals (ie principals of 
the type metastore/hostname). 

You can try running both sides with 
HADOOP_OPTS=-Dsun.security.krb5.debug=true and it should give you some extra 
details.

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: hive-842.txt, HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-10-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918076#action_12918076
 ] 

Todd Lipcon commented on HIVE-842:
--

Hey Pradeep. You also need HIVE-1526 which updates Hive to use Thrift 0.4.0.

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: hive-842.txt, HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-10-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918239#action_12918239
 ] 

Todd Lipcon commented on HIVE-842:
--

Seems like the patch that updates Thrift has fallen out of date with trunk. 
I'll try to regenerate it ASAP. You can probably fix the above issues by (a) 
importing StageType in MapRedTask, and (b) replacing StatsTask.getType's return 
with the StageType enum. (the new version of Thrift uses java enums instead of 
ints to represent thrift enums)

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: hive-842.txt, HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1264) Make Hive work with Hadoop security

2010-09-30 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1264:
--

Attachment: hive-1264.txt

Good catch. This patch updates the build.xml for hbase-handler to include the 
hadoop test jar.

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Jeff Hammerbacher
Assignee: Todd Lipcon
 Attachments: hive-1264-fb-mirror.txt, hive-1264.txt, hive-1264.txt, 
 HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-30 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916687#action_12916687
 ] 

Todd Lipcon commented on HIVE-842:
--

bq.  should there be an option whereby the metastore uses a keytab to 
authenticate to HDFS, but doesn't require users to authenticate to it?
bq. Wouldn't this leave a hole as it currently exists?

Yea - I think the use case is that you may have some old Thrift clients that 
haven't yet been updated to work with the SASL implementation (eg PHP). For 
those clients, perhaps you can provide security based on firewall rules, etc. 
But you would still like to run Hive on top of a secured HDFS.

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-842) Authentication Infrastructure for Hive

2010-09-30 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-842:
-

Attachment: hive-842.txt

Here's a preview patch of this work. A few notes:
- This checks in a bunch of Thrift classes that are in Thrift trunk. Thrift is 
currently in rc phase for an 0.5.0 release, so we can ditch these thrift 
classes out of Hive as soon as that's out (probably before this patch is even 
ready for commit)
- There are still some javadocs that could be improved a little bit.
- There's currently not any integration into the guts of Hive - we simply 
assume the calling user's identity as soon as the RPC is received. I think 
that's OK for the scope of this patch, as discussed above.

There's a bit of a lurking bug, I believe, due to HADOOP-6982, but it's 
shouldn't be major.

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: hive-842.txt, HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HIVE-842:


Assignee: Todd Lipcon

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913439#action_12913439
 ] 

Todd Lipcon commented on HIVE-842:
--

As discussed at the last contributor meeting, I am working on authenticating 
access to the metastore by kerberizing the Thrift interface.

Plan is currently:
1) Update the version of Thrift in Hive to 0.4.0
2) Temporarily check in the SASL support from Thrift trunk (this will be in 
0.5.0 release, due out in October some time)
3) Build a bridge between Thrift's SASL support and Hadoop's 
UserGroupInformation classes. Thus, if a user has a current UGI on the client 
side, it will get propagated to the JAAS context on the handler side.
4) In places where the metastore accesses the file system, use the proxy user 
functionality to act on behalf of the authenticated user.
5) When we detect that we are running on secure hadoop with security enabled, 
enable the above functionality.

I'd like to attack the Hive Web UI separately.

One open question:
- Do Hive *tasks* ever need to authenticate to the metastore? If so, we will 
have to build a delegation token system into Hive.

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913691#action_12913691
 ] 

Todd Lipcon commented on HIVE-842:
--

OK. The code in Hadoop Common is somewhat reusable for this, so it shouldn't be 
too hard to implement. If I recall correctly, though, the delegation tokens 
rely on a secret key that the master daemon periodically rotates. We need to 
add some kind of persistent token storage for this to work - I guess in the 
metastore's DB?

To make this easier to review, I'd like to do the straight kerberos first, and 
then add delegation tokens in a second patch/JIRA. Sound good?

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive

2010-09-22 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913787#action_12913787
 ] 

Todd Lipcon commented on HIVE-842:
--

I don't anticipate breaking the web UI (or anything) on non-secure Hadoop 
versions. But it will probably be insecure to run the web UI, which currently 
trusts users to say who they want to be - i.e I don't plan in the short term to 
integrate an auth layer for the web UI itself.

 Authentication Infrastructure for Hive
 --

 Key: HIVE-842
 URL: https://issues.apache.org/jira/browse/HIVE-842
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Server Infrastructure
Reporter: Edward Capriolo
Assignee: Todd Lipcon
 Attachments: HiveSecurityThoughts.pdf


 This issue deals with the authentication (user name,password) infrastructure. 
 Not the authorization components that specify what a user should be able to 
 do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-22 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1526:
--

Attachment: hive-1526.txt
libthrift.jar
libfb303.jar

Here is a patch along with the newly built jars from Thrift 0.4.0.

I agree that long term we should make codegen part of the build, but I think 
it's enough of a hassle to require everyone to install the same version of 
thrift, we should punt for now.

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Todd Lipcon
 Attachments: hive-1526.txt, libfb303.jar, libthrift.jar


 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-20 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HIVE-1526:
-

Assignee: Todd Lipcon

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Todd Lipcon

 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift

2010-09-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912768#action_12912768
 ] 

Todd Lipcon commented on HIVE-1526:
---

Er, sorry, not THRIFT-381, but rather THRIFT-907. Too many browser tabs!

 Hive should depend on a release version of Thrift
 -

 Key: HIVE-1526
 URL: https://issues.apache.org/jira/browse/HIVE-1526
 Project: Hadoop Hive
  Issue Type: Task
  Components: Build Infrastructure
Reporter: Carl Steinbach
Assignee: Todd Lipcon

 Hive should depend on a release version of Thrift, and ideally it should use 
 Ivy to resolve this dependency.
 The Thrift folks are working on adding Thrift artifacts to a maven repository 
 here: https://issues.apache.org/jira/browse/THRIFT-363

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1264) Make Hive work with Hadoop security

2010-09-16 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1264:
--

   Status: Patch Available  (was: Open)
Affects Version/s: 0.7.0

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Jeff Hammerbacher
Assignee: Todd Lipcon
 Attachments: hive-1264.txt, HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1264) Make Hive work with Hadoop security

2010-09-16 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910373#action_12910373
 ] 

Todd Lipcon commented on HIVE-1264:
---

Submitted to RB: https://review.cloudera.org/r/860/

Regarding the snapshot - it's fine by me to pull from there, I think the 
people.apache.org web server is reasonably stable. If it turns out to be flaky 
it's also cool if you want to mirror it - FB is probably more reliable than ASF.

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Jeff Hammerbacher
Assignee: Todd Lipcon
 Attachments: hive-1264.txt, HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4

2010-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1628:
--

Status: Patch Available  (was: Open)

 Fix Base64TextInputFormat to be compatible with commons codec 1.4
 -

 Key: HIVE-1628
 URL: https://issues.apache.org/jira/browse/HIVE-1628
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.6.0, 0.7.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hive-1628-0.5.txt, hive-1628.txt


 Commons-codec 1.4 made an incompatible change to the Base64 class that made 
 line-wrapping default (boo!). This breaks the Base64TextInputFormat in 
 contrib. This patch adds some simple reflection to use the new constructor 
 that uses the old behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4

2010-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1628:
--

Attachment: hive-1628.txt
hive-1628-0.5.txt

 Fix Base64TextInputFormat to be compatible with commons codec 1.4
 -

 Key: HIVE-1628
 URL: https://issues.apache.org/jira/browse/HIVE-1628
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.6.0, 0.7.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hive-1628-0.5.txt, hive-1628.txt


 Commons-codec 1.4 made an incompatible change to the Base64 class that made 
 line-wrapping default (boo!). This breaks the Base64TextInputFormat in 
 contrib. This patch adds some simple reflection to use the new constructor 
 that uses the old behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4

2010-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1628:
--

Status: Open  (was: Patch Available)

Oops, I just noticed I posted the wrong patch! sorry, one sec...

 Fix Base64TextInputFormat to be compatible with commons codec 1.4
 -

 Key: HIVE-1628
 URL: https://issues.apache.org/jira/browse/HIVE-1628
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.6.0, 0.7.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hive-1628-0.5.txt, hive-1628.txt


 Commons-codec 1.4 made an incompatible change to the Base64 class that made 
 line-wrapping default (boo!). This breaks the Base64TextInputFormat in 
 contrib. This patch adds some simple reflection to use the new constructor 
 that uses the old behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4

2010-09-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1628:
--

Status: Patch Available  (was: Open)

 Fix Base64TextInputFormat to be compatible with commons codec 1.4
 -

 Key: HIVE-1628
 URL: https://issues.apache.org/jira/browse/HIVE-1628
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Contrib
Affects Versions: 0.6.0, 0.7.0
Reporter: Todd Lipcon
Assignee: Todd Lipcon
 Attachments: hive-1628-0.5.txt, hive-1628-0.5.txt, hive-1628.txt, 
 hive-1628.txt


 Commons-codec 1.4 made an incompatible change to the Base64 class that made 
 line-wrapping default (boo!). This breaks the Base64TextInputFormat in 
 contrib. This patch adds some simple reflection to use the new constructor 
 that uses the old behavior.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-1264) Make Hive work with Hadoop security

2010-09-01 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-1264:
--

Attachment: hive-1264.txt

Here's a patch against trunk which adds shims for secure hadoop.

Since there hasn't been a public tarball release of secure hadoop quite yet, 
I've pointed it at a snapshot of CDH3b3 (not yet released) from my apache.org 
web directory.

I haven't run the unit test suite against secure hadoop yet, but I did a very 
brief test on a secure cluster by creating a table and running a simple MR 
query.

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Jeff Hammerbacher
Assignee: Todd Lipcon
 Attachments: hive-1264.txt, HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1264) Make Hive work with Hadoop security

2010-09-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905156#action_12905156
 ] 

Todd Lipcon commented on HIVE-1264:
---

(btw, if you want to test this against a different tarball, you can use 
{{-Dhadoop.security.url=http://url/of/your/tarball 
-Dhadoop.security.version=0.20.104 or whatever.}})

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Jeff Hammerbacher
Assignee: Todd Lipcon
 Attachments: hive-1264.txt, HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1476) Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc.

2010-09-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905299#action_12905299
 ] 

Todd Lipcon commented on HIVE-1476:
---

In the absence of making the metastore truly a metadata-only service, it seems 
like what we really want is for the metastore to act on behalf of users.

Could we have the hive client fetch an HDFS delegation token and pass it 
securely to the metastore, so the metastore can act as the user to perform the 
operations?
Alternatively, could the metastore be set up with an HDFS proxy user principal 
that allows it to impersonate anyone in a hive group?

Although we don't have true authorization, etc, at the moment, in Hive, we 
should think about how to solve this in a way that at least moves us closer to 
that goal.

 Hive's metastore when run as a thrift service creates directories as the 
 service user instead of the real user issuing create table/alter table etc.
 

 Key: HIVE-1476
 URL: https://issues.apache.org/jira/browse/HIVE-1476
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
 Attachments: HIVE-1476.patch, HIVE-1476.patch.2


 If the thrift metastore service is running as the user hive then all table 
 directories as a result of create table are created as that user rather than 
 the user who actually issued the create table command. This is different 
 semantically from non-thrift mode (i.e. local mode) when clients directly 
 connect to the metastore. In the latter case, directories are created as the 
 real user. The thrift mode should do the same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1476) Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc.

2010-09-01 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905310#action_12905310
 ] 

Todd Lipcon commented on HIVE-1476:
---

BTW, we are working on a SASL-secured Thrift transport over at THRIFT-876

 Hive's metastore when run as a thrift service creates directories as the 
 service user instead of the real user issuing create table/alter table etc.
 

 Key: HIVE-1476
 URL: https://issues.apache.org/jira/browse/HIVE-1476
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.6.0, 0.7.0
Reporter: Pradeep Kamath
 Attachments: HIVE-1476.patch, HIVE-1476.patch.2


 If the thrift metastore service is running as the user hive then all table 
 directories as a result of create table are created as that user rather than 
 the user who actually issued the create table command. This is different 
 semantically from non-thrift mode (i.e. local mode) when clients directly 
 connect to the metastore. In the latter case, directories are created as the 
 real user. The thrift mode should do the same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HIVE-1264) Make Hive work with Hadoop security

2010-08-31 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon reassigned HIVE-1264:
-

Assignee: Todd Lipcon  (was: Venkatesh S)

 Make Hive work with Hadoop security
 ---

 Key: HIVE-1264
 URL: https://issues.apache.org/jira/browse/HIVE-1264
 Project: Hadoop Hive
  Issue Type: Improvement
Reporter: Jeff Hammerbacher
Assignee: Todd Lipcon
 Attachments: HiveHadoop20S_patch.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HIVE-789) Add get_unique_id() call to metastore

2010-04-12 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon resolved HIVE-789.
--

Resolution: Won't Fix

Resolving as wontfix - this was a proposed part of a solution, but no longer 
applicable.

 Add get_unique_id() call to metastore
 -

 Key: HIVE-789
 URL: https://issues.apache.org/jira/browse/HIVE-789
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Todd Lipcon
 Attachments: hive-789.patch


 As noted in HIVE-718, it can be tough to avoid race conditions when multiple 
 clients are trying to move files into the same directory. This patch adds a 
 get_unique_id() call to the metastore that returns the current value from an 
 incrementing JDO Sequence so that clients can avoid some races without locks.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-25 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838735#action_12838735
 ] 

Todd Lipcon commented on HIVE-259:
--

Doesn't the autoboxing of Integer types actually allocate objects? I think JVM 
only flyweights integers for very small ones (iirc only from -127 to 128)

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, 
 jb2.txt, Percentile.xlsx


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2010-02-10 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832139#action_12832139
 ] 

Todd Lipcon commented on HIVE-259:
--

Agreed re HashMap. Also, there should be some kind of setting that limits how 
much RAM gets used up. In a later iteration we could do adaptive histogramming 
once we hit the limit. In this version we should just throw up our hands and 
fail with a message that says the user needs to discretize harder.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer
Assignee: Jerome Boulon
 Attachments: HIVE-259.patch


 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1128) Let max/min handle complex types like struct

2010-02-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829311#action_12829311
 ] 

Todd Lipcon commented on HIVE-1128:
---

This is clever, but I'd be surprised if a lot of non-programmer users would 
come up with this on their own. Would it be helpful to also provide argmin and 
argmax functions? The statistics community would probably appreciate the 
syntactic sugar.

 Let max/min handle complex types like struct
 

 Key: HIVE-1128
 URL: https://issues.apache.org/jira/browse/HIVE-1128
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao

 A lot of users are interested in doing arg_min and arg_max. Basically, 
 return the value of some other columns when one column's value is the max 
 value.
 The following is an example usage when this is done:
 {code}
 SELECT department, max(struct(salary, employee_name))
 FROM compensations;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1128) Let max/min handle complex types like struct

2010-02-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829338#action_12829338
 ] 

Todd Lipcon commented on HIVE-1128:
---

yep, my suggestion is definitely a separate feature that is orthogonal to this 
JIRA.

 Let max/min handle complex types like struct
 

 Key: HIVE-1128
 URL: https://issues.apache.org/jira/browse/HIVE-1128
 Project: Hadoop Hive
  Issue Type: Improvement
Affects Versions: 0.6.0
Reporter: Zheng Shao
Assignee: Zheng Shao
 Attachments: HIVE-1128.1.sh, HIVE-1128.2.patch


 A lot of users are interested in doing arg_min and arg_max. Basically, 
 return the value of some other columns when one column's value is the max 
 value.
 The following is an example usage when this is done:
 {code}
 SELECT department, max(struct(salary, employee_name))
 FROM compensations;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1015) Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts

2010-01-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796812#action_12796812
 ] 

Todd Lipcon commented on HIVE-1015:
---

I think it's very slightly different:
- UDF - only a 1:1 mapping on a single column
- UDAF - requires implementation of Combiner-like functionality, best I can 
tell (haven't delved into this deeply, so apologies if you can do a 
reducer-only UDAF)
- UDTF - perhaps supports the same functionality, but the syntax is a little 
less obvious than the MAP/REDUCE syntax. I think this feature could be 
implemented by an AST transform and some kind of interface-changing wrapper 
class for UDTF that makes it look more like the usual MR API.

BTW, these thoughts definitely shouldn't block progress on this JIRA. I just 
wanted to throw the idea out there.

 Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts
 ---

 Key: HIVE-1015
 URL: https://issues.apache.org/jira/browse/HIVE-1015
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Contrib
Reporter: Carl Steinbach
 Attachments: HIVE-1015.patch


 Larry Ogrodnek has written a set of wrapper classes that make it possible
 to write Hive TRANSFORM/MAP/REDUCE scripts in Java in a style that
 more closely resembles conventional Hadoop MR programs.
 A blog post describing this library can be found here: 
 http://dev.bizo.com/2009/10/hive-map-reduce-in-java.html
 The source code (with Apache license) is available here: 
 http://github.com/ogrodnek/shmrj
 We should add this to contrib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-1015) Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts

2010-01-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796559#action_12796559
 ] 

Todd Lipcon commented on HIVE-1015:
---

Related thought: would be nice to be able to write MAP/REDUCE as straight Java 
without having the overhead of streaming and serde. Is there a ticket already 
for this?

 Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts
 ---

 Key: HIVE-1015
 URL: https://issues.apache.org/jira/browse/HIVE-1015
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Contrib
Reporter: Carl Steinbach
 Attachments: HIVE-1015.patch


 Larry Ogrodnek has written a set of wrapper classes that make it possible
 to write Hive TRANSFORM/MAP/REDUCE scripts in Java in a style that
 more closely resembles conventional Hadoop MR programs.
 A blog post describing this library can be found here: 
 http://dev.bizo.com/2009/10/hive-map-reduce-in-java.html
 The source code (with Apache license) is available here: 
 http://github.com/ogrodnek/shmrj
 We should add this to contrib.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-837) virtual column support (filename) in hive

2010-01-02 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795904#action_12795904
 ] 

Todd Lipcon commented on HIVE-837:
--

bq. in that case they are really interested in the actual filename as opposed 
to the directory name. 

+1. I'm currently working with a 200G dataset that has lots of rows that Hive 
is interpreting as NULL. As far as I knew, there are no NULLs in the dataset to 
begin with, so I'd love to do: SELECT FILENAME(), FILEOFFSET() FROM t WHERE 
some_col IS NULL;


 virtual column support (filename) in hive
 -

 Key: HIVE-837
 URL: https://issues.apache.org/jira/browse/HIVE-837
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Namit Jain

 Copying from some mails:
 I am dumping files into a hive partion on five minute intervals. I am using 
 LOAD DATA into a partition.
 weblogs
 web1.00
 web1.05
 web1.10
 ...
 web2.00
 web2.05
 web1.10
 
 Things that would be useful..
 Select files from the folder with a regex or exact name
 select * FROM logs where FILENAME LIKE(WEB1*)
 select * FROM LOGS WHERE FILENAME=web2.00
 Also it would be nice to be able to select offsets in a file, this would make 
 sense with appends
 select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=]
 select  
 substr(filename, 4, 7) as  class_A, 
 substr(filename,  8, 10) as class_B
 count( x ) as cnt
 from FOO
 group by
 substr(filename, 4, 7), 
 substr(filename,  8, 10) ;
 Hive should support virtual columns

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-789) Add get_unique_id() call to metastore

2009-12-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792249#action_12792249
 ] 

Todd Lipcon commented on HIVE-789:
--

I'd be fine if you closed this as wontfix - it was depended on by some stuff a 
while ago, but if you have better solutions, I'm all for them :)

 Add get_unique_id() call to metastore
 -

 Key: HIVE-789
 URL: https://issues.apache.org/jira/browse/HIVE-789
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Todd Lipcon
 Attachments: hive-789.patch


 As noted in HIVE-718, it can be tough to avoid race conditions when multiple 
 clients are trying to move files into the same directory. This patch adds a 
 get_unique_id() call to the metastore that returns the current value from an 
 incrementing JDO Sequence so that clients can avoid some races without locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function

2009-11-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782120#action_12782120
 ] 

Todd Lipcon commented on HIVE-259:
--

An easy way to do this that would work for a ton of data sets would to be 
essentially do counting sort. If you have only a few thousand distinct values 
in the column to be analyzed, just make a hashtable, count up how many you see, 
and then in the single reducer use the histogram to figure out the percentile. 
This should work great for datasets like age, and even for sets like number of 
days since user signed up. For sets that are truly continuous, would be useful 
when combined with a binning UDF to discretize it.

Sadly it's not general case, but would be an easy first step.

 Add PERCENTILE aggregate function
 -

 Key: HIVE-259
 URL: https://issues.apache.org/jira/browse/HIVE-259
 Project: Hadoop Hive
  Issue Type: New Feature
  Components: Query Processor
Reporter: Venky Iyer

 Compute atleast 25, 50, 75th percentiles

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-15 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755644#action_12755644
 ] 

Todd Lipcon commented on HIVE-718:
--

+1 lgtm. Thanks for getting this in, guys!

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
Assignee: Namit Jain
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt, 
 hive.718.1.patch


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-14 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755177#action_12755177
 ] 

Todd Lipcon commented on HIVE-718:
--

Namit: that error is tracked by HIVE-307 marked linked above. I think it's OK 
behavior for 0.4 - we should verify, though, that the file that fails to load 
remains safely in place.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-12 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754632#action_12754632
 ] 

Todd Lipcon commented on HIVE-718:
--

Even in the case of LOAD...OVERWRITE, we currently lack atomicity. The old 
directory is deleted prior to the new directory being moved in. There's a small 
window of time in which neither old nor new data is present. Sure, this is 
probably on the order of a half second, but it is still not correct. There's 
also the general case of a query which has already computed input splits being 
affected by a concurrent LOAD DATA OVERWRITE. The versioning solution I think 
gets us partly there, but will be very tricky to implement correctly while 
maintaining performance since there's no way to do a copy-on-write directory 
snapshot built in to HDFS. So, I think a significant amount of work will have 
to be done in the metastore and we'll definitely have to drop the external 
process load ability that currently exists.

Should we open a new JIRA for the general concurrency control/locking issues 
we're discussing here? It seems like this ticket should be used for the 
0.3-0.4 regression, even if it's just a temporary fix. We can then put a more 
general correct solution on the roadmap for 0.5 or later, since it's looking 
like it will be a complicated project.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-11 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754266#action_12754266
 ] 

Todd Lipcon commented on HIVE-718:
--

Not sure if people are already following the discussion on HADOOP-6240, but 
it's worth checking out -- discussions regarding rename() semantics on HDFS.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-09-11 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754408#action_12754408
 ] 

Todd Lipcon commented on HIVE-718:
--

Namit: here's a trace from a session on hive 0.3.0:

{noformat}
t...@todd-laptop:~$ cat /tmp/insert.txt 
a
b
c
d
t...@todd-laptop:~$ cat /tmp/insert2.txt 
e
f
g
h
t...@todd-laptop:~$ hive
Hive history file=/tmp/todd/hive_job_log_todd_200909111603_978288634.txt
hive create table tmp_insert_test_p (value string) partitioned by (ds string);
OK
Time taken: 3.865 seconds
hive load data local inpath '/tmp/insert.txt' into table tmp_insert_test_p 
partition (ds = '2009-08-01');
Copying data from file:/tmp/insert.txt
Loading data to table tmp_insert_test_p partition {ds=2009-08-01}
OK
Time taken: 0.672 seconds
hive select * from tmp_insert_test_p where ds = '2009-08-01';
OK
a   2009-08-01
b   2009-08-01
c   2009-08-01
d   2009-08-01
Time taken: 0.374 seconds
hive load data local inpath '/tmp/insert2.txt' into table tmp_insert_test_p 
partition (ds = '2009-08-01');
Copying data from file:/tmp/insert2.txt
Loading data to table tmp_insert_test_p partition {ds=2009-08-01}
OK
Time taken: 0.261 seconds
hive select * from tmp_insert_test_p where ds = '2009-08-01';
OK
a   2009-08-01
b   2009-08-01
c   2009-08-01
d   2009-08-01
e   2009-08-01
f   2009-08-01
g   2009-08-01
h   2009-08-01
Time taken: 0.14 seconds
{noformat}

The same session fails on the 0.4 branch:

{noformat}
hive create table tmp_insert_test_p (value string) partitioned by (ds string);
OK
Time taken: 0.068 seconds
hive load data local inpath '/tmp/insert.txt' into table tmp_insert_test_p 
partition (ds = '2009-08-01');
Copying data from file:/tmp/insert.txt
Loading data to table tmp_insert_test_p partition {ds=2009-08-01}
OK
Time taken: 0.315 seconds
hive select * from tmp_insert_test_p where ds = '2009-08-01';
OK
Time taken: 0.523 seconds
{noformat}

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-726) Make ant package -Dhadoop.version=0.17.0 work

2009-09-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751170#action_12751170
 ] 

Todd Lipcon commented on HIVE-726:
--

Yea, I think this is a nice-to-have but not complete blocker. Its value is that 
you can ensure that you are still compatible with future releases. Since we 
didn't shim the entirety of Hadoop, it's possible to compile against the 
default fine, but still break at runtime if you're relying on an API that 
doesn't exist in other versions. Occasionally compiling against all of the 
supported versions is a good sanity check - if possible we should probably 
set up Hudson tests that check against all of the supported Hadoop versions on 
a nightly basis.

 Make ant package -Dhadoop.version=0.17.0 work
 ---

 Key: HIVE-726
 URL: https://issues.apache.org/jira/browse/HIVE-726
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Zheng Shao
 Attachments: HIVE-726.1.patch, HIVE-726.2.patch, HIVE-726.3.patch, 
 hive-726.txt


 After HIVE-487, users will have to specify the versions as in shims/ivy.xml 
 to make ant package -Dhadoop.version=version.work.
 Currently it is only running fine with the following versions (from 
 shims/ivy.xml): 0.17.2.1, 0.18.3, 0.19.0, 0.20.0.
 We used to do ant package -Dhadoop.version=0.17.0 but it's not working any 
 more, although we can specify ant package -Dhadoop.version=0.17.2.1 and the 
 package will probably still work with 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-752) Encountered ClassNotFound exception when trying HWI server

2009-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748999#action_12748999
 ] 

Todd Lipcon commented on HIVE-752:
--

Hi Edward,

As I think I noticed in the original JIRA, I did not manually test HWI. I 
figured that any necessary tests would already exist in the form of JUnit tests 
- in my opinion every supported feature of Hive should be continuously tested 
by Hudson.

If you can add a test case that passes pre-HIVE-718 and fails now, I will 
happily fix it.

-Todd

 Encountered ClassNotFound exception when trying HWI server
 --

 Key: HIVE-752
 URL: https://issues.apache.org/jira/browse/HIVE-752
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
 Environment: Hadoop 0.18.3
Reporter: Venkat Ramachandran
Assignee: Edward Capriolo

 Encountered ClassNotFound exception (for class: 
 org.apache.jetty.hive.shims.Jetty18Shims) when trying to start HWI server on 
 Hadoop 18.
 It appears that the class ShimLoader 
 (org.apache.hadoop.hive.shims.ShimLoader) is referring to incorrect classes 
 as below:
 static {
 JETTY_SHIM_CLASSES.put(0.17, 
 org.apache.jetty.hive.shims.Jetty17Shims);
 JETTY_SHIM_CLASSES.put(0.18, 
 org.apache.jetty.hive.shims.Jetty18Shims);
 JETTY_SHIM_CLASSES.put(0.19, 
 org.apache.jetty.hive.shims.Jetty19Shims);
 JETTY_SHIM_CLASSES.put(0.20, 
 org.apache.jetty.hive.shims.Jetty20Shims);
   }
 however, I think it should be as below:
  static 
   {
 JETTY_SHIM_CLASSES.put(0.17, 
 org.apache.hadoop.hive.shims.Jetty17Shims);
 JETTY_SHIM_CLASSES.put(0.18, 
 org.apache.hadoop.hive.shims.Jetty18Shims);
 JETTY_SHIM_CLASSES.put(0.19, 
 org.apache.hadoop.hive.shims.Jetty19Shims);
 JETTY_SHIM_CLASSES.put(0.20, 
 org.apache.hadoop.hive.shims.Jetty20Shims);
   } 
 Wondering if anybody else encountered this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-752) Encountered ClassNotFound exception when trying HWI server

2009-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749000#action_12749000
 ] 

Todd Lipcon commented on HIVE-752:
--

Sorry, that last one should read HIVE-487

 Encountered ClassNotFound exception when trying HWI server
 --

 Key: HIVE-752
 URL: https://issues.apache.org/jira/browse/HIVE-752
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
 Environment: Hadoop 0.18.3
Reporter: Venkat Ramachandran
Assignee: Edward Capriolo

 Encountered ClassNotFound exception (for class: 
 org.apache.jetty.hive.shims.Jetty18Shims) when trying to start HWI server on 
 Hadoop 18.
 It appears that the class ShimLoader 
 (org.apache.hadoop.hive.shims.ShimLoader) is referring to incorrect classes 
 as below:
 static {
 JETTY_SHIM_CLASSES.put(0.17, 
 org.apache.jetty.hive.shims.Jetty17Shims);
 JETTY_SHIM_CLASSES.put(0.18, 
 org.apache.jetty.hive.shims.Jetty18Shims);
 JETTY_SHIM_CLASSES.put(0.19, 
 org.apache.jetty.hive.shims.Jetty19Shims);
 JETTY_SHIM_CLASSES.put(0.20, 
 org.apache.jetty.hive.shims.Jetty20Shims);
   }
 however, I think it should be as below:
  static 
   {
 JETTY_SHIM_CLASSES.put(0.17, 
 org.apache.hadoop.hive.shims.Jetty17Shims);
 JETTY_SHIM_CLASSES.put(0.18, 
 org.apache.hadoop.hive.shims.Jetty18Shims);
 JETTY_SHIM_CLASSES.put(0.19, 
 org.apache.hadoop.hive.shims.Jetty19Shims);
 JETTY_SHIM_CLASSES.put(0.20, 
 org.apache.hadoop.hive.shims.Jetty20Shims);
   } 
 Wondering if anybody else encountered this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-752) Encountered ClassNotFound exception when trying HWI server

2009-08-28 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749039#action_12749039
 ] 

Todd Lipcon commented on HIVE-752:
--

bq. Sure, every feature of Hive should be tested by hudson, but I do not think 
JUnit replaces real testing. I personally test everything on a live cluster 
because the world is full of gotchas. 

Unfortunately this increases the barrier to contribution significantly. As 
feature sets grow it becomes impossible to do this without a full time QA 
staff. So, for now, we need at least continuous smoke testing of all features, 
including HWI imho. When a release candidate is rolled it's time for the 
significant manual testing - asking for that on every commit is pretty 
difficult.

bq. I do think the shims should have some independent j-unit tests. Example 
what happens when someone trys to compile with 0.21.? I am hoping a unit test 
failure would be present in the shims.

The shims should be covered by virtue of their use elsewhere. I think in the 
original ticket I suggested that we get separate build running on Hudson that 
tests all of the unit tests against all of the supported Hadoop versions - I'm 
not sure who's in charge of the Hive Hudson, but if Cloudera can help get that 
set up we'd be happy to.

bq. I have much 0_20 angst. 1 because all the JMX stuff got renamed and all my 
cacti templates are broken  2. In general it seems like a lot changed and 
everyone is chasing after it.

+1 :) Also agreed that it is a blocker.

I'm getting on a plane in a couple hours for a week long vacation, but I'll try 
to sneak in some time to get this fixed. In the meantime, it would be great if 
you could write up a manual test plan for HWI. Its existence isn't even 
mentioned in README.txt, so it's difficult for new contributors to figure out 
what functionality they need to manually verify.

 Encountered ClassNotFound exception when trying HWI server
 --

 Key: HIVE-752
 URL: https://issues.apache.org/jira/browse/HIVE-752
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Clients
 Environment: Hadoop 0.18.3
Reporter: Venkat Ramachandran
Assignee: Edward Capriolo
Priority: Blocker
 Attachments: hive-752.diff


 Encountered ClassNotFound exception (for class: 
 org.apache.jetty.hive.shims.Jetty18Shims) when trying to start HWI server on 
 Hadoop 18.
 It appears that the class ShimLoader 
 (org.apache.hadoop.hive.shims.ShimLoader) is referring to incorrect classes 
 as below:
 static {
 JETTY_SHIM_CLASSES.put(0.17, 
 org.apache.jetty.hive.shims.Jetty17Shims);
 JETTY_SHIM_CLASSES.put(0.18, 
 org.apache.jetty.hive.shims.Jetty18Shims);
 JETTY_SHIM_CLASSES.put(0.19, 
 org.apache.jetty.hive.shims.Jetty19Shims);
 JETTY_SHIM_CLASSES.put(0.20, 
 org.apache.jetty.hive.shims.Jetty20Shims);
   }
 however, I think it should be as below:
  static 
   {
 JETTY_SHIM_CLASSES.put(0.17, 
 org.apache.hadoop.hive.shims.Jetty17Shims);
 JETTY_SHIM_CLASSES.put(0.18, 
 org.apache.hadoop.hive.shims.Jetty18Shims);
 JETTY_SHIM_CLASSES.put(0.19, 
 org.apache.hadoop.hive.shims.Jetty19Shims);
 JETTY_SHIM_CLASSES.put(0.20, 
 org.apache.hadoop.hive.shims.Jetty20Shims);
   } 
 Wondering if anybody else encountered this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-799) final.name should not include hadoop version

2009-08-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-799:
-

Attachment: hive-799.txt

Trivial patch attached

 final.name should not include hadoop version
 

 Key: HIVE-799
 URL: https://issues.apache.org/jira/browse/HIVE-799
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Affects Versions: 0.4.0
Reporter: Todd Lipcon
 Attachments: hive-799.txt


 Now that the shims allow a single build to work with any version of Hadoop, 
 we shouldn't include the hadoop version in the tarball name.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-800) Exclude .git directory from src/ directory in release tarball

2009-08-26 Thread Todd Lipcon (JIRA)
Exclude .git directory from src/ directory in release tarball
-

 Key: HIVE-800
 URL: https://issues.apache.org/jira/browse/HIVE-800
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Todd Lipcon
 Attachments: hive-800.txt

When creating the dist tarball, we should exclude .git from the src/ directory. 
We may also need to exclude some SVN directory?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-800) Exclude .git directory from src/ directory in release tarball

2009-08-26 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-800:
-

Attachment: hive-800.txt

Trivial patch

 Exclude .git directory from src/ directory in release tarball
 -

 Key: HIVE-800
 URL: https://issues.apache.org/jira/browse/HIVE-800
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Todd Lipcon
 Attachments: hive-800.txt


 When creating the dist tarball, we should exclude .git from the src/ 
 directory. We may also need to exclude some SVN directory?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it

2009-08-26 Thread Todd Lipcon (JIRA)
Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
-

 Key: HIVE-802
 URL: https://issues.apache.org/jira/browse/HIVE-802
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Todd Lipcon


There's a bug in DataNucleus that causes this issue:

http://www.jpox.org/servlet/jira/browse/NUCCORE-371

To reproduce, simply put your hive source tree in a directory that contains a 
'+' character.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-25 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747602#action_12747602
 ] 

Todd Lipcon commented on HIVE-718:
--

I hadn't considered the case of an in-memory metastore. A mktemp-like method 
would be great, but o.a.h.FileSystem gives you nothing of the sort :(

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-25 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747668#action_12747668
 ] 

Todd Lipcon commented on HIVE-718:
--

Not sure how that actually helps - if we use an algorithm like:

{code}
for each file to be moved:
  while not successful:
come up with a random name
try to move src file to the random name
if it fails due to dst already existing, try again with a new random name
{code}

then we'd lose the atomicity/isolation - readers would see a partial load 
during the middle of the operation.

We can't use that algorithm with atomic directory renames, since Hadoop has the 
wacky behavior that move(srcdir, dstdir) will create dstdir/srcdir if 
dstdir already exists

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747107#action_12747107
 ] 

Todd Lipcon commented on HIVE-718:
--

Here's a proposal: How would you guys feel about an addition to the metastore 
API that is as simple as:

string get_unique_id()

The metastore would simply keep an autoincrement field to hand these out. We 
can then use these safely to allocate race-free names to clients. Should be 
straightforward and a lot more simple than any kind of lock/lease based 
mechanism.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HIVE-789) Add get_unique_id() call to metastore

2009-08-24 Thread Todd Lipcon (JIRA)
Add get_unique_id() call to metastore
-

 Key: HIVE-789
 URL: https://issues.apache.org/jira/browse/HIVE-789
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Todd Lipcon
 Fix For: 0.4.0


As noted in HIVE-718, it can be tough to avoid race conditions when multiple 
clients are trying to move files into the same directory. This patch adds a 
get_unique_id() call to the metastore that returns the current value from an 
incrementing JDO Sequence so that clients can avoid some races without locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-789) Add get_unique_id() call to metastore

2009-08-24 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-789:
-

Attachment: hive-789.patch

This patch implements get_unique_id.

One concern: do I need to do any kind of migration script for metastore 
databases? I don't know how metastore schema migrations are done from release 
to release.

 Add get_unique_id() call to metastore
 -

 Key: HIVE-789
 URL: https://issues.apache.org/jira/browse/HIVE-789
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Todd Lipcon
 Fix For: 0.4.0

 Attachments: hive-789.patch


 As noted in HIVE-718, it can be tough to avoid race conditions when multiple 
 clients are trying to move files into the same directory. This patch adds a 
 get_unique_id() call to the metastore that returns the current value from an 
 incrementing JDO Sequence so that clients can avoid some races without locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-789) Add get_unique_id() call to metastore

2009-08-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747131#action_12747131
 ] 

Todd Lipcon commented on HIVE-789:
--

Forgot to mention - this patch is against branch-0.4.0, though it shouldn't be 
hard to apply the same thing to trunk. Assuming this seems like a reasonable 
way to fix the races in HIVE-718 I'll port it to trunk.

 Add get_unique_id() call to metastore
 -

 Key: HIVE-789
 URL: https://issues.apache.org/jira/browse/HIVE-789
 Project: Hadoop Hive
  Issue Type: Improvement
  Components: Metastore
Reporter: Todd Lipcon
 Fix For: 0.4.0

 Attachments: hive-789.patch


 As noted in HIVE-718, it can be tough to avoid race conditions when multiple 
 clients are trying to move files into the same directory. This patch adds a 
 get_unique_id() call to the metastore that returns the current value from an 
 incrementing JDO Sequence so that clients can avoid some races without locks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-792) support add archive in addition to add files and add jars

2009-08-24 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747218#action_12747218
 ] 

Todd Lipcon commented on HIVE-792:
--

Note that a lot of time you want to add a jar to the classpath without 
unpacking it. In that case you want addFileToClassPath. Unpacking is 
reasonably slow, as is the recursive delete when you're done

 support add archive in addition to add files and add jars
 ---

 Key: HIVE-792
 URL: https://issues.apache.org/jira/browse/HIVE-792
 Project: Hadoop Hive
  Issue Type: New Feature
Reporter: Zheng Shao

 In JobClient.java, we have:
 {code}
 if (commandConf != null) {
   files = commandConf.get(tmpfiles);
   libjars = commandConf.get(tmpjars);
   archives = commandConf.get(tmparchives);
 }
 {code}
 The good thing about tmparchives is that TT will automatically unarchive the 
 files (because tmparchives goes through DistributeCache.addCacheArchive, 
 while TT won't do that for tmpfiles).
 We should have add archive which sets tmparchives.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745554#action_12745554
 ] 

Todd Lipcon commented on HIVE-718:
--

bq. my concern in this case is that, it is possible to corrupt the existing 
partition with only a part of new files and overwrite some of the old files and 
user has no way of knowing that such a thing has happened and it may not 
possible to recover the data.

Can you explain the order of events that causes this? I think even with the 
current patch the operation will not fail silently and should not cause 
unrecoverable loss.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744143#action_12744143
 ] 

Todd Lipcon commented on HIVE-718:
--

Prasad: this is true, but the atomicity issue was already there - we just 
discovered it while looking at this section of the code. My guess is that there 
are similar bugs elsewhere in Hive - there's no safe way to have multiple 
threads compete for a directory name, so really any movement of files has to be 
coordinated by the metastore to be safe. The hadoop-side issues are:

- mkdirs on an existing directory will return the exact same thing as mkdirs on 
one that didn't previously exist
- rename of one directory to a new name will not give an error if the new name 
exists - rather, it will move your directory inside that one
- checking if (not exists) { do something to a location } is obviously racy

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-10 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-718:
-

Affects Version/s: 0.4.0
   Status: Patch Available  (was: Open)

I think we should go ahead with my patch for now and then open another JIRA for 
fixing the atomicity issue.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-726) Make ant package -Dhadoop.version=0.17.0 work

2009-08-05 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739765#action_12739765
 ] 

Todd Lipcon commented on HIVE-726:
--

I think there is some confusion about how the shims work.

The point of the new shims package as done in HIVE-487 is that, on every build, 
the shims are build for all major versions of Hadoop. The versions in the 
ivy.xml there were picked as the most recent versions, but as Ashish said, we 
could use the .0 releases instead - it's somewhat arbitrary. After the shims 
have been built with each of the hadoop versions, it makes a hive_shims.jar 
which includes *all* of the implementations.

At this point, when Hadoop is built, it can be built against any version of 
Hadoop using the -Dhadoop.version flag. In actuality this should not matter - 
it may be that there is still some work yet to be done, but in my testing I 
build with the default hadoop.version (0.19.0) and then ran the build products 
against 18 and 20 with no recompile. The ShimLoader class determines the 
current hadoop version at runtime and loads the correct implementation class 
out of hive_shims.jar.

So, -1 on the patch, since it would result in a hive_shims.jar that only 
includes one version of the shims, and thus the build product would only work 
on that version of Hadoop.

 Make ant package -Dhadoop.version=0.17.0 work
 ---

 Key: HIVE-726
 URL: https://issues.apache.org/jira/browse/HIVE-726
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Zheng Shao
 Attachments: HIVE-726.1.patch, HIVE-726.2.patch


 After HIVE-487, users will have to specify the versions as in shims/ivy.xml 
 to make ant package -Dhadoop.version=version.work.
 Currently it is only running fine with the following versions (from 
 shims/ivy.xml): 0.17.2.1, 0.18.3, 0.19.0, 0.20.0.
 We used to do ant package -Dhadoop.version=0.17.0 but it's not working any 
 more, although we can specify ant package -Dhadoop.version=0.17.2.1 and the 
 package will probably still work with 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-726) Make ant package -Dhadoop.version=0.17.0 work

2009-08-05 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-726:
-

Attachment: hive-726.txt

I think there was still an issue which you've uncovered. ant's semantics for 
the param element inside antcall are somewhat screwy - you can't override 
something that has been passed on the command line. Since you guys were passing 
hadoop.version on the command line, it was actually compiling the shims for 17 
four times rather than compiling each version, as best I can tell.

Attaching a patch which gets around this by introducing a new ant property 
called hadoop.version.ant-internal. This property is set in build.properties 
to default to hadoop.version, and hadoop.version is left as is. Everywhere in 
the build files that used to reference hadoop.version now references 
hadoop.version.ant-internal. Since we're not specifying this on the command 
line, the antcalls inside shims/build.xml can properly override it.

One question: I think the condition in build.xml's eclipsefiles targets that 
defaults to 0.19 is now dead code, since the property itself is defaulted to 
0.19.0.

Also, can we change ${final.name} to simply ${name}-${version} since one build 
can work with any hadoop?

 Make ant package -Dhadoop.version=0.17.0 work
 ---

 Key: HIVE-726
 URL: https://issues.apache.org/jira/browse/HIVE-726
 Project: Hadoop Hive
  Issue Type: Bug
  Components: Build Infrastructure
Reporter: Zheng Shao
 Attachments: HIVE-726.1.patch, HIVE-726.2.patch, HIVE-726.3.patch, 
 hive-726.txt


 After HIVE-487, users will have to specify the versions as in shims/ivy.xml 
 to make ant package -Dhadoop.version=version.work.
 Currently it is only running fine with the following versions (from 
 shims/ivy.xml): 0.17.2.1, 0.18.3, 0.19.0, 0.20.0.
 We used to do ant package -Dhadoop.version=0.17.0 but it's not working any 
 more, although we can specify ant package -Dhadoop.version=0.17.2.1 and the 
 package will probably still work with 0.17.0.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738855#action_12738855
 ] 

Todd Lipcon commented on HIVE-487:
--

Normal patch -p0 ought to work:

t...@todd-laptop:~/cloudera/cdh/repos/hive$ patch -p0  
/tmp/hive-487-runtime.patch 
patching file ant/build.xml
patching file bin/ext/cli.sh
patching file build-common.xml
patching file build.xml
etc...

(from a clean trunk checkout)

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
Priority: Blocker
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487-runtime.patch, 
 hive-487-with-cli-changes.2.patch, hive-487-with-cli-changes.3.patch, 
 hive-487-with-cli-changes.patch, hive-487.3.patch, hive-487.4.patch, 
 HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, 
 junit-patch1.html, patch-487.txt


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739075#action_12739075
 ] 

Todd Lipcon commented on HIVE-718:
--

I can confirm this bug on trunk - it's a regression since 0.3.0.

However, Im not sure about the patch - if one of the later rename fails, we 
should undo the previous ones, but in this patch it looks like it's actually 
deleting the previous ones. Why not attempt to move it back to its original 
location?

Also, it seems worth it to add some LOG.error() statements in these cases.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739100#action_12739100
 ] 

Todd Lipcon commented on HIVE-718:
--

What if the partition already exists? In that case we couldn't rename the 
staging directory since the destination name would already exist, right? Or can 
we just make another subdirectory of the partition with some unique name?

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739109#action_12739109
 ] 

Todd Lipcon commented on HIVE-718:
--

OK. How do we come up with a unique name? Just use timestamp?

Does the metastore need to know this name somehow? Or is the change just 
localized to that method? (Also, should we rename that method? It's called 
copyFiles, but does nothing of the sort)

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739146#action_12739146
 ] 

Todd Lipcon commented on HIVE-718:
--

In looking through this code, I've found a few more issues:

- In isolation, it looks like copyFiles/replaceFiles are supposed to be able to 
handle a srcf like /foo/* with a directory layout like:

/foo/subdir1/part-0
/foo/subdir2/part-0

I'm assuming this because it first does fs.globStatus on srcf, and then for 
each of the results of the glob, it calls fs.listStatus (implying that they are 
directories).

However, given the example above, this would actually fail, since both files 
are named part-0 and the could would attempt to rename both to 
tmpdir/part-0.

- In fact, using the tmpdir like this is consistent from the view of an outside 
observer, but not atomic. If the renamer crashes in the middle of the 
operation, the files will have been moved out of the original location and into 
the tmpdir, but the tmpdir has not been renamed into the destination. Is this 
OK? I feel like the solution would be to make dstdir/_staging_timestamp, move 
the files one-by-one into there, and then rename _staging_timestamp to the 
destination. This way if there is a failure in the middle, the client can at 
least determine where their files went without looking through a temporary 
directory.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file

2009-08-04 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739170#action_12739170
 ] 

Todd Lipcon commented on HIVE-718:
--

bq. I think it's not acceptable for a failed insert to corrupt the original 
data of the table. 

then we definitely have to move an entire directory of files in at once - 
otherwise we can have an insert partially succeed

bq. We never have a table with sub directories (instead of files) inside. We 
will need some testing to make sure it actually works.

This is going to be a necessity to do non-overwrite loads into a 
table/partition, right?

bq. For unique name, maybe we can just prepend the job id.

This isn't always available (eg running LOAD DATA from the cli). I think we're 
stuck with java.util.UUID, as ugly as it may be.

I've spent the last hour or so trying to figure out any other way of generating 
a unique name inside a subdirectory. Because of the semantics of 
FileSystem.mkdirs and FileSystem.rename, I don't believe there's any way of 
doing this. mkdirs doesn't return false in the case that the directory already 
exists, and if you rename(src, dst), and dst already exists as a directory, it 
will move src *inside* of dst.

 Load data inpath into a new partition without overwrite does not move the file
 --

 Key: HIVE-718
 URL: https://issues.apache.org/jira/browse/HIVE-718
 Project: Hadoop Hive
  Issue Type: Bug
Reporter: Zheng Shao
 Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt


 The bug can be reproduced as following. Note that it only happens for 
 partitioned tables. The select after the first load returns nothing, while 
 the second returns the data correctly.
 insert.txt in the current local directory contains 3 lines: a, b and c.
 {code}
  create table tmp_insert_test (value string) stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test;
  select * from tmp_insert_test;
 a
 b
 c
  create table tmp_insert_test_p ( value string) partitioned by (ds string) 
  stored as textfile;
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
  load data local inpath 'insert.txt' into table tmp_insert_test_p partition 
  (ds = '2009-08-01');
  select * from tmp_insert_test_p where ds= '2009-08-01';
 a   2009-08-01
 b   2009-08-01
 d   2009-08-01
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-08-03 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-487:
-

Attachment: hive-487-with-cli-changes.3.patch

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
Priority: Blocker
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, 
 hive-487-with-cli-changes.2.patch, hive-487-with-cli-changes.3.patch, 
 hive-487-with-cli-changes.patch, hive-487.3.patch, hive-487.4.patch, 
 HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, 
 junit-patch1.html, patch-487.txt


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-08-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738596#action_12738596
 ] 

Todd Lipcon commented on HIVE-487:
--

The issue turned out to be that the shim classes weren't getting built into 
hive_exec.jar, which seems to include the built classes of many other of the 
components. I'm not entirely sure why this is designed like this (why not just 
have hive_exec.jar add the other jars to its own classloader at startup?) but 
including build/shims/classes in there fixed the tests. Attaching new patch 
momentarily

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
Priority: Blocker
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, 
 hive-487-with-cli-changes.2.patch, hive-487-with-cli-changes.3.patch, 
 hive-487-with-cli-changes.patch, hive-487.3.patch, hive-487.4.patch, 
 HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, 
 junit-patch1.html, patch-487.txt


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-07-30 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737236#action_12737236
 ] 

Todd Lipcon commented on HIVE-487:
--

Hi Ashish,

That does sound reasonable, though I will likely take it on in the short term, 
as we will be distributing packages for hadoop-0.18 and hadoop-0.20 until the 
majority of the community and our customers have transitioned over. During that 
time period we'd like to have a single hive package which will function with 
either. We can apply my work on top of the 0.4.0 release for our distribution, 
so it shouldn't block it, but I do think it would be nice if this feature were 
upstream in the Apache release.

I've got some time blocked off to work on this - if I get something working 
this week do you think it might be able to go into 0.4.0?

-Todd

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
Priority: Blocker
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, 
 hive-487.4.patch, HIVE-487.patch, hive-487.txt, hive-487.txt, 
 jetty-patch.patch, junit-patch1.html, patch-487.txt


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-07-27 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735789#action_12735789
 ] 

Todd Lipcon commented on HIVE-487:
--

Woops, sorry about that. Simply add an import for o.a.h.mapred.JobClient and it 
compiles. New patch in a second

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
Priority: Blocker
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, 
 hive-487.4.patch, HIVE-487.patch, hive-487.txt, jetty-patch.patch, 
 junit-patch1.html


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-07-27 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-487:
-

Attachment: hive-487.txt

Fixes the missing import. Now compiles with hadoop.version=0.17.0

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
Priority: Blocker
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, 
 hive-487.4.patch, HIVE-487.patch, hive-487.txt, hive-487.txt, 
 jetty-patch.patch, junit-patch1.html


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-07-24 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-487:
-

Attachment: hive-487.txt

Here's a patch which adds a project called shims with separate source 
directories for 0.17, 0.18, 0.19, and 0.20. Inside each there is an 
implementation of JettyShims and HadoopShims which encapsulate all of the 
version-dependent code. The build.xml is set in such a way that 
${hadoop.version} determines which one gets compiled.

This probably needs a bit more javadoc before it's commitable, but I think it's 
worth considering this approach over reflection.

Also, it seems like hadoop.version may be 0.18.0, 0.18.1, 0.18.2, etc. As long 
as it's kosher by Apache SVN standards, we should put a symlink for each of 
those versions in the shims/src/ directory pointing to 0.18, and same for the 
other minor releases. If symlinks aren't kosher, we need some way of parsing 
out the major version from within ant.

Not being a regular contributor, I don't have a good test environment set up, 
but I've verified that this at least builds in all of the above versions.

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, 
 hive-487.4.patch, HIVE-487.patch, hive-487.txt, jetty-patch.patch, 
 junit-patch1.html


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-07-23 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734749#action_12734749
 ] 

Todd Lipcon commented on HIVE-487:
--

A couple thoughts:

- Does the same compiled jar truly work in all versions of Hadoop between 0.17 
and 0.19? That is to say, can we consider an option in which we use some 
build.xml rules to, depending on the value of a hadoop.version variable, swap 
between two implementations of the same .java file (one compatible with Jetty 
5, one with Jetty 6)? Then in the build product we could simply include two 
jars and have the wrapper scripts swap between them based on version. If size 
is a concern, the variant classes could be put in their own jar that would only 
be a few KB.

- The reflection code in this patch is pretty messy. I mocked up an idea for a 
slightly cleaner way to do it, and will attach it as a tarball momentarily. The 
idea is to define our own interfaces which have the same methods as we need to 
use in Jetty, and use a dynamic proxy to forward those invocations through to 
the actual implementation class. Dynamically choosing between the two 
interfaces is simple at runtime by simply checking that the method signatures 
correspond. This is still dirty (and a bad role model for CS students ;-) ) but 
it should reduce the number of Class.forName and .getMethod calls in the 
wrapper class

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
 Fix For: 0.4.0

 Attachments: HIVE-487-2.patch, hive-487-jetty-2.diff, 
 hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, 
 jetty-patch.patch, junit-patch1.html


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-07-23 Thread Todd Lipcon (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated HIVE-487:
-

Attachment: dynamic-proxy.tar.gz

Here's a tarball showing the technique mentioned in the comment above. The 
script run.sh will compile and run the example once with v1 on the 
classpath, and a second time with v2 on the classpath. I'm not certain that 
this will cover all the cases that are needed for Jetty, but I figured I would 
throw it out there.

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, 
 hive-487.4.patch, HIVE-487.patch, jetty-patch.patch, junit-patch1.html


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0

2009-07-23 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734795#action_12734795
 ] 

Todd Lipcon commented on HIVE-487:
--

bq. @Todd - Where were you a few weeks ago? 

Chillin' over on the HADOOP jira ;-) We're gearing up for release of our 
distribution that includes Hadoop 0.20.0, so just started watching this one 
more carefully.

bq. The jars are upstream in Hadoop core. I did not look into this closely but 
the talk about 'Sealing exceptions' above led me to believe I should not try 
this.

Sorry, what I meant here is that the hive tarball would include 
lib/hive-0.4.0.jar, lib/jetty-shims/hive-jetty-shim-v6.jar and 
lib/jetty-shims/hive-jetty-shim-v5.jar

In those jars we'd have two different implementations of the shim. The hive 
wrapper script would then do something like:

{code}
HADOOP_JAR=$HADOOP_HOME/hadoop*core*jar
if [[ $HADOOP_JAR =~ 0.1[789] ]]; then
  JETTY_SHIM=lib/jetty-shims/jetty-shim-v5.jar
else
  JETTY_SHIM=lib/jetty-shims/jetty-shim-v6.jar
fi
CLASSPATH=$CLASSPATH:$JETTY_SHIM
{code}

To generate the shim jars at compile time, we'd compile two different 
JettyShim.java files - one against the v5 API, and one against the v6 API.

As for eclipse properly completing/warning for the right versions for the right 
files, I haven't the foggiest idea. But I am pretty sure it's not going to warn 
if your reflective calls are broken either ;-)

bq. My only concern is will the ant process cooperate?

I don't see why not - my example build here is just to show how it works in a 
self contained way. The stuff inside v1-classes and v2-classes in the example 
are the equivalent of the two jetty jar versions - we don't have to compile 
them. The only code that has to compile is DynamicProxy.java which is 
completely normal code.

bq. If you/we can tackle the ant/eclipse issues I would be happy to use the 
'Dynamic Proxy', but maybe we tackle it in a different Jira because this is a 
pretty big blocker and I am sure many people want to see this in the trunk. 

As for committing now and not worrying, that sounds pretty reasonable, as long 
as there's some kind of deprecation timeline set out. (e.g in Hive 0.5.0 we 
will drop support for versions of Hadoop that use Jetty v5 or whatever). As 
someone who isn't a major Hive contributor, I'll defer to you guys completely 
-- I just wanted to throw the idea up on the JIRA.

 Hive does not compile with Hadoop 0.20.0
 

 Key: HIVE-487
 URL: https://issues.apache.org/jira/browse/HIVE-487
 Project: Hadoop Hive
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Aaron Kimball
Assignee: Justin Lynn
 Fix For: 0.4.0

 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, 
 hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, 
 hive-487.4.patch, HIVE-487.patch, jetty-patch.patch, junit-patch1.html


 Attempting to compile Hive with Hadoop 0.20.0 fails:
 aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package
 (several lines elided)
 compile:
  [echo] Compiling: hive
 [javac] Compiling 261 source files to 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94:
  cannot find symbol
 [javac] symbol  : method getCommandLineConfig()
 [javac] location: class org.apache.hadoop.mapred.JobClient
 [javac]   Configuration commandConf = 
 JobClient.getCommandLineConfig();
 [javac]^
 [javac] 
 /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241:
  cannot find symbol
 [javac] symbol  : method validateInput(org.apache.hadoop.mapred.JobConf)
 [javac] location: interface org.apache.hadoop.mapred.InputFormat
 [javac]   inputFormat.validateInput(newjob);
 [javac]  ^
 [javac] Note: Some input files use or override a deprecated API.
 [javac] Note: Recompile with -Xlint:deprecation for details.
 [javac] Note: Some input files use unchecked or unsafe operations.
 [javac] Note: Recompile with -Xlint:unchecked for details.
 [javac] 2 errors
 BUILD FAILED
 /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error 
 occurred while executing this line:
 /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the 
 compiler error output for details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.