[jira] Created: (HIVE-1696) Add delegation token support to metastore
Add delegation token support to metastore - Key: HIVE-1696 URL: https://issues.apache.org/jira/browse/HIVE-1696 Project: Hadoop Hive Issue Type: Sub-task Components: Metastore Reporter: Todd Lipcon As discussed in HIVE-842, kerberos authentication is only sufficient for authentication of a hive user client to the metastore. There are other cases where thrift calls need to be authenticated when the caller is running in an environment without kerberos credentials. For example, an MR task running as part of a hive job may want to report statistics to the metastore, or a job may be running within the context of Oozie or Hive Server. This JIRA is to implement support of delegation tokens for the metastore. The concept of a delegation token is borrowed from the Hadoop security design - the quick summary is that a kerberos-authenticated client may retrieve a binary token from the server. This token can then be passed to other clients which can use it to achieve authentication as the original user in lieu of a kerberos ticket. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1696) Add delegation token support to metastore
[ https://issues.apache.org/jira/browse/HIVE-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918615#action_12918615 ] Todd Lipcon commented on HIVE-1696: --- A few of us had a phone call this morning. We briefly discussed a design for this, summarized below: - The metastore should make use of the delegation token facilities in Hadoop Common. The classes in Common are already generic since they're used by both MR and HDFS for their delegation token types. - The metastore needs to keep track of active delegation tokens across restarts - it probably makes sense to use the existing DB backing store for this. - The metastore thrift API will need a new call, something like: {{binary getDelegationToken(1: string renewer)}} which returns the opaque token. - We'll need to make some changes to HadoopThriftAuthBridge from HIVE-842 in order to support using a delegation token over SASL. In terms of the use cases above, here are some thoughts on how the delegation tokens will be used: h3. MR tasks reporting statistics When a hive job is submitted, it will first obtain a DT from the hive metastore. This DT will be passed with the job, either as a private distributedcache file, or maybe base64-encoded in the jobconf itself. The MR tasks themselves will then load the token into the UGI before making calls. This is basically the pattern that normal hadoop MR jobs use to access HDFS from within a task. h3. Oozie or Hive Server jobs Before Oozie or Hive Server forks the child process which actually runs the job, it will need to obtain a delegation token from the metastore on behalf of the user running the job. It will then provide this to the child process using an environment variable or configuration property. In this case, Oozie or the Hive Server needs to be configured as a proxy superuser on the metastore - ie the oozie/_HOST or hiveserver/_HOST principal is allowed to impersonate other users in order to grab delegation tokens for them. Add delegation token support to metastore - Key: HIVE-1696 URL: https://issues.apache.org/jira/browse/HIVE-1696 Project: Hadoop Hive Issue Type: Sub-task Components: Metastore Reporter: Todd Lipcon As discussed in HIVE-842, kerberos authentication is only sufficient for authentication of a hive user client to the metastore. There are other cases where thrift calls need to be authenticated when the caller is running in an environment without kerberos credentials. For example, an MR task running as part of a hive job may want to report statistics to the metastore, or a job may be running within the context of Oozie or Hive Server. This JIRA is to implement support of delegation tokens for the metastore. The concept of a delegation token is borrowed from the Hadoop security design - the quick summary is that a kerberos-authenticated client may retrieve a binary token from the server. This token can then be passed to other clients which can use it to achieve authentication as the original user in lieu of a kerberos ticket. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918745#action_12918745 ] Todd Lipcon commented on HIVE-842: -- Hey Pradeep. It sounds like it might be - I haven't seen that error before, but I also have only been testing with actual service principals (ie principals of the type metastore/hostname). You can try running both sides with HADOOP_OPTS=-Dsun.security.krb5.debug=true and it should give you some extra details. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: hive-842.txt, HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918076#action_12918076 ] Todd Lipcon commented on HIVE-842: -- Hey Pradeep. You also need HIVE-1526 which updates Hive to use Thrift 0.4.0. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: hive-842.txt, HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12918239#action_12918239 ] Todd Lipcon commented on HIVE-842: -- Seems like the patch that updates Thrift has fallen out of date with trunk. I'll try to regenerate it ASAP. You can probably fix the above issues by (a) importing StageType in MapRedTask, and (b) replacing StatsTask.getType's return with the StageType enum. (the new version of Thrift uses java enums instead of ints to represent thrift enums) Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: hive-842.txt, HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1264: -- Attachment: hive-1264.txt Good catch. This patch updates the build.xml for hbase-handler to include the hadoop test jar. Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Jeff Hammerbacher Assignee: Todd Lipcon Attachments: hive-1264-fb-mirror.txt, hive-1264.txt, hive-1264.txt, HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12916687#action_12916687 ] Todd Lipcon commented on HIVE-842: -- bq. should there be an option whereby the metastore uses a keytab to authenticate to HDFS, but doesn't require users to authenticate to it? bq. Wouldn't this leave a hole as it currently exists? Yea - I think the use case is that you may have some old Thrift clients that haven't yet been updated to work with the SASL implementation (eg PHP). For those clients, perhaps you can provide security based on firewall rules, etc. But you would still like to run Hive on top of a secured HDFS. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-842: - Attachment: hive-842.txt Here's a preview patch of this work. A few notes: - This checks in a bunch of Thrift classes that are in Thrift trunk. Thrift is currently in rc phase for an 0.5.0 release, so we can ditch these thrift classes out of Hive as soon as that's out (probably before this patch is even ready for commit) - There are still some javadocs that could be improved a little bit. - There's currently not any integration into the guts of Hive - we simply assume the calling user's identity as soon as the RPC is received. I think that's OK for the scope of this patch, as discussed above. There's a bit of a lurking bug, I believe, due to HADOOP-6982, but it's shouldn't be major. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: hive-842.txt, HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HIVE-842: Assignee: Todd Lipcon Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913439#action_12913439 ] Todd Lipcon commented on HIVE-842: -- As discussed at the last contributor meeting, I am working on authenticating access to the metastore by kerberizing the Thrift interface. Plan is currently: 1) Update the version of Thrift in Hive to 0.4.0 2) Temporarily check in the SASL support from Thrift trunk (this will be in 0.5.0 release, due out in October some time) 3) Build a bridge between Thrift's SASL support and Hadoop's UserGroupInformation classes. Thus, if a user has a current UGI on the client side, it will get propagated to the JAAS context on the handler side. 4) In places where the metastore accesses the file system, use the proxy user functionality to act on behalf of the authenticated user. 5) When we detect that we are running on secure hadoop with security enabled, enable the above functionality. I'd like to attack the Hive Web UI separately. One open question: - Do Hive *tasks* ever need to authenticate to the metastore? If so, we will have to build a delegation token system into Hive. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913691#action_12913691 ] Todd Lipcon commented on HIVE-842: -- OK. The code in Hadoop Common is somewhat reusable for this, so it shouldn't be too hard to implement. If I recall correctly, though, the delegation tokens rely on a secret key that the master daemon periodically rotates. We need to add some kind of persistent token storage for this to work - I guess in the metastore's DB? To make this easier to review, I'd like to do the straight kerberos first, and then add delegation tokens in a second patch/JIRA. Sound good? Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-842) Authentication Infrastructure for Hive
[ https://issues.apache.org/jira/browse/HIVE-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12913787#action_12913787 ] Todd Lipcon commented on HIVE-842: -- I don't anticipate breaking the web UI (or anything) on non-secure Hadoop versions. But it will probably be insecure to run the web UI, which currently trusts users to say who they want to be - i.e I don't plan in the short term to integrate an auth layer for the web UI itself. Authentication Infrastructure for Hive -- Key: HIVE-842 URL: https://issues.apache.org/jira/browse/HIVE-842 Project: Hadoop Hive Issue Type: New Feature Components: Server Infrastructure Reporter: Edward Capriolo Assignee: Todd Lipcon Attachments: HiveSecurityThoughts.pdf This issue deals with the authentication (user name,password) infrastructure. Not the authorization components that specify what a user should be able to do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1526: -- Attachment: hive-1526.txt libthrift.jar libfb303.jar Here is a patch along with the newly built jars from Thrift 0.4.0. I agree that long term we should make codegen part of the build, but I think it's enough of a hassle to require everyone to install the same version of thrift, we should punt for now. Hive should depend on a release version of Thrift - Key: HIVE-1526 URL: https://issues.apache.org/jira/browse/HIVE-1526 Project: Hadoop Hive Issue Type: Task Components: Build Infrastructure Reporter: Carl Steinbach Assignee: Todd Lipcon Attachments: hive-1526.txt, libfb303.jar, libthrift.jar Hive should depend on a release version of Thrift, and ideally it should use Ivy to resolve this dependency. The Thrift folks are working on adding Thrift artifacts to a maven repository here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HIVE-1526: - Assignee: Todd Lipcon Hive should depend on a release version of Thrift - Key: HIVE-1526 URL: https://issues.apache.org/jira/browse/HIVE-1526 Project: Hadoop Hive Issue Type: Task Components: Build Infrastructure Reporter: Carl Steinbach Assignee: Todd Lipcon Hive should depend on a release version of Thrift, and ideally it should use Ivy to resolve this dependency. The Thrift folks are working on adding Thrift artifacts to a maven repository here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1526) Hive should depend on a release version of Thrift
[ https://issues.apache.org/jira/browse/HIVE-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12912768#action_12912768 ] Todd Lipcon commented on HIVE-1526: --- Er, sorry, not THRIFT-381, but rather THRIFT-907. Too many browser tabs! Hive should depend on a release version of Thrift - Key: HIVE-1526 URL: https://issues.apache.org/jira/browse/HIVE-1526 Project: Hadoop Hive Issue Type: Task Components: Build Infrastructure Reporter: Carl Steinbach Assignee: Todd Lipcon Hive should depend on a release version of Thrift, and ideally it should use Ivy to resolve this dependency. The Thrift folks are working on adding Thrift artifacts to a maven repository here: https://issues.apache.org/jira/browse/THRIFT-363 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1264: -- Status: Patch Available (was: Open) Affects Version/s: 0.7.0 Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Jeff Hammerbacher Assignee: Todd Lipcon Attachments: hive-1264.txt, HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910373#action_12910373 ] Todd Lipcon commented on HIVE-1264: --- Submitted to RB: https://review.cloudera.org/r/860/ Regarding the snapshot - it's fine by me to pull from there, I think the people.apache.org web server is reasonably stable. If it turns out to be flaky it's also cool if you want to mirror it - FB is probably more reliable than ASF. Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Jeff Hammerbacher Assignee: Todd Lipcon Attachments: hive-1264.txt, HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4
[ https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1628: -- Status: Patch Available (was: Open) Fix Base64TextInputFormat to be compatible with commons codec 1.4 - Key: HIVE-1628 URL: https://issues.apache.org/jira/browse/HIVE-1628 Project: Hadoop Hive Issue Type: Bug Components: Contrib Affects Versions: 0.6.0, 0.7.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hive-1628-0.5.txt, hive-1628.txt Commons-codec 1.4 made an incompatible change to the Base64 class that made line-wrapping default (boo!). This breaks the Base64TextInputFormat in contrib. This patch adds some simple reflection to use the new constructor that uses the old behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4
[ https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1628: -- Attachment: hive-1628.txt hive-1628-0.5.txt Fix Base64TextInputFormat to be compatible with commons codec 1.4 - Key: HIVE-1628 URL: https://issues.apache.org/jira/browse/HIVE-1628 Project: Hadoop Hive Issue Type: Bug Components: Contrib Affects Versions: 0.6.0, 0.7.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hive-1628-0.5.txt, hive-1628.txt Commons-codec 1.4 made an incompatible change to the Base64 class that made line-wrapping default (boo!). This breaks the Base64TextInputFormat in contrib. This patch adds some simple reflection to use the new constructor that uses the old behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4
[ https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1628: -- Status: Open (was: Patch Available) Oops, I just noticed I posted the wrong patch! sorry, one sec... Fix Base64TextInputFormat to be compatible with commons codec 1.4 - Key: HIVE-1628 URL: https://issues.apache.org/jira/browse/HIVE-1628 Project: Hadoop Hive Issue Type: Bug Components: Contrib Affects Versions: 0.6.0, 0.7.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hive-1628-0.5.txt, hive-1628.txt Commons-codec 1.4 made an incompatible change to the Base64 class that made line-wrapping default (boo!). This breaks the Base64TextInputFormat in contrib. This patch adds some simple reflection to use the new constructor that uses the old behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1628) Fix Base64TextInputFormat to be compatible with commons codec 1.4
[ https://issues.apache.org/jira/browse/HIVE-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1628: -- Status: Patch Available (was: Open) Fix Base64TextInputFormat to be compatible with commons codec 1.4 - Key: HIVE-1628 URL: https://issues.apache.org/jira/browse/HIVE-1628 Project: Hadoop Hive Issue Type: Bug Components: Contrib Affects Versions: 0.6.0, 0.7.0 Reporter: Todd Lipcon Assignee: Todd Lipcon Attachments: hive-1628-0.5.txt, hive-1628-0.5.txt, hive-1628.txt, hive-1628.txt Commons-codec 1.4 made an incompatible change to the Base64 class that made line-wrapping default (boo!). This breaks the Base64TextInputFormat in contrib. This patch adds some simple reflection to use the new constructor that uses the old behavior. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-1264: -- Attachment: hive-1264.txt Here's a patch against trunk which adds shims for secure hadoop. Since there hasn't been a public tarball release of secure hadoop quite yet, I've pointed it at a snapshot of CDH3b3 (not yet released) from my apache.org web directory. I haven't run the unit test suite against secure hadoop yet, but I did a very brief test on a secure cluster by creating a table and running a simple MR query. Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Reporter: Jeff Hammerbacher Assignee: Todd Lipcon Attachments: hive-1264.txt, HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905156#action_12905156 ] Todd Lipcon commented on HIVE-1264: --- (btw, if you want to test this against a different tarball, you can use {{-Dhadoop.security.url=http://url/of/your/tarball -Dhadoop.security.version=0.20.104 or whatever.}}) Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Reporter: Jeff Hammerbacher Assignee: Todd Lipcon Attachments: hive-1264.txt, HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1476) Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc.
[ https://issues.apache.org/jira/browse/HIVE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905299#action_12905299 ] Todd Lipcon commented on HIVE-1476: --- In the absence of making the metastore truly a metadata-only service, it seems like what we really want is for the metastore to act on behalf of users. Could we have the hive client fetch an HDFS delegation token and pass it securely to the metastore, so the metastore can act as the user to perform the operations? Alternatively, could the metastore be set up with an HDFS proxy user principal that allows it to impersonate anyone in a hive group? Although we don't have true authorization, etc, at the moment, in Hive, we should think about how to solve this in a way that at least moves us closer to that goal. Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc. Key: HIVE-1476 URL: https://issues.apache.org/jira/browse/HIVE-1476 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Attachments: HIVE-1476.patch, HIVE-1476.patch.2 If the thrift metastore service is running as the user hive then all table directories as a result of create table are created as that user rather than the user who actually issued the create table command. This is different semantically from non-thrift mode (i.e. local mode) when clients directly connect to the metastore. In the latter case, directories are created as the real user. The thrift mode should do the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1476) Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc.
[ https://issues.apache.org/jira/browse/HIVE-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905310#action_12905310 ] Todd Lipcon commented on HIVE-1476: --- BTW, we are working on a SASL-secured Thrift transport over at THRIFT-876 Hive's metastore when run as a thrift service creates directories as the service user instead of the real user issuing create table/alter table etc. Key: HIVE-1476 URL: https://issues.apache.org/jira/browse/HIVE-1476 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.6.0, 0.7.0 Reporter: Pradeep Kamath Attachments: HIVE-1476.patch, HIVE-1476.patch.2 If the thrift metastore service is running as the user hive then all table directories as a result of create table are created as that user rather than the user who actually issued the create table command. This is different semantically from non-thrift mode (i.e. local mode) when clients directly connect to the metastore. In the latter case, directories are created as the real user. The thrift mode should do the same. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (HIVE-1264) Make Hive work with Hadoop security
[ https://issues.apache.org/jira/browse/HIVE-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reassigned HIVE-1264: - Assignee: Todd Lipcon (was: Venkatesh S) Make Hive work with Hadoop security --- Key: HIVE-1264 URL: https://issues.apache.org/jira/browse/HIVE-1264 Project: Hadoop Hive Issue Type: Improvement Reporter: Jeff Hammerbacher Assignee: Todd Lipcon Attachments: HiveHadoop20S_patch.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HIVE-789) Add get_unique_id() call to metastore
[ https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon resolved HIVE-789. -- Resolution: Won't Fix Resolving as wontfix - this was a proposed part of a solution, but no longer applicable. Add get_unique_id() call to metastore - Key: HIVE-789 URL: https://issues.apache.org/jira/browse/HIVE-789 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Reporter: Todd Lipcon Attachments: hive-789.patch As noted in HIVE-718, it can be tough to avoid race conditions when multiple clients are trying to move files into the same directory. This patch adds a get_unique_id() call to the metastore that returns the current value from an incrementing JDO Sequence so that clients can avoid some races without locks. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12838735#action_12838735 ] Todd Lipcon commented on HIVE-259: -- Doesn't the autoboxing of Integer types actually allocate objects? I think JVM only flyweights integers for very small ones (iirc only from -127 to 128) Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259-2.patch, HIVE-259.1.patch, HIVE-259.patch, jb2.txt, Percentile.xlsx Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12832139#action_12832139 ] Todd Lipcon commented on HIVE-259: -- Agreed re HashMap. Also, there should be some kind of setting that limits how much RAM gets used up. In a later iteration we could do adaptive histogramming once we hit the limit. In this version we should just throw up our hands and fail with a message that says the user needs to discretize harder. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Assignee: Jerome Boulon Attachments: HIVE-259.patch Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1128) Let max/min handle complex types like struct
[ https://issues.apache.org/jira/browse/HIVE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829311#action_12829311 ] Todd Lipcon commented on HIVE-1128: --- This is clever, but I'd be surprised if a lot of non-programmer users would come up with this on their own. Would it be helpful to also provide argmin and argmax functions? The statistics community would probably appreciate the syntactic sugar. Let max/min handle complex types like struct Key: HIVE-1128 URL: https://issues.apache.org/jira/browse/HIVE-1128 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao A lot of users are interested in doing arg_min and arg_max. Basically, return the value of some other columns when one column's value is the max value. The following is an example usage when this is done: {code} SELECT department, max(struct(salary, employee_name)) FROM compensations; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1128) Let max/min handle complex types like struct
[ https://issues.apache.org/jira/browse/HIVE-1128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829338#action_12829338 ] Todd Lipcon commented on HIVE-1128: --- yep, my suggestion is definitely a separate feature that is orthogonal to this JIRA. Let max/min handle complex types like struct Key: HIVE-1128 URL: https://issues.apache.org/jira/browse/HIVE-1128 Project: Hadoop Hive Issue Type: Improvement Affects Versions: 0.6.0 Reporter: Zheng Shao Assignee: Zheng Shao Attachments: HIVE-1128.1.sh, HIVE-1128.2.patch A lot of users are interested in doing arg_min and arg_max. Basically, return the value of some other columns when one column's value is the max value. The following is an example usage when this is done: {code} SELECT department, max(struct(salary, employee_name)) FROM compensations; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1015) Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts
[ https://issues.apache.org/jira/browse/HIVE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796812#action_12796812 ] Todd Lipcon commented on HIVE-1015: --- I think it's very slightly different: - UDF - only a 1:1 mapping on a single column - UDAF - requires implementation of Combiner-like functionality, best I can tell (haven't delved into this deeply, so apologies if you can do a reducer-only UDAF) - UDTF - perhaps supports the same functionality, but the syntax is a little less obvious than the MAP/REDUCE syntax. I think this feature could be implemented by an AST transform and some kind of interface-changing wrapper class for UDTF that makes it look more like the usual MR API. BTW, these thoughts definitely shouldn't block progress on this JIRA. I just wanted to throw the idea out there. Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts --- Key: HIVE-1015 URL: https://issues.apache.org/jira/browse/HIVE-1015 Project: Hadoop Hive Issue Type: Improvement Components: Contrib Reporter: Carl Steinbach Attachments: HIVE-1015.patch Larry Ogrodnek has written a set of wrapper classes that make it possible to write Hive TRANSFORM/MAP/REDUCE scripts in Java in a style that more closely resembles conventional Hadoop MR programs. A blog post describing this library can be found here: http://dev.bizo.com/2009/10/hive-map-reduce-in-java.html The source code (with Apache license) is available here: http://github.com/ogrodnek/shmrj We should add this to contrib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1015) Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts
[ https://issues.apache.org/jira/browse/HIVE-1015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12796559#action_12796559 ] Todd Lipcon commented on HIVE-1015: --- Related thought: would be nice to be able to write MAP/REDUCE as straight Java without having the overhead of streaming and serde. Is there a ticket already for this? Java MapReduce wrapper for TRANSFORM/MAP/REDUCE scripts --- Key: HIVE-1015 URL: https://issues.apache.org/jira/browse/HIVE-1015 Project: Hadoop Hive Issue Type: Improvement Components: Contrib Reporter: Carl Steinbach Attachments: HIVE-1015.patch Larry Ogrodnek has written a set of wrapper classes that make it possible to write Hive TRANSFORM/MAP/REDUCE scripts in Java in a style that more closely resembles conventional Hadoop MR programs. A blog post describing this library can be found here: http://dev.bizo.com/2009/10/hive-map-reduce-in-java.html The source code (with Apache license) is available here: http://github.com/ogrodnek/shmrj We should add this to contrib. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-837) virtual column support (filename) in hive
[ https://issues.apache.org/jira/browse/HIVE-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12795904#action_12795904 ] Todd Lipcon commented on HIVE-837: -- bq. in that case they are really interested in the actual filename as opposed to the directory name. +1. I'm currently working with a 200G dataset that has lots of rows that Hive is interpreting as NULL. As far as I knew, there are no NULLs in the dataset to begin with, so I'd love to do: SELECT FILENAME(), FILEOFFSET() FROM t WHERE some_col IS NULL; virtual column support (filename) in hive - Key: HIVE-837 URL: https://issues.apache.org/jira/browse/HIVE-837 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Namit Jain Copying from some mails: I am dumping files into a hive partion on five minute intervals. I am using LOAD DATA into a partition. weblogs web1.00 web1.05 web1.10 ... web2.00 web2.05 web1.10 Things that would be useful.. Select files from the folder with a regex or exact name select * FROM logs where FILENAME LIKE(WEB1*) select * FROM LOGS WHERE FILENAME=web2.00 Also it would be nice to be able to select offsets in a file, this would make sense with appends select * from logs WHERE FILENAME=web2.00 FROMOFFSET=454644 [TOOFFSET=] select substr(filename, 4, 7) as class_A, substr(filename, 8, 10) as class_B count( x ) as cnt from FOO group by substr(filename, 4, 7), substr(filename, 8, 10) ; Hive should support virtual columns -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-789) Add get_unique_id() call to metastore
[ https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792249#action_12792249 ] Todd Lipcon commented on HIVE-789: -- I'd be fine if you closed this as wontfix - it was depended on by some stuff a while ago, but if you have better solutions, I'm all for them :) Add get_unique_id() call to metastore - Key: HIVE-789 URL: https://issues.apache.org/jira/browse/HIVE-789 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Reporter: Todd Lipcon Attachments: hive-789.patch As noted in HIVE-718, it can be tough to avoid race conditions when multiple clients are trying to move files into the same directory. This patch adds a get_unique_id() call to the metastore that returns the current value from an incrementing JDO Sequence so that clients can avoid some races without locks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-259) Add PERCENTILE aggregate function
[ https://issues.apache.org/jira/browse/HIVE-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782120#action_12782120 ] Todd Lipcon commented on HIVE-259: -- An easy way to do this that would work for a ton of data sets would to be essentially do counting sort. If you have only a few thousand distinct values in the column to be analyzed, just make a hashtable, count up how many you see, and then in the single reducer use the histogram to figure out the percentile. This should work great for datasets like age, and even for sets like number of days since user signed up. For sets that are truly continuous, would be useful when combined with a binning UDF to discretize it. Sadly it's not general case, but would be an easy first step. Add PERCENTILE aggregate function - Key: HIVE-259 URL: https://issues.apache.org/jira/browse/HIVE-259 Project: Hadoop Hive Issue Type: New Feature Components: Query Processor Reporter: Venky Iyer Compute atleast 25, 50, 75th percentiles -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755644#action_12755644 ] Todd Lipcon commented on HIVE-718: -- +1 lgtm. Thanks for getting this in, guys! Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Assignee: Namit Jain Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt, hive.718.1.patch The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755177#action_12755177 ] Todd Lipcon commented on HIVE-718: -- Namit: that error is tracked by HIVE-307 marked linked above. I think it's OK behavior for 0.4 - we should verify, though, that the file that fails to load remains safely in place. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754632#action_12754632 ] Todd Lipcon commented on HIVE-718: -- Even in the case of LOAD...OVERWRITE, we currently lack atomicity. The old directory is deleted prior to the new directory being moved in. There's a small window of time in which neither old nor new data is present. Sure, this is probably on the order of a half second, but it is still not correct. There's also the general case of a query which has already computed input splits being affected by a concurrent LOAD DATA OVERWRITE. The versioning solution I think gets us partly there, but will be very tricky to implement correctly while maintaining performance since there's no way to do a copy-on-write directory snapshot built in to HDFS. So, I think a significant amount of work will have to be done in the metastore and we'll definitely have to drop the external process load ability that currently exists. Should we open a new JIRA for the general concurrency control/locking issues we're discussing here? It seems like this ticket should be used for the 0.3-0.4 regression, even if it's just a temporary fix. We can then put a more general correct solution on the roadmap for 0.5 or later, since it's looking like it will be a complicated project. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754266#action_12754266 ] Todd Lipcon commented on HIVE-718: -- Not sure if people are already following the discussion on HADOOP-6240, but it's worth checking out -- discussions regarding rename() semantics on HDFS. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754408#action_12754408 ] Todd Lipcon commented on HIVE-718: -- Namit: here's a trace from a session on hive 0.3.0: {noformat} t...@todd-laptop:~$ cat /tmp/insert.txt a b c d t...@todd-laptop:~$ cat /tmp/insert2.txt e f g h t...@todd-laptop:~$ hive Hive history file=/tmp/todd/hive_job_log_todd_200909111603_978288634.txt hive create table tmp_insert_test_p (value string) partitioned by (ds string); OK Time taken: 3.865 seconds hive load data local inpath '/tmp/insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); Copying data from file:/tmp/insert.txt Loading data to table tmp_insert_test_p partition {ds=2009-08-01} OK Time taken: 0.672 seconds hive select * from tmp_insert_test_p where ds = '2009-08-01'; OK a 2009-08-01 b 2009-08-01 c 2009-08-01 d 2009-08-01 Time taken: 0.374 seconds hive load data local inpath '/tmp/insert2.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); Copying data from file:/tmp/insert2.txt Loading data to table tmp_insert_test_p partition {ds=2009-08-01} OK Time taken: 0.261 seconds hive select * from tmp_insert_test_p where ds = '2009-08-01'; OK a 2009-08-01 b 2009-08-01 c 2009-08-01 d 2009-08-01 e 2009-08-01 f 2009-08-01 g 2009-08-01 h 2009-08-01 Time taken: 0.14 seconds {noformat} The same session fails on the 0.4 branch: {noformat} hive create table tmp_insert_test_p (value string) partitioned by (ds string); OK Time taken: 0.068 seconds hive load data local inpath '/tmp/insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); Copying data from file:/tmp/insert.txt Loading data to table tmp_insert_test_p partition {ds=2009-08-01} OK Time taken: 0.315 seconds hive select * from tmp_insert_test_p where ds = '2009-08-01'; OK Time taken: 0.523 seconds {noformat} Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-726) Make ant package -Dhadoop.version=0.17.0 work
[ https://issues.apache.org/jira/browse/HIVE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751170#action_12751170 ] Todd Lipcon commented on HIVE-726: -- Yea, I think this is a nice-to-have but not complete blocker. Its value is that you can ensure that you are still compatible with future releases. Since we didn't shim the entirety of Hadoop, it's possible to compile against the default fine, but still break at runtime if you're relying on an API that doesn't exist in other versions. Occasionally compiling against all of the supported versions is a good sanity check - if possible we should probably set up Hudson tests that check against all of the supported Hadoop versions on a nightly basis. Make ant package -Dhadoop.version=0.17.0 work --- Key: HIVE-726 URL: https://issues.apache.org/jira/browse/HIVE-726 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Zheng Shao Attachments: HIVE-726.1.patch, HIVE-726.2.patch, HIVE-726.3.patch, hive-726.txt After HIVE-487, users will have to specify the versions as in shims/ivy.xml to make ant package -Dhadoop.version=version.work. Currently it is only running fine with the following versions (from shims/ivy.xml): 0.17.2.1, 0.18.3, 0.19.0, 0.20.0. We used to do ant package -Dhadoop.version=0.17.0 but it's not working any more, although we can specify ant package -Dhadoop.version=0.17.2.1 and the package will probably still work with 0.17.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-752) Encountered ClassNotFound exception when trying HWI server
[ https://issues.apache.org/jira/browse/HIVE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748999#action_12748999 ] Todd Lipcon commented on HIVE-752: -- Hi Edward, As I think I noticed in the original JIRA, I did not manually test HWI. I figured that any necessary tests would already exist in the form of JUnit tests - in my opinion every supported feature of Hive should be continuously tested by Hudson. If you can add a test case that passes pre-HIVE-718 and fails now, I will happily fix it. -Todd Encountered ClassNotFound exception when trying HWI server -- Key: HIVE-752 URL: https://issues.apache.org/jira/browse/HIVE-752 Project: Hadoop Hive Issue Type: Bug Components: Clients Environment: Hadoop 0.18.3 Reporter: Venkat Ramachandran Assignee: Edward Capriolo Encountered ClassNotFound exception (for class: org.apache.jetty.hive.shims.Jetty18Shims) when trying to start HWI server on Hadoop 18. It appears that the class ShimLoader (org.apache.hadoop.hive.shims.ShimLoader) is referring to incorrect classes as below: static { JETTY_SHIM_CLASSES.put(0.17, org.apache.jetty.hive.shims.Jetty17Shims); JETTY_SHIM_CLASSES.put(0.18, org.apache.jetty.hive.shims.Jetty18Shims); JETTY_SHIM_CLASSES.put(0.19, org.apache.jetty.hive.shims.Jetty19Shims); JETTY_SHIM_CLASSES.put(0.20, org.apache.jetty.hive.shims.Jetty20Shims); } however, I think it should be as below: static { JETTY_SHIM_CLASSES.put(0.17, org.apache.hadoop.hive.shims.Jetty17Shims); JETTY_SHIM_CLASSES.put(0.18, org.apache.hadoop.hive.shims.Jetty18Shims); JETTY_SHIM_CLASSES.put(0.19, org.apache.hadoop.hive.shims.Jetty19Shims); JETTY_SHIM_CLASSES.put(0.20, org.apache.hadoop.hive.shims.Jetty20Shims); } Wondering if anybody else encountered this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-752) Encountered ClassNotFound exception when trying HWI server
[ https://issues.apache.org/jira/browse/HIVE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749000#action_12749000 ] Todd Lipcon commented on HIVE-752: -- Sorry, that last one should read HIVE-487 Encountered ClassNotFound exception when trying HWI server -- Key: HIVE-752 URL: https://issues.apache.org/jira/browse/HIVE-752 Project: Hadoop Hive Issue Type: Bug Components: Clients Environment: Hadoop 0.18.3 Reporter: Venkat Ramachandran Assignee: Edward Capriolo Encountered ClassNotFound exception (for class: org.apache.jetty.hive.shims.Jetty18Shims) when trying to start HWI server on Hadoop 18. It appears that the class ShimLoader (org.apache.hadoop.hive.shims.ShimLoader) is referring to incorrect classes as below: static { JETTY_SHIM_CLASSES.put(0.17, org.apache.jetty.hive.shims.Jetty17Shims); JETTY_SHIM_CLASSES.put(0.18, org.apache.jetty.hive.shims.Jetty18Shims); JETTY_SHIM_CLASSES.put(0.19, org.apache.jetty.hive.shims.Jetty19Shims); JETTY_SHIM_CLASSES.put(0.20, org.apache.jetty.hive.shims.Jetty20Shims); } however, I think it should be as below: static { JETTY_SHIM_CLASSES.put(0.17, org.apache.hadoop.hive.shims.Jetty17Shims); JETTY_SHIM_CLASSES.put(0.18, org.apache.hadoop.hive.shims.Jetty18Shims); JETTY_SHIM_CLASSES.put(0.19, org.apache.hadoop.hive.shims.Jetty19Shims); JETTY_SHIM_CLASSES.put(0.20, org.apache.hadoop.hive.shims.Jetty20Shims); } Wondering if anybody else encountered this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-752) Encountered ClassNotFound exception when trying HWI server
[ https://issues.apache.org/jira/browse/HIVE-752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749039#action_12749039 ] Todd Lipcon commented on HIVE-752: -- bq. Sure, every feature of Hive should be tested by hudson, but I do not think JUnit replaces real testing. I personally test everything on a live cluster because the world is full of gotchas. Unfortunately this increases the barrier to contribution significantly. As feature sets grow it becomes impossible to do this without a full time QA staff. So, for now, we need at least continuous smoke testing of all features, including HWI imho. When a release candidate is rolled it's time for the significant manual testing - asking for that on every commit is pretty difficult. bq. I do think the shims should have some independent j-unit tests. Example what happens when someone trys to compile with 0.21.? I am hoping a unit test failure would be present in the shims. The shims should be covered by virtue of their use elsewhere. I think in the original ticket I suggested that we get separate build running on Hudson that tests all of the unit tests against all of the supported Hadoop versions - I'm not sure who's in charge of the Hive Hudson, but if Cloudera can help get that set up we'd be happy to. bq. I have much 0_20 angst. 1 because all the JMX stuff got renamed and all my cacti templates are broken 2. In general it seems like a lot changed and everyone is chasing after it. +1 :) Also agreed that it is a blocker. I'm getting on a plane in a couple hours for a week long vacation, but I'll try to sneak in some time to get this fixed. In the meantime, it would be great if you could write up a manual test plan for HWI. Its existence isn't even mentioned in README.txt, so it's difficult for new contributors to figure out what functionality they need to manually verify. Encountered ClassNotFound exception when trying HWI server -- Key: HIVE-752 URL: https://issues.apache.org/jira/browse/HIVE-752 Project: Hadoop Hive Issue Type: Bug Components: Clients Environment: Hadoop 0.18.3 Reporter: Venkat Ramachandran Assignee: Edward Capriolo Priority: Blocker Attachments: hive-752.diff Encountered ClassNotFound exception (for class: org.apache.jetty.hive.shims.Jetty18Shims) when trying to start HWI server on Hadoop 18. It appears that the class ShimLoader (org.apache.hadoop.hive.shims.ShimLoader) is referring to incorrect classes as below: static { JETTY_SHIM_CLASSES.put(0.17, org.apache.jetty.hive.shims.Jetty17Shims); JETTY_SHIM_CLASSES.put(0.18, org.apache.jetty.hive.shims.Jetty18Shims); JETTY_SHIM_CLASSES.put(0.19, org.apache.jetty.hive.shims.Jetty19Shims); JETTY_SHIM_CLASSES.put(0.20, org.apache.jetty.hive.shims.Jetty20Shims); } however, I think it should be as below: static { JETTY_SHIM_CLASSES.put(0.17, org.apache.hadoop.hive.shims.Jetty17Shims); JETTY_SHIM_CLASSES.put(0.18, org.apache.hadoop.hive.shims.Jetty18Shims); JETTY_SHIM_CLASSES.put(0.19, org.apache.hadoop.hive.shims.Jetty19Shims); JETTY_SHIM_CLASSES.put(0.20, org.apache.hadoop.hive.shims.Jetty20Shims); } Wondering if anybody else encountered this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-799) final.name should not include hadoop version
[ https://issues.apache.org/jira/browse/HIVE-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-799: - Attachment: hive-799.txt Trivial patch attached final.name should not include hadoop version Key: HIVE-799 URL: https://issues.apache.org/jira/browse/HIVE-799 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Affects Versions: 0.4.0 Reporter: Todd Lipcon Attachments: hive-799.txt Now that the shims allow a single build to work with any version of Hadoop, we shouldn't include the hadoop version in the tarball name. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-800) Exclude .git directory from src/ directory in release tarball
Exclude .git directory from src/ directory in release tarball - Key: HIVE-800 URL: https://issues.apache.org/jira/browse/HIVE-800 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Todd Lipcon Attachments: hive-800.txt When creating the dist tarball, we should exclude .git from the src/ directory. We may also need to exclude some SVN directory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-800) Exclude .git directory from src/ directory in release tarball
[ https://issues.apache.org/jira/browse/HIVE-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-800: - Attachment: hive-800.txt Trivial patch Exclude .git directory from src/ directory in release tarball - Key: HIVE-800 URL: https://issues.apache.org/jira/browse/HIVE-800 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Todd Lipcon Attachments: hive-800.txt When creating the dist tarball, we should exclude .git from the src/ directory. We may also need to exclude some SVN directory? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-802) Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it
Bug in DataNucleus prevents Hive from building if inside a dir with '+' in it - Key: HIVE-802 URL: https://issues.apache.org/jira/browse/HIVE-802 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Todd Lipcon There's a bug in DataNucleus that causes this issue: http://www.jpox.org/servlet/jira/browse/NUCCORE-371 To reproduce, simply put your hive source tree in a directory that contains a '+' character. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747602#action_12747602 ] Todd Lipcon commented on HIVE-718: -- I hadn't considered the case of an in-memory metastore. A mktemp-like method would be great, but o.a.h.FileSystem gives you nothing of the sort :( Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747668#action_12747668 ] Todd Lipcon commented on HIVE-718: -- Not sure how that actually helps - if we use an algorithm like: {code} for each file to be moved: while not successful: come up with a random name try to move src file to the random name if it fails due to dst already existing, try again with a new random name {code} then we'd lose the atomicity/isolation - readers would see a partial load during the middle of the operation. We can't use that algorithm with atomic directory renames, since Hadoop has the wacky behavior that move(srcdir, dstdir) will create dstdir/srcdir if dstdir already exists Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747107#action_12747107 ] Todd Lipcon commented on HIVE-718: -- Here's a proposal: How would you guys feel about an addition to the metastore API that is as simple as: string get_unique_id() The metastore would simply keep an autoincrement field to hand these out. We can then use these safely to allocate race-free names to clients. Should be straightforward and a lot more simple than any kind of lock/lease based mechanism. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HIVE-789) Add get_unique_id() call to metastore
Add get_unique_id() call to metastore - Key: HIVE-789 URL: https://issues.apache.org/jira/browse/HIVE-789 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Reporter: Todd Lipcon Fix For: 0.4.0 As noted in HIVE-718, it can be tough to avoid race conditions when multiple clients are trying to move files into the same directory. This patch adds a get_unique_id() call to the metastore that returns the current value from an incrementing JDO Sequence so that clients can avoid some races without locks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-789) Add get_unique_id() call to metastore
[ https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-789: - Attachment: hive-789.patch This patch implements get_unique_id. One concern: do I need to do any kind of migration script for metastore databases? I don't know how metastore schema migrations are done from release to release. Add get_unique_id() call to metastore - Key: HIVE-789 URL: https://issues.apache.org/jira/browse/HIVE-789 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Reporter: Todd Lipcon Fix For: 0.4.0 Attachments: hive-789.patch As noted in HIVE-718, it can be tough to avoid race conditions when multiple clients are trying to move files into the same directory. This patch adds a get_unique_id() call to the metastore that returns the current value from an incrementing JDO Sequence so that clients can avoid some races without locks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-789) Add get_unique_id() call to metastore
[ https://issues.apache.org/jira/browse/HIVE-789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747131#action_12747131 ] Todd Lipcon commented on HIVE-789: -- Forgot to mention - this patch is against branch-0.4.0, though it shouldn't be hard to apply the same thing to trunk. Assuming this seems like a reasonable way to fix the races in HIVE-718 I'll port it to trunk. Add get_unique_id() call to metastore - Key: HIVE-789 URL: https://issues.apache.org/jira/browse/HIVE-789 Project: Hadoop Hive Issue Type: Improvement Components: Metastore Reporter: Todd Lipcon Fix For: 0.4.0 Attachments: hive-789.patch As noted in HIVE-718, it can be tough to avoid race conditions when multiple clients are trying to move files into the same directory. This patch adds a get_unique_id() call to the metastore that returns the current value from an incrementing JDO Sequence so that clients can avoid some races without locks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-792) support add archive in addition to add files and add jars
[ https://issues.apache.org/jira/browse/HIVE-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12747218#action_12747218 ] Todd Lipcon commented on HIVE-792: -- Note that a lot of time you want to add a jar to the classpath without unpacking it. In that case you want addFileToClassPath. Unpacking is reasonably slow, as is the recursive delete when you're done support add archive in addition to add files and add jars --- Key: HIVE-792 URL: https://issues.apache.org/jira/browse/HIVE-792 Project: Hadoop Hive Issue Type: New Feature Reporter: Zheng Shao In JobClient.java, we have: {code} if (commandConf != null) { files = commandConf.get(tmpfiles); libjars = commandConf.get(tmpjars); archives = commandConf.get(tmparchives); } {code} The good thing about tmparchives is that TT will automatically unarchive the files (because tmparchives goes through DistributeCache.addCacheArchive, while TT won't do that for tmpfiles). We should have add archive which sets tmparchives. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745554#action_12745554 ] Todd Lipcon commented on HIVE-718: -- bq. my concern in this case is that, it is possible to corrupt the existing partition with only a part of new files and overwrite some of the old files and user has no way of knowing that such a thing has happened and it may not possible to recover the data. Can you explain the order of events that causes this? I think even with the current patch the operation will not fail silently and should not cause unrecoverable loss. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744143#action_12744143 ] Todd Lipcon commented on HIVE-718: -- Prasad: this is true, but the atomicity issue was already there - we just discovered it while looking at this section of the code. My guess is that there are similar bugs elsewhere in Hive - there's no safe way to have multiple threads compete for a directory name, so really any movement of files has to be coordinated by the metastore to be safe. The hadoop-side issues are: - mkdirs on an existing directory will return the exact same thing as mkdirs on one that didn't previously exist - rename of one directory to a new name will not give an error if the new name exists - rather, it will move your directory inside that one - checking if (not exists) { do something to a location } is obviously racy Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-718: - Affects Version/s: 0.4.0 Status: Patch Available (was: Open) I think we should go ahead with my patch for now and then open another JIRA for fixing the atomicity issue. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.4.0 Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-726) Make ant package -Dhadoop.version=0.17.0 work
[ https://issues.apache.org/jira/browse/HIVE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739765#action_12739765 ] Todd Lipcon commented on HIVE-726: -- I think there is some confusion about how the shims work. The point of the new shims package as done in HIVE-487 is that, on every build, the shims are build for all major versions of Hadoop. The versions in the ivy.xml there were picked as the most recent versions, but as Ashish said, we could use the .0 releases instead - it's somewhat arbitrary. After the shims have been built with each of the hadoop versions, it makes a hive_shims.jar which includes *all* of the implementations. At this point, when Hadoop is built, it can be built against any version of Hadoop using the -Dhadoop.version flag. In actuality this should not matter - it may be that there is still some work yet to be done, but in my testing I build with the default hadoop.version (0.19.0) and then ran the build products against 18 and 20 with no recompile. The ShimLoader class determines the current hadoop version at runtime and loads the correct implementation class out of hive_shims.jar. So, -1 on the patch, since it would result in a hive_shims.jar that only includes one version of the shims, and thus the build product would only work on that version of Hadoop. Make ant package -Dhadoop.version=0.17.0 work --- Key: HIVE-726 URL: https://issues.apache.org/jira/browse/HIVE-726 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Zheng Shao Attachments: HIVE-726.1.patch, HIVE-726.2.patch After HIVE-487, users will have to specify the versions as in shims/ivy.xml to make ant package -Dhadoop.version=version.work. Currently it is only running fine with the following versions (from shims/ivy.xml): 0.17.2.1, 0.18.3, 0.19.0, 0.20.0. We used to do ant package -Dhadoop.version=0.17.0 but it's not working any more, although we can specify ant package -Dhadoop.version=0.17.2.1 and the package will probably still work with 0.17.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-726) Make ant package -Dhadoop.version=0.17.0 work
[ https://issues.apache.org/jira/browse/HIVE-726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-726: - Attachment: hive-726.txt I think there was still an issue which you've uncovered. ant's semantics for the param element inside antcall are somewhat screwy - you can't override something that has been passed on the command line. Since you guys were passing hadoop.version on the command line, it was actually compiling the shims for 17 four times rather than compiling each version, as best I can tell. Attaching a patch which gets around this by introducing a new ant property called hadoop.version.ant-internal. This property is set in build.properties to default to hadoop.version, and hadoop.version is left as is. Everywhere in the build files that used to reference hadoop.version now references hadoop.version.ant-internal. Since we're not specifying this on the command line, the antcalls inside shims/build.xml can properly override it. One question: I think the condition in build.xml's eclipsefiles targets that defaults to 0.19 is now dead code, since the property itself is defaulted to 0.19.0. Also, can we change ${final.name} to simply ${name}-${version} since one build can work with any hadoop? Make ant package -Dhadoop.version=0.17.0 work --- Key: HIVE-726 URL: https://issues.apache.org/jira/browse/HIVE-726 Project: Hadoop Hive Issue Type: Bug Components: Build Infrastructure Reporter: Zheng Shao Attachments: HIVE-726.1.patch, HIVE-726.2.patch, HIVE-726.3.patch, hive-726.txt After HIVE-487, users will have to specify the versions as in shims/ivy.xml to make ant package -Dhadoop.version=version.work. Currently it is only running fine with the following versions (from shims/ivy.xml): 0.17.2.1, 0.18.3, 0.19.0, 0.20.0. We used to do ant package -Dhadoop.version=0.17.0 but it's not working any more, although we can specify ant package -Dhadoop.version=0.17.2.1 and the package will probably still work with 0.17.0. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738855#action_12738855 ] Todd Lipcon commented on HIVE-487: -- Normal patch -p0 ought to work: t...@todd-laptop:~/cloudera/cdh/repos/hive$ patch -p0 /tmp/hive-487-runtime.patch patching file ant/build.xml patching file bin/ext/cli.sh patching file build-common.xml patching file build.xml etc... (from a clean trunk checkout) Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Priority: Blocker Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487-runtime.patch, hive-487-with-cli-changes.2.patch, hive-487-with-cli-changes.3.patch, hive-487-with-cli-changes.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, junit-patch1.html, patch-487.txt Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739075#action_12739075 ] Todd Lipcon commented on HIVE-718: -- I can confirm this bug on trunk - it's a regression since 0.3.0. However, Im not sure about the patch - if one of the later rename fails, we should undo the previous ones, but in this patch it looks like it's actually deleting the previous ones. Why not attempt to move it back to its original location? Also, it seems worth it to add some LOG.error() statements in these cases. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739100#action_12739100 ] Todd Lipcon commented on HIVE-718: -- What if the partition already exists? In that case we couldn't rename the staging directory since the destination name would already exist, right? Or can we just make another subdirectory of the partition with some unique name? Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739109#action_12739109 ] Todd Lipcon commented on HIVE-718: -- OK. How do we come up with a unique name? Just use timestamp? Does the metastore need to know this name somehow? Or is the change just localized to that method? (Also, should we rename that method? It's called copyFiles, but does nothing of the sort) Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739146#action_12739146 ] Todd Lipcon commented on HIVE-718: -- In looking through this code, I've found a few more issues: - In isolation, it looks like copyFiles/replaceFiles are supposed to be able to handle a srcf like /foo/* with a directory layout like: /foo/subdir1/part-0 /foo/subdir2/part-0 I'm assuming this because it first does fs.globStatus on srcf, and then for each of the results of the glob, it calls fs.listStatus (implying that they are directories). However, given the example above, this would actually fail, since both files are named part-0 and the could would attempt to rename both to tmpdir/part-0. - In fact, using the tmpdir like this is consistent from the view of an outside observer, but not atomic. If the renamer crashes in the middle of the operation, the files will have been moved out of the original location and into the tmpdir, but the tmpdir has not been renamed into the destination. Is this OK? I feel like the solution would be to make dstdir/_staging_timestamp, move the files one-by-one into there, and then rename _staging_timestamp to the destination. This way if there is a failure in the middle, the client can at least determine where their files went without looking through a temporary directory. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-718) Load data inpath into a new partition without overwrite does not move the file
[ https://issues.apache.org/jira/browse/HIVE-718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12739170#action_12739170 ] Todd Lipcon commented on HIVE-718: -- bq. I think it's not acceptable for a failed insert to corrupt the original data of the table. then we definitely have to move an entire directory of files in at once - otherwise we can have an insert partially succeed bq. We never have a table with sub directories (instead of files) inside. We will need some testing to make sure it actually works. This is going to be a necessity to do non-overwrite loads into a table/partition, right? bq. For unique name, maybe we can just prepend the job id. This isn't always available (eg running LOAD DATA from the cli). I think we're stuck with java.util.UUID, as ugly as it may be. I've spent the last hour or so trying to figure out any other way of generating a unique name inside a subdirectory. Because of the semantics of FileSystem.mkdirs and FileSystem.rename, I don't believe there's any way of doing this. mkdirs doesn't return false in the case that the directory already exists, and if you rename(src, dst), and dst already exists as a directory, it will move src *inside* of dst. Load data inpath into a new partition without overwrite does not move the file -- Key: HIVE-718 URL: https://issues.apache.org/jira/browse/HIVE-718 Project: Hadoop Hive Issue Type: Bug Reporter: Zheng Shao Attachments: HIVE-718.1.patch, HIVE-718.2.patch, hive-718.txt The bug can be reproduced as following. Note that it only happens for partitioned tables. The select after the first load returns nothing, while the second returns the data correctly. insert.txt in the current local directory contains 3 lines: a, b and c. {code} create table tmp_insert_test (value string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test; select * from tmp_insert_test; a b c create table tmp_insert_test_p ( value string) partitioned by (ds string) stored as textfile; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; load data local inpath 'insert.txt' into table tmp_insert_test_p partition (ds = '2009-08-01'); select * from tmp_insert_test_p where ds= '2009-08-01'; a 2009-08-01 b 2009-08-01 d 2009-08-01 {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-487: - Attachment: hive-487-with-cli-changes.3.patch Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Priority: Blocker Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487-with-cli-changes.2.patch, hive-487-with-cli-changes.3.patch, hive-487-with-cli-changes.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, junit-patch1.html, patch-487.txt Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738596#action_12738596 ] Todd Lipcon commented on HIVE-487: -- The issue turned out to be that the shim classes weren't getting built into hive_exec.jar, which seems to include the built classes of many other of the components. I'm not entirely sure why this is designed like this (why not just have hive_exec.jar add the other jars to its own classloader at startup?) but including build/shims/classes in there fixed the tests. Attaching new patch momentarily Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Priority: Blocker Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487-with-cli-changes.2.patch, hive-487-with-cli-changes.3.patch, hive-487-with-cli-changes.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, junit-patch1.html, patch-487.txt Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737236#action_12737236 ] Todd Lipcon commented on HIVE-487: -- Hi Ashish, That does sound reasonable, though I will likely take it on in the short term, as we will be distributing packages for hadoop-0.18 and hadoop-0.20 until the majority of the community and our customers have transitioned over. During that time period we'd like to have a single hive package which will function with either. We can apply my work on top of the 0.4.0 release for our distribution, so it shouldn't block it, but I do think it would be nice if this feature were upstream in the Apache release. I've got some time blocked off to work on this - if I get something working this week do you think it might be able to go into 0.4.0? -Todd Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Priority: Blocker Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, junit-patch1.html, patch-487.txt Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12735789#action_12735789 ] Todd Lipcon commented on HIVE-487: -- Woops, sorry about that. Simply add an import for o.a.h.mapred.JobClient and it compiles. New patch in a second Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Priority: Blocker Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, hive-487.txt, jetty-patch.patch, junit-patch1.html Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-487: - Attachment: hive-487.txt Fixes the missing import. Now compiles with hadoop.version=0.17.0 Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Priority: Blocker Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, hive-487.txt, hive-487.txt, jetty-patch.patch, junit-patch1.html Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-487: - Attachment: hive-487.txt Here's a patch which adds a project called shims with separate source directories for 0.17, 0.18, 0.19, and 0.20. Inside each there is an implementation of JettyShims and HadoopShims which encapsulate all of the version-dependent code. The build.xml is set in such a way that ${hadoop.version} determines which one gets compiled. This probably needs a bit more javadoc before it's commitable, but I think it's worth considering this approach over reflection. Also, it seems like hadoop.version may be 0.18.0, 0.18.1, 0.18.2, etc. As long as it's kosher by Apache SVN standards, we should put a symlink for each of those versions in the shims/src/ directory pointing to 0.18, and same for the other minor releases. If symlinks aren't kosher, we need some way of parsing out the major version from within ant. Not being a regular contributor, I don't have a good test environment set up, but I've verified that this at least builds in all of the above versions. Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, hive-487.txt, jetty-patch.patch, junit-patch1.html Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734749#action_12734749 ] Todd Lipcon commented on HIVE-487: -- A couple thoughts: - Does the same compiled jar truly work in all versions of Hadoop between 0.17 and 0.19? That is to say, can we consider an option in which we use some build.xml rules to, depending on the value of a hadoop.version variable, swap between two implementations of the same .java file (one compatible with Jetty 5, one with Jetty 6)? Then in the build product we could simply include two jars and have the wrapper scripts swap between them based on version. If size is a concern, the variant classes could be put in their own jar that would only be a few KB. - The reflection code in this patch is pretty messy. I mocked up an idea for a slightly cleaner way to do it, and will attach it as a tarball momentarily. The idea is to define our own interfaces which have the same methods as we need to use in Jetty, and use a dynamic proxy to forward those invocations through to the actual implementation class. Dynamically choosing between the two interfaces is simple at runtime by simply checking that the method signatures correspond. This is still dirty (and a bad role model for CS students ;-) ) but it should reduce the number of Class.forName and .getMethod calls in the wrapper class Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Fix For: 0.4.0 Attachments: HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, jetty-patch.patch, junit-patch1.html Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated HIVE-487: - Attachment: dynamic-proxy.tar.gz Here's a tarball showing the technique mentioned in the comment above. The script run.sh will compile and run the example once with v1 on the classpath, and a second time with v2 on the classpath. I'm not certain that this will cover all the cases that are needed for Jetty, but I figured I would throw it out there. Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, jetty-patch.patch, junit-patch1.html Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-487) Hive does not compile with Hadoop 0.20.0
[ https://issues.apache.org/jira/browse/HIVE-487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12734795#action_12734795 ] Todd Lipcon commented on HIVE-487: -- bq. @Todd - Where were you a few weeks ago? Chillin' over on the HADOOP jira ;-) We're gearing up for release of our distribution that includes Hadoop 0.20.0, so just started watching this one more carefully. bq. The jars are upstream in Hadoop core. I did not look into this closely but the talk about 'Sealing exceptions' above led me to believe I should not try this. Sorry, what I meant here is that the hive tarball would include lib/hive-0.4.0.jar, lib/jetty-shims/hive-jetty-shim-v6.jar and lib/jetty-shims/hive-jetty-shim-v5.jar In those jars we'd have two different implementations of the shim. The hive wrapper script would then do something like: {code} HADOOP_JAR=$HADOOP_HOME/hadoop*core*jar if [[ $HADOOP_JAR =~ 0.1[789] ]]; then JETTY_SHIM=lib/jetty-shims/jetty-shim-v5.jar else JETTY_SHIM=lib/jetty-shims/jetty-shim-v6.jar fi CLASSPATH=$CLASSPATH:$JETTY_SHIM {code} To generate the shim jars at compile time, we'd compile two different JettyShim.java files - one against the v5 API, and one against the v6 API. As for eclipse properly completing/warning for the right versions for the right files, I haven't the foggiest idea. But I am pretty sure it's not going to warn if your reflective calls are broken either ;-) bq. My only concern is will the ant process cooperate? I don't see why not - my example build here is just to show how it works in a self contained way. The stuff inside v1-classes and v2-classes in the example are the equivalent of the two jetty jar versions - we don't have to compile them. The only code that has to compile is DynamicProxy.java which is completely normal code. bq. If you/we can tackle the ant/eclipse issues I would be happy to use the 'Dynamic Proxy', but maybe we tackle it in a different Jira because this is a pretty big blocker and I am sure many people want to see this in the trunk. As for committing now and not worrying, that sounds pretty reasonable, as long as there's some kind of deprecation timeline set out. (e.g in Hive 0.5.0 we will drop support for versions of Hadoop that use Jetty v5 or whatever). As someone who isn't a major Hive contributor, I'll defer to you guys completely -- I just wanted to throw the idea up on the JIRA. Hive does not compile with Hadoop 0.20.0 Key: HIVE-487 URL: https://issues.apache.org/jira/browse/HIVE-487 Project: Hadoop Hive Issue Type: Bug Affects Versions: 0.3.0 Reporter: Aaron Kimball Assignee: Justin Lynn Fix For: 0.4.0 Attachments: dynamic-proxy.tar.gz, HIVE-487-2.patch, hive-487-jetty-2.diff, hive-487-jetty.patch, hive-487.3.patch, hive-487.4.patch, HIVE-487.patch, jetty-patch.patch, junit-patch1.html Attempting to compile Hive with Hadoop 0.20.0 fails: aa...@jargon:~/src/ext/svn/hive-0.3.0$ ant -Dhadoop.version=0.20.0 package (several lines elided) compile: [echo] Compiling: hive [javac] Compiling 261 source files to /home/aaron/src/ext/svn/hive-0.3.0/build/ql/classes [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/exec/ExecDriver.java:94: cannot find symbol [javac] symbol : method getCommandLineConfig() [javac] location: class org.apache.hadoop.mapred.JobClient [javac] Configuration commandConf = JobClient.getCommandLineConfig(); [javac]^ [javac] /home/aaron/src/ext/svn/hive-0.3.0/build/ql/java/org/apache/hadoop/hive/ql/io/HiveInputFormat.java:241: cannot find symbol [javac] symbol : method validateInput(org.apache.hadoop.mapred.JobConf) [javac] location: interface org.apache.hadoop.mapred.InputFormat [javac] inputFormat.validateInput(newjob); [javac] ^ [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 2 errors BUILD FAILED /home/aaron/src/ext/svn/hive-0.3.0/build.xml:145: The following error occurred while executing this line: /home/aaron/src/ext/svn/hive-0.3.0/ql/build.xml:135: Compile failed; see the compiler error output for details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.