[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745568#action_12745568 ] Santhosh Srinivasan commented on PIG-924: - Hadoop has promised "APIs in stone" forever and has not delivered on that promise yet. Higher layers in the stack have to learn how to cope with a ever changing lower layer. How this change is managed is a matter of convenience to the owners of the higher layer. I really like Shims approach which avoids the cost of branching out Pig every time we make a compatible release. The cost of creating a branch for each version of hadoop seems to be too high compared to the cost of the Shims approach. Of course, there are pros and cons to each approach. The question here is when will Hadoop set its APIs in stone and how many more releases will we have before this happens. If the answer to the question is 12 months and 2 more releases, then we should go with the Shims approach. If the answer is 3-6 months and one more release then we should stick with our current approach and pay the small penalty of patches supplied to work with the specific release of Hadoop. Summary: Use the shims patch if APIs are not set in stone within a quarter or two and if there is more than one release of Hadoop. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745551#action_12745551 ] Todd Lipcon commented on PIG-924: - Hey guys, As we understood it, Pig 0.5 wasn't due for quite some time. If it's the case that 0.5 is a small release on top of 0.4 and it should be out in a few weeks, this seems a lot more reasonable. Most likely we'll end up applying this patch to the 0.4 release for our distribution, even if there are multiple branches made in SVN. That's fine, though - we've got a process developed for this and are happy to support users on both versions for the next several months as people transition to 0.20 and the new APIs. Feel free to resolve as wontfix -Todd > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745544#action_12745544 ] Dmitriy V. Ryaboy commented on PIG-924: --- Arun -- it wouldn't suffice for those who want to use pig-0.4 with hadoop 0.19* or 0.20* Pig 0.5 isn't due out for 4 to 6 months which is behind the curve for adoption of 20. Putting in this patch will make compatibility an issue of a compile-time flag. Putting in this patch and restructuring the ant tasks somewhat will make this completely transparent. Waiting until 0.5 means that users wind up with instructions like this: http://behemoth.strlen.net/~alex/hadoop20-pig-howto.txt for half a year. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745540#action_12745540 ] Olga Natkovich commented on PIG-924: Todd and Dmitry, I understand your intention. I am wondering if in the current situation, the following might not be the best course of action: (1) Release Pig 0.4.0. I think we resolved all the blockers and can start the process (2) Wait till Hadoop 20.1 is released and release Pig 0.5.0. Owen promised that Hadoop 20.1 will go out for a vote next week. This means that Pig 0.4.0 and 0.5.0 will be just a couple of weeks apart which should not be a big issue for users. Meanwhile they can apply PIG-660 to the code bundled with Pig 0.4.0 or the trunk. I am currently working with the release engineering to get an official hadoop20.jar that Pig can be build with. I expect to have it in the next couple of days. The concern with applying the patch is the code complexity it introduces. Also, if there are patches that are version specific, they will not be easy to apply. Multiple branches is something we understand and know how to work with better. We also don't want to set a precedent of supporting pig releases on multiple versions on Hadoop because it is not clear that this is something we will be able to maintain going forward. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745536#action_12745536 ] Arun C Murthy commented on PIG-924: --- bq. The fact is, though, that there are a significant number of people running 0.18.x who would like to use Pig 0.4.0, and supporting them out of the box seems worth it. Given that the API is still changing for 0.21, and Pig hasn't adopted the "new" MR APIs yet, it seems like it's premature to leave 18 in the cold. I believe the plan is for 0.4.0 to work with hadoop-0.18.* anyway... wouldn't that suffice? > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745519#action_12745519 ] Arun C Murthy commented on PIG-924: --- I agree with Owen. One conceivable option is for the Pig project to maintain separate branches (per Pig release) to support the various Hadoop versions... several projects are run this way. Clearly it adds to the cost for pushing out a release for the Pig committers and it is their call. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745518#action_12745518 ] Dmitriy V. Ryaboy commented on PIG-924: --- Owen -- I may not have made the intent clear; the idea is that when Pig is rewritten to use the future-proofed APIs, the shims will go away (presumably for 0.5). Right now, pig is not using the new APIs, even the 20 patch posted by Olga uses the deprecated mapred calls. This is only to make life easier in the transitional period while Pig is using the old, mutating APIs. Check out the pig user list archives for motivation of why these shims are needed. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745517#action_12745517 ] Todd Lipcon commented on PIG-924: - bq. I think this is a bad idea and is totally unmaintainable. In particular, the HadoopShim interface is very specific to the changes in those particular versions. We are trying to stabilize the FileSystem and Map/Reduce interfaces to avoid these problems and that is a much better solution. Agreed that this is not a long term solution. Like you said, the long term solution is stabilized cross-version APIs so this isn't necessary. The fact is, though, that there are a significant number of people running 0.18.x who would like to use Pig 0.4.0, and supporting them out of the box seems worth it. This patch is pretty small and easily verifiable both by eye and by tests. Given that the API is still changing for 0.21, and Pig hasn't adopted the "new" MR APIs yet, it seems like it's premature to leave 18 in the cold. Do you have an objection to committing this only on the 0.4.0 branch and *not* planning to maintain it in trunk/0.5? > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745513#action_12745513 ] Daniel Dai commented on PIG-924: Wrapping hadoop functionality add extra maintenance cost to adopting new features of hadoop. We still need to figure out the balance point between usability and maintenance cost. I don't think this issue is a blocker for 0.4. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745166#action_12745166 ] Todd Lipcon commented on PIG-924: - bq. If existing deployments need a single pig.jar without a hadoop dependency, it might be possible to create a new target (pig-all) that would create a statically bundled jar; but I think the default behavior should be to not bundle, build all the shims, and use whatever hadoop is on the path. +1 for making the default to *not* bundle hadoop inside pig.jar, and adding another non-default target for those people who might want it. bq. The current patch is written as is so that it can be applied to trunk, enabling people to compile statically, and only require a change to the ant build files to switch to a dynamic compile later on (after 0.4, probably) >From the packager's perspective, I'd love if this change could get in for 0.4. >If it doesn't, we'll end up applying the patch ourselves for packaging >purposes - we need to have the hadoop dependency be on the user's installed >hadoop, not on whatever happened to get bundled into pig.jar. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745164#action_12745164 ] Dmitriy V. Ryaboy commented on PIG-924: --- Daniel, you've hit the nail on the head. This patch is specifically written to enable us to compile against all the versions of hadoop, and let the user pick which one he wants at runtime (by virtue of including the right hadoop on the path -- no flags needed). In fact the default ant task in the shims directory compiles all the shims at once. The version string hack is safe, as long as hadoop is built correctly (the zebra version is not, as it returns "Unknown", hence the last-resort hack of defaulting to 20). If hadoop came from its own jar I could use reflection to get the jar name, and use that as a fallback for an Unknown version -- but in pig, hadoop comes from the pig.jar ! Ideally, Pig would compile all the versions of shims into its jars, and the pig jar woud not include hadoop. Then the user would include the right hadoop on the path (or bin/pig would do it for him), and everything would happen automagically. By bundling hadoop into the jar, however, switching hadoop versions on the fly is next to impossible (or at least I don't know how) -- we have multiple jars on the classpath, and the classloader will use whatever is the latest (or is it earliest?). Finding the right resource becomes fraught with peril. If existing deployments need a single pig.jar without a hadoop dependency, it might be possible to create a new target (pig-all) that would create a statically bundled jar; but I think the default behavior should be to not bundle, build all the shims, and use whatever hadoop is on the path. The current patch is written as is so that it can be applied to trunk, enabling people to compile statically, and only require a change to the ant build files to switch to a dynamic compile later on (after 0.4, probably) > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745160#action_12745160 ] Daniel Dai commented on PIG-924: >From your latest patch, shims works this way 1. The version of shims Pig compiles is controlled by "hadoop.version" property in build.xml 2. The version of shims Pig uses is determined dynamically by hacking the string returned by VersionInfo.getVersion As in your code comment, version string hack is not safe. My thinking is that pig only use bundled hadoop unless override: 1. Pig compile all version of shims, There is no conflict between different version of shims, why not compile them all? So user do not need to recompile the code if he want to use different external hadoop. 2. Pig bundles a default hadoop, which is specified by hadoop.version in build.xml. Pig use this version of shims by default 3. If user want to use an external hadoop, he/she need to override the default hadoop version explicitly, eg, "-Dhadoop_version" in command line. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745109#action_12745109 ] Dmitriy V. Ryaboy commented on PIG-924: --- Regarding deprecation -- I tried setting it back to off, and adding @SuppressWarnings("deprecation") to the shims for 20, but and complained about deprecation nonetheless. Not sure what its deal is. Adding something like this to the main build.xml works. Does this seem like a reasonable solution? {code} [] {code} > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744953#action_12744953 ] Hadoop QA commented on PIG-924: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416945/pig_924.3.patch against trunk revision 804406. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/console This message is automatically generated. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744620#action_12744620 ] Todd Lipcon commented on PIG-924: - Few more comments I missed on the first pass through: - A few of the shim methods appear unused: - fileSystemDeleteOnExit - inputFormatValidateInput - setTmpFiles - Is the inner MiniDFSCluster class used? I think this is replaced by the MiniDFSClusterShim if I understand it correctly. - Still seems to be some unrelated changes to build.xml - the javac.deprecation for example - If we are now excluding TestHBaseStorage on all platforms, we should get rid of the two lines above it that exclude it only on windows - it's redundant and confusing. Thanks -Todd > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.2.patch, pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744378#action_12744378 ] Daniel Dai commented on PIG-924: Hi, Dmitriy, Generally the patch is good. Just like Todd said, we don't want to change anything else besides the shims layer. In addition to Todd's comment, Main.java contains the change for "pig.logfile", which you address in Pig-923. Would you please clear things up and resubmit? Thanks > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744310#action_12744310 ] Todd Lipcon commented on PIG-924: - Gotcha, thanks for explaining. Aside from the nits, patch looks good to me. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744307#action_12744307 ] Dmitriy V. Ryaboy commented on PIG-924: --- Thanks for looking, Todd -- most of those changes, like the factor of 0.9, deprecation, excluding HBase test, etc, are consistent with the 0.20 patch posted to PIG-660 . Moving junit.hadoop.conf is critical -- there are comments about this in 660 -- without it, resetting hadoop.version doesn't actually work, as some of the information from a previous build sticks around. I'll fix the whitespace; this wasn't a final patch, more of a proof of concept. Point being this could work, but it can't, because Hadoop is bundled in the jar. I am looking for comments from the core developer team regarding the possibility of un-bundling. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744305#action_12744305 ] Todd Lipcon commented on PIG-924: - Couple notes on the patch: - you've turned javac.deprecation from "on" to "off" - seems unwise. perhaps you should just do this for the one javac task where you want that behavior - src.shims.dir.com in the build.xml has a "REMOVE" mark on it - is this still needed? it looks like it is, but perhaps is better named .common instead of .com - you've moved junit.hadoop.conf into basedir instead of ${user.home} - this seems reasonable but is orthogonal to this patch. should be a separate JIRA - why are we now excluding HBase storage test? - some spurious whitespace changes (eg TypeCheckingVisitor.java) - in MRCompiler, a factor of 0.9 seems to have disappeared. the commented-out line should be removed - some tab characters seem to be introduced - in MiniCluster, also some commented-out code which should be cleaned up > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744273#action_12744273 ] Daniel Dai commented on PIG-924: I am reviewing the patch. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744216#action_12744216 ] Todd Lipcon commented on PIG-924: - Oops, apparently it is Monday and my brain is scrambled. Above should read "pretty important that a single build of *Pig* will work...", of course. > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744209#action_12744209 ] Todd Lipcon commented on PIG-924: - Hey guys, Any word on this? From the packaging perspective it's pretty important that a single build of Hive will work with both Hadoop 18 and Hadoop 20. Obviously packaging isn't the Yahoo team's highest priority, but I think it is very important for community adoption, etc. If we require separate builds for 18 and 20 it's one more thing that can cause confusion for new users. As I understand it from Dmitriy, for this to work we just need to stop packing the Hadoop JAR into the pig JAR. Instead, the wrapper script just needs to specify the hadoop JAR on the classpath. Is there some barrier to doing this that I'm unaware of? > Make Pig work with multiple versions of Hadoop > -- > > Key: PIG-924 > URL: https://issues.apache.org/jira/browse/PIG-924 > Project: Pig > Issue Type: Bug >Reporter: Dmitriy V. Ryaboy > Attachments: pig_924.patch > > > The current Pig build scripts package hadoop and other dependencies into the > pig.jar file. > This means that if users upgrade Hadoop, they also need to upgrade Pig. > Pig has relatively few dependencies on Hadoop interfaces that changed between > 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to > use the correct calls for any of the above versions of Hadoop. Unfortunately, > the building process precludes us from the ability to do this at runtime, and > forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.