[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745513#action_12745513 ] Daniel Dai commented on PIG-924: Wrapping hadoop functionality add extra maintenance cost to adopting new features of hadoop. We still need to figure out the balance point between usability and maintenance cost. I don't think this issue is a blocker for 0.4. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745517#action_12745517 ] Todd Lipcon commented on PIG-924: - bq. I think this is a bad idea and is totally unmaintainable. In particular, the HadoopShim interface is very specific to the changes in those particular versions. We are trying to stabilize the FileSystem and Map/Reduce interfaces to avoid these problems and that is a much better solution. Agreed that this is not a long term solution. Like you said, the long term solution is stabilized cross-version APIs so this isn't necessary. The fact is, though, that there are a significant number of people running 0.18.x who would like to use Pig 0.4.0, and supporting them out of the box seems worth it. This patch is pretty small and easily verifiable both by eye and by tests. Given that the API is still changing for 0.21, and Pig hasn't adopted the new MR APIs yet, it seems like it's premature to leave 18 in the cold. Do you have an objection to committing this only on the 0.4.0 branch and *not* planning to maintain it in trunk/0.5? Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745518#action_12745518 ] Dmitriy V. Ryaboy commented on PIG-924: --- Owen -- I may not have made the intent clear; the idea is that when Pig is rewritten to use the future-proofed APIs, the shims will go away (presumably for 0.5). Right now, pig is not using the new APIs, even the 20 patch posted by Olga uses the deprecated mapred calls. This is only to make life easier in the transitional period while Pig is using the old, mutating APIs. Check out the pig user list archives for motivation of why these shims are needed. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745540#action_12745540 ] Olga Natkovich commented on PIG-924: Todd and Dmitry, I understand your intention. I am wondering if in the current situation, the following might not be the best course of action: (1) Release Pig 0.4.0. I think we resolved all the blockers and can start the process (2) Wait till Hadoop 20.1 is released and release Pig 0.5.0. Owen promised that Hadoop 20.1 will go out for a vote next week. This means that Pig 0.4.0 and 0.5.0 will be just a couple of weeks apart which should not be a big issue for users. Meanwhile they can apply PIG-660 to the code bundled with Pig 0.4.0 or the trunk. I am currently working with the release engineering to get an official hadoop20.jar that Pig can be build with. I expect to have it in the next couple of days. The concern with applying the patch is the code complexity it introduces. Also, if there are patches that are version specific, they will not be easy to apply. Multiple branches is something we understand and know how to work with better. We also don't want to set a precedent of supporting pig releases on multiple versions on Hadoop because it is not clear that this is something we will be able to maintain going forward. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744953#action_12744953 ] Hadoop QA commented on PIG-924: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416945/pig_924.3.patch against trunk revision 804406. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 8 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/console This message is automatically generated. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745109#action_12745109 ] Dmitriy V. Ryaboy commented on PIG-924: --- Regarding deprecation -- I tried setting it back to off, and adding @SuppressWarnings(deprecation) to the shims for 20, but and complained about deprecation nonetheless. Not sure what its deal is. Adding something like this to the main build.xml works. Does this seem like a reasonable solution? {code} !-- set deprecation off if hadoop version greater or equals 20 -- target name=set_deprecation condition property=hadoop_is20 equals arg1=${hadoop.version} arg2=20/ /condition antcall target=if_hadoop_is20/ antcall target=if_hadoop_not20/ /target target name=if_hadoop_is20 if=hadoop_is20 property name=javac.deprecation value=off / /target target name=if_hadoop_not20 unless=hadoop_is20 property name=javac.deprecation value=on / /target target name=init depends=set_deprecation [] {code} Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745160#action_12745160 ] Daniel Dai commented on PIG-924: From your latest patch, shims works this way 1. The version of shims Pig compiles is controlled by hadoop.version property in build.xml 2. The version of shims Pig uses is determined dynamically by hacking the string returned by VersionInfo.getVersion As in your code comment, version string hack is not safe. My thinking is that pig only use bundled hadoop unless override: 1. Pig compile all version of shims, There is no conflict between different version of shims, why not compile them all? So user do not need to recompile the code if he want to use different external hadoop. 2. Pig bundles a default hadoop, which is specified by hadoop.version in build.xml. Pig use this version of shims by default 3. If user want to use an external hadoop, he/she need to override the default hadoop version explicitly, eg, -Dhadoop_version in command line. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12745166#action_12745166 ] Todd Lipcon commented on PIG-924: - bq. If existing deployments need a single pig.jar without a hadoop dependency, it might be possible to create a new target (pig-all) that would create a statically bundled jar; but I think the default behavior should be to not bundle, build all the shims, and use whatever hadoop is on the path. +1 for making the default to *not* bundle hadoop inside pig.jar, and adding another non-default target for those people who might want it. bq. The current patch is written as is so that it can be applied to trunk, enabling people to compile statically, and only require a change to the ant build files to switch to a dynamic compile later on (after 0.4, probably) From the packager's perspective, I'd love if this change could get in for 0.4. If it doesn't, we'll end up applying the patch ourselves for packaging purposes - we need to have the hadoop dependency be on the user's installed hadoop, not on whatever happened to get bundled into pig.jar. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744378#action_12744378 ] Daniel Dai commented on PIG-924: Hi, Dmitriy, Generally the patch is good. Just like Todd said, we don't want to change anything else besides the shims layer. In addition to Todd's comment, Main.java contains the change for pig.logfile, which you address in Pig-923. Would you please clear things up and resubmit? Thanks Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744620#action_12744620 ] Todd Lipcon commented on PIG-924: - Few more comments I missed on the first pass through: - A few of the shim methods appear unused: - fileSystemDeleteOnExit - inputFormatValidateInput - setTmpFiles - Is the inner MiniDFSCluster class used? I think this is replaced by the MiniDFSClusterShim if I understand it correctly. - Still seems to be some unrelated changes to build.xml - the javac.deprecation for example - If we are now excluding TestHBaseStorage on all platforms, we should get rid of the two lines above it that exclude it only on windows - it's redundant and confusing. Thanks -Todd Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.2.patch, pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744216#action_12744216 ] Todd Lipcon commented on PIG-924: - Oops, apparently it is Monday and my brain is scrambled. Above should read pretty important that a single build of *Pig* will work..., of course. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744273#action_12744273 ] Daniel Dai commented on PIG-924: I am reviewing the patch. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744305#action_12744305 ] Todd Lipcon commented on PIG-924: - Couple notes on the patch: - you've turned javac.deprecation from on to off - seems unwise. perhaps you should just do this for the one javac task where you want that behavior - src.shims.dir.com in the build.xml has a REMOVE mark on it - is this still needed? it looks like it is, but perhaps is better named .common instead of .com - you've moved junit.hadoop.conf into basedir instead of ${user.home} - this seems reasonable but is orthogonal to this patch. should be a separate JIRA - why are we now excluding HBase storage test? - some spurious whitespace changes (eg TypeCheckingVisitor.java) - in MRCompiler, a factor of 0.9 seems to have disappeared. the commented-out line should be removed - some tab characters seem to be introduced - in MiniCluster, also some commented-out code which should be cleaned up Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744307#action_12744307 ] Dmitriy V. Ryaboy commented on PIG-924: --- Thanks for looking, Todd -- most of those changes, like the factor of 0.9, deprecation, excluding HBase test, etc, are consistent with the 0.20 patch posted to PIG-660 . Moving junit.hadoop.conf is critical -- there are comments about this in 660 -- without it, resetting hadoop.version doesn't actually work, as some of the information from a previous build sticks around. I'll fix the whitespace; this wasn't a final patch, more of a proof of concept. Point being this could work, but it can't, because Hadoop is bundled in the jar. I am looking for comments from the core developer team regarding the possibility of un-bundling. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744310#action_12744310 ] Todd Lipcon commented on PIG-924: - Gotcha, thanks for explaining. Aside from the nits, patch looks good to me. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.