[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745568#action_12745568
 ] 

Santhosh Srinivasan commented on PIG-924:
-

Hadoop has promised "APIs in stone" forever and has not delivered on that 
promise yet. Higher layers in the stack have to learn how to cope with a ever 
changing lower layer. How this change is managed is a matter of convenience to 
the owners of the higher layer. I really like Shims approach which avoids the 
cost of branching out Pig every time we make a compatible release. The cost of 
creating a branch for each version of hadoop seems to be too high compared to 
the cost of the Shims approach.

Of course, there are pros and cons to each approach. The question here is when 
will Hadoop set its APIs in stone and how many more releases will we have 
before this happens. If the answer to the question is 12 months and 2 more 
releases, then we should go with the Shims approach. If the answer is 3-6 
months and one more release then we should stick with our current approach and 
pay the small penalty of patches supplied to work with the specific release of 
Hadoop.

Summary: Use the shims patch if APIs are not set in stone within a quarter or 
two and if there is more than one release of Hadoop.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745551#action_12745551
 ] 

Todd Lipcon commented on PIG-924:
-

Hey guys,

As we understood it, Pig 0.5 wasn't due for quite some time. If it's the case 
that 0.5 is a small release on top of 0.4 and it should be out in a few weeks, 
this seems a lot more reasonable.

Most likely we'll end up applying this patch to the 0.4 release for our 
distribution, even if there are multiple branches made in SVN. That's fine, 
though - we've got a process developed for this and are happy to support users 
on both versions for the next several months as people transition to 0.20 and 
the new APIs.

Feel free to resolve as wontfix
-Todd

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745544#action_12745544
 ] 

Dmitriy V. Ryaboy commented on PIG-924:
---

Arun -- it wouldn't suffice for those who want to use pig-0.4 with hadoop 0.19* 
or 0.20*

Pig 0.5 isn't due out for 4 to 6 months which is behind the curve for adoption 
of 20.  Putting in this patch will make compatibility an issue of a 
compile-time flag.  Putting in this patch and restructuring the ant tasks 
somewhat will make this completely transparent.

Waiting until 0.5 means that users wind up with instructions like this: 
http://behemoth.strlen.net/~alex/hadoop20-pig-howto.txt for half a year.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745540#action_12745540
 ] 

Olga Natkovich commented on PIG-924:


Todd and Dmitry, I understand your intention. I am wondering if in the current 
situation, the following might not be the best course of action:

(1) Release Pig 0.4.0. I think we resolved all the blockers and can start the 
process
(2) Wait till Hadoop 20.1 is released and release Pig 0.5.0.

Owen promised that Hadoop 20.1 will go out for a vote next week. This means 
that Pig 0.4.0 and 0.5.0 will be just a couple of weeks apart which should not 
be a big issue for users. Meanwhile they can apply PIG-660 to the code bundled 
with Pig 0.4.0 or the trunk. I am currently working with the release 
engineering to get an official hadoop20.jar that Pig can  be build with. I 
expect to have it in the next couple of days.

The concern with applying the patch is the code complexity it introduces. Also, 
if there are patches that are version specific, they will not be easy to apply. 
Multiple branches is something we understand and know how to work with better. 
We also don't want to set a precedent of supporting pig releases on multiple 
versions on Hadoop because it is not clear that this is something we will be 
able to maintain going forward.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745536#action_12745536
 ] 

Arun C Murthy commented on PIG-924:
---

bq. The fact is, though, that there are a significant number of people running 
0.18.x who would like to use Pig 0.4.0, and supporting them out of the box 
seems worth it. Given that the API is still changing for 0.21, and Pig hasn't 
adopted the "new" MR APIs yet, it seems like it's premature to leave 18 in the 
cold.

I believe the plan is for 0.4.0 to work with hadoop-0.18.* anyway... wouldn't 
that suffice?

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Arun C Murthy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745519#action_12745519
 ] 

Arun C Murthy commented on PIG-924:
---

I agree with Owen.

One conceivable option is for the Pig project to maintain separate branches 
(per Pig release) to support the various Hadoop versions... several projects 
are run this way. Clearly it adds to the cost for pushing out a release for the 
Pig committers and it is their call.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745518#action_12745518
 ] 

Dmitriy V. Ryaboy commented on PIG-924:
---

Owen -- I may not have made the intent clear; the idea is that when Pig is 
rewritten to use the future-proofed APIs, the shims will go away (presumably 
for 0.5).   Right now, pig is not using the new APIs, even the 20 patch posted 
by Olga uses the deprecated mapred calls. 

This is only to make life easier in the transitional period while Pig is using 
the old, mutating APIs.

Check out the pig user list archives for motivation of why these shims are 
needed.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745517#action_12745517
 ] 

Todd Lipcon commented on PIG-924:
-

bq. I think this is a bad idea and is totally unmaintainable. In particular, 
the HadoopShim interface is very specific to the changes in those particular 
versions. We are trying to stabilize the FileSystem and Map/Reduce interfaces 
to avoid these problems and that is a much better solution.

Agreed that this is not a long term solution. Like you said, the long term 
solution is stabilized cross-version APIs so this isn't necessary. The fact is, 
though, that there are a significant number of people running 0.18.x who would 
like to use Pig 0.4.0, and supporting them out of the box seems worth it. This 
patch is pretty small and easily verifiable both by eye and by tests. Given 
that the API is still changing for 0.21, and Pig hasn't adopted the "new" MR 
APIs yet, it seems like it's premature to leave 18 in the cold.

Do you have an objection to committing this only on the 0.4.0 branch and *not* 
planning to maintain it in trunk/0.5?

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745513#action_12745513
 ] 

Daniel Dai commented on PIG-924:


Wrapping hadoop functionality add extra maintenance cost to adopting new 
features of hadoop. We still need to figure out the balance point between 
usability and maintenance cost. I don't think this issue is a blocker for 0.4.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745166#action_12745166
 ] 

Todd Lipcon commented on PIG-924:
-

bq. If existing deployments need a single pig.jar without a hadoop dependency, 
it might be possible to create a new target (pig-all) that would create a 
statically bundled jar; but I think the default behavior should be to not 
bundle, build all the shims, and use whatever hadoop is on the path.

+1 for making the default to *not* bundle hadoop inside pig.jar, and adding 
another non-default target for those people who might want it.

bq. The current patch is written as is so that it can be applied to trunk, 
enabling people to compile statically, and only require a change to the ant 
build files to switch to a dynamic compile later on (after 0.4, probably)

>From the packager's perspective, I'd love if this change could get in for 0.4. 
>If it doesn't, we'll end up applying the patch ourselves for packaging 
>purposes - we need to have the hadoop dependency be on the user's installed 
>hadoop, not on whatever happened to get bundled into pig.jar.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745164#action_12745164
 ] 

Dmitriy V. Ryaboy commented on PIG-924:
---

Daniel, you've hit the nail on the head.

This patch is specifically written to enable us to compile against all the 
versions of hadoop, and let the user pick which one he wants at runtime (by 
virtue of including the right hadoop on the path -- no flags needed).  In fact 
the default ant task in the shims directory compiles all the shims at once.

The version string hack is safe, as long as hadoop is built correctly (the 
zebra version is not, as it returns "Unknown", hence the last-resort hack of 
defaulting to 20).
If hadoop came from its own jar I could use reflection to get the jar name, and 
use that as a fallback for an Unknown version -- but in pig, hadoop comes from 
the pig.jar !

Ideally, Pig would compile all the versions of shims into its jars, and the pig 
jar woud not include hadoop. Then the user would include the right hadoop on 
the path (or bin/pig would do it for him), and everything would happen 
automagically.  

By bundling hadoop into the jar, however, switching hadoop versions on the fly 
is next to impossible (or at least I don't know how) -- we have multiple jars 
on the classpath, and the classloader will use whatever is the latest (or is it 
earliest?). Finding the right resource becomes fraught with peril.

If existing deployments need a single pig.jar without a hadoop dependency, it 
might be possible to create a new target (pig-all) that would create a 
statically bundled jar; but I think the default behavior should be to not 
bundle, build all the shims, and use whatever hadoop is on the path.

The current patch is written as is so that it can be applied to trunk, enabling 
people to compile statically, and only require a change to the ant build files 
to switch to a dynamic compile later on (after 0.4, probably)

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745160#action_12745160
 ] 

Daniel Dai commented on PIG-924:


>From your latest patch, shims works this way
1. The version of shims Pig compiles is controlled by "hadoop.version" property 
in build.xml
2. The version of shims Pig uses is determined dynamically by hacking the 
string returned by VersionInfo.getVersion

As in your code comment, version string hack is not safe. My thinking is that 
pig only use bundled hadoop unless override:
1. Pig compile all version of shims, There is no conflict between different 
version of shims, why not compile them all? So user do not need to recompile 
the code if he want to use different external hadoop.
2. Pig bundles a default hadoop, which is specified by hadoop.version in 
build.xml. Pig use this version of shims by default
3. If user want to use an external hadoop, he/she need to override the default 
hadoop version explicitly, eg, "-Dhadoop_version" in command line. 

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745109#action_12745109
 ] 

Dmitriy V. Ryaboy commented on PIG-924:
---

Regarding deprecation -- I tried setting it back to off, and adding 
@SuppressWarnings("deprecation") to the shims for 20, but and complained about 
deprecation nonetheless. Not sure what its deal is.

Adding something like this to the main build.xml works. Does this seem like a 
reasonable solution?

{code}


  

  
  
  


  


  




  []
{code}

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-19 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744953#action_12744953
 ] 

Hadoop QA commented on PIG-924:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12416945/pig_924.3.patch
  against trunk revision 804406.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 8 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/173/console

This message is automatically generated.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-18 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744620#action_12744620
 ] 

Todd Lipcon commented on PIG-924:
-

Few more comments I missed on the first pass through:

- A few of the shim methods appear unused:
  - fileSystemDeleteOnExit
  - inputFormatValidateInput
  - setTmpFiles

- Is the inner MiniDFSCluster class used? I think this is replaced by the 
MiniDFSClusterShim if I understand it correctly.

- Still seems to be some unrelated changes to build.xml - the javac.deprecation 
for example
- If we are now excluding TestHBaseStorage on all platforms, we should get rid 
of the two lines above it that exclude it only on windows - it's redundant and 
confusing.

Thanks
-Todd

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744378#action_12744378
 ] 

Daniel Dai commented on PIG-924:


Hi, Dmitriy, 
Generally the patch is good. Just like Todd said, we don't want to change 
anything else besides the shims layer. In addition to Todd's comment, Main.java 
contains the change for "pig.logfile", which you address in Pig-923. Would you 
please clear things up and resubmit?

Thanks

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744310#action_12744310
 ] 

Todd Lipcon commented on PIG-924:
-

Gotcha, thanks for explaining. Aside from the nits, patch looks good to me.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744307#action_12744307
 ] 

Dmitriy V. Ryaboy commented on PIG-924:
---

Thanks for looking, Todd -- most of those changes, like the factor of 0.9, 
deprecation, excluding HBase test, etc, are consistent with the 0.20 patch 
posted to PIG-660 . 
Moving junit.hadoop.conf is critical -- there are comments about this in 660 -- 
without it, resetting hadoop.version doesn't actually work, as some of the 
information from a previous build sticks around.

I'll fix the whitespace; this wasn't a final patch, more of a proof of concept. 
 Point being this could work, but it can't, because Hadoop is bundled in the 
jar. I am looking for comments from the core developer team regarding the 
possibility of un-bundling.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744305#action_12744305
 ] 

Todd Lipcon commented on PIG-924:
-

Couple notes on the patch:

- you've turned javac.deprecation from "on" to "off" - seems unwise. perhaps 
you should just do this for the one javac task where you want that behavior
- src.shims.dir.com in the build.xml has a "REMOVE" mark on it - is this still 
needed? it looks like it is, but perhaps is better named .common instead of .com
- you've moved junit.hadoop.conf into basedir instead of ${user.home} - this 
seems reasonable but is orthogonal to this patch. should be a separate JIRA
- why are we now excluding HBase storage test?
- some spurious whitespace changes (eg TypeCheckingVisitor.java)
- in MRCompiler, a factor of 0.9 seems to have disappeared. the commented-out 
line should be removed
- some tab characters seem to be introduced
- in MiniCluster, also some commented-out code which should be cleaned up


> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744273#action_12744273
 ] 

Daniel Dai commented on PIG-924:


I am reviewing the patch.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744216#action_12744216
 ] 

Todd Lipcon commented on PIG-924:
-

Oops, apparently it is Monday and my brain is scrambled. Above should read 
"pretty important that a single build of *Pig* will work...", of course.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12744209#action_12744209
 ] 

Todd Lipcon commented on PIG-924:
-

Hey guys,

Any word on this? From the packaging perspective it's pretty important that a 
single build of Hive will work with both Hadoop 18 and Hadoop 20. Obviously 
packaging isn't the Yahoo team's highest priority, but I think it is very 
important for community adoption, etc. If we require separate builds for 18 and 
20 it's one more thing that can cause confusion for new users.

As I understand it from Dmitriy, for this to work we just need to stop packing 
the Hadoop JAR into the pig JAR. Instead, the wrapper script just needs to 
specify the hadoop JAR on the classpath. Is there some barrier to doing this 
that I'm unaware of?

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.