Re: pig 0.11 candidate 2 feedback: Several problems

Olga Natkovich Wed, 20 Feb 2013 09:26:05 -0800

I agree that supporting as much as we can is a good goal. The issue is who is 
going to be testing against all these versions? We found the issues under 
discussion because of a customer report, not because we consistently test 
against all versions. Perhaps when we decide which versions to support for next 
release we need also to agree who is going to be testing and maintaining 
compatibility with a particular version.


For instance since Hadoop 23 compatibility is important for us at Yahoo we have 
been maintaining compatibility with this version for 0.9, 0.10 and will do the 
same for 0.11 and going forward. I think we would need others to step in and 
claim the versions of their interest.

Olga


________________________________
 From: Kai Londenberg <kai.londenb...@googlemail.com>
To: dev@pig.apache.org 
Sent: Wednesday, February 20, 2013 1:51 AM
Subject: Re: pig 0.11 candidate 2 feedback: Several problems
 
Hi,

I stronly agree with Jonathan here. If there are good reasons why you
can't support an older version of Hadoop any more, that's one thing.
But having to change 2 lines of code doesn't really qualify as such in
my point of view ;)

At least for me, pig support for 0.20.2 is essential - without it, I
can't use it. If it doesn't support it, I'll have to branch pig and
hack it myself, or stop using it.

I guess, there are a lot of people still running 0.20.2 Clusters. If
you really have lots of data stored on HDFS and a continuously busy
cluster, an upgrade is nothing you do "just because".


2013/2/20 Jonathan Coveney <jcove...@gmail.com>:
> I agree that we shouldn't have to support old versions forever. That said,
> I also don't think we should be too blase about supporting older versions
> where it is not odious to do so. We have a lot of competition in the
> language space and the broader the versions we can support, the better
> (assuming it isn't too odious to do so). In this case, I don't think it
> should be too hard to change ObjectSerializer so that the commons-codec
> code used is compatible with both versions...we could just in-line some of
> the Base64 code, and comment accordingly.
>
> That said, we also should be clear about what versions we support, but 6-12
> months seems short. The upgrade cycles on Hadoop are really, really long.
>
>
> 2013/2/20 Prashant Kommireddi <prash1...@gmail.com>
>
>> Agreed, that makes sense. Probably supporting older hadoop version for a 1
>> or 2 pig releases before moving to a newer/stable version?
>>
>> Having said that, should we use 0.11 period to communicate the same to the
>> community and start moving on 0.12 onwards? I know we are way past 6-12
>> months (1-2 release) time frame with 0.20.2, but we also need to make sure
>> users are aware and plan accordingly.
>>
>> I'd also be interested to hear how other projects (Hive, Oozie) are
>> handling this.
>>
>> -Prashant
>>
>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <onatkov...@yahoo.com
>> >wrote:
>>
>> > It seems that for each Pig release we need to agree and clearly state
>> > which Hadoop versions it will support. I guess the main question is how
>> we
>> > decide on this. Perhaps we should say that Pig no longer supports older
>> > Hadoop versions once the newer one is out for at least 6-12 month to make
>> > sure it is stable. I don't think we can support old versions
>> indefinitely.
>> > It is in everybody's interest to keep moving forward.
>> >
>> > Olga
>> >
>> >
>> > ________________________________
>> >  From: Prashant Kommireddi <prash1...@gmail.com>
>> > To: dev@pig.apache.org
>> > Sent: Tuesday, February 19, 2013 10:57 AM
>> > Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>> >
>> > What do you guys feel about the JIRA to do with 0.20.2 compatibility
>> > (PIG-3194)? I am interested in discussing the strategy around backward
>> > compatibility as this is something that would haunt us each time we move
>> to
>> > the next hadoop version. For eg, we might be in a similar situation while
>> > moving to Hadoop 2.0, when some of the stuff might break for 1.0.
>> >
>> > I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users
>> > might be caught unaware. Of course, I must admit there is selfish
>> interest
>> > here and it's probably easier for us to have a workaround on Pig rather
>> > than upgrade hadoop in all our production DCs.
>> >
>> > -Prashant
>> >
>> >
>> > On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>> russell.jur...@gmail.com
>> > >wrote:
>> >
>> > > I think someone should step up and fix the easy ones, if possible.
>> > >
>> > >
>> > > On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <billgra...@gmail.com>
>> > wrote:
>> > >
>> > > > Thanks Kai for reporting these.
>> > > >
>> > > > What do people think about the severity of these issues w.r.t. Pig
>> 11?
>> > I
>> > > > see a few possible options:
>> > > >
>> > > > 1. We include some or all of these patches in a new Pig 11 rc. We'd
>> > want
>> > > to
>> > > > make sure that they don't destabilize the current branch. This
>> approach
>> > > > makes sense if we think Pig 11 wouldn't be a good release without one
>> > or
>> > > > more of these included.
>> > > >
>> > > > 2. We continue with the Pig 11 release without these, but then
>> include
>> > > one
>> > > > or more in a 0.11.1 release.
>> > > >
>> > > > 3. We continue with the Pig 11 release without these, but then
>> include
>> > > them
>> > > > in a 0.12 release.
>> > > >
>> > > > Jon has a patch for the MAP issue
>> > > > (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
>> > > > ready, which seems like the most pressing of the three to me.
>> > > >
>> > > > thanks,
>> > > > Bill
>> > > >
>> > > > On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>> > > > kai.londenb...@googlemail.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > I just subscribed to the dev mailing list in order to give you some
>> > > > > feedback on pig 0.11 candidate 2.
>> > > > >
>> > > > > The following three issues are currently present in 0.11 candidate
>> 2:
>> > > > >
>> > > > > https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map
>> > entry
>> > > > > alias resolution leading to "Duplicate schema alias" errors'
>> > > > > https://issues.apache.org/jira/browse/PIG-3194 - Changes to
>> > > > > ObjectSerializer.java break compatibility with Hadoop 0.20.2
>> > > > > https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in
>> > > > > PhysicalOperator leads to ExecException "Error while trying to get
>> > > > > next result in POStream"
>> > > > >
>> > > > > The last two of these are easily solveable (see the tickets for
>> > > > > details on that). The first one is a bit trickier I think, but at
>> > > > > least there is a workaround for it (pass Map fields through an UDF)
>> > > > >
>> > > > > In my personal opinion, each of these problems is pretty severe,
>> but
>> > > > > opinions about the importance of the MAP Datatype and STREAM
>> > Operator,
>> > > > > as well as Hadoop 0.20.2 compatibility might differ.
>> > > > >
>> > > > > so far ..
>> > > > >
>> > > > > Kai Londenberg
>> > > > >
>> > > >
>> > > >
>> > > >
>> > > > --
>> > > > *Note that I'm no longer using my Yahoo! email address. Please email
>> me
>> > > at
>> > > > billgra...@gmail.com going forward.*
>> > > >
>> > >
>> > >
>> > >
>> > > --
>> > > Russell Jurney twitter.com/rjurney russell.jur...@gmail.com
>> > > datasyndrome.com
>> > >
>> >
>>

Re: pig 0.11 candidate 2 feedback: Several problems

Reply via email to