Re: pig 0.11 candidate 2 feedback: Several problems

Russell Jurney Wed, 20 Feb 2013 15:34:40 -0800

I stand corrected. Cool, 0.11 is good!


On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <[email protected]>wrote:

> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to 0.20.
>
> Jarcec
>
> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
> > I agree -- this is a good release. The bugs Kai pointed out should be
> > fixed, but as they are not critical regressions, we can fix them in
> 0.11.1
> > (if someone wants to roll 0.11.1 the minute these fixes are committed, I
> > won't mind and will dutifully vote for the release).
> >
> > I think the Hadoop 20.2 incompatibility is unfortunate but iirc this is
> > fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in 20.2?)
> >
> > FWIW Twitter's running CDH3 and this release works in our environment.
> >
> > At this point things that block a release are critical regressions in
> > performance or correctness.
> >
> > D
> >
> >
> > On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <[email protected]>
> wrote:
> >
> > > No.  Bugs like these are supposed to be found and fixed after we branch
> > > from trunk (which happened several months ago in the case of 0.11).
>  The
> > > point of RCs are to check that it's a good build, licenses are right,
> etc.
> > >  Any bugs found this late in the game have to be seen as failures of
> > > earlier testing.
> > >
> > > Alan.
> > >
> > > On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
> > >
> > > > Isn't the point of an RC to find and fix bugs like these>
> > > >
> > > >
> > > > On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <[email protected]>
> > > wrote:
> > > >
> > > >> Regarding Pig 11 rc2, I propose we continue with the current vote
> as is
> > > >> (which closes today EOD). Patches for 0.20.2 issues can be rolled
> into a
> > > >> Pig 0.11.1 release whenever they're available and tested.
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
> [email protected]
> > > >>> wrote:
> > > >>
> > > >>> I agree that supporting as much as we can is a good goal. The
> issue is
> > > >> who
> > > >>> is going to be testing against all these versions? We found the
> issues
> > > >>> under discussion because of a customer report, not because we
> > > >> consistently
> > > >>> test against all versions. Perhaps when we decide which versions to
> > > >> support
> > > >>> for next release we need also to agree who is going to be testing
> and
> > > >>> maintaining compatibility with a particular version.
> > > >>>
> > > >>> For instance since Hadoop 23 compatibility is important for us at
> Yahoo
> > > >> we
> > > >>> have been maintaining compatibility with this version for 0.9,
> 0.10 and
> > > >>> will do the same for 0.11 and going forward. I think we would need
> > > others
> > > >>> to step in and claim the versions of their interest.
> > > >>>
> > > >>> Olga
> > > >>>
> > > >>>
> > > >>> ________________________________
> > > >>> From: Kai Londenberg <[email protected]>
> > > >>> To: [email protected]
> > > >>> Sent: Wednesday, February 20, 2013 1:51 AM
> > > >>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > >>>
> > > >>> Hi,
> > > >>>
> > > >>> I stronly agree with Jonathan here. If there are good reasons why
> you
> > > >>> can't support an older version of Hadoop any more, that's one
> thing.
> > > >>> But having to change 2 lines of code doesn't really qualify as
> such in
> > > >>> my point of view ;)
> > > >>>
> > > >>> At least for me, pig support for 0.20.2 is essential - without it,
> I
> > > >>> can't use it. If it doesn't support it, I'll have to branch pig and
> > > >>> hack it myself, or stop using it.
> > > >>>
> > > >>> I guess, there are a lot of people still running 0.20.2 Clusters.
> If
> > > >>> you really have lots of data stored on HDFS and a continuously busy
> > > >>> cluster, an upgrade is nothing you do "just because".
> > > >>>
> > > >>>
> > > >>> 2013/2/20 Jonathan Coveney <[email protected]>:
> > > >>>> I agree that we shouldn't have to support old versions forever.
> That
> > > >>> said,
> > > >>>> I also don't think we should be too blase about supporting older
> > > >> versions
> > > >>>> where it is not odious to do so. We have a lot of competition in
> the
> > > >>>> language space and the broader the versions we can support, the
> better
> > > >>>> (assuming it isn't too odious to do so). In this case, I don't
> think
> > > it
> > > >>>> should be too hard to change ObjectSerializer so that the
> > > commons-codec
> > > >>>> code used is compatible with both versions...we could just in-line
> > > some
> > > >>> of
> > > >>>> the Base64 code, and comment accordingly.
> > > >>>>
> > > >>>> That said, we also should be clear about what versions we
> support, but
> > > >>> 6-12
> > > >>>> months seems short. The upgrade cycles on Hadoop are really,
> really
> > > >> long.
> > > >>>>
> > > >>>>
> > > >>>> 2013/2/20 Prashant Kommireddi <[email protected]>
> > > >>>>
> > > >>>>> Agreed, that makes sense. Probably supporting older hadoop
> version
> > > for
> > > >>> a 1
> > > >>>>> or 2 pig releases before moving to a newer/stable version?
> > > >>>>>
> > > >>>>> Having said that, should we use 0.11 period to communicate the
> same
> > > to
> > > >>> the
> > > >>>>> community and start moving on 0.12 onwards? I know we are way
> past
> > > >> 6-12
> > > >>>>> months (1-2 release) time frame with 0.20.2, but we also need to
> make
> > > >>> sure
> > > >>>>> users are aware and plan accordingly.
> > > >>>>>
> > > >>>>> I'd also be interested to hear how other projects (Hive, Oozie)
> are
> > > >>>>> handling this.
> > > >>>>>
> > > >>>>> -Prashant
> > > >>>>>
> > > >>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
> > > [email protected]
> > > >>>>>> wrote:
> > > >>>>>
> > > >>>>>> It seems that for each Pig release we need to agree and clearly
> > > >> state
> > > >>>>>> which Hadoop versions it will support. I guess the main
> question is
> > > >>> how
> > > >>>>> we
> > > >>>>>> decide on this. Perhaps we should say that Pig no longer
> supports
> > > >>> older
> > > >>>>>> Hadoop versions once the newer one is out for at least 6-12
> month to
> > > >>> make
> > > >>>>>> sure it is stable. I don't think we can support old versions
> > > >>>>> indefinitely.
> > > >>>>>> It is in everybody's interest to keep moving forward.
> > > >>>>>>
> > > >>>>>> Olga
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> ________________________________
> > > >>>>>> From: Prashant Kommireddi <[email protected]>
> > > >>>>>> To: [email protected]
> > > >>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
> > > >>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
> > > >>>>>>
> > > >>>>>> What do you guys feel about the JIRA to do with 0.20.2
> compatibility
> > > >>>>>> (PIG-3194)? I am interested in discussing the strategy around
> > > >> backward
> > > >>>>>> compatibility as this is something that would haunt us each
> time we
> > > >>> move
> > > >>>>> to
> > > >>>>>> the next hadoop version. For eg, we might be in a similar
> situation
> > > >>> while
> > > >>>>>> moving to Hadoop 2.0, when some of the stuff might break for
> 1.0.
> > > >>>>>>
> > > >>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
> 0.20.2
> > > >>> users
> > > >>>>>> might be caught unaware. Of course, I must admit there is
> selfish
> > > >>>>> interest
> > > >>>>>> here and it's probably easier for us to have a workaround on Pig
> > > >>> rather
> > > >>>>>> than upgrade hadoop in all our production DCs.
> > > >>>>>>
> > > >>>>>> -Prashant
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
> > > >>>>> [email protected]
> > > >>>>>>> wrote:
> > > >>>>>>
> > > >>>>>>> I think someone should step up and fix the easy ones, if
> possible.
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
> > > >> [email protected]>
> > > >>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Thanks Kai for reporting these.
> > > >>>>>>>>
> > > >>>>>>>> What do people think about the severity of these issues w.r.t.
> > > >> Pig
> > > >>>>> 11?
> > > >>>>>> I
> > > >>>>>>>> see a few possible options:
> > > >>>>>>>>
> > > >>>>>>>> 1. We include some or all of these patches in a new Pig 11 rc.
> > > >>> We'd
> > > >>>>>> want
> > > >>>>>>> to
> > > >>>>>>>> make sure that they don't destabilize the current branch. This
> > > >>>>> approach
> > > >>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
> > > >> without
> > > >>> one
> > > >>>>>> or
> > > >>>>>>>> more of these included.
> > > >>>>>>>>
> > > >>>>>>>> 2. We continue with the Pig 11 release without these, but then
> > > >>>>> include
> > > >>>>>>> one
> > > >>>>>>>> or more in a 0.11.1 release.
> > > >>>>>>>>
> > > >>>>>>>> 3. We continue with the Pig 11 release without these, but then
> > > >>>>> include
> > > >>>>>>> them
> > > >>>>>>>> in a 0.12 release.
> > > >>>>>>>>
> > > >>>>>>>> Jon has a patch for the MAP issue
> > > >>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144>)
> > > >>>>>>>> ready, which seems like the most pressing of the three to me.
> > > >>>>>>>>
> > > >>>>>>>> thanks,
> > > >>>>>>>> Bill
> > > >>>>>>>>
> > > >>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
> > > >>>>>>>> [email protected]> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> I just subscribed to the dev mailing list in order to give
> you
> > > >>> some
> > > >>>>>>>>> feedback on pig 0.11 candidate 2.
> > > >>>>>>>>>
> > > >>>>>>>>> The following three issues are currently present in 0.11
> > > >>> candidate
> > > >>>>> 2:
> > > >>>>>>>>>
> > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous
> > > >> map
> > > >>>>>> entry
> > > >>>>>>>>> alias resolution leading to "Duplicate schema alias" errors'
> > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes to
> > > >>>>>>>>> ObjectSerializer.java break compatibility with Hadoop 0.20.2
> > > >>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
> > > >>> Condition in
> > > >>>>>>>>> PhysicalOperator leads to ExecException "Error while trying
> to
> > > >>> get
> > > >>>>>>>>> next result in POStream"
> > > >>>>>>>>>
> > > >>>>>>>>> The last two of these are easily solveable (see the tickets
> > > >> for
> > > >>>>>>>>> details on that). The first one is a bit trickier I think,
> but
> > > >>> at
> > > >>>>>>>>> least there is a workaround for it (pass Map fields through
> an
> > > >>> UDF)
> > > >>>>>>>>>
> > > >>>>>>>>> In my personal opinion, each of these problems is pretty
> > > >> severe,
> > > >>>>> but
> > > >>>>>>>>> opinions about the importance of the MAP Datatype and STREAM
> > > >>>>>> Operator,
> > > >>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
> > > >>>>>>>>>
> > > >>>>>>>>> so far ..
> > > >>>>>>>>>
> > > >>>>>>>>> Kai Londenberg
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> --
> > > >>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
> > > >>> email
> > > >>>>> me
> > > >>>>>>> at
> > > >>>>>>>> [email protected] going forward.*
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> --
> > > >>>>>>> Russell Jurney twitter.com/rjurney [email protected]
> > > >>>>>>> datasyndrome.com
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> *Note that I'm no longer using my Yahoo! email address. Please
> email me
> > > at
> > > >> [email protected] going forward.*
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Russell Jurney twitter.com/rjurney [email protected]
> > > datasyndrome.com
> > >
> > >
>



-- 
Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com

Re: pig 0.11 candidate 2 feedback: Several problems

Reply via email to