Re: pig 0.11 candidate 2 feedback: Several problems

Dmitriy Ryaboy Fri, 01 Mar 2013 19:01:24 -0800

I'd like to get the gc fix in as well, but looks like Rohini is about to commit 
it so we are good there.


On Mar 1, 2013, at 11:33 AM, Bill Graham <[email protected]> wrote:

> +1 to releasing Pig 0.11.1 when this is addressed. I should be able to help
> with the release again.
> 
> 
> 
> On Fri, Mar 1, 2013 at 11:25 AM, Prashant Kommireddi 
> <[email protected]>wrote:
> 
>> Hey Guys,
>> 
>> I wanted to start a conversation on this again. If Kai is not looking at
>> PIG-3194 I can start working on it to get 0.11 compatible with 20.2. If
>> everyone agrees, we should roll out 0.11.1 sooner than usual and I
>> volunteer to help with it in anyway possible.
>> 
>> Any objections to getting 0.11.1 out soon after 3194 is fixed?
>> 
>> -Prashant
>> 
>> On Wed, Feb 20, 2013 at 3:34 PM, Russell Jurney <[email protected]
>>> wrote:
>> 
>>> I stand corrected. Cool, 0.11 is good!
>>> 
>>> 
>>> On Wed, Feb 20, 2013 at 1:15 PM, Jarek Jarcec Cecho <[email protected]
>>>> wrote:
>>> 
>>>> Just a unrelated note: The CDH3 is more closer to Hadoop 1.x than to
>>> 0.20.
>>>> 
>>>> Jarcec
>>>> 
>>>> On Wed, Feb 20, 2013 at 12:04:51PM -0800, Dmitriy Ryaboy wrote:
>>>>> I agree -- this is a good release. The bugs Kai pointed out should be
>>>>> fixed, but as they are not critical regressions, we can fix them in
>>>> 0.11.1
>>>>> (if someone wants to roll 0.11.1 the minute these fixes are
>> committed,
>>> I
>>>>> won't mind and will dutifully vote for the release).
>>>>> 
>>>>> I think the Hadoop 20.2 incompatibility is unfortunate but iirc this
>> is
>>>>> fixable by setting HADOOP_USER_CLASSPATH_FIRST=true (was that in
>> 20.2?)
>>>>> 
>>>>> FWIW Twitter's running CDH3 and this release works in our
>> environment.
>>>>> 
>>>>> At this point things that block a release are critical regressions in
>>>>> performance or correctness.
>>>>> 
>>>>> D
>>>>> 
>>>>> 
>>>>> On Wed, Feb 20, 2013 at 11:52 AM, Alan Gates <[email protected]>
>>>> wrote:
>>>>> 
>>>>>> No.  Bugs like these are supposed to be found and fixed after we
>>> branch
>>>>>> from trunk (which happened several months ago in the case of 0.11).
>>>> The
>>>>>> point of RCs are to check that it's a good build, licenses are
>> right,
>>>> etc.
>>>>>> Any bugs found this late in the game have to be seen as failures
>> of
>>>>>> earlier testing.
>>>>>> 
>>>>>> Alan.
>>>>>> 
>>>>>> On Feb 20, 2013, at 11:33 AM, Russell Jurney wrote:
>>>>>> 
>>>>>>> Isn't the point of an RC to find and fix bugs like these>
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Feb 20, 2013 at 11:31 AM, Bill Graham <
>>> [email protected]>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Regarding Pig 11 rc2, I propose we continue with the current
>> vote
>>>> as is
>>>>>>>> (which closes today EOD). Patches for 0.20.2 issues can be
>> rolled
>>>> into a
>>>>>>>> Pig 0.11.1 release whenever they're available and tested.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Wed, Feb 20, 2013 at 9:24 AM, Olga Natkovich <
>>>> [email protected]
>>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I agree that supporting as much as we can is a good goal. The
>>>> issue is
>>>>>>>> who
>>>>>>>>> is going to be testing against all these versions? We found the
>>>> issues
>>>>>>>>> under discussion because of a customer report, not because we
>>>>>>>> consistently
>>>>>>>>> test against all versions. Perhaps when we decide which
>> versions
>>> to
>>>>>>>> support
>>>>>>>>> for next release we need also to agree who is going to be
>> testing
>>>> and
>>>>>>>>> maintaining compatibility with a particular version.
>>>>>>>>> 
>>>>>>>>> For instance since Hadoop 23 compatibility is important for us
>> at
>>>> Yahoo
>>>>>>>> we
>>>>>>>>> have been maintaining compatibility with this version for 0.9,
>>>> 0.10 and
>>>>>>>>> will do the same for 0.11 and going forward. I think we would
>>> need
>>>>>> others
>>>>>>>>> to step in and claim the versions of their interest.
>>>>>>>>> 
>>>>>>>>> Olga
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> ________________________________
>>>>>>>>> From: Kai Londenberg <[email protected]>
>>>>>>>>> To: [email protected]
>>>>>>>>> Sent: Wednesday, February 20, 2013 1:51 AM
>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I stronly agree with Jonathan here. If there are good reasons
>> why
>>>> you
>>>>>>>>> can't support an older version of Hadoop any more, that's one
>>>> thing.
>>>>>>>>> But having to change 2 lines of code doesn't really qualify as
>>>> such in
>>>>>>>>> my point of view ;)
>>>>>>>>> 
>>>>>>>>> At least for me, pig support for 0.20.2 is essential - without
>>> it,
>>>> I
>>>>>>>>> can't use it. If it doesn't support it, I'll have to branch pig
>>> and
>>>>>>>>> hack it myself, or stop using it.
>>>>>>>>> 
>>>>>>>>> I guess, there are a lot of people still running 0.20.2
>> Clusters.
>>>> If
>>>>>>>>> you really have lots of data stored on HDFS and a continuously
>>> busy
>>>>>>>>> cluster, an upgrade is nothing you do "just because".
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 2013/2/20 Jonathan Coveney <[email protected]>:
>>>>>>>>>> I agree that we shouldn't have to support old versions
>> forever.
>>>> That
>>>>>>>>> said,
>>>>>>>>>> I also don't think we should be too blase about supporting
>> older
>>>>>>>> versions
>>>>>>>>>> where it is not odious to do so. We have a lot of competition
>> in
>>>> the
>>>>>>>>>> language space and the broader the versions we can support,
>> the
>>>> better
>>>>>>>>>> (assuming it isn't too odious to do so). In this case, I don't
>>>> think
>>>>>> it
>>>>>>>>>> should be too hard to change ObjectSerializer so that the
>>>>>> commons-codec
>>>>>>>>>> code used is compatible with both versions...we could just
>>> in-line
>>>>>> some
>>>>>>>>> of
>>>>>>>>>> the Base64 code, and comment accordingly.
>>>>>>>>>> 
>>>>>>>>>> That said, we also should be clear about what versions we
>>>> support, but
>>>>>>>>> 6-12
>>>>>>>>>> months seems short. The upgrade cycles on Hadoop are really,
>>>> really
>>>>>>>> long.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 2013/2/20 Prashant Kommireddi <[email protected]>
>>>>>>>>>> 
>>>>>>>>>>> Agreed, that makes sense. Probably supporting older hadoop
>>>> version
>>>>>> for
>>>>>>>>> a 1
>>>>>>>>>>> or 2 pig releases before moving to a newer/stable version?
>>>>>>>>>>> 
>>>>>>>>>>> Having said that, should we use 0.11 period to communicate
>> the
>>>> same
>>>>>> to
>>>>>>>>> the
>>>>>>>>>>> community and start moving on 0.12 onwards? I know we are way
>>>> past
>>>>>>>> 6-12
>>>>>>>>>>> months (1-2 release) time frame with 0.20.2, but we also need
>>> to
>>>> make
>>>>>>>>> sure
>>>>>>>>>>> users are aware and plan accordingly.
>>>>>>>>>>> 
>>>>>>>>>>> I'd also be interested to hear how other projects (Hive,
>> Oozie)
>>>> are
>>>>>>>>>>> handling this.
>>>>>>>>>>> 
>>>>>>>>>>> -Prashant
>>>>>>>>>>> 
>>>>>>>>>>> On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich <
>>>>>> [email protected]
>>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> It seems that for each Pig release we need to agree and
>>> clearly
>>>>>>>> state
>>>>>>>>>>>> which Hadoop versions it will support. I guess the main
>>>> question is
>>>>>>>>> how
>>>>>>>>>>> we
>>>>>>>>>>>> decide on this. Perhaps we should say that Pig no longer
>>>> supports
>>>>>>>>> older
>>>>>>>>>>>> Hadoop versions once the newer one is out for at least 6-12
>>>> month to
>>>>>>>>> make
>>>>>>>>>>>> sure it is stable. I don't think we can support old versions
>>>>>>>>>>> indefinitely.
>>>>>>>>>>>> It is in everybody's interest to keep moving forward.
>>>>>>>>>>>> 
>>>>>>>>>>>> Olga
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> ________________________________
>>>>>>>>>>>> From: Prashant Kommireddi <[email protected]>
>>>>>>>>>>>> To: [email protected]
>>>>>>>>>>>> Sent: Tuesday, February 19, 2013 10:57 AM
>>>>>>>>>>>> Subject: Re: pig 0.11 candidate 2 feedback: Several problems
>>>>>>>>>>>> 
>>>>>>>>>>>> What do you guys feel about the JIRA to do with 0.20.2
>>>> compatibility
>>>>>>>>>>>> (PIG-3194)? I am interested in discussing the strategy
>> around
>>>>>>>> backward
>>>>>>>>>>>> compatibility as this is something that would haunt us each
>>>> time we
>>>>>>>>> move
>>>>>>>>>>> to
>>>>>>>>>>>> the next hadoop version. For eg, we might be in a similar
>>>> situation
>>>>>>>>> while
>>>>>>>>>>>> moving to Hadoop 2.0, when some of the stuff might break for
>>>> 1.0.
>>>>>>>>>>>> 
>>>>>>>>>>>> I feel it would be good to get this JIRA fix in for 0.11, as
>>>> 0.20.2
>>>>>>>>> users
>>>>>>>>>>>> might be caught unaware. Of course, I must admit there is
>>>> selfish
>>>>>>>>>>> interest
>>>>>>>>>>>> here and it's probably easier for us to have a workaround on
>>> Pig
>>>>>>>>> rather
>>>>>>>>>>>> than upgrade hadoop in all our production DCs.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Prashant
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney <
>>>>>>>>>>> [email protected]
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I think someone should step up and fix the easy ones, if
>>>> possible.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham <
>>>>>>>> [email protected]>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Kai for reporting these.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> What do people think about the severity of these issues
>>> w.r.t.
>>>>>>>> Pig
>>>>>>>>>>> 11?
>>>>>>>>>>>> I
>>>>>>>>>>>>>> see a few possible options:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 1. We include some or all of these patches in a new Pig 11
>>> rc.
>>>>>>>>> We'd
>>>>>>>>>>>> want
>>>>>>>>>>>>> to
>>>>>>>>>>>>>> make sure that they don't destabilize the current branch.
>>> This
>>>>>>>>>>> approach
>>>>>>>>>>>>>> makes sense if we think Pig 11 wouldn't be a good release
>>>>>>>> without
>>>>>>>>> one
>>>>>>>>>>>> or
>>>>>>>>>>>>>> more of these included.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 2. We continue with the Pig 11 release without these, but
>>> then
>>>>>>>>>>> include
>>>>>>>>>>>>> one
>>>>>>>>>>>>>> or more in a 0.11.1 release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 3. We continue with the Pig 11 release without these, but
>>> then
>>>>>>>>>>> include
>>>>>>>>>>>>> them
>>>>>>>>>>>>>> in a 0.12 release.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Jon has a patch for the MAP issue
>>>>>>>>>>>>>> (PIG-3144<https://issues.apache.org/jira/browse/PIG-3144
>>> )
>>>>>>>>>>>>>> ready, which seems like the most pressing of the three to
>>> me.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> thanks,
>>>>>>>>>>>>>> Bill
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg <
>>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I just subscribed to the dev mailing list in order to
>> give
>>>> you
>>>>>>>>> some
>>>>>>>>>>>>>>> feedback on pig 0.11 candidate 2.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The following three issues are currently present in 0.11
>>>>>>>>> candidate
>>>>>>>>>>> 2:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3144 -
>>> 'Erroneous
>>>>>>>> map
>>>>>>>>>>>> entry
>>>>>>>>>>>>>>> alias resolution leading to "Duplicate schema alias"
>>> errors'
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3194 - Changes
>>> to
>>>>>>>>>>>>>>> ObjectSerializer.java break compatibility with Hadoop
>>> 0.20.2
>>>>>>>>>>>>>>> https://issues.apache.org/jira/browse/PIG-3195 - Race
>>>>>>>>> Condition in
>>>>>>>>>>>>>>> PhysicalOperator leads to ExecException "Error while
>> trying
>>>> to
>>>>>>>>> get
>>>>>>>>>>>>>>> next result in POStream"
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> The last two of these are easily solveable (see the
>> tickets
>>>>>>>> for
>>>>>>>>>>>>>>> details on that). The first one is a bit trickier I
>> think,
>>>> but
>>>>>>>>> at
>>>>>>>>>>>>>>> least there is a workaround for it (pass Map fields
>> through
>>>> an
>>>>>>>>> UDF)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> In my personal opinion, each of these problems is pretty
>>>>>>>> severe,
>>>>>>>>>>> but
>>>>>>>>>>>>>>> opinions about the importance of the MAP Datatype and
>>> STREAM
>>>>>>>>>>>> Operator,
>>>>>>>>>>>>>>> as well as Hadoop 0.20.2 compatibility might differ.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> so far ..
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Kai Londenberg
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> *Note that I'm no longer using my Yahoo! email address.
>>> Please
>>>>>>>>> email
>>>>>>>>>>> me
>>>>>>>>>>>>> at
>>>>>>>>>>>>>> [email protected] going forward.*
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Russell Jurney twitter.com/rjurney
>> [email protected]
>>>>>>>>>>>>> datasyndrome.com
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --
>>>>>>>> *Note that I'm no longer using my Yahoo! email address. Please
>>>> email me
>>>>>> at
>>>>>>>> [email protected] going forward.*
>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Russell Jurney twitter.com/rjurney [email protected]
>>>>>> datasyndrome.com
>>>>>> 
>>>>>> 
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Russell Jurney twitter.com/rjurney [email protected]
>>> datasyndrome.com
>>> 
>>

Re: pig 0.11 candidate 2 feedback: Several problems

Reply via email to