Re: Protocol Buffers version

2015-06-18 Thread Alan Burlison

On 16/06/2015 10:54, Steve Loughran wrote:


One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.


to be ruthless, that's not enough reason to upgrade branch-2, due to the 
transitive pain it makes all the way down.


I completely get your point, however we are faced with two pretty 
equally unpalatable options, either fork PB 2.5.0 and add support for 
Solaris SPARC or switch to 2.6.1.


Although as I've found out, even though 2.6.1 claims to support Solaris 
SPARC it doesn't, and needs a patch (albeit a small one) to get it to 
work :-/ From what I can gather, cross-platform support in PB breaks 
fairly regularly,


--
Alan Burlison
--


Re: Protocol Buffers version

2015-06-16 Thread Allen Wittenauer

On Jun 16, 2015, at 2:54 AM, Steve Loughran  wrote:

 
 One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.
> 
> to be ruthless, that's not enough reason to upgrade branch-2, due to the 
> transitive pain it makes all the way down.

Not in branch-2, but certainly in trunk.  

Re: Protocol Buffers version

2015-06-16 Thread Steve Loughran

> On 15 Jun 2015, at 22:31, Colin P. McCabe  wrote:
> 
> On Mon, Jun 15, 2015 at 7:24 AM, Allen Wittenauer  wrote:
>> 
>> On Jun 12, 2015, at 1:03 PM, Alan Burlison  wrote:
>> 
>>> On 14/05/2015 18:41, Chris Nauroth wrote:
>>> 
 As a reminder though, the community probably would want to see a strong
 justification for the upgrade in terms of features or performance or
 something else.  Right now, I'm not seeing a significant benefit for us
 based on my reading of their release notes.  I think it's worthwhile to
 figure this out first.  Otherwise, there is a risk that any testing work
 turns out to be a wasted effort.
>>> 
>>> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.

to be ruthless, that's not enough reason to upgrade branch-2, due to the 
transitive pain it makes all the way down.

>> 
>> 
>>That's a pretty good reason.
>> 
>>Some of us had a discussion at Summit about effectively forking 
>> protobuf and making it an Apache TLP.  This would give us a chance to get 
>> out from under Google's blind spot, guarantee better compatibility across 
>> the ecosystem, etc, etc.
>> 
>>It is sounding more and more like that's really what needs to happen.
> 
> I agree that it would be nice if the protobuf project avoided making
> backwards-incompatible API changes within a minor release.  But in
> practice, we have had the same issues with Jackson, Guava, jets3t, and
> other dependencies.  Nearly every important Hadoop dependency has made
> backwards-incompatible API changes within a minor release of the
> dependency... and that's one reason we are using such old versions of
> everything.  I don't think PB deserves to be singled out as much as it
> has been.

I think it does deserve as it was such an all-or-nothing change. Guava, well, 
we may keep it at 11.0, but we've made sure there are no classes used which 
aren't in the latest versions. Even where we depend on artifacts which need 
later versions (curator-2.7.1) we've addressed the version problem by verifying 
that you can actually rebuild curator with guava<-11.0 with everything working 
(curator-x-discovery doesn't compile, but we don't use that). So we know that 
unless a bit of curator uses reflection, we can run it against 11.x. And if 
someone wants to use a later version of Guava + hadoop-common, they can swap it 
in and hadoop will still work. Which is important as on Java 8u45 + you do need 
a recent Guava.

In contrast, protobuf needed a co-ordinate update across everything, every 
project which had checked in their generated protobuf files had to rebuild and 
check in, which guarantees they could no longer work with protobuf 2.4

Jackson? its broken-ness wasn't so obvious: if we'd known I wouldn't have let 
it go updated. It's now on the risk list and I don't see us updating that for a 
long time.

>  I think the work going on now to implement CLASSPATH
> isolation in Hadoop will really be beneficial here because we will be
> able to upgrade without worrying about these problems.


+1


Re: Protocol Buffers version

2015-06-15 Thread Colin P. McCabe
On Mon, Jun 15, 2015 at 7:24 AM, Allen Wittenauer  wrote:
>
> On Jun 12, 2015, at 1:03 PM, Alan Burlison  wrote:
>
>> On 14/05/2015 18:41, Chris Nauroth wrote:
>>
>>> As a reminder though, the community probably would want to see a strong
>>> justification for the upgrade in terms of features or performance or
>>> something else.  Right now, I'm not seeing a significant benefit for us
>>> based on my reading of their release notes.  I think it's worthwhile to
>>> figure this out first.  Otherwise, there is a risk that any testing work
>>> turns out to be a wasted effort.
>>
>> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.
>
>
> That's a pretty good reason.
>
> Some of us had a discussion at Summit about effectively forking 
> protobuf and making it an Apache TLP.  This would give us a chance to get out 
> from under Google's blind spot, guarantee better compatibility across the 
> ecosystem, etc, etc.
>
> It is sounding more and more like that's really what needs to happen.

I agree that it would be nice if the protobuf project avoided making
backwards-incompatible API changes within a minor release.  But in
practice, we have had the same issues with Jackson, Guava, jets3t, and
other dependencies.  Nearly every important Hadoop dependency has made
backwards-incompatible API changes within a minor release of the
dependency... and that's one reason we are using such old versions of
everything.  I don't think PB deserves to be singled out as much as it
has been.  I think the work going on now to implement CLASSPATH
isolation in Hadoop will really be beneficial here because we will be
able to upgrade without worrying about these problems.

cheers,
Colin


Re: Protocol Buffers version

2015-06-15 Thread Roman Shaposhnik
On Mon, Jun 15, 2015 at 8:57 AM, Andrew Purtell  wrote:
> I can't answer the original question but can point out the protostuff (
> https://github.com/protostuff/protostuff) folks have been responsive and
> friendly in the past when we (HBase) were curious about swapping in their
> stuff. Two significant benefits of protostuff, IMHO, is ASL 2 licensing and
> everything is implemented in Java including the compiler.

Big +1 to protostuff from community, licensing and implementation perspectives.

Thanks,
Roman.


Re: Protocol Buffers version

2015-06-15 Thread Andrew Purtell
I can't answer the original question but can point out the protostuff (
https://github.com/protostuff/protostuff) folks have been responsive and
friendly in the past when we (HBase) were curious about swapping in their
stuff. Two significant benefits of protostuff, IMHO, is ASL 2 licensing and
everything is implemented in Java including the compiler.


On Mon, Jun 15, 2015 at 8:49 AM, Sean Busbey  wrote:

> Anyone have a read on how the protobuf folks would feel about that? Apache
> has a history of not accepting projects that are non-amicable forks.
>
> On Mon, Jun 15, 2015 at 9:24 AM, Allen Wittenauer 
> wrote:
>
> >
> > On Jun 12, 2015, at 1:03 PM, Alan Burlison 
> > wrote:
> >
> > > On 14/05/2015 18:41, Chris Nauroth wrote:
> > >
> > >> As a reminder though, the community probably would want to see a
> strong
> > >> justification for the upgrade in terms of features or performance or
> > >> something else.  Right now, I'm not seeing a significant benefit for
> us
> > >> based on my reading of their release notes.  I think it's worthwhile
> to
> > >> figure this out first.  Otherwise, there is a risk that any testing
> work
> > >> turns out to be a wasted effort.
> > >
> > > One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1
> > does.
> >
> >
> > That's a pretty good reason.
> >
> > Some of us had a discussion at Summit about effectively forking
> > protobuf and making it an Apache TLP.  This would give us a chance to get
> > out from under Google's blind spot, guarantee better compatibility across
> > the ecosystem, etc, etc.
> >
> > It is sounding more and more like that's really what needs to
> > happen.
>
>
>
>
> --
> Sean
>



-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Protocol Buffers version

2015-06-15 Thread Sean Busbey
Anyone have a read on how the protobuf folks would feel about that? Apache
has a history of not accepting projects that are non-amicable forks.

On Mon, Jun 15, 2015 at 9:24 AM, Allen Wittenauer  wrote:

>
> On Jun 12, 2015, at 1:03 PM, Alan Burlison 
> wrote:
>
> > On 14/05/2015 18:41, Chris Nauroth wrote:
> >
> >> As a reminder though, the community probably would want to see a strong
> >> justification for the upgrade in terms of features or performance or
> >> something else.  Right now, I'm not seeing a significant benefit for us
> >> based on my reading of their release notes.  I think it's worthwhile to
> >> figure this out first.  Otherwise, there is a risk that any testing work
> >> turns out to be a wasted effort.
> >
> > One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1
> does.
>
>
> That's a pretty good reason.
>
> Some of us had a discussion at Summit about effectively forking
> protobuf and making it an Apache TLP.  This would give us a chance to get
> out from under Google's blind spot, guarantee better compatibility across
> the ecosystem, etc, etc.
>
> It is sounding more and more like that's really what needs to
> happen.




-- 
Sean


Re: Protocol Buffers version

2015-06-15 Thread Allen Wittenauer

On Jun 12, 2015, at 1:03 PM, Alan Burlison  wrote:

> On 14/05/2015 18:41, Chris Nauroth wrote:
> 
>> As a reminder though, the community probably would want to see a strong
>> justification for the upgrade in terms of features or performance or
>> something else.  Right now, I'm not seeing a significant benefit for us
>> based on my reading of their release notes.  I think it's worthwhile to
>> figure this out first.  Otherwise, there is a risk that any testing work
>> turns out to be a wasted effort.
> 
> One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.


That's a pretty good reason.

Some of us had a discussion at Summit about effectively forking 
protobuf and making it an Apache TLP.  This would give us a chance to get out 
from under Google's blind spot, guarantee better compatibility across the 
ecosystem, etc, etc.

It is sounding more and more like that's really what needs to happen.

Re: Protocol Buffers version

2015-06-12 Thread Alan Burlison

On 14/05/2015 18:41, Chris Nauroth wrote:


As a reminder though, the community probably would want to see a strong
justification for the upgrade in terms of features or performance or
something else.  Right now, I'm not seeing a significant benefit for us
based on my reading of their release notes.  I think it's worthwhile to
figure this out first.  Otherwise, there is a risk that any testing work
turns out to be a wasted effort.


One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.

--
Alan Burlison
--


Re: Protocol Buffers version

2015-05-20 Thread Steve Loughran

> On 19 May 2015, at 17:59, Colin P. McCabe  wrote:
> 
> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
> handled a lot better by Google.  Specifically, since it was an
> API-breaking upgrade, it should have been a major version bump for the
> Java library version.  I also feel that removing the download links
> for the old versions of the native libraries was careless, and
> certainly burned some of our Hadoop users.
> 
> However, I don't see any reason to believe that protobuf 2.6 will not
> be wire-compatible with earlier versions.  Google has actually been
> pretty good about preserving wire-compatibility... just not about API
> compatibility.  If we want to get a formal statement from the project,
> we can, but I would be pretty shocked if they decided to change the
> protocol in a backwards-incompatible way in a minor version release.

that's what they have done well: wire formats don't break (though you have the 
freedom to do that by adding new non-optional fields)

Of course, they do have the standard service problems then of (a) downgrading 
if optional fields are omitted and (b) maintaining semantics over time. They 
just have that at a bigger scale than the rest of us.

the 2.4/2.5 switch showed the trouble of using code from a company capable of 
doing a whole-stack rebuild overnight. They can update a dependency 
(protobuf.jar, guava.jar) and have it picked up in the binaries. We don't have 
that luxury.

> 
> I do think there are some potential issues for our users of bumping
> the library version in a minor Hadoop release.  Until we implement
> full dependency isolation for Hadoop, there may be some disruptions to
> end-users from changing Java dependency versions.  Similarly, users
> will need to install a new native protobuf library version as well.
> So I think we should bump the protobuf versions in Hadoop 3.0, but not
> in 2.x.

+1, though I do fear the more things we put off until "3.0", the bigger that 
switch and so the harder the adoption.

FWIW, one area I do find hard with protobuf is trying to set message fields 
through reflection. That is, I want code that will link against, say, the 
Hadoop 2.6 binaries, but if there are the extra fields for a 2.7 message, to 
use them. Deep down in the internals, protobuf should let me do this -but not 
at the java API level.


Re: Protocol Buffers version

2015-05-19 Thread Sangjin Lee
I pushed it out to a github fork:
https://github.com/sjlee/protobuf/tree/2.5.0-incompatibility

We haven't observed other compatibility issues than these.

On Tue, May 19, 2015 at 10:05 PM, Chris Nauroth 
wrote:

> Thanks, Sangjin.  I'd be interested in taking a peek at a personal GitHub
> repo or even just a patch file of those changes.  If there were
> incompatibilities, then that doesn't bode well for an upgrade to 2.6.
>
> --Chris Nauroth
>
>
>
>
> On 5/19/15, 8:40 PM, "Sangjin Lee"  wrote:
>
> >When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 ->
> >2.5.0) proved to be one of the bigger problems. In our case, most of our
> >users were using protobuf 2.4.x or earlier.
> >
> >We identified a couple of places where the backward compatibility was
> >broken, and patched for those issues. We've been running with that patched
> >version of protobuf 2.5.0 since. I can push out those changes to github or
> >something if others are interested FWIW.
> >
> >Regards,
> >Sangjin
> >
> >On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe 
> >wrote:
> >
> >> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
> >> handled a lot better by Google.  Specifically, since it was an
> >> API-breaking upgrade, it should have been a major version bump for the
> >> Java library version.  I also feel that removing the download links
> >> for the old versions of the native libraries was careless, and
> >> certainly burned some of our Hadoop users.
> >>
> >> However, I don't see any reason to believe that protobuf 2.6 will not
> >> be wire-compatible with earlier versions.  Google has actually been
> >> pretty good about preserving wire-compatibility... just not about API
> >> compatibility.  If we want to get a formal statement from the project,
> >> we can, but I would be pretty shocked if they decided to change the
> >> protocol in a backwards-incompatible way in a minor version release.
> >>
> >> I do think there are some potential issues for our users of bumping
> >> the library version in a minor Hadoop release.  Until we implement
> >> full dependency isolation for Hadoop, there may be some disruptions to
> >> end-users from changing Java dependency versions.  Similarly, users
> >> will need to install a new native protobuf library version as well.
> >> So I think we should bump the protobuf versions in Hadoop 3.0, but not
> >> in 2.x.
> >>
> >> cheers,
> >> Colin
> >>
> >> On Fri, May 15, 2015 at 4:55 AM, Alan Burlison
> >>
> >> wrote:
> >> > On 15/05/2015 09:44, Steve Loughran wrote:
> >> >
> >> >> Now: why do you want to use a later version of protobuf.jar? Is it
> >> >> because "it is there"? Or is there a tangible need?
> >> >
> >> >
> >> > No, it's because I'm looking at this from a platform perspective: We
> >>have
> >> > other consumers of ProtoBuf beside Hadoop and we'd obviously like to
> >> > minimise the versions of PB that we ship, and preferably just ship the
> >> > latest version. The fact that PB seems to often be incompatible across
> >> > releases is an issue as it makes upgrading and dropping older versions
> >> > problematic.
> >> >
> >> > --
> >> > Alan Burlison
> >> > --
> >>
>
>


Re: Protocol Buffers version

2015-05-19 Thread Chris Nauroth
Thanks, Sangjin.  I'd be interested in taking a peek at a personal GitHub
repo or even just a patch file of those changes.  If there were
incompatibilities, then that doesn't bode well for an upgrade to 2.6.

--Chris Nauroth




On 5/19/15, 8:40 PM, "Sangjin Lee"  wrote:

>When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 ->
>2.5.0) proved to be one of the bigger problems. In our case, most of our
>users were using protobuf 2.4.x or earlier.
>
>We identified a couple of places where the backward compatibility was
>broken, and patched for those issues. We've been running with that patched
>version of protobuf 2.5.0 since. I can push out those changes to github or
>something if others are interested FWIW.
>
>Regards,
>Sangjin
>
>On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe 
>wrote:
>
>> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
>> handled a lot better by Google.  Specifically, since it was an
>> API-breaking upgrade, it should have been a major version bump for the
>> Java library version.  I also feel that removing the download links
>> for the old versions of the native libraries was careless, and
>> certainly burned some of our Hadoop users.
>>
>> However, I don't see any reason to believe that protobuf 2.6 will not
>> be wire-compatible with earlier versions.  Google has actually been
>> pretty good about preserving wire-compatibility... just not about API
>> compatibility.  If we want to get a formal statement from the project,
>> we can, but I would be pretty shocked if they decided to change the
>> protocol in a backwards-incompatible way in a minor version release.
>>
>> I do think there are some potential issues for our users of bumping
>> the library version in a minor Hadoop release.  Until we implement
>> full dependency isolation for Hadoop, there may be some disruptions to
>> end-users from changing Java dependency versions.  Similarly, users
>> will need to install a new native protobuf library version as well.
>> So I think we should bump the protobuf versions in Hadoop 3.0, but not
>> in 2.x.
>>
>> cheers,
>> Colin
>>
>> On Fri, May 15, 2015 at 4:55 AM, Alan Burlison
>>
>> wrote:
>> > On 15/05/2015 09:44, Steve Loughran wrote:
>> >
>> >> Now: why do you want to use a later version of protobuf.jar? Is it
>> >> because "it is there"? Or is there a tangible need?
>> >
>> >
>> > No, it's because I'm looking at this from a platform perspective: We
>>have
>> > other consumers of ProtoBuf beside Hadoop and we'd obviously like to
>> > minimise the versions of PB that we ship, and preferably just ship the
>> > latest version. The fact that PB seems to often be incompatible across
>> > releases is an issue as it makes upgrading and dropping older versions
>> > problematic.
>> >
>> > --
>> > Alan Burlison
>> > --
>>



Re: Protocol Buffers version

2015-05-19 Thread Sangjin Lee
When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 ->
2.5.0) proved to be one of the bigger problems. In our case, most of our
users were using protobuf 2.4.x or earlier.

We identified a couple of places where the backward compatibility was
broken, and patched for those issues. We've been running with that patched
version of protobuf 2.5.0 since. I can push out those changes to github or
something if others are interested FWIW.

Regards,
Sangjin

On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe  wrote:

> I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
> handled a lot better by Google.  Specifically, since it was an
> API-breaking upgrade, it should have been a major version bump for the
> Java library version.  I also feel that removing the download links
> for the old versions of the native libraries was careless, and
> certainly burned some of our Hadoop users.
>
> However, I don't see any reason to believe that protobuf 2.6 will not
> be wire-compatible with earlier versions.  Google has actually been
> pretty good about preserving wire-compatibility... just not about API
> compatibility.  If we want to get a formal statement from the project,
> we can, but I would be pretty shocked if they decided to change the
> protocol in a backwards-incompatible way in a minor version release.
>
> I do think there are some potential issues for our users of bumping
> the library version in a minor Hadoop release.  Until we implement
> full dependency isolation for Hadoop, there may be some disruptions to
> end-users from changing Java dependency versions.  Similarly, users
> will need to install a new native protobuf library version as well.
> So I think we should bump the protobuf versions in Hadoop 3.0, but not
> in 2.x.
>
> cheers,
> Colin
>
> On Fri, May 15, 2015 at 4:55 AM, Alan Burlison 
> wrote:
> > On 15/05/2015 09:44, Steve Loughran wrote:
> >
> >> Now: why do you want to use a later version of protobuf.jar? Is it
> >> because "it is there"? Or is there a tangible need?
> >
> >
> > No, it's because I'm looking at this from a platform perspective: We have
> > other consumers of ProtoBuf beside Hadoop and we'd obviously like to
> > minimise the versions of PB that we ship, and preferably just ship the
> > latest version. The fact that PB seems to often be incompatible across
> > releases is an issue as it makes upgrading and dropping older versions
> > problematic.
> >
> > --
> > Alan Burlison
> > --
>


Re: Protocol Buffers version

2015-05-19 Thread Colin P. McCabe
I agree that the protobuf 2.4.1 -> 2.5.0 transition could have been
handled a lot better by Google.  Specifically, since it was an
API-breaking upgrade, it should have been a major version bump for the
Java library version.  I also feel that removing the download links
for the old versions of the native libraries was careless, and
certainly burned some of our Hadoop users.

However, I don't see any reason to believe that protobuf 2.6 will not
be wire-compatible with earlier versions.  Google has actually been
pretty good about preserving wire-compatibility... just not about API
compatibility.  If we want to get a formal statement from the project,
we can, but I would be pretty shocked if they decided to change the
protocol in a backwards-incompatible way in a minor version release.

I do think there are some potential issues for our users of bumping
the library version in a minor Hadoop release.  Until we implement
full dependency isolation for Hadoop, there may be some disruptions to
end-users from changing Java dependency versions.  Similarly, users
will need to install a new native protobuf library version as well.
So I think we should bump the protobuf versions in Hadoop 3.0, but not
in 2.x.

cheers,
Colin

On Fri, May 15, 2015 at 4:55 AM, Alan Burlison  wrote:
> On 15/05/2015 09:44, Steve Loughran wrote:
>
>> Now: why do you want to use a later version of protobuf.jar? Is it
>> because "it is there"? Or is there a tangible need?
>
>
> No, it's because I'm looking at this from a platform perspective: We have
> other consumers of ProtoBuf beside Hadoop and we'd obviously like to
> minimise the versions of PB that we ship, and preferably just ship the
> latest version. The fact that PB seems to often be incompatible across
> releases is an issue as it makes upgrading and dropping older versions
> problematic.
>
> --
> Alan Burlison
> --


Re: Protocol Buffers version

2015-05-15 Thread Alan Burlison

On 15/05/2015 09:44, Steve Loughran wrote:


Now: why do you want to use a later version of protobuf.jar? Is it
because "it is there"? Or is there a tangible need?


No, it's because I'm looking at this from a platform perspective: We 
have other consumers of ProtoBuf beside Hadoop and we'd obviously like 
to minimise the versions of PB that we ship, and preferably just ship 
the latest version. The fact that PB seems to often be incompatible 
across releases is an issue as it makes upgrading and dropping older 
versions problematic.


--
Alan Burlison
--


Re: Protocol Buffers version

2015-05-15 Thread Steve Loughran

On 14 May 2015, at 15:23, Alan Burlison 
mailto:alan.burli...@oracle.com>> wrote:

I think bundling or forking is the only practical option. I was looking to see 
if we could provide ProtocolBuffers as an installable option on our platform, 
if it's a version-compatibility nightmare as you say, that's going to be 
difficult as we really don't want to have to provide multiple versions.

The problem Hadoop has is that it's code, especially the HDFS client code, is 
used in a lot of other applications, and they end up having be in sync at the 
Java level. Hopefully the protobuf wire format is compatible (that is the whole 
point of the format, after all), but we know from experience that the JAR-level 
it isn't. Having to rebuild every single .proto derived java class and then 
switch across the entire dependency tree was the upgrade path there, with about 
a month where getting the trunk versions of two apps to link was pretty hit and 
miss.

I think everyone came out burned from that
-scared and unwilling to repeat the experience
-not believing any further google assertions of library compatibility (see 
also: guava)

What to do?

  1.  Leave alone and it slowly ages, when an upgrade happens it can be more 
traumatic. But until that time: nothing breaks.
  2.  Upgrade regularly and you can dramatically break things, so people don't 
upgrade Hadoop itself, they stick with old versions (with issues already fixed 
in the later releases), they keep on requesting backported fixes into the 
"working" branch and you end up with two branches of your code to maintain.
  3.  Fork and you take on maintenance costs of your forked library forever; it 
will implicitly age and theres' the opportunity cost of that work, i.e. better 
things to waste your time on.
  4.  Rip out protobuf entirely and switch to something else (thrift) that has 
better stability, tag the proto channels as deprecated, etc, etc. You'd better 
trust the successor's stability and security features before going to that 
effort.

Hadoop 2.x has defaulted to option (1).

Now: why do you want to use a later version of protobuf.jar? Is it because "it 
is there"? Or is there a tangible need?

-steve


Re: Protocol Buffers version

2015-05-14 Thread Chris Nauroth
Thanks for that link, Alan.  That looks like a useful site!

Ideally, the Protocol Buffers project would give a clear statement about
wire compatibility between 2.5.0 and 2.6.1.  Unfortunately, I can't find
that anywhere.  If it's not documented, then it's probably worth following
up on the Protocol Buffers support lists to ask them.

One thing we could try is starting up a mix of Hadoop processes using
2.5.0 and 2.6.1 to see how it goes.  We've made a commitment to both
forward and backward compatibility within Hadoop 2.x, so we'd need a 2.5.0
client to be able to talk to a 2.6.1 server, and we'd need a 2.6.1 client
to be able to talk to a 2.5.0 server.  Even if this appears to go well, I
wouldn't consider it a substitute for a formal statement of the
compatibility policy from the Protocol Buffers project.  Otherwise, there
might be some subtle lurking issue that we miss in our initial testing.

As a reminder though, the community probably would want to see a strong
justification for the upgrade in terms of features or performance or
something else.  Right now, I'm not seeing a significant benefit for us
based on my reading of their release notes.  I think it's worthwhile to
figure this out first.  Otherwise, there is a risk that any testing work
turns out to be a wasted effort.

--Chris Nauroth




On 5/14/15, 7:23 AM, "Alan Burlison"  wrote:

>On 13/05/2015 17:13, Chris Nauroth wrote:
>
>> It was important to complete this upgrade before Hadoop 2.x came out of
>> beta.  After that, we committed to a policy of backwards-compatibility
>> within the 2.x release line.  I can't find a statement about whether or
>> not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
>> compile time and on the wire).  Do you know the answer?  If it's
>> backwards-incompatible, then we wouldn't be able to do this upgrade
>>within
>> Hadoop 2.x, though we could consider it for 3.x (trunk).
>
>I'm not sure about the wire format, what's the best way of checking for
>wire format issues?
>
>http://upstream-tracker.org/versions/protobuf.html suggests there are
>are some source-level issues which will require investigation.
>
>> In general, we upgrade dependencies when a new release offers a
>>compelling
>> benefit, not solely to keep up with the latest.  In the case of 2.5.0,
>> there was a performance benefit.  Looking at the release notes for 2.6.0
>> and 2.6.1, I don't see anything particularly compelling.  (That's just
>>my
>> opinion though, and others might disagree.)
>
>I think bundling or forking is the only practical option. I was looking
>to see if we could provide ProtocolBuffers as an installable option on
>our platform, if it's a version-compatibility nightmare as you say,
>that's going to be difficult as we really don't want to have to provide
>multiple versions.
>
>> BTW, if anyone is curious, it's possible to try a custom build right now
>> linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
>> -Dprotoc.path= when you run the mvn
>>command.
>
>Once I have fixed all the other source portability issues I'll circle
>back around and take a look at this.
>
>-- 
>Alan Burlison
>--



Re: Protocol Buffers version

2015-05-14 Thread Alan Burlison

On 13/05/2015 17:13, Chris Nauroth wrote:


It was important to complete this upgrade before Hadoop 2.x came out of
beta.  After that, we committed to a policy of backwards-compatibility
within the 2.x release line.  I can't find a statement about whether or
not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
compile time and on the wire).  Do you know the answer?  If it's
backwards-incompatible, then we wouldn't be able to do this upgrade within
Hadoop 2.x, though we could consider it for 3.x (trunk).


I'm not sure about the wire format, what's the best way of checking for 
wire format issues?


http://upstream-tracker.org/versions/protobuf.html suggests there are 
are some source-level issues which will require investigation.



In general, we upgrade dependencies when a new release offers a compelling
benefit, not solely to keep up with the latest.  In the case of 2.5.0,
there was a performance benefit.  Looking at the release notes for 2.6.0
and 2.6.1, I don't see anything particularly compelling.  (That's just my
opinion though, and others might disagree.)


I think bundling or forking is the only practical option. I was looking 
to see if we could provide ProtocolBuffers as an installable option on 
our platform, if it's a version-compatibility nightmare as you say, 
that's going to be difficult as we really don't want to have to provide 
multiple versions.



BTW, if anyone is curious, it's possible to try a custom build right now
linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
-Dprotoc.path= when you run the mvn command.


Once I have fixed all the other source portability issues I'll circle 
back around and take a look at this.


--
Alan Burlison
--


Re: Protocol Buffers version

2015-05-13 Thread Chris Nauroth
Some additional details...

A few years ago, we moved from Protocol Buffers 2.4.1 to 2.5.0.  There
were some challenges with that upgrade, because 2.5.0 was not
backwards-compatible with 2.4.1.  We needed to coordinate carefully with
projects downstream of Hadoop that receive our protobuf classes through
transitive dependency.  Here are a few issues with more background:

https://issues.apache.org/jira/browse/HADOOP-9845

https://issues.apache.org/jira/browse/HBASE-8165

https://issues.apache.org/jira/browse/HIVE-5112

It was important to complete this upgrade before Hadoop 2.x came out of
beta.  After that, we committed to a policy of backwards-compatibility
within the 2.x release line.  I can't find a statement about whether or
not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
compile time and on the wire).  Do you know the answer?  If it's
backwards-incompatible, then we wouldn't be able to do this upgrade within
Hadoop 2.x, though we could consider it for 3.x (trunk).

In general, we upgrade dependencies when a new release offers a compelling
benefit, not solely to keep up with the latest.  In the case of 2.5.0,
there was a performance benefit.  Looking at the release notes for 2.6.0
and 2.6.1, I don't see anything particularly compelling.  (That's just my
opinion though, and others might disagree.)

https://github.com/google/protobuf/blob/master/CHANGES.txt

BTW, if anyone is curious, it's possible to try a custom build right now
linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
-Dprotoc.path= when you run the mvn command.


--Chris Nauroth




On 5/13/15, 8:59 AM, "Allen Wittenauer"  wrote:

>
>On May 13, 2015, at 5:02 AM, Alan Burlison 
>wrote:
>
>> The current version of Protocol Buffers is 2.6.1 but the current
>>version required by Hadoop is 2.5.0. Is there any reason for this, or
>>should I log a JIRA to get it updated?
>
>   The story of protocol buffers is part of a shameful past where Hadoop
>trusted Google.  This was a terrible mistake, based upon the last time
>the project upgraded.  2.4->2.5 required some source level, non-backward
>compatible, and completely-avoidable-but-G-made-us-do-it-anyway surgery
>to make work. This also ended up being a flag day for every single
>developer who not only worked with Hadoop but all of the downstream
>projects as well.  Big disaster.
>
>   The fact that when Google shut down Google Code, they didn't even tag
>previous releases  in the github source tree without significant amount
>of pressure from the open source community was just adding insult to
>injury.  As a result, I believe the collective opinion is to just flat
>out avoid adding any more Google bits into the system.
>
>   See also: guava, which suffers from the same shortsightedness.
>
>   At some point, we'll either upgrade, switch to a different protocol
>serialization format, or fork protobuf.
>



Re: Protocol Buffers version

2015-05-13 Thread Allen Wittenauer

On May 13, 2015, at 5:02 AM, Alan Burlison  wrote:

> The current version of Protocol Buffers is 2.6.1 but the current version 
> required by Hadoop is 2.5.0. Is there any reason for this, or should I log a 
> JIRA to get it updated?

The story of protocol buffers is part of a shameful past where Hadoop 
trusted Google.  This was a terrible mistake, based upon the last time the 
project upgraded.  2.4->2.5 required some source level, non-backward 
compatible, and completely-avoidable-but-G-made-us-do-it-anyway surgery to make 
work. This also ended up being a flag day for every single developer who not 
only worked with Hadoop but all of the downstream projects as well.  Big 
disaster.

The fact that when Google shut down Google Code, they didn't even tag 
previous releases  in the github source tree without significant amount of 
pressure from the open source community was just adding insult to injury.  As a 
result, I believe the collective opinion is to just flat out avoid adding any 
more Google bits into the system.

See also: guava, which suffers from the same shortsightedness. 

At some point, we'll either upgrade, switch to a different protocol 
serialization format, or fork protobuf.