Re: Protocol Buffers version

2015-06-18 Thread Alan Burlison

On 16/06/2015 10:54, Steve Loughran wrote:


One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.


to be ruthless, that's not enough reason to upgrade branch-2, due to the 
transitive pain it makes all the way down.


I completely get your point, however we are faced with two pretty 
equally unpalatable options, either fork PB 2.5.0 and add support for 
Solaris SPARC or switch to 2.6.1.


Although as I've found out, even though 2.6.1 claims to support Solaris 
SPARC it doesn't, and needs a patch (albeit a small one) to get it to 
work :-/ From what I can gather, cross-platform support in PB breaks 
fairly regularly,


--
Alan Burlison
--


Re: Protocol Buffers version

2015-06-16 Thread Allen Wittenauer

On Jun 16, 2015, at 2:54 AM, Steve Loughran ste...@hortonworks.com wrote:

 
 One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.
 
 to be ruthless, that's not enough reason to upgrade branch-2, due to the 
 transitive pain it makes all the way down.

Not in branch-2, but certainly in trunk.  

Re: Protocol Buffers version

2015-06-16 Thread Steve Loughran

 On 15 Jun 2015, at 22:31, Colin P. McCabe cmcc...@apache.org wrote:
 
 On Mon, Jun 15, 2015 at 7:24 AM, Allen Wittenauer a...@altiscale.com wrote:
 
 On Jun 12, 2015, at 1:03 PM, Alan Burlison alan.burli...@oracle.com wrote:
 
 On 14/05/2015 18:41, Chris Nauroth wrote:
 
 As a reminder though, the community probably would want to see a strong
 justification for the upgrade in terms of features or performance or
 something else.  Right now, I'm not seeing a significant benefit for us
 based on my reading of their release notes.  I think it's worthwhile to
 figure this out first.  Otherwise, there is a risk that any testing work
 turns out to be a wasted effort.
 
 One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.

to be ruthless, that's not enough reason to upgrade branch-2, due to the 
transitive pain it makes all the way down.

 
 
That's a pretty good reason.
 
Some of us had a discussion at Summit about effectively forking 
 protobuf and making it an Apache TLP.  This would give us a chance to get 
 out from under Google's blind spot, guarantee better compatibility across 
 the ecosystem, etc, etc.
 
It is sounding more and more like that's really what needs to happen.
 
 I agree that it would be nice if the protobuf project avoided making
 backwards-incompatible API changes within a minor release.  But in
 practice, we have had the same issues with Jackson, Guava, jets3t, and
 other dependencies.  Nearly every important Hadoop dependency has made
 backwards-incompatible API changes within a minor release of the
 dependency... and that's one reason we are using such old versions of
 everything.  I don't think PB deserves to be singled out as much as it
 has been.

I think it does deserve as it was such an all-or-nothing change. Guava, well, 
we may keep it at 11.0, but we've made sure there are no classes used which 
aren't in the latest versions. Even where we depend on artifacts which need 
later versions (curator-2.7.1) we've addressed the version problem by verifying 
that you can actually rebuild curator with guava-11.0 with everything working 
(curator-x-discovery doesn't compile, but we don't use that). So we know that 
unless a bit of curator uses reflection, we can run it against 11.x. And if 
someone wants to use a later version of Guava + hadoop-common, they can swap it 
in and hadoop will still work. Which is important as on Java 8u45 + you do need 
a recent Guava.

In contrast, protobuf needed a co-ordinate update across everything, every 
project which had checked in their generated protobuf files had to rebuild and 
check in, which guarantees they could no longer work with protobuf 2.4

Jackson? its broken-ness wasn't so obvious: if we'd known I wouldn't have let 
it go updated. It's now on the risk list and I don't see us updating that for a 
long time.

  I think the work going on now to implement CLASSPATH
 isolation in Hadoop will really be beneficial here because we will be
 able to upgrade without worrying about these problems.


+1


Re: Protocol Buffers version

2015-06-15 Thread Andrew Purtell
I can't answer the original question but can point out the protostuff (
https://github.com/protostuff/protostuff) folks have been responsive and
friendly in the past when we (HBase) were curious about swapping in their
stuff. Two significant benefits of protostuff, IMHO, is ASL 2 licensing and
everything is implemented in Java including the compiler.


On Mon, Jun 15, 2015 at 8:49 AM, Sean Busbey bus...@cloudera.com wrote:

 Anyone have a read on how the protobuf folks would feel about that? Apache
 has a history of not accepting projects that are non-amicable forks.

 On Mon, Jun 15, 2015 at 9:24 AM, Allen Wittenauer a...@altiscale.com
 wrote:

 
  On Jun 12, 2015, at 1:03 PM, Alan Burlison alan.burli...@oracle.com
  wrote:
 
   On 14/05/2015 18:41, Chris Nauroth wrote:
  
   As a reminder though, the community probably would want to see a
 strong
   justification for the upgrade in terms of features or performance or
   something else.  Right now, I'm not seeing a significant benefit for
 us
   based on my reading of their release notes.  I think it's worthwhile
 to
   figure this out first.  Otherwise, there is a risk that any testing
 work
   turns out to be a wasted effort.
  
   One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1
  does.
 
 
  That's a pretty good reason.
 
  Some of us had a discussion at Summit about effectively forking
  protobuf and making it an Apache TLP.  This would give us a chance to get
  out from under Google's blind spot, guarantee better compatibility across
  the ecosystem, etc, etc.
 
  It is sounding more and more like that's really what needs to
  happen.




 --
 Sean




-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)


Re: Protocol Buffers version

2015-06-15 Thread Allen Wittenauer

On Jun 12, 2015, at 1:03 PM, Alan Burlison alan.burli...@oracle.com wrote:

 On 14/05/2015 18:41, Chris Nauroth wrote:
 
 As a reminder though, the community probably would want to see a strong
 justification for the upgrade in terms of features or performance or
 something else.  Right now, I'm not seeing a significant benefit for us
 based on my reading of their release notes.  I think it's worthwhile to
 figure this out first.  Otherwise, there is a risk that any testing work
 turns out to be a wasted effort.
 
 One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.


That's a pretty good reason.

Some of us had a discussion at Summit about effectively forking 
protobuf and making it an Apache TLP.  This would give us a chance to get out 
from under Google's blind spot, guarantee better compatibility across the 
ecosystem, etc, etc.

It is sounding more and more like that's really what needs to happen.

Re: Protocol Buffers version

2015-06-15 Thread Sean Busbey
Anyone have a read on how the protobuf folks would feel about that? Apache
has a history of not accepting projects that are non-amicable forks.

On Mon, Jun 15, 2015 at 9:24 AM, Allen Wittenauer a...@altiscale.com wrote:


 On Jun 12, 2015, at 1:03 PM, Alan Burlison alan.burli...@oracle.com
 wrote:

  On 14/05/2015 18:41, Chris Nauroth wrote:
 
  As a reminder though, the community probably would want to see a strong
  justification for the upgrade in terms of features or performance or
  something else.  Right now, I'm not seeing a significant benefit for us
  based on my reading of their release notes.  I think it's worthwhile to
  figure this out first.  Otherwise, there is a risk that any testing work
  turns out to be a wasted effort.
 
  One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1
 does.


 That's a pretty good reason.

 Some of us had a discussion at Summit about effectively forking
 protobuf and making it an Apache TLP.  This would give us a chance to get
 out from under Google's blind spot, guarantee better compatibility across
 the ecosystem, etc, etc.

 It is sounding more and more like that's really what needs to
 happen.




-- 
Sean


Re: Protocol Buffers version

2015-06-15 Thread Roman Shaposhnik
On Mon, Jun 15, 2015 at 8:57 AM, Andrew Purtell apurt...@apache.org wrote:
 I can't answer the original question but can point out the protostuff (
 https://github.com/protostuff/protostuff) folks have been responsive and
 friendly in the past when we (HBase) were curious about swapping in their
 stuff. Two significant benefits of protostuff, IMHO, is ASL 2 licensing and
 everything is implemented in Java including the compiler.

Big +1 to protostuff from community, licensing and implementation perspectives.

Thanks,
Roman.


Re: Protocol Buffers version

2015-06-15 Thread Colin P. McCabe
On Mon, Jun 15, 2015 at 7:24 AM, Allen Wittenauer a...@altiscale.com wrote:

 On Jun 12, 2015, at 1:03 PM, Alan Burlison alan.burli...@oracle.com wrote:

 On 14/05/2015 18:41, Chris Nauroth wrote:

 As a reminder though, the community probably would want to see a strong
 justification for the upgrade in terms of features or performance or
 something else.  Right now, I'm not seeing a significant benefit for us
 based on my reading of their release notes.  I think it's worthwhile to
 figure this out first.  Otherwise, there is a risk that any testing work
 turns out to be a wasted effort.

 One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.


 That's a pretty good reason.

 Some of us had a discussion at Summit about effectively forking 
 protobuf and making it an Apache TLP.  This would give us a chance to get out 
 from under Google's blind spot, guarantee better compatibility across the 
 ecosystem, etc, etc.

 It is sounding more and more like that's really what needs to happen.

I agree that it would be nice if the protobuf project avoided making
backwards-incompatible API changes within a minor release.  But in
practice, we have had the same issues with Jackson, Guava, jets3t, and
other dependencies.  Nearly every important Hadoop dependency has made
backwards-incompatible API changes within a minor release of the
dependency... and that's one reason we are using such old versions of
everything.  I don't think PB deserves to be singled out as much as it
has been.  I think the work going on now to implement CLASSPATH
isolation in Hadoop will really be beneficial here because we will be
able to upgrade without worrying about these problems.

cheers,
Colin


Re: Protocol Buffers version

2015-06-12 Thread Alan Burlison

On 14/05/2015 18:41, Chris Nauroth wrote:


As a reminder though, the community probably would want to see a strong
justification for the upgrade in terms of features or performance or
something else.  Right now, I'm not seeing a significant benefit for us
based on my reading of their release notes.  I think it's worthwhile to
figure this out first.  Otherwise, there is a risk that any testing work
turns out to be a wasted effort.


One reason at least: PB 2.5.0 has no support for Solaris SPARC. 2.6.1 does.

--
Alan Burlison
--


Re: Protocol Buffers version

2015-05-20 Thread Steve Loughran

 On 19 May 2015, at 17:59, Colin P. McCabe cmcc...@apache.org wrote:
 
 I agree that the protobuf 2.4.1 - 2.5.0 transition could have been
 handled a lot better by Google.  Specifically, since it was an
 API-breaking upgrade, it should have been a major version bump for the
 Java library version.  I also feel that removing the download links
 for the old versions of the native libraries was careless, and
 certainly burned some of our Hadoop users.
 
 However, I don't see any reason to believe that protobuf 2.6 will not
 be wire-compatible with earlier versions.  Google has actually been
 pretty good about preserving wire-compatibility... just not about API
 compatibility.  If we want to get a formal statement from the project,
 we can, but I would be pretty shocked if they decided to change the
 protocol in a backwards-incompatible way in a minor version release.

that's what they have done well: wire formats don't break (though you have the 
freedom to do that by adding new non-optional fields)

Of course, they do have the standard service problems then of (a) downgrading 
if optional fields are omitted and (b) maintaining semantics over time. They 
just have that at a bigger scale than the rest of us.

the 2.4/2.5 switch showed the trouble of using code from a company capable of 
doing a whole-stack rebuild overnight. They can update a dependency 
(protobuf.jar, guava.jar) and have it picked up in the binaries. We don't have 
that luxury.

 
 I do think there are some potential issues for our users of bumping
 the library version in a minor Hadoop release.  Until we implement
 full dependency isolation for Hadoop, there may be some disruptions to
 end-users from changing Java dependency versions.  Similarly, users
 will need to install a new native protobuf library version as well.
 So I think we should bump the protobuf versions in Hadoop 3.0, but not
 in 2.x.

+1, though I do fear the more things we put off until 3.0, the bigger that 
switch and so the harder the adoption.

FWIW, one area I do find hard with protobuf is trying to set message fields 
through reflection. That is, I want code that will link against, say, the 
Hadoop 2.6 binaries, but if there are the extra fields for a 2.7 message, to 
use them. Deep down in the internals, protobuf should let me do this -but not 
at the java API level.


Re: Protocol Buffers version

2015-05-19 Thread Sangjin Lee
I pushed it out to a github fork:
https://github.com/sjlee/protobuf/tree/2.5.0-incompatibility

We haven't observed other compatibility issues than these.

On Tue, May 19, 2015 at 10:05 PM, Chris Nauroth cnaur...@hortonworks.com
wrote:

 Thanks, Sangjin.  I'd be interested in taking a peek at a personal GitHub
 repo or even just a patch file of those changes.  If there were
 incompatibilities, then that doesn't bode well for an upgrade to 2.6.

 --Chris Nauroth




 On 5/19/15, 8:40 PM, Sangjin Lee sj...@apache.org wrote:

 When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 -
 2.5.0) proved to be one of the bigger problems. In our case, most of our
 users were using protobuf 2.4.x or earlier.
 
 We identified a couple of places where the backward compatibility was
 broken, and patched for those issues. We've been running with that patched
 version of protobuf 2.5.0 since. I can push out those changes to github or
 something if others are interested FWIW.
 
 Regards,
 Sangjin
 
 On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe cmcc...@apache.org
 wrote:
 
  I agree that the protobuf 2.4.1 - 2.5.0 transition could have been
  handled a lot better by Google.  Specifically, since it was an
  API-breaking upgrade, it should have been a major version bump for the
  Java library version.  I also feel that removing the download links
  for the old versions of the native libraries was careless, and
  certainly burned some of our Hadoop users.
 
  However, I don't see any reason to believe that protobuf 2.6 will not
  be wire-compatible with earlier versions.  Google has actually been
  pretty good about preserving wire-compatibility... just not about API
  compatibility.  If we want to get a formal statement from the project,
  we can, but I would be pretty shocked if they decided to change the
  protocol in a backwards-incompatible way in a minor version release.
 
  I do think there are some potential issues for our users of bumping
  the library version in a minor Hadoop release.  Until we implement
  full dependency isolation for Hadoop, there may be some disruptions to
  end-users from changing Java dependency versions.  Similarly, users
  will need to install a new native protobuf library version as well.
  So I think we should bump the protobuf versions in Hadoop 3.0, but not
  in 2.x.
 
  cheers,
  Colin
 
  On Fri, May 15, 2015 at 4:55 AM, Alan Burlison
 alan.burli...@oracle.com
  wrote:
   On 15/05/2015 09:44, Steve Loughran wrote:
  
   Now: why do you want to use a later version of protobuf.jar? Is it
   because it is there? Or is there a tangible need?
  
  
   No, it's because I'm looking at this from a platform perspective: We
 have
   other consumers of ProtoBuf beside Hadoop and we'd obviously like to
   minimise the versions of PB that we ship, and preferably just ship the
   latest version. The fact that PB seems to often be incompatible across
   releases is an issue as it makes upgrading and dropping older versions
   problematic.
  
   --
   Alan Burlison
   --
 




Re: Protocol Buffers version

2015-05-19 Thread Chris Nauroth
Thanks, Sangjin.  I'd be interested in taking a peek at a personal GitHub
repo or even just a patch file of those changes.  If there were
incompatibilities, then that doesn't bode well for an upgrade to 2.6.

--Chris Nauroth




On 5/19/15, 8:40 PM, Sangjin Lee sj...@apache.org wrote:

When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 -
2.5.0) proved to be one of the bigger problems. In our case, most of our
users were using protobuf 2.4.x or earlier.

We identified a couple of places where the backward compatibility was
broken, and patched for those issues. We've been running with that patched
version of protobuf 2.5.0 since. I can push out those changes to github or
something if others are interested FWIW.

Regards,
Sangjin

On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe cmcc...@apache.org
wrote:

 I agree that the protobuf 2.4.1 - 2.5.0 transition could have been
 handled a lot better by Google.  Specifically, since it was an
 API-breaking upgrade, it should have been a major version bump for the
 Java library version.  I also feel that removing the download links
 for the old versions of the native libraries was careless, and
 certainly burned some of our Hadoop users.

 However, I don't see any reason to believe that protobuf 2.6 will not
 be wire-compatible with earlier versions.  Google has actually been
 pretty good about preserving wire-compatibility... just not about API
 compatibility.  If we want to get a formal statement from the project,
 we can, but I would be pretty shocked if they decided to change the
 protocol in a backwards-incompatible way in a minor version release.

 I do think there are some potential issues for our users of bumping
 the library version in a minor Hadoop release.  Until we implement
 full dependency isolation for Hadoop, there may be some disruptions to
 end-users from changing Java dependency versions.  Similarly, users
 will need to install a new native protobuf library version as well.
 So I think we should bump the protobuf versions in Hadoop 3.0, but not
 in 2.x.

 cheers,
 Colin

 On Fri, May 15, 2015 at 4:55 AM, Alan Burlison
alan.burli...@oracle.com
 wrote:
  On 15/05/2015 09:44, Steve Loughran wrote:
 
  Now: why do you want to use a later version of protobuf.jar? Is it
  because it is there? Or is there a tangible need?
 
 
  No, it's because I'm looking at this from a platform perspective: We
have
  other consumers of ProtoBuf beside Hadoop and we'd obviously like to
  minimise the versions of PB that we ship, and preferably just ship the
  latest version. The fact that PB seems to often be incompatible across
  releases is an issue as it makes upgrading and dropping older versions
  problematic.
 
  --
  Alan Burlison
  --




Re: Protocol Buffers version

2015-05-19 Thread Sangjin Lee
When we moved to Hadoop 2.4, the associated protobuf upgrade (2.4.1 -
2.5.0) proved to be one of the bigger problems. In our case, most of our
users were using protobuf 2.4.x or earlier.

We identified a couple of places where the backward compatibility was
broken, and patched for those issues. We've been running with that patched
version of protobuf 2.5.0 since. I can push out those changes to github or
something if others are interested FWIW.

Regards,
Sangjin

On Tue, May 19, 2015 at 9:59 AM, Colin P. McCabe cmcc...@apache.org wrote:

 I agree that the protobuf 2.4.1 - 2.5.0 transition could have been
 handled a lot better by Google.  Specifically, since it was an
 API-breaking upgrade, it should have been a major version bump for the
 Java library version.  I also feel that removing the download links
 for the old versions of the native libraries was careless, and
 certainly burned some of our Hadoop users.

 However, I don't see any reason to believe that protobuf 2.6 will not
 be wire-compatible with earlier versions.  Google has actually been
 pretty good about preserving wire-compatibility... just not about API
 compatibility.  If we want to get a formal statement from the project,
 we can, but I would be pretty shocked if they decided to change the
 protocol in a backwards-incompatible way in a minor version release.

 I do think there are some potential issues for our users of bumping
 the library version in a minor Hadoop release.  Until we implement
 full dependency isolation for Hadoop, there may be some disruptions to
 end-users from changing Java dependency versions.  Similarly, users
 will need to install a new native protobuf library version as well.
 So I think we should bump the protobuf versions in Hadoop 3.0, but not
 in 2.x.

 cheers,
 Colin

 On Fri, May 15, 2015 at 4:55 AM, Alan Burlison alan.burli...@oracle.com
 wrote:
  On 15/05/2015 09:44, Steve Loughran wrote:
 
  Now: why do you want to use a later version of protobuf.jar? Is it
  because it is there? Or is there a tangible need?
 
 
  No, it's because I'm looking at this from a platform perspective: We have
  other consumers of ProtoBuf beside Hadoop and we'd obviously like to
  minimise the versions of PB that we ship, and preferably just ship the
  latest version. The fact that PB seems to often be incompatible across
  releases is an issue as it makes upgrading and dropping older versions
  problematic.
 
  --
  Alan Burlison
  --



Re: Protocol Buffers version

2015-05-19 Thread Colin P. McCabe
I agree that the protobuf 2.4.1 - 2.5.0 transition could have been
handled a lot better by Google.  Specifically, since it was an
API-breaking upgrade, it should have been a major version bump for the
Java library version.  I also feel that removing the download links
for the old versions of the native libraries was careless, and
certainly burned some of our Hadoop users.

However, I don't see any reason to believe that protobuf 2.6 will not
be wire-compatible with earlier versions.  Google has actually been
pretty good about preserving wire-compatibility... just not about API
compatibility.  If we want to get a formal statement from the project,
we can, but I would be pretty shocked if they decided to change the
protocol in a backwards-incompatible way in a minor version release.

I do think there are some potential issues for our users of bumping
the library version in a minor Hadoop release.  Until we implement
full dependency isolation for Hadoop, there may be some disruptions to
end-users from changing Java dependency versions.  Similarly, users
will need to install a new native protobuf library version as well.
So I think we should bump the protobuf versions in Hadoop 3.0, but not
in 2.x.

cheers,
Colin

On Fri, May 15, 2015 at 4:55 AM, Alan Burlison alan.burli...@oracle.com wrote:
 On 15/05/2015 09:44, Steve Loughran wrote:

 Now: why do you want to use a later version of protobuf.jar? Is it
 because it is there? Or is there a tangible need?


 No, it's because I'm looking at this from a platform perspective: We have
 other consumers of ProtoBuf beside Hadoop and we'd obviously like to
 minimise the versions of PB that we ship, and preferably just ship the
 latest version. The fact that PB seems to often be incompatible across
 releases is an issue as it makes upgrading and dropping older versions
 problematic.

 --
 Alan Burlison
 --


Re: Protocol Buffers version

2015-05-15 Thread Steve Loughran

On 14 May 2015, at 15:23, Alan Burlison 
alan.burli...@oracle.commailto:alan.burli...@oracle.com wrote:

I think bundling or forking is the only practical option. I was looking to see 
if we could provide ProtocolBuffers as an installable option on our platform, 
if it's a version-compatibility nightmare as you say, that's going to be 
difficult as we really don't want to have to provide multiple versions.

The problem Hadoop has is that it's code, especially the HDFS client code, is 
used in a lot of other applications, and they end up having be in sync at the 
Java level. Hopefully the protobuf wire format is compatible (that is the whole 
point of the format, after all), but we know from experience that the JAR-level 
it isn't. Having to rebuild every single .proto derived java class and then 
switch across the entire dependency tree was the upgrade path there, with about 
a month where getting the trunk versions of two apps to link was pretty hit and 
miss.

I think everyone came out burned from that
-scared and unwilling to repeat the experience
-not believing any further google assertions of library compatibility (see 
also: guava)

What to do?

  1.  Leave alone and it slowly ages, when an upgrade happens it can be more 
traumatic. But until that time: nothing breaks.
  2.  Upgrade regularly and you can dramatically break things, so people don't 
upgrade Hadoop itself, they stick with old versions (with issues already fixed 
in the later releases), they keep on requesting backported fixes into the 
working branch and you end up with two branches of your code to maintain.
  3.  Fork and you take on maintenance costs of your forked library forever; it 
will implicitly age and theres' the opportunity cost of that work, i.e. better 
things to waste your time on.
  4.  Rip out protobuf entirely and switch to something else (thrift) that has 
better stability, tag the proto channels as deprecated, etc, etc. You'd better 
trust the successor's stability and security features before going to that 
effort.

Hadoop 2.x has defaulted to option (1).

Now: why do you want to use a later version of protobuf.jar? Is it because it 
is there? Or is there a tangible need?

-steve


Re: Protocol Buffers version

2015-05-15 Thread Alan Burlison

On 15/05/2015 09:44, Steve Loughran wrote:


Now: why do you want to use a later version of protobuf.jar? Is it
because it is there? Or is there a tangible need?


No, it's because I'm looking at this from a platform perspective: We 
have other consumers of ProtoBuf beside Hadoop and we'd obviously like 
to minimise the versions of PB that we ship, and preferably just ship 
the latest version. The fact that PB seems to often be incompatible 
across releases is an issue as it makes upgrading and dropping older 
versions problematic.


--
Alan Burlison
--


Re: Protocol Buffers version

2015-05-14 Thread Chris Nauroth
Thanks for that link, Alan.  That looks like a useful site!

Ideally, the Protocol Buffers project would give a clear statement about
wire compatibility between 2.5.0 and 2.6.1.  Unfortunately, I can't find
that anywhere.  If it's not documented, then it's probably worth following
up on the Protocol Buffers support lists to ask them.

One thing we could try is starting up a mix of Hadoop processes using
2.5.0 and 2.6.1 to see how it goes.  We've made a commitment to both
forward and backward compatibility within Hadoop 2.x, so we'd need a 2.5.0
client to be able to talk to a 2.6.1 server, and we'd need a 2.6.1 client
to be able to talk to a 2.5.0 server.  Even if this appears to go well, I
wouldn't consider it a substitute for a formal statement of the
compatibility policy from the Protocol Buffers project.  Otherwise, there
might be some subtle lurking issue that we miss in our initial testing.

As a reminder though, the community probably would want to see a strong
justification for the upgrade in terms of features or performance or
something else.  Right now, I'm not seeing a significant benefit for us
based on my reading of their release notes.  I think it's worthwhile to
figure this out first.  Otherwise, there is a risk that any testing work
turns out to be a wasted effort.

--Chris Nauroth




On 5/14/15, 7:23 AM, Alan Burlison alan.burli...@oracle.com wrote:

On 13/05/2015 17:13, Chris Nauroth wrote:

 It was important to complete this upgrade before Hadoop 2.x came out of
 beta.  After that, we committed to a policy of backwards-compatibility
 within the 2.x release line.  I can't find a statement about whether or
 not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
 compile time and on the wire).  Do you know the answer?  If it's
 backwards-incompatible, then we wouldn't be able to do this upgrade
within
 Hadoop 2.x, though we could consider it for 3.x (trunk).

I'm not sure about the wire format, what's the best way of checking for
wire format issues?

http://upstream-tracker.org/versions/protobuf.html suggests there are
are some source-level issues which will require investigation.

 In general, we upgrade dependencies when a new release offers a
compelling
 benefit, not solely to keep up with the latest.  In the case of 2.5.0,
 there was a performance benefit.  Looking at the release notes for 2.6.0
 and 2.6.1, I don't see anything particularly compelling.  (That's just
my
 opinion though, and others might disagree.)

I think bundling or forking is the only practical option. I was looking
to see if we could provide ProtocolBuffers as an installable option on
our platform, if it's a version-compatibility nightmare as you say,
that's going to be difficult as we really don't want to have to provide
multiple versions.

 BTW, if anyone is curious, it's possible to try a custom build right now
 linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
 -Dprotoc.path=path to protoc 2.6.1 binary when you run the mvn
command.

Once I have fixed all the other source portability issues I'll circle
back around and take a look at this.

-- 
Alan Burlison
--



Re: Protocol Buffers version

2015-05-13 Thread Allen Wittenauer

On May 13, 2015, at 5:02 AM, Alan Burlison alan.burli...@oracle.com wrote:

 The current version of Protocol Buffers is 2.6.1 but the current version 
 required by Hadoop is 2.5.0. Is there any reason for this, or should I log a 
 JIRA to get it updated?

The story of protocol buffers is part of a shameful past where Hadoop 
trusted Google.  This was a terrible mistake, based upon the last time the 
project upgraded.  2.4-2.5 required some source level, non-backward 
compatible, and completely-avoidable-but-G-made-us-do-it-anyway surgery to make 
work. This also ended up being a flag day for every single developer who not 
only worked with Hadoop but all of the downstream projects as well.  Big 
disaster.

The fact that when Google shut down Google Code, they didn't even tag 
previous releases  in the github source tree without significant amount of 
pressure from the open source community was just adding insult to injury.  As a 
result, I believe the collective opinion is to just flat out avoid adding any 
more Google bits into the system.

See also: guava, which suffers from the same shortsightedness. 

At some point, we'll either upgrade, switch to a different protocol 
serialization format, or fork protobuf. 



Re: Protocol Buffers version

2015-05-13 Thread Chris Nauroth
Some additional details...

A few years ago, we moved from Protocol Buffers 2.4.1 to 2.5.0.  There
were some challenges with that upgrade, because 2.5.0 was not
backwards-compatible with 2.4.1.  We needed to coordinate carefully with
projects downstream of Hadoop that receive our protobuf classes through
transitive dependency.  Here are a few issues with more background:

https://issues.apache.org/jira/browse/HADOOP-9845

https://issues.apache.org/jira/browse/HBASE-8165

https://issues.apache.org/jira/browse/HIVE-5112

It was important to complete this upgrade before Hadoop 2.x came out of
beta.  After that, we committed to a policy of backwards-compatibility
within the 2.x release line.  I can't find a statement about whether or
not Protocol Buffers 2.6.1 is backwards-compatible with 2.5.0 (both at
compile time and on the wire).  Do you know the answer?  If it's
backwards-incompatible, then we wouldn't be able to do this upgrade within
Hadoop 2.x, though we could consider it for 3.x (trunk).

In general, we upgrade dependencies when a new release offers a compelling
benefit, not solely to keep up with the latest.  In the case of 2.5.0,
there was a performance benefit.  Looking at the release notes for 2.6.0
and 2.6.1, I don't see anything particularly compelling.  (That's just my
opinion though, and others might disagree.)

https://github.com/google/protobuf/blob/master/CHANGES.txt

BTW, if anyone is curious, it's possible to try a custom build right now
linked against 2.6.1.  You'd pass -Dprotobuf.version=2.6.1 and
-Dprotoc.path=path to protoc 2.6.1 binary when you run the mvn command.


--Chris Nauroth




On 5/13/15, 8:59 AM, Allen Wittenauer a...@altiscale.com wrote:


On May 13, 2015, at 5:02 AM, Alan Burlison alan.burli...@oracle.com
wrote:

 The current version of Protocol Buffers is 2.6.1 but the current
version required by Hadoop is 2.5.0. Is there any reason for this, or
should I log a JIRA to get it updated?

   The story of protocol buffers is part of a shameful past where Hadoop
trusted Google.  This was a terrible mistake, based upon the last time
the project upgraded.  2.4-2.5 required some source level, non-backward
compatible, and completely-avoidable-but-G-made-us-do-it-anyway surgery
to make work. This also ended up being a flag day for every single
developer who not only worked with Hadoop but all of the downstream
projects as well.  Big disaster.

   The fact that when Google shut down Google Code, they didn't even tag
previous releases  in the github source tree without significant amount
of pressure from the open source community was just adding insult to
injury.  As a result, I believe the collective opinion is to just flat
out avoid adding any more Google bits into the system.

   See also: guava, which suffers from the same shortsightedness.

   At some point, we'll either upgrade, switch to a different protocol
serialization format, or fork protobuf.