Re: Can we update protobuf's version on trunk?

2017-03-30 Thread Andrew Wang
Great. If y'all are satisfied, I am too.

My only other request is that we shade PB even for the non-client JARs,
since empirically there are a lot of downstream projects pulling in our
server-side artifacts.

On Thu, Mar 30, 2017 at 9:55 AM, Stack  wrote:

> On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas 
> wrote:
>
>> On Wed, Mar 29, 2017 at 4:59 PM, Stack  wrote:
>> >> The former; an intermediate handler decoding, [modifying,] and
>> >> encoding the record without losing unknown fields.
>> >>
>> >
>> > I did not try this. Did you? Otherwise I can.
>>
>> Yeah, I did. Same format. -C
>>
>>
> Grand.
> St.Ack
>
>
>
>
>> >> This looks fine. -C
>> >>
>> >> > Thanks,
>> >> > St.Ack
>> >> >
>> >> >
>> >> > # Using the protoc v3.0.2 tool
>> >> > $ protoc --version
>> >> > libprotoc 3.0.2
>> >> >
>> >> > # I have a simple proto definition with two fields in it
>> >> > $ more pb.proto
>> >> > message Test {
>> >> >   optional string one = 1;
>> >> >   optional string two = 2;
>> >> > }
>> >> >
>> >> > # This is a text-encoded instance of a 'Test' proto message:
>> >> > $ more pb.txt
>> >> > one: "one"
>> >> > two: "two"
>> >> >
>> >> > # Now I encode the above as a pb binary
>> >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb.proto. Please use 'syntax =
>> "proto2";'
>> >> > or
>> >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to
>> proto2
>> >> > syntax.)
>> >> >
>> >> > # Here is a dump of the binary
>> >> > $ od -xc pb.bin
>> >> > 000  030a6e6f126574036f77
>> >> >   \n 003   o   n   e 022 003   t   w   o
>> >> > 012
>> >> >
>> >> > # Here is a proto definition file that has a Test Message minus the
>> >> > 'two'
>> >> > field.
>> >> > $ more pb_drops_two.proto
>> >> > message Test {
>> >> >   optional string one = 1;
>> >> > }
>> >> >
>> >> > # Use it to decode the bin file:
>> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax
>> =
>> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> >> > (Defaulted
>> >> > to proto2 syntax.)
>> >> > one: "one"
>> >> > 2: "two"
>> >> >
>> >> > Note how the second field is preserved (absent a field name). It is
>> not
>> >> > dropped.
>> >> >
>> >> > If I change the syntax of pb_drops_two.proto to be proto3, the field
>> IS
>> >> > dropped.
>> >> >
>> >> > # Here proto file with proto3 syntax specified (had to drop the
>> >> > 'optional'
>> >> > qualifier -- not allowed in proto3):
>> >> > $ more pb_drops_two.proto
>> >> > syntax = "proto3";
>> >> > message Test {
>> >> >   string one = 1;
>> >> > }
>> >> >
>> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  >
>> pb_drops_two.txt
>> >> > $ more pb_drops_two.txt
>> >> > one: "one"
>> >> >
>> >> >
>> >> > I cannot reencode the text output using pb_drops_two.proto. It
>> >> > complains:
>> >> >
>> >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
>> >> > pb_drops_two.bin
>> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
>> syntax
>> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax
>> =
>> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> >> > (Defaulted
>> >> > to proto2 syntax.)
>> >> > input:2:1: Expected identifier, got: 2
>> >> >
>> >> > Proto 2.5 does same:
>> >> >
>> >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
>> >> > pb_drops_two.txt > pb_drops_two.bin
>> >> > input:2:1: Expected identifier.
>> >> > Failed to parse input.
>> >> >
>> >> > St.Ack
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:
>> >> >>
>> >> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <
>> andrew.w...@cloudera.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> >
>> >> >>> > > If unknown fields are dropped, then applications proxying
>> tokens
>> >> >>> > > and
>> >> >>> > other
>> >> >>> > >> data between servers will effectively corrupt those messages,
>> >> >>> > >> unless
>> >> >>> > >> we
>> >> >>> > >> make everything opaque bytes, which- absent the convenient,
>> >> >>> > >> prenominate
>> >> >>> > >> semantics managing the conversion- obviate the compatibility
>> >> >>> > >> machinery
>> >> >>> > that
>> >> >>> > >> is the whole point of PB. Google is removing the features that
>> >> >>> > >> justified
>> >> >>> > >> choosing PB over its alternatives. Since we can't require that
>> >> >>> > >> our
>> >> >>> > >> applications compile (or link) against our updated schema,
>> this
>> >> >>> > >> creates
>> >> >>> > a
>> >> >>> > >> problem that PB was supposed to solve.
>> >> >>> > >
>> >> >>> > >
>> >> >>> > > This is scary, and it potentially affects services outside of
>> the
>> >> >>> > > Hadoop
>> >> >>> > > codebas

Re: Can we update protobuf's version on trunk?

2017-03-30 Thread Stack
On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas 
wrote:

> On Wed, Mar 29, 2017 at 4:59 PM, Stack  wrote:
> >> The former; an intermediate handler decoding, [modifying,] and
> >> encoding the record without losing unknown fields.
> >>
> >
> > I did not try this. Did you? Otherwise I can.
>
> Yeah, I did. Same format. -C
>
>
Grand.
St.Ack




> >> This looks fine. -C
> >>
> >> > Thanks,
> >> > St.Ack
> >> >
> >> >
> >> > # Using the protoc v3.0.2 tool
> >> > $ protoc --version
> >> > libprotoc 3.0.2
> >> >
> >> > # I have a simple proto definition with two fields in it
> >> > $ more pb.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> >   optional string two = 2;
> >> > }
> >> >
> >> > # This is a text-encoded instance of a 'Test' proto message:
> >> > $ more pb.txt
> >> > one: "one"
> >> > two: "two"
> >> >
> >> > # Now I encode the above as a pb binary
> >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb.proto. Please use 'syntax =
> "proto2";'
> >> > or
> >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> >> > syntax.)
> >> >
> >> > # Here is a dump of the binary
> >> > $ od -xc pb.bin
> >> > 000  030a6e6f126574036f77
> >> >   \n 003   o   n   e 022 003   t   w   o
> >> > 012
> >> >
> >> > # Here is a proto definition file that has a Test Message minus the
> >> > 'two'
> >> > field.
> >> > $ more pb_drops_two.proto
> >> > message Test {
> >> >   optional string one = 1;
> >> > }
> >> >
> >> > # Use it to decode the bin file:
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > one: "one"
> >> > 2: "two"
> >> >
> >> > Note how the second field is preserved (absent a field name). It is
> not
> >> > dropped.
> >> >
> >> > If I change the syntax of pb_drops_two.proto to be proto3, the field
> IS
> >> > dropped.
> >> >
> >> > # Here proto file with proto3 syntax specified (had to drop the
> >> > 'optional'
> >> > qualifier -- not allowed in proto3):
> >> > $ more pb_drops_two.proto
> >> > syntax = "proto3";
> >> > message Test {
> >> >   string one = 1;
> >> > }
> >> >
> >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> >> > $ more pb_drops_two.txt
> >> > one: "one"
> >> >
> >> >
> >> > I cannot reencode the text output using pb_drops_two.proto. It
> >> > complains:
> >> >
> >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> >> > pb_drops_two.bin
> >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No
> syntax
> >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> >> > (Defaulted
> >> > to proto2 syntax.)
> >> > input:2:1: Expected identifier, got: 2
> >> >
> >> > Proto 2.5 does same:
> >> >
> >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> >> > pb_drops_two.txt > pb_drops_two.bin
> >> > input:2:1: Expected identifier.
> >> > Failed to parse input.
> >> >
> >> > St.Ack
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:
> >> >>
> >> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <
> andrew.w...@cloudera.com>
> >> >> wrote:
> >> >>>
> >> >>> >
> >> >>> > > If unknown fields are dropped, then applications proxying tokens
> >> >>> > > and
> >> >>> > other
> >> >>> > >> data between servers will effectively corrupt those messages,
> >> >>> > >> unless
> >> >>> > >> we
> >> >>> > >> make everything opaque bytes, which- absent the convenient,
> >> >>> > >> prenominate
> >> >>> > >> semantics managing the conversion- obviate the compatibility
> >> >>> > >> machinery
> >> >>> > that
> >> >>> > >> is the whole point of PB. Google is removing the features that
> >> >>> > >> justified
> >> >>> > >> choosing PB over its alternatives. Since we can't require that
> >> >>> > >> our
> >> >>> > >> applications compile (or link) against our updated schema, this
> >> >>> > >> creates
> >> >>> > a
> >> >>> > >> problem that PB was supposed to solve.
> >> >>> > >
> >> >>> > >
> >> >>> > > This is scary, and it potentially affects services outside of
> the
> >> >>> > > Hadoop
> >> >>> > > codebase. This makes it difficult to assess the impact.
> >> >>> >
> >> >>> > Stack mentioned a compatibility mode that uses the proto2
> semantics.
> >> >>> > If that carries unknown fields through intermediate handlers, then
> >> >>> > this objection goes away. -C
> >> >>>
> >> >>>
> >> >>> Did some more googling, found this:
> >> >>>
> >> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
> >> >>>
> >> >>> Feng Xiao ap

Re: Can we update protobuf's version on trunk?

2017-03-30 Thread Chris Douglas
On Wed, Mar 29, 2017 at 4:59 PM, Stack  wrote:
>> The former; an intermediate handler decoding, [modifying,] and
>> encoding the record without losing unknown fields.
>>
>
> I did not try this. Did you? Otherwise I can.

Yeah, I did. Same format. -C

>> This looks fine. -C
>>
>> > Thanks,
>> > St.Ack
>> >
>> >
>> > # Using the protoc v3.0.2 tool
>> > $ protoc --version
>> > libprotoc 3.0.2
>> >
>> > # I have a simple proto definition with two fields in it
>> > $ more pb.proto
>> > message Test {
>> >   optional string one = 1;
>> >   optional string two = 2;
>> > }
>> >
>> > # This is a text-encoded instance of a 'Test' proto message:
>> > $ more pb.txt
>> > one: "one"
>> > two: "two"
>> >
>> > # Now I encode the above as a pb binary
>> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
>> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
>> > specified for the proto file: pb.proto. Please use 'syntax = "proto2";'
>> > or
>> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
>> > syntax.)
>> >
>> > # Here is a dump of the binary
>> > $ od -xc pb.bin
>> > 000  030a6e6f126574036f77
>> >   \n 003   o   n   e 022 003   t   w   o
>> > 012
>> >
>> > # Here is a proto definition file that has a Test Message minus the
>> > 'two'
>> > field.
>> > $ more pb_drops_two.proto
>> > message Test {
>> >   optional string one = 1;
>> > }
>> >
>> > # Use it to decode the bin file:
>> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
>> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
>> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
>> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> > (Defaulted
>> > to proto2 syntax.)
>> > one: "one"
>> > 2: "two"
>> >
>> > Note how the second field is preserved (absent a field name). It is not
>> > dropped.
>> >
>> > If I change the syntax of pb_drops_two.proto to be proto3, the field IS
>> > dropped.
>> >
>> > # Here proto file with proto3 syntax specified (had to drop the
>> > 'optional'
>> > qualifier -- not allowed in proto3):
>> > $ more pb_drops_two.proto
>> > syntax = "proto3";
>> > message Test {
>> >   string one = 1;
>> > }
>> >
>> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
>> > $ more pb_drops_two.txt
>> > one: "one"
>> >
>> >
>> > I cannot reencode the text output using pb_drops_two.proto. It
>> > complains:
>> >
>> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
>> > pb_drops_two.bin
>> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
>> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
>> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> > (Defaulted
>> > to proto2 syntax.)
>> > input:2:1: Expected identifier, got: 2
>> >
>> > Proto 2.5 does same:
>> >
>> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
>> > pb_drops_two.txt > pb_drops_two.bin
>> > input:2:1: Expected identifier.
>> > Failed to parse input.
>> >
>> > St.Ack
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:
>> >>
>> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
>> >> wrote:
>> >>>
>> >>> >
>> >>> > > If unknown fields are dropped, then applications proxying tokens
>> >>> > > and
>> >>> > other
>> >>> > >> data between servers will effectively corrupt those messages,
>> >>> > >> unless
>> >>> > >> we
>> >>> > >> make everything opaque bytes, which- absent the convenient,
>> >>> > >> prenominate
>> >>> > >> semantics managing the conversion- obviate the compatibility
>> >>> > >> machinery
>> >>> > that
>> >>> > >> is the whole point of PB. Google is removing the features that
>> >>> > >> justified
>> >>> > >> choosing PB over its alternatives. Since we can't require that
>> >>> > >> our
>> >>> > >> applications compile (or link) against our updated schema, this
>> >>> > >> creates
>> >>> > a
>> >>> > >> problem that PB was supposed to solve.
>> >>> > >
>> >>> > >
>> >>> > > This is scary, and it potentially affects services outside of the
>> >>> > > Hadoop
>> >>> > > codebase. This makes it difficult to assess the impact.
>> >>> >
>> >>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>> >>> > If that carries unknown fields through intermediate handlers, then
>> >>> > this objection goes away. -C
>> >>>
>> >>>
>> >>> Did some more googling, found this:
>> >>>
>> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>> >>>
>> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds
>> >>> like
>> >>> packing the fields into a byte type. No mention of a PB2 compatibility
>> >>> mode. Also here:
>> >>>
>> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>> >>>
>> >>> Participants say that unknown fields were dropped for automatic JSON
>> >>> encoding, since you can't losslessly convert to JSON without knowing
>> >>> the

Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack
On Wed, Mar 29, 2017 at 3:12 PM, Chris Douglas 
wrote:

> On Wed, Mar 29, 2017 at 1:13 PM, Stack  wrote:
> > Is the below evidence enough that pb3 in proto2 syntax mode does not drop
> > 'unknown' fields? (Maybe you want evidence that java tooling behaves the
> > same?)
>
> I reproduced your example with the Java tooling, including changing
> some of the fields in the intermediate representation. As long as the
> syntax is "proto2", it seems to have compatible semantics.
>
>
Thanks.


> > To be clear, when we say proxy above, are we expecting that a pb message
> > deserialized by a process down-the-line that happens to have a crimped
> proto
> > definition that is absent a couple of fields somehow can re-serialize
> and at
> > the end of the line, all fields are present? Or are we talking
> pass-through
> > of the message without rewrite?
>
> The former; an intermediate handler decoding, [modifying,] and
> encoding the record without losing unknown fields.
>
>
I did not try this. Did you? Otherwise I can.

St.Ack


> This looks fine. -C
>
> > Thanks,
> > St.Ack
> >
> >
> > # Using the protoc v3.0.2 tool
> > $ protoc --version
> > libprotoc 3.0.2
> >
> > # I have a simple proto definition with two fields in it
> > $ more pb.proto
> > message Test {
> >   optional string one = 1;
> >   optional string two = 2;
> > }
> >
> > # This is a text-encoded instance of a 'Test' proto message:
> > $ more pb.txt
> > one: "one"
> > two: "two"
> >
> > # Now I encode the above as a pb binary
> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb.proto. Please use 'syntax = "proto2";'
> or
> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> > syntax.)
> >
> > # Here is a dump of the binary
> > $ od -xc pb.bin
> > 000  030a6e6f126574036f77
> >   \n 003   o   n   e 022 003   t   w   o
> > 012
> >
> > # Here is a proto definition file that has a Test Message minus the 'two'
> > field.
> > $ more pb_drops_two.proto
> > message Test {
> >   optional string one = 1;
> > }
> >
> > # Use it to decode the bin file:
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > one: "one"
> > 2: "two"
> >
> > Note how the second field is preserved (absent a field name). It is not
> > dropped.
> >
> > If I change the syntax of pb_drops_two.proto to be proto3, the field IS
> > dropped.
> >
> > # Here proto file with proto3 syntax specified (had to drop the
> 'optional'
> > qualifier -- not allowed in proto3):
> > $ more pb_drops_two.proto
> > syntax = "proto3";
> > message Test {
> >   string one = 1;
> > }
> >
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> > $ more pb_drops_two.txt
> > one: "one"
> >
> >
> > I cannot reencode the text output using pb_drops_two.proto. It complains:
> >
> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> > pb_drops_two.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > input:2:1: Expected identifier, got: 2
> >
> > Proto 2.5 does same:
> >
> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> > pb_drops_two.txt > pb_drops_two.bin
> > input:2:1: Expected identifier.
> > Failed to parse input.
> >
> > St.Ack
> >
> >
> >
> >
> >
> >
> > On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:
> >>
> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
> >> wrote:
> >>>
> >>> >
> >>> > > If unknown fields are dropped, then applications proxying tokens
> and
> >>> > other
> >>> > >> data between servers will effectively corrupt those messages,
> unless
> >>> > >> we
> >>> > >> make everything opaque bytes, which- absent the convenient,
> >>> > >> prenominate
> >>> > >> semantics managing the conversion- obviate the compatibility
> >>> > >> machinery
> >>> > that
> >>> > >> is the whole point of PB. Google is removing the features that
> >>> > >> justified
> >>> > >> choosing PB over its alternatives. Since we can't require that our
> >>> > >> applications compile (or link) against our updated schema, this
> >>> > >> creates
> >>> > a
> >>> > >> problem that PB was supposed to solve.
> >>> > >
> >>> > >
> >>> > > This is scary, and it potentially affects services outside of the
> >>> > > Hadoop
> >>> > > codebase. This makes it difficult to assess the impact.
> >>> >
> >>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
> >>> > If that carries unknown fields through intermediate handlers, then
> >>> > this obje

Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Chris Douglas
On Wed, Mar 29, 2017 at 1:13 PM, Stack  wrote:
> Is the below evidence enough that pb3 in proto2 syntax mode does not drop
> 'unknown' fields? (Maybe you want evidence that java tooling behaves the
> same?)

I reproduced your example with the Java tooling, including changing
some of the fields in the intermediate representation. As long as the
syntax is "proto2", it seems to have compatible semantics.

> To be clear, when we say proxy above, are we expecting that a pb message
> deserialized by a process down-the-line that happens to have a crimped proto
> definition that is absent a couple of fields somehow can re-serialize and at
> the end of the line, all fields are present? Or are we talking pass-through
> of the message without rewrite?

The former; an intermediate handler decoding, [modifying,] and
encoding the record without losing unknown fields.

This looks fine. -C

> Thanks,
> St.Ack
>
>
> # Using the protoc v3.0.2 tool
> $ protoc --version
> libprotoc 3.0.2
>
> # I have a simple proto definition with two fields in it
> $ more pb.proto
> message Test {
>   optional string one = 1;
>   optional string two = 2;
> }
>
> # This is a text-encoded instance of a 'Test' proto message:
> $ more pb.txt
> one: "one"
> two: "two"
>
> # Now I encode the above as a pb binary
> $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or
> 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> syntax.)
>
> # Here is a dump of the binary
> $ od -xc pb.bin
> 000  030a6e6f126574036f77
>   \n 003   o   n   e 022 003   t   w   o
> 012
>
> # Here is a proto definition file that has a Test Message minus the 'two'
> field.
> $ more pb_drops_two.proto
> message Test {
>   optional string one = 1;
> }
>
> # Use it to decode the bin file:
> $ protoc --decode=Test pb_drops_two.proto < pb.bin
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
> to proto2 syntax.)
> one: "one"
> 2: "two"
>
> Note how the second field is preserved (absent a field name). It is not
> dropped.
>
> If I change the syntax of pb_drops_two.proto to be proto3, the field IS
> dropped.
>
> # Here proto file with proto3 syntax specified (had to drop the 'optional'
> qualifier -- not allowed in proto3):
> $ more pb_drops_two.proto
> syntax = "proto3";
> message Test {
>   string one = 1;
> }
>
> $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> $ more pb_drops_two.txt
> one: "one"
>
>
> I cannot reencode the text output using pb_drops_two.proto. It complains:
>
> $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> pb_drops_two.bin
> [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
> to proto2 syntax.)
> input:2:1: Expected identifier, got: 2
>
> Proto 2.5 does same:
>
> $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> pb_drops_two.txt > pb_drops_two.bin
> input:2:1: Expected identifier.
> Failed to parse input.
>
> St.Ack
>
>
>
>
>
>
> On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:
>>
>> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
>> wrote:
>>>
>>> >
>>> > > If unknown fields are dropped, then applications proxying tokens and
>>> > other
>>> > >> data between servers will effectively corrupt those messages, unless
>>> > >> we
>>> > >> make everything opaque bytes, which- absent the convenient,
>>> > >> prenominate
>>> > >> semantics managing the conversion- obviate the compatibility
>>> > >> machinery
>>> > that
>>> > >> is the whole point of PB. Google is removing the features that
>>> > >> justified
>>> > >> choosing PB over its alternatives. Since we can't require that our
>>> > >> applications compile (or link) against our updated schema, this
>>> > >> creates
>>> > a
>>> > >> problem that PB was supposed to solve.
>>> > >
>>> > >
>>> > > This is scary, and it potentially affects services outside of the
>>> > > Hadoop
>>> > > codebase. This makes it difficult to assess the impact.
>>> >
>>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>>> > If that carries unknown fields through intermediate handlers, then
>>> > this objection goes away. -C
>>>
>>>
>>> Did some more googling, found this:
>>>
>>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>>>
>>> Feng Xiao appears to be a Google engineer, and suggests workarounds like
>>> packing the fields into a byte type. No mention of a PB2 compatibility
>>> mode. Also here:
>>>
>>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>>>
>>> Participants say that un

Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack
Is the below evidence enough that pb3 in proto2 syntax mode does not drop
'unknown' fields? (Maybe you want evidence that java tooling behaves the
same?)

To be clear, when we say proxy above, are we expecting that a pb message
deserialized by a process down-the-line that happens to have a crimped
proto definition that is absent a couple of fields somehow can re-serialize
and at the end of the line, all fields are present? Or are we talking
pass-through of the message without rewrite?

Thanks,
St.Ack


# Using the protoc v3.0.2 tool
$ protoc --version
libprotoc 3.0.2

# I have a simple proto definition with two fields in it
$ more pb.proto
message Test {
  optional string one = 1;
  optional string two = 2;
}

# This is a text-encoded instance of a 'Test' proto message:
$ more pb.txt
one: "one"
two: "two"

# Now I encode the above as a pb binary
$ protoc --encode=Test pb.proto < pb.txt > pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or
'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
syntax.)

# Here is a dump of the binary
$ od -xc pb.bin
000  030a6e6f126574036f77
  \n 003   o   n   e 022 003   t   w   o
012

# Here is a proto definition file that has a Test Message minus the 'two'
field.
$ more pb_drops_two.proto
message Test {
  optional string one = 1;
}

# Use it to decode the bin file:
$ protoc --decode=Test pb_drops_two.proto < pb.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
one: "one"
2: "two"

Note how the second field is preserved (absent a field name). It is not
dropped.

If I change the syntax of pb_drops_two.proto to be proto3, the field IS
dropped.

# Here proto file with proto3 syntax specified (had to drop the 'optional'
qualifier -- not allowed in proto3):
$ more pb_drops_two.proto
syntax = "proto3";
message Test {
  string one = 1;
}

$ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
$ more pb_drops_two.txt
one: "one"


I cannot reencode the text output using pb_drops_two.proto. It complains:

$ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
pb_drops_two.bin
[libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
specified for the proto file: pb_drops_two.proto. Please use 'syntax =
"proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted
to proto2 syntax.)
input:2:1: Expected identifier, got: 2

Proto 2.5 does same:

$ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
pb_drops_two.txt > pb_drops_two.bin
input:2:1: Expected identifier.
Failed to parse input.

St.Ack






On Wed, Mar 29, 2017 at 10:14 AM, Stack  wrote:

> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
> wrote:
>
>> >
>> > > If unknown fields are dropped, then applications proxying tokens and
>> > other
>> > >> data between servers will effectively corrupt those messages, unless
>> we
>> > >> make everything opaque bytes, which- absent the convenient,
>> prenominate
>> > >> semantics managing the conversion- obviate the compatibility
>> machinery
>> > that
>> > >> is the whole point of PB. Google is removing the features that
>> justified
>> > >> choosing PB over its alternatives. Since we can't require that our
>> > >> applications compile (or link) against our updated schema, this
>> creates
>> > a
>> > >> problem that PB was supposed to solve.
>> > >
>> > >
>> > > This is scary, and it potentially affects services outside of the
>> Hadoop
>> > > codebase. This makes it difficult to assess the impact.
>> >
>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>> > If that carries unknown fields through intermediate handlers, then
>> > this objection goes away. -C
>>
>>
>> Did some more googling, found this:
>>
>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>>
>> Feng Xiao appears to be a Google engineer, and suggests workarounds like
>> packing the fields into a byte type. No mention of a PB2 compatibility
>> mode. Also here:
>>
>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>>
>> Participants say that unknown fields were dropped for automatic JSON
>> encoding, since you can't losslessly convert to JSON without knowing the
>> type.
>>
>> Unfortunately, it sounds like these are intrinsic differences with PB3.
>>
>>
> As I read it Andrew, the field-dropping happens when pb3 is running in
> proto3 'mode'. Let me try it...
>
> St.Ack
>
>
>
>> Best,
>> Andrew
>>
>
>


Re: Can we update protobuf's version on trunk?

2017-03-29 Thread Stack
On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang 
wrote:

> >
> > > If unknown fields are dropped, then applications proxying tokens and
> > other
> > >> data between servers will effectively corrupt those messages, unless
> we
> > >> make everything opaque bytes, which- absent the convenient,
> prenominate
> > >> semantics managing the conversion- obviate the compatibility machinery
> > that
> > >> is the whole point of PB. Google is removing the features that
> justified
> > >> choosing PB over its alternatives. Since we can't require that our
> > >> applications compile (or link) against our updated schema, this
> creates
> > a
> > >> problem that PB was supposed to solve.
> > >
> > >
> > > This is scary, and it potentially affects services outside of the
> Hadoop
> > > codebase. This makes it difficult to assess the impact.
> >
> > Stack mentioned a compatibility mode that uses the proto2 semantics.
> > If that carries unknown fields through intermediate handlers, then
> > this objection goes away. -C
>
>
> Did some more googling, found this:
>
> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>
> Feng Xiao appears to be a Google engineer, and suggests workarounds like
> packing the fields into a byte type. No mention of a PB2 compatibility
> mode. Also here:
>
> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>
> Participants say that unknown fields were dropped for automatic JSON
> encoding, since you can't losslessly convert to JSON without knowing the
> type.
>
> Unfortunately, it sounds like these are intrinsic differences with PB3.
>
>
As I read it Andrew, the field-dropping happens when pb3 is running in
proto3 'mode'. Let me try it...

St.Ack



> Best,
> Andrew
>


Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Allen Wittenauer

> On Mar 28, 2017, at 5:09 PM, Chris Douglas  wrote:
> 
> I haven't seen data identifying PB as a bottleneck, but the
> non-x86/non-Linux and dev setup arguments may make this worthwhile. -C

FWIW, we have the same problem with leveldbjni-all. (See the ASF 
PowerPC build logs) I keep meaning to spend time on the maven build to actually 
download and install since a) the project appears to be never headed for a 
release and b) it's not an optional component in YARN for some reason.  
Potentially in combination of moving from leveldbjni-all to just leveldbjni.



-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Tsuyoshi Ozawa
> We can define protobuf's version by using syntax = "proto2" or syntax
> = "proto3" in proto files with proto3 compiler. I think this is what
> Stack mentioned as compatibility mode.

I mean, "We can define protobuf's 'specification' 'version at
compilation time by using syntax = "proto2""...

- Tsuyoshi

On Wed, Mar 29, 2017 at 9:13 AM, Tsuyoshi Ozawa  wrote:
>> Stack mentioned a compatibility mode that uses the proto2 semantics.
>
> We can define protobuf's version by using syntax = "proto2" or syntax
> = "proto3" in proto files with proto3 compiler. I think this is what
> Stack mentioned as compatibility mode.
>
> https://github.com/golang/protobuf/blob/master/proto/proto3_proto/proto3.proto#L32
>
>> Did some more googling, found this:
>>
>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>>
>> Feng Xiao appears to be a Google engineer, and suggests workarounds like
>> packing the fields into a byte type. No mention of a PB2 compatibility mode.
>> Also here:
>>
>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>>
>> Participants say that unknown fields were dropped for automatic JSON
>> encoding, since you can't losslessly convert to JSON without knowing the
>> type.
>
> Feng mentions in the thread:
>> The following are the main new features in language version 3:
>
> Does it mean that, if the syntax = "proto3", there are no
> compatibility mode because of the changes of how to handle enum or
> missing fields between proto3 "specification" and proto2
> "specification"? On the other hand, if files are compiled with the
> syntax = "proto2", I think we could treat the generated files as with
> protobuf 2.x "runtime".
>
> Best
> - Tsuyoshi

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Tsuyoshi Ozawa
> Stack mentioned a compatibility mode that uses the proto2 semantics.

We can define protobuf's version by using syntax = "proto2" or syntax
= "proto3" in proto files with proto3 compiler. I think this is what
Stack mentioned as compatibility mode.

https://github.com/golang/protobuf/blob/master/proto/proto3_proto/proto3.proto#L32

> Did some more googling, found this:
>
> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>
> Feng Xiao appears to be a Google engineer, and suggests workarounds like
> packing the fields into a byte type. No mention of a PB2 compatibility mode.
> Also here:
>
> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>
> Participants say that unknown fields were dropped for automatic JSON
> encoding, since you can't losslessly convert to JSON without knowing the
> type.

Feng mentions in the thread:
> The following are the main new features in language version 3:

Does it mean that, if the syntax = "proto3", there are no
compatibility mode because of the changes of how to handle enum or
missing fields between proto3 "specification" and proto2
"specification"? On the other hand, if files are compiled with the
syntax = "proto2", I think we could treat the generated files as with
protobuf 2.x "runtime".

Best
- Tsuyoshi

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Chris Douglas
On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang  wrote:
> Unfortunately, it sounds like these are intrinsic differences with PB3.

That's too bad... but possibly not fatal: most of the data we proxy
through client code is, if not opaque, it's at least immutable
(particularly tokens). If PB3 does support reading valid PB fields as
bytes, then we could proxy the payload through application code as an
opaque blob. That opacity has a drawback: if clients could use that
information (e.g., StorageType), we'd need to include it in a
redundant field.

Ewan Higgs used a technique in HDFS-11026 [1] to handle a transition
from Writable to protobuf. This probably could be used for most of our
token types. It's not a general solution, but it would be sufficient
for existing applications to continue working, with some accommodation
for proxy versioning and rolling upgrades.

I haven't seen data identifying PB as a bottleneck, but the
non-x86/non-Linux and dev setup arguments may make this worthwhile. -C

[1] https://issues.apache.org/jira/browse/HDFS-11026

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Andrew Wang
>
> > If unknown fields are dropped, then applications proxying tokens and
> other
> >> data between servers will effectively corrupt those messages, unless we
> >> make everything opaque bytes, which- absent the convenient, prenominate
> >> semantics managing the conversion- obviate the compatibility machinery
> that
> >> is the whole point of PB. Google is removing the features that justified
> >> choosing PB over its alternatives. Since we can't require that our
> >> applications compile (or link) against our updated schema, this creates
> a
> >> problem that PB was supposed to solve.
> >
> >
> > This is scary, and it potentially affects services outside of the Hadoop
> > codebase. This makes it difficult to assess the impact.
>
> Stack mentioned a compatibility mode that uses the proto2 semantics.
> If that carries unknown fields through intermediate handlers, then
> this objection goes away. -C


Did some more googling, found this:

https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ

Feng Xiao appears to be a Google engineer, and suggests workarounds like
packing the fields into a byte type. No mention of a PB2 compatibility
mode. Also here:

https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ

Participants say that unknown fields were dropped for automatic JSON
encoding, since you can't losslessly convert to JSON without knowing the
type.

Unfortunately, it sounds like these are intrinsic differences with PB3.

Best,
Andrew


Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Chris Douglas
On Tue, Mar 28, 2017 at 3:04 PM, Andrew Wang  wrote:
> There's no mention of the convenient "Embedded messages are compatible with
>> bytes if the bytes contain an encoded version of the message" semantics in
>> proto3.
>
>
> I checked the proto3 guide, and I think this is supported:
> https://developers.google.com/protocol-buffers/docs/proto3#updating

You're right, it looks like this is supported.

> If unknown fields are dropped, then applications proxying tokens and other
>> data between servers will effectively corrupt those messages, unless we
>> make everything opaque bytes, which- absent the convenient, prenominate
>> semantics managing the conversion- obviate the compatibility machinery that
>> is the whole point of PB. Google is removing the features that justified
>> choosing PB over its alternatives. Since we can't require that our
>> applications compile (or link) against our updated schema, this creates a
>> problem that PB was supposed to solve.
>
>
> This is scary, and it potentially affects services outside of the Hadoop
> codebase. This makes it difficult to assess the impact.

Stack mentioned a compatibility mode that uses the proto2 semantics.
If that carries unknown fields through intermediate handlers, then
this objection goes away. -C

> Paraphrasing, the issues with PB2.5 are:
>
>1. poor support for non-x86, non-Linux platforms
>2. not as available, so harder to setup a dev environment
>3. missing zero-copy support, which helped performance in HBase
>
> #1 and #2 can be addressed if we rehosted PB (with cross-OS compilation
> patches) elsewhere.
> #3 I don't think we benefit from, since we don't pass around big PB byte
> arrays (at least in HDFS).
>
> So the way I see it, upgrading to PB3 has risk from the behavior change wrt
> unknown fields, while there are other ways of addressing the stated issues
> with PB2.5.
>
> Best,
> Andrew

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



Re: Can we update protobuf's version on trunk?

2017-03-28 Thread Andrew Wang
I've been investigating this a bit. I'm hoping Chris can ring in, since
he's identified wire compatibility issues. Replying inline to Chris' comment

on HDFS-11010:

There's no mention of the convenient "Embedded messages are compatible with
> bytes if the bytes contain an encoded version of the message" semantics in
> proto3.


I checked the proto3 guide, and I think this is supported:
https://developers.google.com/protocol-buffers/docs/proto3#updating

If unknown fields are dropped, then applications proxying tokens and other
> data between servers will effectively corrupt those messages, unless we
> make everything opaque bytes, which- absent the convenient, prenominate
> semantics managing the conversion- obviate the compatibility machinery that
> is the whole point of PB. Google is removing the features that justified
> choosing PB over its alternatives. Since we can't require that our
> applications compile (or link) against our updated schema, this creates a
> problem that PB was supposed to solve.


This is scary, and it potentially affects services outside of the Hadoop
codebase. This makes it difficult to assess the impact.

Paraphrasing, the issues with PB2.5 are:

   1. poor support for non-x86, non-Linux platforms
   2. not as available, so harder to setup a dev environment
   3. missing zero-copy support, which helped performance in HBase

#1 and #2 can be addressed if we rehosted PB (with cross-OS compilation
patches) elsewhere.
#3 I don't think we benefit from, since we don't pass around big PB byte
arrays (at least in HDFS).

So the way I see it, upgrading to PB3 has risk from the behavior change wrt
unknown fields, while there are other ways of addressing the stated issues
with PB2.5.

Best,
Andrew


Re: Can we update protobuf's version on trunk?

2017-03-27 Thread Tsuyoshi Ozawa
Forwarding to common-dev, hdfs-dev, mapreduce-dev too.

Thanks
- Tsuyoshi

2017年3月27日(月) 21:16 Tsuyoshi Ozawa :

> Dear Hadoop developers,
>
> After shaded client, introduced by HADOOP-11804, is merged,
> we can more easily update some dependency with minimizing the impact
> of backward compatibility on trunk. (Thanks Sean and Sanjin for taking
> the issue!)
>
> Then, is it time to update protobuf's version to the latest one on
> trunk? Could you share your opinion here?
>
> There has been plural discussions in parallel so far. Hence, I would
> like to share current opinions by developers with my understanding
> here.
>
> Stack mentioned on HADOOP-13363:
> * Would this be a problem? Old clients can talk to the new servers
> because of wire compatible. Is anyone consuming hadoop protos directly
> other than hadoop? Are hadoop proto files considered
> InterfaceAudience.Private or InterfaceAudience.Public? If the former,
> I could work on a patch for 3.0.0 (It'd be big but boring). Does
> Hadoop have Protobuf in its API anywhere (I can take a look but being
> lazy asking here first).
>
> gohadoop[1] uses proto files directly, treating the proto files as a
> stable interface.
> [1] https://github.com/hortonworks/gohadoop/search?
> utf8=%E2%9C%93&q=*proto&type=
>
> Fortunately, in fact, no additional work is needed to compile hadoop
> code base. Only one work I did is to change getOndiskTrunkSize's
> argument to take protobuf v3's object[2]. Please point me if I'm
> something missing.
>
> [2] https://issues.apache.org/jira/secure/attachment/
> 12860647/HADOOP-13363.004.patch
>
> There are some concerns against updating protobuf on HDFS-11010:
> * I'm really hesitant to bump PB considering the pain it brought last
> time. (by Andrew)
>
> This is because there are no *binary* compatibility, not wire
> compatibility. If I understand correctly, at the last time, the
> problem is caused by mixing v2.4.0 and v.2.5.0 class are mixed between
> Hadoop and HBase. (I knew this fact on Steve's comment on
> HADOOP-13363[3])
> As I firstly mentioned, the protobuf is shaded now on trunk. We don't
> need to care binary(source code level) compatibility.
>
> [3] https://issues.apache.org/jira/browse/HADOOP-13363?
> focusedCommentId=15372724&page=com.atlassian.jira.
> plugin.system.issuetabpanels:comment-tabpanel#comment-15372724
>
> * Have we checked if it's wire compatible with our current version of
> PB? (by Andrew)
>
> As far as I know, it's wire compatible between protobuf v2 and protobuf v3.
> Google team has been testing it. Of course we can validate it by using
> a following script.
>
> https://chromium.googlesource.com/external/github.com/
> google/protobuf/+/master/java/compatibility_tests/README.md
>
> * Let me ask the question in a different way, what about PB 3 is
> concerning to you ?(by Anu)
>
> * Some of its incompatibilities with 2.x, such as dropping unknown
> fields from records. Any component that proxies records must have an
> updated version of the schema, or it will silently drop data and
> convert unknown values to defaults. Unknown enum value handling has
> changed. There's no mention of the convenient "Embedded messages are
> compatible with bytes if the bytes contain an encoded version of the
> message" semantics in proto3. (by Chris)
>
> This is what we need to discuss.
> Quoting a document from google's developer's manual,
> https://developers.google.com/protocol-buffers/docs/proto3#unknowns
>
> > For most Google protocol buffers implementations, unknown fields are not
> accessible in proto3 via the corresponding proto runtimes, and are dropped
> and forgotten at deserialization time. This is different behaviour to
> proto2, where unknown fields are always preserved and serialized along with
> the message.
>
> Is this incompatibility acceptable, or not acceptable for us? If we
> need to check some test cases before updating protobuf, it's nice to
> clarify the test cases we need to check here and test it now.
>
> Best regards,
> - Tsuyoshi
>


Can we update protobuf's version on trunk?

2017-03-27 Thread Tsuyoshi Ozawa
Dear Hadoop developers,

After shaded client, introduced by HADOOP-11804, is merged,
we can more easily update some dependency with minimizing the impact
of backward compatibility on trunk. (Thanks Sean and Sanjin for taking
the issue!)

Then, is it time to update protobuf's version to the latest one on
trunk? Could you share your opinion here?

There has been plural discussions in parallel so far. Hence, I would
like to share current opinions by developers with my understanding
here.

Stack mentioned on HADOOP-13363:
* Would this be a problem? Old clients can talk to the new servers
because of wire compatible. Is anyone consuming hadoop protos directly
other than hadoop? Are hadoop proto files considered
InterfaceAudience.Private or InterfaceAudience.Public? If the former,
I could work on a patch for 3.0.0 (It'd be big but boring). Does
Hadoop have Protobuf in its API anywhere (I can take a look but being
lazy asking here first).

gohadoop[1] uses proto files directly, treating the proto files as a
stable interface.
[1] https://github.com/hortonworks/gohadoop/search?utf8=%E2%9C%93&q=*proto&type=

Fortunately, in fact, no additional work is needed to compile hadoop
code base. Only one work I did is to change getOndiskTrunkSize's
argument to take protobuf v3's object[2]. Please point me if I'm
something missing.

[2] 
https://issues.apache.org/jira/secure/attachment/12860647/HADOOP-13363.004.patch

There are some concerns against updating protobuf on HDFS-11010:
* I'm really hesitant to bump PB considering the pain it brought last
time. (by Andrew)

This is because there are no *binary* compatibility, not wire
compatibility. If I understand correctly, at the last time, the
problem is caused by mixing v2.4.0 and v.2.5.0 class are mixed between
Hadoop and HBase. (I knew this fact on Steve's comment on
HADOOP-13363[3])
As I firstly mentioned, the protobuf is shaded now on trunk. We don't
need to care binary(source code level) compatibility.

[3] 
https://issues.apache.org/jira/browse/HADOOP-13363?focusedCommentId=15372724&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15372724

* Have we checked if it's wire compatible with our current version of
PB? (by Andrew)

As far as I know, it's wire compatible between protobuf v2 and protobuf v3.
Google team has been testing it. Of course we can validate it by using
a following script.

https://chromium.googlesource.com/external/github.com/google/protobuf/+/master/java/compatibility_tests/README.md

* Let me ask the question in a different way, what about PB 3 is
concerning to you ?(by Anu)

* Some of its incompatibilities with 2.x, such as dropping unknown
fields from records. Any component that proxies records must have an
updated version of the schema, or it will silently drop data and
convert unknown values to defaults. Unknown enum value handling has
changed. There's no mention of the convenient "Embedded messages are
compatible with bytes if the bytes contain an encoded version of the
message" semantics in proto3. (by Chris)

This is what we need to discuss.
Quoting a document from google's developer's manual,
https://developers.google.com/protocol-buffers/docs/proto3#unknowns

> For most Google protocol buffers implementations, unknown fields are not 
> accessible in proto3 via the corresponding proto runtimes, and are dropped 
> and forgotten at deserialization time. This is different behaviour to proto2, 
> where unknown fields are always preserved and serialized along with the 
> message.

Is this incompatibility acceptable, or not acceptable for us? If we
need to check some test cases before updating protobuf, it's nice to
clarify the test cases we need to check here and test it now.

Best regards,
- Tsuyoshi

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org