On Wed, Mar 29, 2017 at 3:12 PM, Chris Douglas <chris.doug...@gmail.com>
wrote:

> On Wed, Mar 29, 2017 at 1:13 PM, Stack <st...@duboce.net> wrote:
> > Is the below evidence enough that pb3 in proto2 syntax mode does not drop
> > 'unknown' fields? (Maybe you want evidence that java tooling behaves the
> > same?)
>
> I reproduced your example with the Java tooling, including changing
> some of the fields in the intermediate representation. As long as the
> syntax is "proto2", it seems to have compatible semantics.
>
>
Thanks.


> > To be clear, when we say proxy above, are we expecting that a pb message
> > deserialized by a process down-the-line that happens to have a crimped
> proto
> > definition that is absent a couple of fields somehow can re-serialize
> and at
> > the end of the line, all fields are present? Or are we talking
> pass-through
> > of the message without rewrite?
>
> The former; an intermediate handler decoding, [modifying,] and
> encoding the record without losing unknown fields.
>
>
I did not try this. Did you? Otherwise I can.

St.Ack


> This looks fine. -C
>
> > Thanks,
> > St.Ack
> >
> >
> > # Using the protoc v3.0.2 tool
> > $ protoc --version
> > libprotoc 3.0.2
> >
> > # I have a simple proto definition with two fields in it
> > $ more pb.proto
> > message Test {
> >   optional string one = 1;
> >   optional string two = 2;
> > }
> >
> > # This is a text-encoded instance of a 'Test' proto message:
> > $ more pb.txt
> > one: "one"
> > two: "two"
> >
> > # Now I encode the above as a pb binary
> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb.proto. Please use 'syntax = "proto2";'
> or
> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
> > syntax.)
> >
> > # Here is a dump of the binary
> > $ od -xc pb.bin
> > 0000000      030a    6e6f    1265    7403    6f77
> >           \n 003   o   n   e 022 003   t   w   o
> > 0000012
> >
> > # Here is a proto definition file that has a Test Message minus the 'two'
> > field.
> > $ more pb_drops_two.proto
> > message Test {
> >   optional string one = 1;
> > }
> >
> > # Use it to decode the bin file:
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > one: "one"
> > 2: "two"
> >
> > Note how the second field is preserved (absent a field name). It is not
> > dropped.
> >
> > If I change the syntax of pb_drops_two.proto to be proto3, the field IS
> > dropped.
> >
> > # Here proto file with proto3 syntax specified (had to drop the
> 'optional'
> > qualifier -- not allowed in proto3):
> > $ more pb_drops_two.proto
> > syntax = "proto3";
> > message Test {
> >   string one = 1;
> > }
> >
> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
> > $ more pb_drops_two.txt
> > one: "one"
> >
> >
> > I cannot reencode the text output using pb_drops_two.proto. It complains:
> >
> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
> > pb_drops_two.bin
> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
> (Defaulted
> > to proto2 syntax.)
> > input:2:1: Expected identifier, got: 2
> >
> > Proto 2.5 does same:
> >
> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
> > pb_drops_two.txt > pb_drops_two.bin
> > input:2:1: Expected identifier.
> > Failed to parse input.
> >
> > St.Ack
> >
> >
> >
> >
> >
> >
> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote:
> >>
> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com>
> >> wrote:
> >>>
> >>> >
> >>> > > If unknown fields are dropped, then applications proxying tokens
> and
> >>> > other
> >>> > >> data between servers will effectively corrupt those messages,
> unless
> >>> > >> we
> >>> > >> make everything opaque bytes, which- absent the convenient,
> >>> > >> prenominate
> >>> > >> semantics managing the conversion- obviate the compatibility
> >>> > >> machinery
> >>> > that
> >>> > >> is the whole point of PB. Google is removing the features that
> >>> > >> justified
> >>> > >> choosing PB over its alternatives. Since we can't require that our
> >>> > >> applications compile (or link) against our updated schema, this
> >>> > >> creates
> >>> > a
> >>> > >> problem that PB was supposed to solve.
> >>> > >
> >>> > >
> >>> > > This is scary, and it potentially affects services outside of the
> >>> > > Hadoop
> >>> > > codebase. This makes it difficult to assess the impact.
> >>> >
> >>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
> >>> > If that carries unknown fields through intermediate handlers, then
> >>> > this objection goes away. -C
> >>>
> >>>
> >>> Did some more googling, found this:
> >>>
> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
> >>>
> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds
> like
> >>> packing the fields into a byte type. No mention of a PB2 compatibility
> >>> mode. Also here:
> >>>
> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
> >>>
> >>> Participants say that unknown fields were dropped for automatic JSON
> >>> encoding, since you can't losslessly convert to JSON without knowing
> the
> >>> type.
> >>>
> >>> Unfortunately, it sounds like these are intrinsic differences with PB3.
> >>>
> >>
> >> As I read it Andrew, the field-dropping happens when pb3 is running in
> >> proto3 'mode'. Let me try it...
> >>
> >> St.Ack
> >>
> >>
> >>>
> >>> Best,
> >>> Andrew
> >>
> >>
> >
>

Reply via email to