On Thu, Mar 30, 2017 at 9:16 AM, Chris Douglas <chris.doug...@gmail.com> wrote:
> On Wed, Mar 29, 2017 at 4:59 PM, Stack <st...@duboce.net> wrote: > >> The former; an intermediate handler decoding, [modifying,] and > >> encoding the record without losing unknown fields. > >> > > > > I did not try this. Did you? Otherwise I can. > > Yeah, I did. Same format. -C > > Grand. St.Ack > >> This looks fine. -C > >> > >> > Thanks, > >> > St.Ack > >> > > >> > > >> > # Using the protoc v3.0.2 tool > >> > $ protoc --version > >> > libprotoc 3.0.2 > >> > > >> > # I have a simple proto definition with two fields in it > >> > $ more pb.proto > >> > message Test { > >> > optional string one = 1; > >> > optional string two = 2; > >> > } > >> > > >> > # This is a text-encoded instance of a 'Test' proto message: > >> > $ more pb.txt > >> > one: "one" > >> > two: "two" > >> > > >> > # Now I encode the above as a pb binary > >> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin > >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No > syntax > >> > specified for the proto file: pb.proto. Please use 'syntax = > "proto2";' > >> > or > >> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 > >> > syntax.) > >> > > >> > # Here is a dump of the binary > >> > $ od -xc pb.bin > >> > 0000000 030a 6e6f 1265 7403 6f77 > >> > \n 003 o n e 022 003 t w o > >> > 0000012 > >> > > >> > # Here is a proto definition file that has a Test Message minus the > >> > 'two' > >> > field. > >> > $ more pb_drops_two.proto > >> > message Test { > >> > optional string one = 1; > >> > } > >> > > >> > # Use it to decode the bin file: > >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin > >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No > syntax > >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax = > >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version. > >> > (Defaulted > >> > to proto2 syntax.) > >> > one: "one" > >> > 2: "two" > >> > > >> > Note how the second field is preserved (absent a field name). It is > not > >> > dropped. > >> > > >> > If I change the syntax of pb_drops_two.proto to be proto3, the field > IS > >> > dropped. > >> > > >> > # Here proto file with proto3 syntax specified (had to drop the > >> > 'optional' > >> > qualifier -- not allowed in proto3): > >> > $ more pb_drops_two.proto > >> > syntax = "proto3"; > >> > message Test { > >> > string one = 1; > >> > } > >> > > >> > $ protoc --decode=Test pb_drops_two.proto < pb.bin > pb_drops_two.txt > >> > $ more pb_drops_two.txt > >> > one: "one" > >> > > >> > > >> > I cannot reencode the text output using pb_drops_two.proto. It > >> > complains: > >> > > >> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt > > >> > pb_drops_two.bin > >> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No > syntax > >> > specified for the proto file: pb_drops_two.proto. Please use 'syntax = > >> > "proto2";' or 'syntax = "proto3";' to specify a syntax version. > >> > (Defaulted > >> > to proto2 syntax.) > >> > input:2:1: Expected identifier, got: 2 > >> > > >> > Proto 2.5 does same: > >> > > >> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto < > >> > pb_drops_two.txt > pb_drops_two.bin > >> > input:2:1: Expected identifier. > >> > Failed to parse input. > >> > > >> > St.Ack > >> > > >> > > >> > > >> > > >> > > >> > > >> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote: > >> >> > >> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang < > andrew.w...@cloudera.com> > >> >> wrote: > >> >>> > >> >>> > > >> >>> > > If unknown fields are dropped, then applications proxying tokens > >> >>> > > and > >> >>> > other > >> >>> > >> data between servers will effectively corrupt those messages, > >> >>> > >> unless > >> >>> > >> we > >> >>> > >> make everything opaque bytes, which- absent the convenient, > >> >>> > >> prenominate > >> >>> > >> semantics managing the conversion- obviate the compatibility > >> >>> > >> machinery > >> >>> > that > >> >>> > >> is the whole point of PB. Google is removing the features that > >> >>> > >> justified > >> >>> > >> choosing PB over its alternatives. Since we can't require that > >> >>> > >> our > >> >>> > >> applications compile (or link) against our updated schema, this > >> >>> > >> creates > >> >>> > a > >> >>> > >> problem that PB was supposed to solve. > >> >>> > > > >> >>> > > > >> >>> > > This is scary, and it potentially affects services outside of > the > >> >>> > > Hadoop > >> >>> > > codebase. This makes it difficult to assess the impact. > >> >>> > > >> >>> > Stack mentioned a compatibility mode that uses the proto2 > semantics. > >> >>> > If that carries unknown fields through intermediate handlers, then > >> >>> > this objection goes away. -C > >> >>> > >> >>> > >> >>> Did some more googling, found this: > >> >>> > >> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ > >> >>> > >> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds > >> >>> like > >> >>> packing the fields into a byte type. No mention of a PB2 > compatibility > >> >>> mode. Also here: > >> >>> > >> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ > >> >>> > >> >>> Participants say that unknown fields were dropped for automatic JSON > >> >>> encoding, since you can't losslessly convert to JSON without knowing > >> >>> the > >> >>> type. > >> >>> > >> >>> Unfortunately, it sounds like these are intrinsic differences with > >> >>> PB3. > >> >>> > >> >> > >> >> As I read it Andrew, the field-dropping happens when pb3 is running > in > >> >> proto3 'mode'. Let me try it... > >> >> > >> >> St.Ack > >> >> > >> >> > >> >>> > >> >>> Best, > >> >>> Andrew > >> >> > >> >> > >> > > > > > >