Is the below evidence enough that pb3 in proto2 syntax mode does not drop 'unknown' fields? (Maybe you want evidence that java tooling behaves the same?)
To be clear, when we say proxy above, are we expecting that a pb message deserialized by a process down-the-line that happens to have a crimped proto definition that is absent a couple of fields somehow can re-serialize and at the end of the line, all fields are present? Or are we talking pass-through of the message without rewrite? Thanks, St.Ack # Using the protoc v3.0.2 tool $ protoc --version libprotoc 3.0.2 # I have a simple proto definition with two fields in it $ more pb.proto message Test { optional string one = 1; optional string two = 2; } # This is a text-encoded instance of a 'Test' proto message: $ more pb.txt one: "one" two: "two" # Now I encode the above as a pb binary $ protoc --encode=Test pb.proto < pb.txt > pb.bin [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: pb.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.) # Here is a dump of the binary $ od -xc pb.bin 0000000 030a 6e6f 1265 7403 6f77 \n 003 o n e 022 003 t w o 0000012 # Here is a proto definition file that has a Test Message minus the 'two' field. $ more pb_drops_two.proto message Test { optional string one = 1; } # Use it to decode the bin file: $ protoc --decode=Test pb_drops_two.proto < pb.bin [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: pb_drops_two.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.) one: "one" 2: "two" Note how the second field is preserved (absent a field name). It is not dropped. If I change the syntax of pb_drops_two.proto to be proto3, the field IS dropped. # Here proto file with proto3 syntax specified (had to drop the 'optional' qualifier -- not allowed in proto3): $ more pb_drops_two.proto syntax = "proto3"; message Test { string one = 1; } $ protoc --decode=Test pb_drops_two.proto < pb.bin > pb_drops_two.txt $ more pb_drops_two.txt one: "one" I cannot reencode the text output using pb_drops_two.proto. It complains: $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt > pb_drops_two.bin [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax specified for the proto file: pb_drops_two.proto. Please use 'syntax = "proto2";' or 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2 syntax.) input:2:1: Expected identifier, got: 2 Proto 2.5 does same: $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt > pb_drops_two.bin input:2:1: Expected identifier. Failed to parse input. St.Ack On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote: > On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com> > wrote: > >> > >> > > If unknown fields are dropped, then applications proxying tokens and >> > other >> > >> data between servers will effectively corrupt those messages, unless >> we >> > >> make everything opaque bytes, which- absent the convenient, >> prenominate >> > >> semantics managing the conversion- obviate the compatibility >> machinery >> > that >> > >> is the whole point of PB. Google is removing the features that >> justified >> > >> choosing PB over its alternatives. Since we can't require that our >> > >> applications compile (or link) against our updated schema, this >> creates >> > a >> > >> problem that PB was supposed to solve. >> > > >> > > >> > > This is scary, and it potentially affects services outside of the >> Hadoop >> > > codebase. This makes it difficult to assess the impact. >> > >> > Stack mentioned a compatibility mode that uses the proto2 semantics. >> > If that carries unknown fields through intermediate handlers, then >> > this objection goes away. -C >> >> >> Did some more googling, found this: >> >> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ >> >> Feng Xiao appears to be a Google engineer, and suggests workarounds like >> packing the fields into a byte type. No mention of a PB2 compatibility >> mode. Also here: >> >> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ >> >> Participants say that unknown fields were dropped for automatic JSON >> encoding, since you can't losslessly convert to JSON without knowing the >> type. >> >> Unfortunately, it sounds like these are intrinsic differences with PB3. >> >> > As I read it Andrew, the field-dropping happens when pb3 is running in > proto3 'mode'. Let me try it... > > St.Ack > > > >> Best, >> Andrew >> > >