On Wed, Mar 29, 2017 at 4:59 PM, Stack <st...@duboce.net> wrote:
>> The former; an intermediate handler decoding, [modifying,] and
>> encoding the record without losing unknown fields.
> I did not try this. Did you? Otherwise I can.

Yeah, I did. Same format. -C

>> This looks fine. -C
>> > Thanks,
>> > St.Ack
>> >
>> >
>> > # Using the protoc v3.0.2 tool
>> > $ protoc --version
>> > libprotoc 3.0.2
>> >
>> > # I have a simple proto definition with two fields in it
>> > $ more pb.proto
>> > message Test {
>> >   optional string one = 1;
>> >   optional string two = 2;
>> > }
>> >
>> > # This is a text-encoded instance of a 'Test' proto message:
>> > $ more pb.txt
>> > one: "one"
>> > two: "two"
>> >
>> > # Now I encode the above as a pb binary
>> > $ protoc --encode=Test pb.proto < pb.txt > pb.bin
>> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
>> > specified for the proto file: pb.proto. Please use 'syntax = "proto2";'
>> > or
>> > 'syntax = "proto3";' to specify a syntax version. (Defaulted to proto2
>> > syntax.)
>> >
>> > # Here is a dump of the binary
>> > $ od -xc pb.bin
>> > 0000000      030a    6e6f    1265    7403    6f77
>> >           \n 003   o   n   e 022 003   t   w   o
>> > 0000012
>> >
>> > # Here is a proto definition file that has a Test Message minus the
>> > 'two'
>> > field.
>> > $ more pb_drops_two.proto
>> > message Test {
>> >   optional string one = 1;
>> > }
>> >
>> > # Use it to decode the bin file:
>> > $ protoc --decode=Test pb_drops_two.proto < pb.bin
>> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
>> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
>> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> > (Defaulted
>> > to proto2 syntax.)
>> > one: "one"
>> > 2: "two"
>> >
>> > Note how the second field is preserved (absent a field name). It is not
>> > dropped.
>> >
>> > If I change the syntax of pb_drops_two.proto to be proto3, the field IS
>> > dropped.
>> >
>> > # Here proto file with proto3 syntax specified (had to drop the
>> > 'optional'
>> > qualifier -- not allowed in proto3):
>> > $ more pb_drops_two.proto
>> > syntax = "proto3";
>> > message Test {
>> >   string one = 1;
>> > }
>> >
>> > $ protoc --decode=Test pb_drops_two.proto < pb.bin  > pb_drops_two.txt
>> > $ more pb_drops_two.txt
>> > one: "one"
>> >
>> >
>> > I cannot reencode the text output using pb_drops_two.proto. It
>> > complains:
>> >
>> > $ protoc --encode=Test pb_drops_two.proto < pb_drops_two.txt >
>> > pb_drops_two.bin
>> > [libprotobuf WARNING google/protobuf/compiler/parser.cc:546] No syntax
>> > specified for the proto file: pb_drops_two.proto. Please use 'syntax =
>> > "proto2";' or 'syntax = "proto3";' to specify a syntax version.
>> > (Defaulted
>> > to proto2 syntax.)
>> > input:2:1: Expected identifier, got: 2
>> >
>> > Proto 2.5 does same:
>> >
>> > $ ~/bin/protobuf-2.5.0/src/protoc --encode=Test pb_drops_two.proto <
>> > pb_drops_two.txt > pb_drops_two.bin
>> > input:2:1: Expected identifier.
>> > Failed to parse input.
>> >
>> > St.Ack
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Wed, Mar 29, 2017 at 10:14 AM, Stack <st...@duboce.net> wrote:
>> >>
>> >> On Tue, Mar 28, 2017 at 4:18 PM, Andrew Wang <andrew.w...@cloudera.com>
>> >> wrote:
>> >>>
>> >>> >
>> >>> > > If unknown fields are dropped, then applications proxying tokens
>> >>> > > and
>> >>> > other
>> >>> > >> data between servers will effectively corrupt those messages,
>> >>> > >> unless
>> >>> > >> we
>> >>> > >> make everything opaque bytes, which- absent the convenient,
>> >>> > >> prenominate
>> >>> > >> semantics managing the conversion- obviate the compatibility
>> >>> > >> machinery
>> >>> > that
>> >>> > >> is the whole point of PB. Google is removing the features that
>> >>> > >> justified
>> >>> > >> choosing PB over its alternatives. Since we can't require that
>> >>> > >> our
>> >>> > >> applications compile (or link) against our updated schema, this
>> >>> > >> creates
>> >>> > a
>> >>> > >> problem that PB was supposed to solve.
>> >>> > >
>> >>> > >
>> >>> > > This is scary, and it potentially affects services outside of the
>> >>> > > Hadoop
>> >>> > > codebase. This makes it difficult to assess the impact.
>> >>> >
>> >>> > Stack mentioned a compatibility mode that uses the proto2 semantics.
>> >>> > If that carries unknown fields through intermediate handlers, then
>> >>> > this objection goes away. -C
>> >>>
>> >>>
>> >>> Did some more googling, found this:
>> >>>
>> >>> https://groups.google.com/d/msg/protobuf/Z6pNo81FiEQ/fHkdcNtdAwAJ
>> >>>
>> >>> Feng Xiao appears to be a Google engineer, and suggests workarounds
>> >>> like
>> >>> packing the fields into a byte type. No mention of a PB2 compatibility
>> >>> mode. Also here:
>> >>>
>> >>> https://groups.google.com/d/msg/protobuf/bO2L6-_t91Q/-zIaJAR9AAAJ
>> >>>
>> >>> Participants say that unknown fields were dropped for automatic JSON
>> >>> encoding, since you can't losslessly convert to JSON without knowing
>> >>> the
>> >>> type.
>> >>>
>> >>> Unfortunately, it sounds like these are intrinsic differences with
>> >>> PB3.
>> >>>
>> >>
>> >> As I read it Andrew, the field-dropping happens when pb3 is running in
>> >> proto3 'mode'. Let me try it...
>> >>
>> >> St.Ack
>> >>
>> >>
>> >>>
>> >>> Best,
>> >>> Andrew
>> >>
>> >>
>> >

To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to