Status: New
Owner: ----
Labels: Type-Defect Priority-Medium

New issue 378 by josephrd...@gmail.com: UTF8 string validation for repeated strings only checks the first element when deserializing
http://code.google.com/p/protobuf/issues/detail?id=378

What steps will reproduce the problem?
1. Serialize a message with a repeated string field containing two strings. The first string should be valid UTF8, the second should not.
2. Deserialize the previously serialized message.
3. Observe the invalid UTF8 warning is only printed from the serialization step, and not the deserialization step. 4. If you reverse the order the strings are encoded, the deserialization step will flag two invalid UTF8 strings instead of one.
5. This was observed in the C++ code.

What is the expected output? What do you see instead?

Explained above.

Additionally, the problem in source is in the file cpp_string_field.cc, in the method

void RepeatedStringFieldGenerator::
GenerateMergeFromCodedStream(io::Printer* printer) const;

When passing arguments to the VerifyUTF8String method it always passes the 0th element, ie:

    printer->Print(variables_,
      "::google::protobuf::internal::WireFormat::VerifyUTF8String(\n"
      "  this->$name$(0).data(), this->$name$(0).length(),\n"
      "  ::google::protobuf::internal::WireFormat::PARSE);\n");

Instead of the 0th element, it should reference the last element of the repeated field (since the most recently decoded element will reside there.) There's no simple accessor, but $name$_size() - 1 works for obtaining the index.


--
You received this message because you are subscribed to the Google Groups "Protocol 
Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Reply via email to