Status: New
Owner: ----
Labels: Type-Defect Priority-Medium
New issue 378 by josephrd...@gmail.com: UTF8 string validation for repeated
strings only checks the first element when deserializing
http://code.google.com/p/protobuf/issues/detail?id=378
What steps will reproduce the problem?
1. Serialize a message with a repeated string field containing two strings.
The first string should be valid UTF8, the second should not.
2. Deserialize the previously serialized message.
3. Observe the invalid UTF8 warning is only printed from the serialization
step, and not the deserialization step.
4. If you reverse the order the strings are encoded, the deserialization
step will flag two invalid UTF8 strings instead of one.
5. This was observed in the C++ code.
What is the expected output? What do you see instead?
Explained above.
Additionally, the problem in source is in the file cpp_string_field.cc, in
the method
void RepeatedStringFieldGenerator::
GenerateMergeFromCodedStream(io::Printer* printer) const;
When passing arguments to the VerifyUTF8String method it always passes the
0th element, ie:
printer->Print(variables_,
"::google::protobuf::internal::WireFormat::VerifyUTF8String(\n"
" this->$name$(0).data(), this->$name$(0).length(),\n"
" ::google::protobuf::internal::WireFormat::PARSE);\n");
Instead of the 0th element, it should reference the last element of the
repeated field (since the most recently decoded element will reside there.)
There's no simple accessor, but $name$_size() - 1 works for obtaining the
index.
--
You received this message because you are subscribed to the Google Groups "Protocol
Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/protobuf?hl=en.