Thanks for the input.  I'm not looking to beat binary serialization 
performance, but I would like to avoid having to hand write the JSON 
serialization for insertion into elasticsearch.  I understand the proto 
JSON serialization has to lookup field names to generate the JSON, which 
isn't required when building manually, but I wouldn't expect that to 
account for an order of magnitude difference.

A repeated double would not give the desired JSON output.  This is used for 
the coordinates section of GeoJson (what elasticsearch understands).

Thanks,
Ed

On Thursday, March 22, 2018 at 6:45:41 PM UTC-4, Feng Xiao wrote:
>
> On Thu, Mar 22, 2018 at 8:23 AM, Edward Clark <ebcl...@gmail.com 
> <javascript:>> wrote:
>
>> Howdy,
>>
>> I'm working on a project that recently needed to insert data represented 
>> by protobufs into elasticsearch.  Using the built in JSON serialization we 
>> were able to quickly get data into elasticsearch, however, the JSON 
>> serialization seems to be rather slow when compared to generating with a 
>> library like rapidjson.  Is this expected or is a likely we're doing 
>> something wrong? 
>>
> It's expected for proto-to-JSON conversion to be slower (and likely much 
> slower) than a dedicated JSON library converting objects designed to 
> represent JSON objects to JSON. It's like comparing a library that converts 
> rapidjson::Document to protobuf binary format against protobuf binary 
> serialization. The latter is definitely going to be faster no matter how 
> you optimize the former. Proto objects are just not designed to be 
> efficiently converted to JSON.
>
> There are ways to improve the proto to JSON conversion though, but at the 
> end of day it won't going to beat proto to proto binary serialization so 
> usually performance sensitive services will just support proto binary 
> format instead. 
>  
>
>> Below is info on what we're using, and relative serialization performance 
>> results.  Surprisingly, rapidjson serialization was faster than protobufs 
>> binary serialization in some cases, which leads me to believe I'm doing 
>> something wrong.
>>
>> Ubuntu 16.04
>> GCC 7.3, std=c++17, libstdc++11 string api
>> Protobuf 3.5.1.1 compiled with -O3, proto3 syntax
>>
>> I've measure the performance of 3 cases, serializing the protobuf to 
>> binary, serializing the protobuf to JSON via MessageToJSONString, and 
>> building a rapidjson::Document from the protobuf and then serializing that 
>> to JSON.  All tests use the same message with different portions of the 
>> message populated, 100,000 iterations.  The json generated from the 
>> protobuf and rapidjson match exactly.
>>
>> Test 1, a single string field populated.
>> proto binary: 0.01s
>> proto json:    0.50s
>> rapidjson:     0.02s
>>
>> Test 2, 1 top level string field, 1 nested object with 3 more string 
>> fields.
>> proto binary: 0.02s
>> proto json:    1.06s
>> rapidjson:     0.05s
>>
>> Test 3, 2 string fields, and 1 ::google::protobuf::ListValue containing 
>> doubles of the format, [[[double, double], [double, double], ...]], 36 
>> pairs of doubles total.
>> *proto binary: 1.50s*
>> *proto json:    8.87s*
>> *rapidjson:     0.41s*
>>
> I think this is because of your choice of using 
> google::protobuf::ListValue. That type (along with 
> google::protobuf::Value/Struct) is specifically designed to mimic arbitrary 
> JSON content with proto and is far from efficient compared to protobuf 
> primitive types. I would just use a "repeated double" to represent these 36 
> pairs of doubles.
>  
>
>>
>> Protobuf binary serialization code:
>>     std::string toJSON(Message const& msg) { return 
>> msg.SerializeAsString(); }
>>
>> Protobuf json serialization code:
>>     std::string toJSON(Message const& msg) { return 
>> msg.SerializeAsString(); }
>>         std::string json;
>>         ::google::protobuf::util::MessageToJsonString(msg, 
>> std::addressof(json));
>>         return json;
>>     }
>>
>> Rapidjson serialization code:
>>     // It's a lengthy section of code manually populating the document.  
>> Of note, empty strings and numbers set to 0 are omitted from the JSON as 
>> the protobuf does.  The resulting JSON is exactly the same as the protobuf 
>> json.
>>
>> Any info on how to improve the protobuf to JSON serialization would be 
>> greatly appreciated! 
>>
>> Thanks,
>> Ed
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to protobuf+u...@googlegroups.com <javascript:>.
>> To post to this group, send email to prot...@googlegroups.com 
>> <javascript:>.
>> Visit this group at https://groups.google.com/group/protobuf.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to