Re: Performance: Sending a message with ~150k items, approx 3.3mb, can I do better than 100ms?

Kenton Varda Tue, 14 Jul 2009 17:07:06 -0700

So, 172 MB/s for composition + serialization.  Sounds about right.

On Tue, Jul 14, 2009 at 10:46 AM, Alex Black <a...@alexblack.ca> wrote:


>  Thanks for those tips.  I am using tcmalloc, and I'm re-using message for
> each batch, e.g. I fill it up with say 500 items, send it out, clear it,
> re-use it.
>
> Here are my hopefully accurate timings, each done 100 times, averaged:
>
> 1. Baseline (just loops through the data on the server) no protobuf: 191ms
> 2. Compose messages, serialize them, no I/O or deserialization: 213ms
> 3. Same as #2 but with IO to a dum java client: 265ms
> 4. Same as #3 but add java protobuf deserialization: 323ms
>
> So from this it looks like:
> - composing and serializing the messages takes 22ms
> - sending the data over sockets takes 52ms
> - deserializing the data in java with protobuf takes 58ms
>
> The amount of data being sent is: 3,959,368 bytes in 158,045 messages
> (composed in batches of 1000).
>
> - Alex
>
>  ------------------------------
> *From:* Kenton Varda [mailto:ken...@google.com]
> *Sent:* Tuesday, July 14, 2009 3:26 AM
> *To:* Alex Black
> *Cc:* Protocol Buffers
>
> *Subject:* Re: Performance: Sending a message with ~150k items, approx
> 3.3mb, can I do better than 100ms?
>
> OK.  If your message composition (or parsing, on the receiving end) takes a
> lot of time, you might look into how much of that is due to memory
> allocation.  Usually this is a pretty significant fraction.  Two good ways
> to improve that:
> 1) If your app builds many messages over time and most of them have roughly
> the same "shape" (i.e. which fields are set, the size of repeated fields,
> etc. are usually similar), then you should clear and reuse the same message
> object rather than allocate a new one each time.  This way it will reuse the
> same memory, avoiding allocation.
>
> 2) Use tcmalloc:
>   http://google-perftools.googlecode.com
> It is often faster than your system's malloc, particularly for
> multi-threaded C++ apps.  All C++ servers at Google use this.
>
> On Mon, Jul 13, 2009 at 11:50 PM, Alex Black <a...@alexblack.ca> wrote:
>
>>
>> Kenton: I made a mistake with these numbers - pls ignore them - I'll
>> revisit tomorrow.
>>
>> Thx.
>>
>> -----Original Message-----
>> From: protobuf@googlegroups.com [mailto:proto...@googlegroups.com] On
>> Behalf Of Alex Black
>> Sent: Tuesday, July 14, 2009 2:05 AM
>> To: Protocol Buffers
>> Subject: Re: Performance: Sending a message with ~150k items, approx
>> 3.3mb, can I do better than 100ms?
>>
>>
>> ok, I took I/O out of the picture by serializing each message into a
>> pre-allocated buffer, and this time I did a more through measurement.
>>
>> Benchmark 1: Complete scenario
>> - average time 262ms (100 runs)
>>
>> Benchmark 2: Same as # 1 but no IO
>> - average time 250ms (100 runs)
>>
>> Benchmark 3: Same as 2 but with serialization commented out
>> - average time 251ms (100 runs)
>>
>> Benchmark 4: Same as 3 but with message composition commented out too (no
>> protobuf calls)
>> - average time 185 ms (100 runs)
>>
>> So from this I conclude:
>> - My initial #s were wrong
>> - My timings vary too much for each run to really get accurate averages
>> - IO takes about 10ms
>> - Serialization takes ~0ms
>> - Message composition and setting of fields takes ~66ms
>>
>> My message composition is in a loop, the part in the loop looks like:
>>
>>                        uuid_t relatedVertexId;
>>
>>                        myProto::IdConfidence* neighborIdConfidence =
>> pNodeWithNeighbors-
>> >add_neighbors();
>>
>>                        // Set the vertex id
>>                        neighborIdConfidence->set_id((const void*)
>> relatedVertexId, 16);
>>                        // set the confidence
>>                        neighborIdConfidence->set_confidence( confidence );
>>
>>                        currentBatchSize++;
>>
>>                        if ( currentBatchSize == BatchSize )
>>                        {
>>                                // Flush out this batch
>>                                //stream << getNeighborsResponse;
>>                                getNeighborsResponse.Clear();
>>                                currentBatchSize = 0;
>>                        }
>>
>> On Jul 14, 1:27 am, Kenton Varda <ken...@google.com> wrote:
>> > Oh, I didn't even know you were including composition in there.  My
>> > benchmarks are only for serialization of already-composed messages.
>> > But this still doesn't tell us how much time is spent on network I/O vs.
>> > protobuf serialization.  My guess is that once you factor that out,
>> > your performance is pretty close to the benchmarks.
>> >
>> > On Mon, Jul 13, 2009 at 10:11 PM, Alex Black <a...@alexblack.ca> wrote:
>> >
>> > > If I comment out the actual serialization and sending of the message
>> > > (so I am just composing messages, and clearing them each batch) then
>> > > the 100ms drops to about 50ms.
>> >
>> > > On Jul 14, 12:36 am, Alex Black <a...@alexblack.ca> wrote:
>> > > > I'm sending a message with about ~150k repeated items in it, total
>> > > > size is about 3.3mb, and its taking me about 100ms to serialize it
>> > > > and send it out.
>> >
>> > > > Can I expect to do any better than this? What could I look into to
>> > > > improve this?
>> > > > - I have "option optimize_for = SPEED;" set in my proto file
>> > > > - I'm compiling with -O3
>> > > > - I'm sending my message in batches of 1000
>> > > > - I'm using C++, on ubuntu, x64
>> > > > - I'm testing all on one machine (e.g. client and server are on
>> > > > one
>> > > > machine)
>> >
>> > > > My message looks like:
>> >
>> > > > message NodeWithNeighbors
>> > > > {
>> > > >         required Id nodeId = 1;
>> > > >         repeated IdConfidence neighbors = 2;
>> >
>> > > > }
>> >
>> > > > message GetNeighborsResponse
>> > > > {
>> > > >         repeated NodeWithNeighbors nodesWithNeighbors = 1;
>> >
>> > > > }
>> >
>> > > > message IdConfidence
>> > > > {
>> > > >         required bytes id = 1;
>> > > >         required float confidence = 2;
>> >
>> > > > }
>> >
>> > > > Where "bytes id" is used to send 16byte IDs (uuids).
>> >
>> > > > I'm writing each message (batch) out like this:
>> >
>> > > >         CodedOutputStream codedOutputStream(&m_ProtoBufStream);
>> >
>> > > >         // Write out the size of the message
>> > > >         codedOutputStream.WriteVarint32(message.ByteSize());
>> > > >         // Ask the message to serialize itself to our stream
>> > > > adapter,
>> > > which
>> > > > ultimately calls Write on us
>> > > >         // which we then call Write on our composed stream
>> > > >         message.SerializeWithCachedSizes(&codedOutputStream);
>> >
>> > > > In my stream implementation I'm buffering every 16kb, and calling
>> > > > send on the socket once i have 16kb.
>> >
>> > > > Thanks!
>> >
>> > > > - Alex
>>
>>
>> >>
>>
>

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Performance: Sending a message with ~150k items, approx 3.3mb, can I do better than 100ms?

Reply via email to