yeah, I'll have it running twice as fast by the time you are back from your holiday. ;)
________________________________ From: Kenton Varda [mailto:ken...@google.com] Sent: Wednesday, July 15, 2009 7:48 PM To: Alex Black Cc: protobuf@googlegroups.com Subject: Re: Performance: Sending a message with ~150k items, approx 3.3mb, can I do better than 100ms? If you can find a way to make it faster, please send a patch! :) On Wed, Jul 15, 2009 at 4:46 PM, Alex Black <a...@alexblack.ca> wrote: Thanks, yes performance seems really good, though I wouldn't mind seeing the java deserialization faster. ________________________________ From: Kenton Varda [mailto:ken...@google.com] Sent: Tuesday, July 14, 2009 8:06 PM To: Alex Black Cc: protobuf@googlegroups.com Subject: Re: Performance: Sending a message with ~150k items, approx 3.3mb, can I do better than 100ms? So, 172 MB/s for composition + serialization. Sounds about right. On Tue, Jul 14, 2009 at 10:46 AM, Alex Black <a...@alexblack.ca> wrote: Thanks for those tips. I am using tcmalloc, and I'm re-using message for each batch, e.g. I fill it up with say 500 items, send it out, clear it, re-use it. Here are my hopefully accurate timings, each done 100 times, averaged: 1. Baseline (just loops through the data on the server) no protobuf: 191ms 2. Compose messages, serialize them, no I/O or deserialization: 213ms 3. Same as #2 but with IO to a dum java client: 265ms 4. Same as #3 but add java protobuf deserialization: 323ms So from this it looks like: - composing and serializing the messages takes 22ms - sending the data over sockets takes 52ms - deserializing the data in java with protobuf takes 58ms The amount of data being sent is: 3,959,368 bytes in 158,045 messages (composed in batches of 1000). - Alex ________________________________ From: Kenton Varda [mailto:ken...@google.com] Sent: Tuesday, July 14, 2009 3:26 AM To: Alex Black Cc: Protocol Buffers Subject: Re: Performance: Sending a message with ~150k items, approx 3.3mb, can I do better than 100ms? OK. If your message composition (or parsing, on the receiving end) takes a lot of time, you might look into how much of that is due to memory allocation. Usually this is a pretty significant fraction. Two good ways to improve that: 1) If your app builds many messages over time and most of them have roughly the same "shape" (i.e. which fields are set, the size of repeated fields, etc. are usually similar), then you should clear and reuse the same message object rather than allocate a new one each time. This way it will reuse the same memory, avoiding allocation. 2) Use tcmalloc: http://google-perftools.googlecode.com It is often faster than your system's malloc, particularly for multi-threaded C++ apps. All C++ servers at Google use this. On Mon, Jul 13, 2009 at 11:50 PM, Alex Black <a...@alexblack.ca> wrote: Kenton: I made a mistake with these numbers - pls ignore them - I'll revisit tomorrow. Thx. -----Original Message----- From: protobuf@googlegroups.com [mailto:proto...@googlegroups.com] On Behalf Of Alex Black Sent: Tuesday, July 14, 2009 2:05 AM To: Protocol Buffers Subject: Re: Performance: Sending a message with ~150k items, approx 3.3mb, can I do better than 100ms? ok, I took I/O out of the picture by serializing each message into a pre-allocated buffer, and this time I did a more through measurement. Benchmark 1: Complete scenario - average time 262ms (100 runs) Benchmark 2: Same as # 1 but no IO - average time 250ms (100 runs) Benchmark 3: Same as 2 but with serialization commented out - average time 251ms (100 runs) Benchmark 4: Same as 3 but with message composition commented out too (no protobuf calls) - average time 185 ms (100 runs) So from this I conclude: - My initial #s were wrong - My timings vary too much for each run to really get accurate averages - IO takes about 10ms - Serialization takes ~0ms - Message composition and setting of fields takes ~66ms My message composition is in a loop, the part in the loop looks like: uuid_t relatedVertexId; myProto::IdConfidence* neighborIdConfidence = pNodeWithNeighbors- >add_neighbors(); // Set the vertex id neighborIdConfidence->set_id((const void*) relatedVertexId, 16); // set the confidence neighborIdConfidence->set_confidence( confidence ); currentBatchSize++; if ( currentBatchSize == BatchSize ) { // Flush out this batch //stream << getNeighborsResponse; getNeighborsResponse.Clear(); currentBatchSize = 0; } On Jul 14, 1:27 am, Kenton Varda <ken...@google.com> wrote: > Oh, I didn't even know you were including composition in there. My > benchmarks are only for serialization of already-composed messages. > But this still doesn't tell us how much time is spent on network I/O vs. > protobuf serialization. My guess is that once you factor that out, > your performance is pretty close to the benchmarks. > > On Mon, Jul 13, 2009 at 10:11 PM, Alex Black <a...@alexblack.ca> wrote: > > > If I comment out the actual serialization and sending of the message > > (so I am just composing messages, and clearing them each batch) then > > the 100ms drops to about 50ms. > > > On Jul 14, 12:36 am, Alex Black <a...@alexblack.ca> wrote: > > > I'm sending a message with about ~150k repeated items in it, total > > > size is about 3.3mb, and its taking me about 100ms to serialize it > > > and send it out. > > > > Can I expect to do any better than this? What could I look into to > > > improve this? > > > - I have "option optimize_for = SPEED;" set in my proto file > > > - I'm compiling with -O3 > > > - I'm sending my message in batches of 1000 > > > - I'm using C++, on ubuntu, x64 > > > - I'm testing all on one machine (e.g. client and server are on > > > one > > > machine) > > > > My message looks like: > > > > message NodeWithNeighbors > > > { > > > required Id nodeId = 1; > > > repeated IdConfidence neighbors = 2; > > > > } > > > > message GetNeighborsResponse > > > { > > > repeated NodeWithNeighbors nodesWithNeighbors = 1; > > > > } > > > > message IdConfidence > > > { > > > required bytes id = 1; > > > required float confidence = 2; > > > > } > > > > Where "bytes id" is used to send 16byte IDs (uuids). > > > > I'm writing each message (batch) out like this: > > > > CodedOutputStream codedOutputStream(&m_ProtoBufStream); > > > > // Write out the size of the message > > > codedOutputStream.WriteVarint32(message.ByteSize()); > > > // Ask the message to serialize itself to our stream > > > adapter, > > which > > > ultimately calls Write on us > > > // which we then call Write on our composed stream > > > message.SerializeWithCachedSizes(&codedOutputStream); > > > > In my stream implementation I'm buffering every 16kb, and calling > > > send on the socket once i have 16kb. > > > > Thanks! > > > > - Alex --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---