If you can find a way to make it faster, please send a patch! :) On Wed, Jul 15, 2009 at 4:46 PM, Alex Black <a...@alexblack.ca> wrote:
> Thanks, yes performance seems really good, though I wouldn't mind seeing > the java deserialization faster. > > ------------------------------ > *From:* Kenton Varda [mailto:ken...@google.com] > *Sent:* Tuesday, July 14, 2009 8:06 PM > *To:* Alex Black > *Cc:* protobuf@googlegroups.com > > *Subject:* Re: Performance: Sending a message with ~150k items, approx > 3.3mb, can I do better than 100ms? > > So, 172 MB/s for composition + serialization. Sounds about right. > > On Tue, Jul 14, 2009 at 10:46 AM, Alex Black <a...@alexblack.ca> wrote: > >> Thanks for those tips. I am using tcmalloc, and I'm re-using message >> for each batch, e.g. I fill it up with say 500 items, send it out, clear it, >> re-use it. >> >> Here are my hopefully accurate timings, each done 100 times, averaged: >> >> 1. Baseline (just loops through the data on the server) no protobuf: 191ms >> 2. Compose messages, serialize them, no I/O or deserialization: 213ms >> 3. Same as #2 but with IO to a dum java client: 265ms >> 4. Same as #3 but add java protobuf deserialization: 323ms >> >> So from this it looks like: >> - composing and serializing the messages takes 22ms >> - sending the data over sockets takes 52ms >> - deserializing the data in java with protobuf takes 58ms >> >> The amount of data being sent is: 3,959,368 bytes in 158,045 messages >> (composed in batches of 1000). >> >> - Alex >> >> ------------------------------ >> *From:* Kenton Varda [mailto:ken...@google.com] >> *Sent:* Tuesday, July 14, 2009 3:26 AM >> *To:* Alex Black >> *Cc:* Protocol Buffers >> >> *Subject:* Re: Performance: Sending a message with ~150k items, approx >> 3.3mb, can I do better than 100ms? >> >> OK. If your message composition (or parsing, on the receiving end) >> takes a lot of time, you might look into how much of that is due to memory >> allocation. Usually this is a pretty significant fraction. Two good ways >> to improve that: >> 1) If your app builds many messages over time and most of them have >> roughly the same "shape" (i.e. which fields are set, the size of repeated >> fields, etc. are usually similar), then you should clear and reuse the same >> message object rather than allocate a new one each time. This way it will >> reuse the same memory, avoiding allocation. >> >> 2) Use tcmalloc: >> http://google-perftools.googlecode.com >> It is often faster than your system's malloc, particularly for >> multi-threaded C++ apps. All C++ servers at Google use this. >> >> On Mon, Jul 13, 2009 at 11:50 PM, Alex Black <a...@alexblack.ca> wrote: >> >>> >>> Kenton: I made a mistake with these numbers - pls ignore them - I'll >>> revisit tomorrow. >>> >>> Thx. >>> >>> -----Original Message----- >>> From: protobuf@googlegroups.com [mailto:proto...@googlegroups.com] On >>> Behalf Of Alex Black >>> Sent: Tuesday, July 14, 2009 2:05 AM >>> To: Protocol Buffers >>> Subject: Re: Performance: Sending a message with ~150k items, approx >>> 3.3mb, can I do better than 100ms? >>> >>> >>> ok, I took I/O out of the picture by serializing each message into a >>> pre-allocated buffer, and this time I did a more through measurement. >>> >>> Benchmark 1: Complete scenario >>> - average time 262ms (100 runs) >>> >>> Benchmark 2: Same as # 1 but no IO >>> - average time 250ms (100 runs) >>> >>> Benchmark 3: Same as 2 but with serialization commented out >>> - average time 251ms (100 runs) >>> >>> Benchmark 4: Same as 3 but with message composition commented out too (no >>> protobuf calls) >>> - average time 185 ms (100 runs) >>> >>> So from this I conclude: >>> - My initial #s were wrong >>> - My timings vary too much for each run to really get accurate averages >>> - IO takes about 10ms >>> - Serialization takes ~0ms >>> - Message composition and setting of fields takes ~66ms >>> >>> My message composition is in a loop, the part in the loop looks like: >>> >>> uuid_t relatedVertexId; >>> >>> myProto::IdConfidence* neighborIdConfidence = >>> pNodeWithNeighbors- >>> >add_neighbors(); >>> >>> // Set the vertex id >>> neighborIdConfidence->set_id((const void*) >>> relatedVertexId, 16); >>> // set the confidence >>> neighborIdConfidence->set_confidence( confidence >>> ); >>> >>> currentBatchSize++; >>> >>> if ( currentBatchSize == BatchSize ) >>> { >>> // Flush out this batch >>> //stream << getNeighborsResponse; >>> getNeighborsResponse.Clear(); >>> currentBatchSize = 0; >>> } >>> >>> On Jul 14, 1:27 am, Kenton Varda <ken...@google.com> wrote: >>> > Oh, I didn't even know you were including composition in there. My >>> > benchmarks are only for serialization of already-composed messages. >>> > But this still doesn't tell us how much time is spent on network I/O >>> vs. >>> > protobuf serialization. My guess is that once you factor that out, >>> > your performance is pretty close to the benchmarks. >>> > >>> > On Mon, Jul 13, 2009 at 10:11 PM, Alex Black <a...@alexblack.ca> >>> wrote: >>> > >>> > > If I comment out the actual serialization and sending of the message >>> > > (so I am just composing messages, and clearing them each batch) then >>> > > the 100ms drops to about 50ms. >>> > >>> > > On Jul 14, 12:36 am, Alex Black <a...@alexblack.ca> wrote: >>> > > > I'm sending a message with about ~150k repeated items in it, total >>> > > > size is about 3.3mb, and its taking me about 100ms to serialize it >>> > > > and send it out. >>> > >>> > > > Can I expect to do any better than this? What could I look into to >>> > > > improve this? >>> > > > - I have "option optimize_for = SPEED;" set in my proto file >>> > > > - I'm compiling with -O3 >>> > > > - I'm sending my message in batches of 1000 >>> > > > - I'm using C++, on ubuntu, x64 >>> > > > - I'm testing all on one machine (e.g. client and server are on >>> > > > one >>> > > > machine) >>> > >>> > > > My message looks like: >>> > >>> > > > message NodeWithNeighbors >>> > > > { >>> > > > required Id nodeId = 1; >>> > > > repeated IdConfidence neighbors = 2; >>> > >>> > > > } >>> > >>> > > > message GetNeighborsResponse >>> > > > { >>> > > > repeated NodeWithNeighbors nodesWithNeighbors = 1; >>> > >>> > > > } >>> > >>> > > > message IdConfidence >>> > > > { >>> > > > required bytes id = 1; >>> > > > required float confidence = 2; >>> > >>> > > > } >>> > >>> > > > Where "bytes id" is used to send 16byte IDs (uuids). >>> > >>> > > > I'm writing each message (batch) out like this: >>> > >>> > > > CodedOutputStream codedOutputStream(&m_ProtoBufStream); >>> > >>> > > > // Write out the size of the message >>> > > > codedOutputStream.WriteVarint32(message.ByteSize()); >>> > > > // Ask the message to serialize itself to our stream >>> > > > adapter, >>> > > which >>> > > > ultimately calls Write on us >>> > > > // which we then call Write on our composed stream >>> > > > message.SerializeWithCachedSizes(&codedOutputStream); >>> > >>> > > > In my stream implementation I'm buffering every 16kb, and calling >>> > > > send on the socket once i have 16kb. >>> > >>> > > > Thanks! >>> > >>> > > > - Alex >>> >>> >>> >>> >>> >> > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---