Interesting results. Looking forward to your code submission. FWIW, I was looking at the same benchmark code from the perspective of Thrift and wrote the following to the thrift-dev list. Thought you might find it at least somewhat relevant.
------------------------------------------------------------- Subj: Report on thrift-protobuf-compare I did some digging into the benchmarking code at: http://code.google.com/p/thrift-protobuf-compare/ If folks could look this over and give me comments, I'll make any edits suggested and pass this information along to the owner of thrift-protobuf-compare. Here is the short version: 1. The performance figures that thrift-protobuf-compare provides for dynamic serialization systems like JSON are not currently valid since the tests do not really test them as a fully general serialization/deserialization framework. 2. Using TCompactProtocol, Thrift serialization speed and serialized size are basically equivalent to protocol buffers 3. FWIW object creation is quite a bit faster in Thrift than it is in protocol buffers, at least in this test. I am not sure how important this is for real world workloads. 4. Deserialization comes out slower in Thrift than in protocol buffers. I think this may be because the test harness' API matches up directly with APIs that protocol buffers provides that Thrift does not, causing Thrift to do more work. I am still investigating this. Here is the long version: First, the test is not comparing apples-to-apples when it compares any of the dynamic serialization systems (JSON, yaml, etc.) with the static serialization systems (Thrift, protocol buffers). For example, the JSON serialization and deserialization looks like this: public byte[] serialize(MediaContent content) throws Exception { ByteArrayOutputStream baos = new ByteArrayOutputStream(expectedSize); JsonGenerator generator = _factory.createJsonGenerator(baos, JsonEncoding.UTF8); generator.writeStartObject(); writeMedia(generator, content.getMedia()); writeImage(generator, content.getImage(0)); writeImage(generator, content.getImage(1)); generator.writeEndObject(); generator.close(); byte[] array = baos.toByteArray(); expectedSize = array.length; return array; } public MediaContent deserialize(byte[] array) throws Exception { JsonParser parser = _factory.createJsonParser(array); parser.nextToken(); // start object MediaContent mc = new MediaContent(readMedia(parser)); mc.addImage(readImage(parser)); mc.addImage(readImage(parser)); parser.nextToken(); // end object parser.close(); return mc; } Notice that the serializer somehow knows that there are two images to be written without looping over the object being serialized and that the deserializer somehow knows that there are two images to be read without any metadata to carry that information in the serialized form. The upshot is that, as written, none of the dynamic mechanisms are really being used in a manner that results in a valid serialization/deserialization framework and any performance numbers derived from them are invalid. Furthermore, the data being serialized is not the same. Compare this method from StdMediaSerializer.java: public final MediaContent create() throws Exception { Media media = new Media(null, "video/mpg4", Media.Player.JAVA, "Javaone Keynote", "http: //javaone.com/keynote.mpg", 1234567, 123, 0, 0, 0); media.addToPerson("Bill Gates"); media.addToPerson("Steve Jobs"); Image image1 = new Image(0, "Javaone Keynote", "A", 0, Image.Size.LARGE); Image image2 = new Image(0, "Javaone Keynote", "B", 0, Image.Size.SMALL); MediaContent content = new MediaContent(media); content.addImage(image1); content.addImage(image2); return content; } With the same method in ThriftSerializer.java: public MediaContent create() { Media media = new Media(); media.setUri("http://javaone.com/keynote.mpg"); media.setFormat("video/mpg4"); media.setTitle("Javaone Keynote"); media.setDuration(1234567); media.setBitrate(123); media.addToPerson("Bill Gates"); media.addToPerson("Steve Jobs"); media.setPlayer(Player.JAVA); Image image1 = new Image(); image1.setUri("http://javaone.com/keynote_large.jpg"); image1.setSize(Size.LARGE); image1.setTitle("Javaone Keynote"); Image image2 = new Image("http://javaone.com/keynote_thumbnail.jpg", "Javaone Keynote", -1, -1, Size.SMALL); MediaContent content = new MediaContent(); content.setMedia(media); content.addToImage(image1); content.addToImage(image2); return content; } And in ProtobufSerializer.java: public MediaContent create() { MediaContent content = MediaContent.newBuilder(). setMedia( Media.newBuilder().setUri("http://javaone.com/keynote.mpg").setFormat("video /mpg4").setTit le("Javaone Keynote").setDuration(1234567). setBitrate(123).addPerson("Bill Gates").addPerson("Steve Jobs").setPlayer(Player.JAVA).b uild()). addImage( Image.newBuilder().setUri("http://javaone.com/keynote_large.jpg").setSize(Si ze.LARGE).setT itle("Javaone Keynote").build()). addImage( Image.newBuilder().setUri("http://javaone.com/keynote_thumbnail.jpg").setSiz e(Size.SMALL). setTitle("Javaone Keynote").build()). build(); return content; } Note that the URIs for the two images are different (shorter) in StdMediaSerializer than they are in the case of the other two. And even between ThriftSerializer and ProtobufSerializer, the data is not identical -- the ThriftSerializer is passing two -1 values that for image2 that ProtobufSerializer is not passing. It is too much effort to fix the dynamic serializers at the moment, so I decided just to focus on using the code comparing Thrift and protocol buffers. I updated to the trunk of Thrift (rev 773454) and changed the Thrift serializer to use TCompactProtocol instead of TBinaryProtocol. I also corrected ThriftSerializer's create() so that the same data was being sent for image2 as in ProtobufSerializer. Finally, I updated the formatting in BenchmarkRunner and commented out all the serializers except Thrift and protocol buffers. Here are the benchmark results from 3 consecutive runs: , Create, Ser, Deser, Total, Size thrift , 267.37, 8314.00, 8546.00, 17127.36, 220 protobuf , 412.98, 12642.00, 5217.50, 18272.48, 217 , Create, Ser, Deser, Total, Size thrift , 266.87, 10905.50, 8526.50, 19698.86, 220 protobuf , 415.21, 11880.50, 4930.00, 17225.71, 217 , Create, Ser, Deser, Total, Size thrift , 264.95, 11059.50, 8701.50, 20025.95, 220 protobuf , 417.45, 11125.00, 5203.50, 16745.95, 217 Note that TCompactProtocol is almost as compact as protocol buffers and performs slightly better on serialization (although there is clearly some variability in the timings). Object creation is significantly faster in Thrift than in protocol buffers although it is unclear whether that is really important. Looking a little further at deserialization since Thrift seemed to be performing worse than protocol buffers there, the problem may be related to the fact that protocol buffers provides APIs that support direct serialization to and deserialization from byte arrays which Thrift does not provide. The test harness is set up such that the output of serialize() and the input of deserialize() is a byte array, so this means that Thrift needs to do more work to match up with the test harness. I am still investigating this. Chad ----- Original Message ---- From: Sharad Agarwal <[email protected]> To: [email protected] Sent: Tuesday, May 12, 2009 5:47:16 AM Subject: Re: benchmark site: thrift-protobuf-compare ok. I got a chance to try my hand at it. Ran the benchmark on my dev box. Good to see the relative nos: , Object create, Serialization, Deserialization, Total Time, Serialized Size avro-generic , 2453.88000, 5335.50000, 4526.00000, 12315.38000, 211 avro-specific , 1024.00000, 2912.50000, 10415.00000, 14351.50000, 211 protobuf , 1196.20000, 8483.00000, 5965.00000, 15644.20000, 217 thrift , 1089.70000, 7744.00000, 8796.50000, 17630.20000, 314 hessian , 1019.92500, 355739.00000, 49838.50000, 406597.42500, 463 java , 1018.29500, 26675.50000, 87540.50000, 115234.29500, 845 java (externalizable) , 1028.99000, 10002.50000, 23270.50000, 34301.99000, 315 Note that in above results, I have moved the Utf-8 conversion out of the create() call. With Utf-8 conversion in create, the object create times are: , Object create avro-generic , 4236.55500 avro-specific , 2982.89000 Serialized size is small as expected. We are doing pretty good in Serialization time as well. Deserialization time is relatively high, I think again it is due to Utf-8 object creation. I will post a patch soon to the project. - Sharad
