Interesting results. Looking forward to your code submission.

FWIW, I was looking at the same benchmark code from the perspective of Thrift 
and wrote the following to the thrift-dev list. Thought you might find it at 
least somewhat relevant.

-------------------------------------------------------------
Subj: Report on thrift-protobuf-compare
I did some digging into the benchmarking code at:
http://code.google.com/p/thrift-protobuf-compare/

If folks could look this over and give me comments, I'll make any edits
suggested and pass this information along to the owner of
thrift-protobuf-compare.

Here is the short version:

1. The performance figures that thrift-protobuf-compare provides for dynamic
serialization systems like JSON are not currently valid since the tests do
not really test them as a fully general serialization/deserialization
framework.

2. Using TCompactProtocol, Thrift serialization speed and serialized size
are basically equivalent to protocol buffers

3. FWIW object creation is quite a bit faster in Thrift than it is in
protocol buffers, at least in this test. I am not sure how important this is
for real world workloads.

4. Deserialization comes out slower in Thrift than in protocol buffers. I
think this may be because the test harness' API matches up directly with
APIs that protocol buffers provides that Thrift does not, causing Thrift to
do more work. I am still investigating this.

Here is the long version:

First, the test is not comparing apples-to-apples when it compares any of
the dynamic serialization systems (JSON, yaml, etc.) with the static
serialization systems (Thrift, protocol buffers).

For example, the JSON serialization and deserialization looks like this:

  public byte[] serialize(MediaContent content) throws Exception
  {
    ByteArrayOutputStream baos = new ByteArrayOutputStream(expectedSize);
    JsonGenerator generator = _factory.createJsonGenerator(baos,
JsonEncoding.UTF8);
    generator.writeStartObject();
    writeMedia(generator, content.getMedia());
    writeImage(generator, content.getImage(0));
    writeImage(generator, content.getImage(1));
    generator.writeEndObject();
    generator.close();
    byte[] array = baos.toByteArray();
    expectedSize = array.length;
    return array;
  }

  public MediaContent deserialize(byte[] array) throws Exception
  {
    JsonParser parser = _factory.createJsonParser(array);
    parser.nextToken(); // start object
    MediaContent mc = new MediaContent(readMedia(parser));
    mc.addImage(readImage(parser));
    mc.addImage(readImage(parser));
    parser.nextToken(); // end object
    parser.close();
    return mc;
  }

Notice that the serializer somehow knows that there are two images to be
written without looping over the object being serialized and that the
deserializer somehow knows that there are two images to be read without any
metadata to carry that information in the serialized form.

The upshot is that, as written, none of the dynamic mechanisms are really
being used in a manner that results in a valid serialization/deserialization
framework and any performance numbers derived from them are invalid.

Furthermore, the data being serialized is not the same. Compare this method
from StdMediaSerializer.java:

    public final MediaContent create() throws Exception
    {
        Media media = new Media(null, "video/mpg4", Media.Player.JAVA,
"Javaone Keynote", "http:
//javaone.com/keynote.mpg", 1234567, 123, 0, 0, 0);
        media.addToPerson("Bill Gates");
        media.addToPerson("Steve Jobs");
       
        Image image1 = new Image(0, "Javaone Keynote", "A", 0,
Image.Size.LARGE);
        Image image2 = new Image(0, "Javaone Keynote", "B", 0,
Image.Size.SMALL);
       
        MediaContent content = new MediaContent(media);
        content.addImage(image1);
        content.addImage(image2);
        return content;
    }

With the same method in ThriftSerializer.java:

  public MediaContent create()
  {
    Media media = new Media();
    media.setUri("http://javaone.com/keynote.mpg";);
    media.setFormat("video/mpg4");
    media.setTitle("Javaone Keynote");
    media.setDuration(1234567);
    media.setBitrate(123);
    media.addToPerson("Bill Gates");
    media.addToPerson("Steve Jobs");
    media.setPlayer(Player.JAVA);

    Image image1 = new Image();
    image1.setUri("http://javaone.com/keynote_large.jpg";);
    image1.setSize(Size.LARGE);
    image1.setTitle("Javaone Keynote");

    Image image2 = new Image("http://javaone.com/keynote_thumbnail.jpg";,
"Javaone Keynote", -1, -1, Size.SMALL);

    MediaContent content = new MediaContent();
    content.setMedia(media);
    content.addToImage(image1);
    content.addToImage(image2);
    return content;
  }

And in ProtobufSerializer.java:
  public MediaContent create()
  {
    MediaContent content = MediaContent.newBuilder().
    setMedia(
     
Media.newBuilder().setUri("http://javaone.com/keynote.mpg";).setFormat("video
/mpg4").setTit
le("Javaone Keynote").setDuration(1234567).
        setBitrate(123).addPerson("Bill Gates").addPerson("Steve
Jobs").setPlayer(Player.JAVA).b
uild()).
    addImage(
     
Image.newBuilder().setUri("http://javaone.com/keynote_large.jpg";).setSize(Si
ze.LARGE).setT
itle("Javaone Keynote").build()).
    addImage(
     
Image.newBuilder().setUri("http://javaone.com/keynote_thumbnail.jpg";).setSiz
e(Size.SMALL).
setTitle("Javaone Keynote").build()).
    build();
    return content;
  }

Note that the URIs for the two images are different (shorter) in
StdMediaSerializer than they are in the case of the other two. And even
between ThriftSerializer and ProtobufSerializer, the data is not identical
-- the ThriftSerializer is passing two -1 values that for image2 that
ProtobufSerializer is not passing.

It is too much effort to fix the dynamic serializers at the moment, so I
decided just to focus on using the code comparing Thrift and protocol
buffers.

I updated to the trunk of Thrift (rev 773454) and changed the Thrift
serializer to use TCompactProtocol instead of TBinaryProtocol. I also
corrected ThriftSerializer's create() so that the same data was being sent
for image2 as in ProtobufSerializer. Finally, I updated the formatting in
BenchmarkRunner and commented out all the serializers except Thrift and
protocol buffers. Here are the benchmark results from 3 consecutive runs:

              ,     Create,        Ser,      Deser,      Total,       Size
thrift        ,     267.37,    8314.00,    8546.00,   17127.36,        220
protobuf      ,     412.98,   12642.00,    5217.50,   18272.48,        217

              ,     Create,        Ser,      Deser,      Total,       Size
thrift        ,     266.87,   10905.50,    8526.50,   19698.86,        220
protobuf      ,     415.21,   11880.50,    4930.00,   17225.71,        217

              ,     Create,        Ser,      Deser,      Total,       Size
thrift        ,     264.95,   11059.50,    8701.50,   20025.95,        220
protobuf      ,     417.45,   11125.00,    5203.50,   16745.95,        217

Note that TCompactProtocol is almost as compact as protocol buffers and
performs slightly better on serialization (although there is clearly some
variability in the timings). Object creation is significantly faster in
Thrift than in protocol buffers although it is unclear whether that is
really important.

Looking a little further at deserialization since Thrift seemed to be
performing worse than protocol buffers there, the problem may be related to
the fact that protocol buffers provides APIs that support direct
serialization to and deserialization from byte arrays which Thrift does not
provide. The test harness is set up such that the output of serialize() and
the input of deserialize() is a byte array, so this means that Thrift needs
to do more work to match up with the test harness. I am still investigating
this.

Chad


----- Original Message ----
From: Sharad Agarwal <[email protected]>
To: [email protected]
Sent: Tuesday, May 12, 2009 5:47:16 AM
Subject: Re: benchmark site: thrift-protobuf-compare

ok. I got a chance to try my hand at it. Ran the benchmark on my dev box. Good 
to see the relative nos:

                        ,   Object create,   Serialization, Deserialization,    
  Total Time, Serialized Size
avro-generic            ,      2453.88000,      5335.50000,      4526.00000,    
 12315.38000,        211
avro-specific           ,      1024.00000,      2912.50000,     10415.00000,    
 14351.50000,        211
protobuf                ,      1196.20000,      8483.00000,      5965.00000,    
 15644.20000,        217
thrift                  ,      1089.70000,      7744.00000,      8796.50000,    
 17630.20000,        314
hessian                 ,      1019.92500,    355739.00000,     49838.50000,    
406597.42500,        463
java                    ,      1018.29500,     26675.50000,     87540.50000,    
115234.29500,        845
java (externalizable)   ,      1028.99000,     10002.50000,     23270.50000,    
 34301.99000,        315


Note that in above results, I have moved the Utf-8 conversion out of the 
create() call. With Utf-8 conversion in create, the object create times are:
                        ,   Object create  
avro-generic            ,      4236.55500  
avro-specific           ,      2982.89000


Serialized size is small as expected. We are doing pretty good in Serialization 
time as well. Deserialization time is relatively high, I think again it is due 
to Utf-8 object creation.

I will post a patch soon to the project.

- Sharad

Reply via email to