codec strategies

2014-05-15 Thread Rafael Schloming
I've put together some mock ups of a few different codec strategies both to
compare from an API/usability perspective and to get a rough idea of some
of the performance implications of the different choices. Please see the
attached code for the full details. I'll summarize the different strategies
below.

The SimpleEncoder is pretty straighforward, the only real point here is to
use basic types to represent values and thereby minimize the amount of
intermediate memory and CPU required in order to use the codec.

The DispatchingDecoder works similarly to a sax style parser. It basically
iterates over the encoded content and dispatches values to a handler.

The StreamingDecoder is similar to the DispatchingDecoder except instead of
an internal "bytecode" loop calling out to a handler, it is externally
driven by calling into the decoder. This appears to be marginally slower
than the DispatchingDecoder in the particular scenario in the mock up,
however it may have some API benefitis, e.g. conversions can be done on
demand and it is possible to skip over uninteresting data rather than
parsing it.

The mock up also includes the same data being encoded/decode using the
existing codec (with Clebert's patch).

Fair warning, the data I chose to encode/decode is completely arbitrary and
not intended to be representative at all. That said, the numbers I'm
getting suggest to me that we can do a whole lot better than the current
codec if we start with something simple and keep it that way. Here is the
output I'm getting for a run with a hundred million iterations:

  simple encode: 4416 millis
  dispatching decode: 3049 millis
  streaming decode: 3243 millis
  existing encode: 9515 millis
  existing decode: 13931 millis

Another factor to consider is the difficulty in quantifying the impact of
generating lots of garbage. In a small benchmark like this there isn't a
lot of memory pressure, so extra garbage doesn't have a lot of impact,
however in a real application that would translate into increased GC cycles
and so might be more of a factor. What I can say from watching memory usage
under the profiler is that at any given point there are typically hundreds
of megs worth of garbage Integer and UUID instances lying around when the
existing codec is running. All of the alternative strategies I've included
don't generate any garbage.

--Rafael
/*
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 *
 */
package org;

import java.util.UUID;
import org.apache.qpid.proton.codec.*;
import java.nio.*;

/**
 * Codec
 *
 */

public class Codec
{

public static final void main(String[] args) {
int loop = 10*1024*1024;
if (args.length > 0) {
loop = Integer.parseInt(args[0]);
}

String test = "all";
if (args.length > 1) {
test = args[1];
}

boolean runDispatching =
test.equals("all") || test.equals("dispatching");
boolean runStreaming = test.equals("all") || test.equals("streaming");
boolean runExisting =
test.equals("all") || test.equals("existing");

byte[] bytes = new byte[1024];
ByteBuffer buf = ByteBuffer.wrap(bytes);

long start, end;

if (runDispatching || runStreaming) {
start = System.currentTimeMillis();
int size = simpleEncode(bytes, loop);
end = System.currentTimeMillis();
time("simple encode", start, end);

if (runDispatching) {
start = System.currentTimeMillis();
dispatchingDecode(bytes, size, loop);
end = System.currentTimeMillis();
time("dispatching decode", start, end);
}

if (runStreaming) {
start = System.currentTimeMillis();
streamingDecode(bytes, size, loop);
end = System.currentTimeMillis();
time("streaming decode", start, end);
}
}

if (runExisting) {
start = System.currentTimeMi

Re: codec strategies

2014-05-15 Thread Alan Conway
On Tue, 2014-05-06 at 22:36 -0400, Rafael Schloming wrote:
> I've put together some mock ups of a few different codec strategies
> both to compare from an API/usability perspective and to get a rough
> idea of some of the performance implications of the different choices.
> Please see the attached code for the full details. I'll summarize the
> different strategies below.
> 
> The SimpleEncoder is pretty straighforward, the only real point here
> is to use basic types to represent values and thereby minimize the
> amount of intermediate memory and CPU required in order to use the
> codec.
> 
> 
> The DispatchingDecoder works similarly to a sax style parser. It
> basically iterates over the encoded content and dispatches values to a
> handler.
> 
> 
> The StreamingDecoder is similar to the DispatchingDecoder except
> instead of an internal "bytecode" loop calling out to a handler, it is
> externally driven by calling into the decoder. This appears to be
> marginally slower than the DispatchingDecoder in the particular
> scenario in the mock up, however it may have some API benefitis, e.g.
> conversions can be done on demand and it is possible to skip over
> uninteresting data rather than parsing it.
> 
> 
> 
> The mock up also includes the same data being encoded/decode using the
> existing codec (with Clebert's patch).
> 
> 
> Fair warning, the data I chose to encode/decode is completely
> arbitrary and not intended to be representative at all. That said, the
> numbers I'm getting suggest to me that we can do a whole lot better
> than the current codec if we start with something simple and keep it
> that way. Here is the output I'm getting for a run with a hundred
> million iterations:
> 
>   simple encode: 4416 millis
>   dispatching decode: 3049 millis
>   streaming decode: 3243 millis
>   existing encode: 9515 millis
>   existing decode: 13931 millis
> 
> 

I vote for DispatchingDecode: it's the simplest, the fastest and is
based on a well established parsing pattern with a good track record for
performance (SAX). Its not so hard to ignore data in a handler.

Writing a handler state machine is a bit more complex than writing a
sequence of calls to a stream API, but I think you could encapsulate
most of a standard state machine that given a sequence of type codes
will fill a sequence of variables. Not sure about the right way to do
that in Java performance-wise.

Hmm. That might be worth another performance test though - if you did
have such tools for making it easy to build handlers, would those tools
introduce a penalty that would make the StreamingDecode look more
attractive...

> Another factor to consider is the difficulty in quantifying the impact
> of generating lots of garbage. In a small benchmark like this there
> isn't a lot of memory pressure, so extra garbage doesn't have a lot of
> impact, however in a real application that would translate into
> increased GC cycles and so might be more of a factor. What I can say
> from watching memory usage under the profiler is that at any given
> point there are typically hundreds of megs worth of garbage Integer
> and UUID instances lying around when the existing codec is running.
> All of the alternative strategies I've included don't generate any
> garbage.
> 
> 
> --Rafael
> 
> 




Re: codec strategies

2014-05-13 Thread Rafael Schloming
On Thu, May 8, 2014 at 9:42 AM, Alan Conway  wrote:

> I vote for DispatchingDecode: it's the simplest, the fastest and is
> based on a well established parsing pattern with a good track record for
> performance (SAX). Its not so hard to ignore data in a handler.
>
> Writing a handler state machine is a bit more complex than writing a
> sequence of calls to a stream API, but I think you could encapsulate
> most of a standard state machine that given a sequence of type codes
> will fill a sequence of variables. Not sure about the right way to do
> that in Java performance-wise.
>
> Hmm. That might be worth another performance test though - if you did
> have such tools for making it easy to build handlers, would those tools
> introduce a penalty that would make the StreamingDecode look more
> attractive...
>

The biggest difference from an API perspective has to do with data
conversions/coersion. Say you're writing a piece of Java code that wants to
operate on Java integer or a Java float and doesn't care what the
underlying wire type is so long as it can be reasonable converted to that
type. In a stream style API you would simple write:

  int i = decoder.getInt();
  float f = decoder.getFloat();

The decoder implementation itself can the be smart enough to convert
whatever underlying wire type there might be into the appropriate java
type. The SAX API on the other hand will have a distinct callback for byte
vs ubyte vs short vs ushort, etc, and it could be quite cumbersome to
convert all the different possibilities into the type you actually want to
operate on. Put another way, the stream style API is capable of
incorporating the desired output type of the user, whereas the SAX style
API is not.

It might be possible to provide some kind of coercing handler that would
help the SAX situation. As you say though it's probably worth checking that
something like that would be usable and not introduce its own penalties.

--Rafael


Re: codec strategies

2014-05-12 Thread Kritikos, Alex
Hi Raf,

I suggest you re-measure codec times using System.nanotime which should give 
you more accurate results for benchmarking. Please note that on windows 
platforms the accuracy is much worse than POSIX platforms.

Thanks,

Alex Kritikos
Software AG

On 7 Μαϊ 2014, at 5:36 π.μ., Rafael Schloming  wrote:

> I've put together some mock ups of a few different codec strategies both to 
> compare from an API/usability perspective and to get a rough idea of some of 
> the performance implications of the different choices. Please see the 
> attached code for the full details. I'll summarize the different strategies 
> below.
>
> The SimpleEncoder is pretty straighforward, the only real point here is to 
> use basic types to represent values and thereby minimize the amount of 
> intermediate memory and CPU required in order to use the codec.
>
> The DispatchingDecoder works similarly to a sax style parser. It basically 
> iterates over the encoded content and dispatches values to a handler.
>
> The StreamingDecoder is similar to the DispatchingDecoder except instead of 
> an internal "bytecode" loop calling out to a handler, it is externally driven 
> by calling into the decoder. This appears to be marginally slower than the 
> DispatchingDecoder in the particular scenario in the mock up, however it may 
> have some API benefitis, e.g. conversions can be done on demand and it is 
> possible to skip over uninteresting data rather than parsing it.
>
> The mock up also includes the same data being encoded/decode using the 
> existing codec (with Clebert's patch).
>
> Fair warning, the data I chose to encode/decode is completely arbitrary and 
> not intended to be representative at all. That said, the numbers I'm getting 
> suggest to me that we can do a whole lot better than the current codec if we 
> start with something simple and keep it that way. Here is the output I'm 
> getting for a run with a hundred million iterations:
>
>   simple encode: 4416 millis
>   dispatching decode: 3049 millis
>   streaming decode: 3243 millis
>   existing encode: 9515 millis
>   existing decode: 13931 millis
>
> Another factor to consider is the difficulty in quantifying the impact of 
> generating lots of garbage. In a small benchmark like this there isn't a lot 
> of memory pressure, so extra garbage doesn't have a lot of impact, however in 
> a real application that would translate into increased GC cycles and so might 
> be more of a factor. What I can say from watching memory usage under the 
> profiler is that at any given point there are typically hundreds of megs 
> worth of garbage Integer and UUID instances lying around when the existing 
> codec is running. All of the alternative strategies I've included don't 
> generate any garbage.
>
> --Rafael
>
> 

This communication contains information which is confidential and may also be 
privileged. It is for the exclusive use of the intended recipient(s). If you 
are not the intended recipient(s), please note that any distribution, copying, 
or use of this communication or the information in it, is strictly prohibited. 
If you have received this communication in error please notify us by e-mail and 
then delete the e-mail and any copies of it.
Software AG (UK) Limited Registered in England & Wales 1310740 - 
http://www.softwareag.com/uk



codec strategies

2014-05-07 Thread Rafael Schloming
I've put together some mock ups of a few different codec strategies both to
compare from an API/usability perspective and to get a rough idea of some
of the performance implications of the different choices. Please see the
attached code for the full details. I'll summarize the different strategies
below.

The SimpleEncoder is pretty straighforward, the only real point here is to
use basic types to represent values and thereby minimize the amount of
intermediate memory and CPU required in order to use the codec.

The DispatchingDecoder works similarly to a sax style parser. It basically
iterates over the encoded content and dispatches values to a handler.

The StreamingDecoder is similar to the DispatchingDecoder except instead of
an internal "bytecode" loop calling out to a handler, it is externally
driven by calling into the decoder. This appears to be marginally slower
than the DispatchingDecoder in the particular scenario in the mock up,
however it may have some API benefitis, e.g. conversions can be done on
demand and it is possible to skip over uninteresting data rather than
parsing it.

The mock up also includes the same data being encoded/decode using the
existing codec (with Clebert's patch).

Fair warning, the data I chose to encode/decode is completely arbitrary and
not intended to be representative at all. That said, the numbers I'm
getting suggest to me that we can do a whole lot better than the current
codec if we start with something simple and keep it that way. Here is the
output I'm getting for a run with a hundred million iterations:

  simple encode: 4416 millis
  dispatching decode: 3049 millis
  streaming decode: 3243 millis
  existing encode: 9515 millis
  existing decode: 13931 millis

Another factor to consider is the difficulty in quantifying the impact of
generating lots of garbage. In a small benchmark like this there isn't a
lot of memory pressure, so extra garbage doesn't have a lot of impact,
however in a real application that would translate into increased GC cycles
and so might be more of a factor. What I can say from watching memory usage
under the profiler is that at any given point there are typically hundreds
of megs worth of garbage Integer and UUID instances lying around when the
existing codec is running. All of the alternative strategies I've included
don't generate any garbage.

--Rafael
/*
 *
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *   http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 *
 */
package org;

import java.util.UUID;
import org.apache.qpid.proton.codec.*;
import java.nio.*;

/**
 * Codec
 *
 */

public class Codec
{

public static final void main(String[] args) {
int loop = 10*1024*1024;
if (args.length > 0) {
loop = Integer.parseInt(args[0]);
}

String test = "all";
if (args.length > 1) {
test = args[1];
}

boolean runDispatching =
test.equals("all") || test.equals("dispatching");
boolean runStreaming = test.equals("all") || test.equals("streaming");
boolean runExisting =
test.equals("all") || test.equals("existing");

byte[] bytes = new byte[1024];
ByteBuffer buf = ByteBuffer.wrap(bytes);

long start, end;

if (runDispatching || runStreaming) {
start = System.currentTimeMillis();
int size = simpleEncode(bytes, loop);
end = System.currentTimeMillis();
time("simple encode", start, end);

if (runDispatching) {
start = System.currentTimeMillis();
dispatchingDecode(bytes, size, loop);
end = System.currentTimeMillis();
time("dispatching decode", start, end);
}

if (runStreaming) {
start = System.currentTimeMillis();
streamingDecode(bytes, size, loop);
end = System.currentTimeMillis();
time("streaming decode", start, end);
}
}

if (runExisting) {
start = System.currentTimeMi