codec strategies
I've put together some mock ups of a few different codec strategies both to compare from an API/usability perspective and to get a rough idea of some of the performance implications of the different choices. Please see the attached code for the full details. I'll summarize the different strategies below. The SimpleEncoder is pretty straighforward, the only real point here is to use basic types to represent values and thereby minimize the amount of intermediate memory and CPU required in order to use the codec. The DispatchingDecoder works similarly to a sax style parser. It basically iterates over the encoded content and dispatches values to a handler. The StreamingDecoder is similar to the DispatchingDecoder except instead of an internal "bytecode" loop calling out to a handler, it is externally driven by calling into the decoder. This appears to be marginally slower than the DispatchingDecoder in the particular scenario in the mock up, however it may have some API benefitis, e.g. conversions can be done on demand and it is possible to skip over uninteresting data rather than parsing it. The mock up also includes the same data being encoded/decode using the existing codec (with Clebert's patch). Fair warning, the data I chose to encode/decode is completely arbitrary and not intended to be representative at all. That said, the numbers I'm getting suggest to me that we can do a whole lot better than the current codec if we start with something simple and keep it that way. Here is the output I'm getting for a run with a hundred million iterations: simple encode: 4416 millis dispatching decode: 3049 millis streaming decode: 3243 millis existing encode: 9515 millis existing decode: 13931 millis Another factor to consider is the difficulty in quantifying the impact of generating lots of garbage. In a small benchmark like this there isn't a lot of memory pressure, so extra garbage doesn't have a lot of impact, however in a real application that would translate into increased GC cycles and so might be more of a factor. What I can say from watching memory usage under the profiler is that at any given point there are typically hundreds of megs worth of garbage Integer and UUID instances lying around when the existing codec is running. All of the alternative strategies I've included don't generate any garbage. --Rafael /* * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. * */ package org; import java.util.UUID; import org.apache.qpid.proton.codec.*; import java.nio.*; /** * Codec * */ public class Codec { public static final void main(String[] args) { int loop = 10*1024*1024; if (args.length > 0) { loop = Integer.parseInt(args[0]); } String test = "all"; if (args.length > 1) { test = args[1]; } boolean runDispatching = test.equals("all") || test.equals("dispatching"); boolean runStreaming = test.equals("all") || test.equals("streaming"); boolean runExisting = test.equals("all") || test.equals("existing"); byte[] bytes = new byte[1024]; ByteBuffer buf = ByteBuffer.wrap(bytes); long start, end; if (runDispatching || runStreaming) { start = System.currentTimeMillis(); int size = simpleEncode(bytes, loop); end = System.currentTimeMillis(); time("simple encode", start, end); if (runDispatching) { start = System.currentTimeMillis(); dispatchingDecode(bytes, size, loop); end = System.currentTimeMillis(); time("dispatching decode", start, end); } if (runStreaming) { start = System.currentTimeMillis(); streamingDecode(bytes, size, loop); end = System.currentTimeMillis(); time("streaming decode", start, end); } } if (runExisting) { start = System.currentTimeMi
Re: codec strategies
On Tue, 2014-05-06 at 22:36 -0400, Rafael Schloming wrote: > I've put together some mock ups of a few different codec strategies > both to compare from an API/usability perspective and to get a rough > idea of some of the performance implications of the different choices. > Please see the attached code for the full details. I'll summarize the > different strategies below. > > The SimpleEncoder is pretty straighforward, the only real point here > is to use basic types to represent values and thereby minimize the > amount of intermediate memory and CPU required in order to use the > codec. > > > The DispatchingDecoder works similarly to a sax style parser. It > basically iterates over the encoded content and dispatches values to a > handler. > > > The StreamingDecoder is similar to the DispatchingDecoder except > instead of an internal "bytecode" loop calling out to a handler, it is > externally driven by calling into the decoder. This appears to be > marginally slower than the DispatchingDecoder in the particular > scenario in the mock up, however it may have some API benefitis, e.g. > conversions can be done on demand and it is possible to skip over > uninteresting data rather than parsing it. > > > > The mock up also includes the same data being encoded/decode using the > existing codec (with Clebert's patch). > > > Fair warning, the data I chose to encode/decode is completely > arbitrary and not intended to be representative at all. That said, the > numbers I'm getting suggest to me that we can do a whole lot better > than the current codec if we start with something simple and keep it > that way. Here is the output I'm getting for a run with a hundred > million iterations: > > simple encode: 4416 millis > dispatching decode: 3049 millis > streaming decode: 3243 millis > existing encode: 9515 millis > existing decode: 13931 millis > > I vote for DispatchingDecode: it's the simplest, the fastest and is based on a well established parsing pattern with a good track record for performance (SAX). Its not so hard to ignore data in a handler. Writing a handler state machine is a bit more complex than writing a sequence of calls to a stream API, but I think you could encapsulate most of a standard state machine that given a sequence of type codes will fill a sequence of variables. Not sure about the right way to do that in Java performance-wise. Hmm. That might be worth another performance test though - if you did have such tools for making it easy to build handlers, would those tools introduce a penalty that would make the StreamingDecode look more attractive... > Another factor to consider is the difficulty in quantifying the impact > of generating lots of garbage. In a small benchmark like this there > isn't a lot of memory pressure, so extra garbage doesn't have a lot of > impact, however in a real application that would translate into > increased GC cycles and so might be more of a factor. What I can say > from watching memory usage under the profiler is that at any given > point there are typically hundreds of megs worth of garbage Integer > and UUID instances lying around when the existing codec is running. > All of the alternative strategies I've included don't generate any > garbage. > > > --Rafael > >
Re: codec strategies
On Thu, May 8, 2014 at 9:42 AM, Alan Conway wrote: > I vote for DispatchingDecode: it's the simplest, the fastest and is > based on a well established parsing pattern with a good track record for > performance (SAX). Its not so hard to ignore data in a handler. > > Writing a handler state machine is a bit more complex than writing a > sequence of calls to a stream API, but I think you could encapsulate > most of a standard state machine that given a sequence of type codes > will fill a sequence of variables. Not sure about the right way to do > that in Java performance-wise. > > Hmm. That might be worth another performance test though - if you did > have such tools for making it easy to build handlers, would those tools > introduce a penalty that would make the StreamingDecode look more > attractive... > The biggest difference from an API perspective has to do with data conversions/coersion. Say you're writing a piece of Java code that wants to operate on Java integer or a Java float and doesn't care what the underlying wire type is so long as it can be reasonable converted to that type. In a stream style API you would simple write: int i = decoder.getInt(); float f = decoder.getFloat(); The decoder implementation itself can the be smart enough to convert whatever underlying wire type there might be into the appropriate java type. The SAX API on the other hand will have a distinct callback for byte vs ubyte vs short vs ushort, etc, and it could be quite cumbersome to convert all the different possibilities into the type you actually want to operate on. Put another way, the stream style API is capable of incorporating the desired output type of the user, whereas the SAX style API is not. It might be possible to provide some kind of coercing handler that would help the SAX situation. As you say though it's probably worth checking that something like that would be usable and not introduce its own penalties. --Rafael
Re: codec strategies
Hi Raf, I suggest you re-measure codec times using System.nanotime which should give you more accurate results for benchmarking. Please note that on windows platforms the accuracy is much worse than POSIX platforms. Thanks, Alex Kritikos Software AG On 7 Μαϊ 2014, at 5:36 π.μ., Rafael Schloming wrote: > I've put together some mock ups of a few different codec strategies both to > compare from an API/usability perspective and to get a rough idea of some of > the performance implications of the different choices. Please see the > attached code for the full details. I'll summarize the different strategies > below. > > The SimpleEncoder is pretty straighforward, the only real point here is to > use basic types to represent values and thereby minimize the amount of > intermediate memory and CPU required in order to use the codec. > > The DispatchingDecoder works similarly to a sax style parser. It basically > iterates over the encoded content and dispatches values to a handler. > > The StreamingDecoder is similar to the DispatchingDecoder except instead of > an internal "bytecode" loop calling out to a handler, it is externally driven > by calling into the decoder. This appears to be marginally slower than the > DispatchingDecoder in the particular scenario in the mock up, however it may > have some API benefitis, e.g. conversions can be done on demand and it is > possible to skip over uninteresting data rather than parsing it. > > The mock up also includes the same data being encoded/decode using the > existing codec (with Clebert's patch). > > Fair warning, the data I chose to encode/decode is completely arbitrary and > not intended to be representative at all. That said, the numbers I'm getting > suggest to me that we can do a whole lot better than the current codec if we > start with something simple and keep it that way. Here is the output I'm > getting for a run with a hundred million iterations: > > simple encode: 4416 millis > dispatching decode: 3049 millis > streaming decode: 3243 millis > existing encode: 9515 millis > existing decode: 13931 millis > > Another factor to consider is the difficulty in quantifying the impact of > generating lots of garbage. In a small benchmark like this there isn't a lot > of memory pressure, so extra garbage doesn't have a lot of impact, however in > a real application that would translate into increased GC cycles and so might > be more of a factor. What I can say from watching memory usage under the > profiler is that at any given point there are typically hundreds of megs > worth of garbage Integer and UUID instances lying around when the existing > codec is running. All of the alternative strategies I've included don't > generate any garbage. > > --Rafael > > This communication contains information which is confidential and may also be privileged. It is for the exclusive use of the intended recipient(s). If you are not the intended recipient(s), please note that any distribution, copying, or use of this communication or the information in it, is strictly prohibited. If you have received this communication in error please notify us by e-mail and then delete the e-mail and any copies of it. Software AG (UK) Limited Registered in England & Wales 1310740 - http://www.softwareag.com/uk
codec strategies
I've put together some mock ups of a few different codec strategies both to compare from an API/usability perspective and to get a rough idea of some of the performance implications of the different choices. Please see the attached code for the full details. I'll summarize the different strategies below. The SimpleEncoder is pretty straighforward, the only real point here is to use basic types to represent values and thereby minimize the amount of intermediate memory and CPU required in order to use the codec. The DispatchingDecoder works similarly to a sax style parser. It basically iterates over the encoded content and dispatches values to a handler. The StreamingDecoder is similar to the DispatchingDecoder except instead of an internal "bytecode" loop calling out to a handler, it is externally driven by calling into the decoder. This appears to be marginally slower than the DispatchingDecoder in the particular scenario in the mock up, however it may have some API benefitis, e.g. conversions can be done on demand and it is possible to skip over uninteresting data rather than parsing it. The mock up also includes the same data being encoded/decode using the existing codec (with Clebert's patch). Fair warning, the data I chose to encode/decode is completely arbitrary and not intended to be representative at all. That said, the numbers I'm getting suggest to me that we can do a whole lot better than the current codec if we start with something simple and keep it that way. Here is the output I'm getting for a run with a hundred million iterations: simple encode: 4416 millis dispatching decode: 3049 millis streaming decode: 3243 millis existing encode: 9515 millis existing decode: 13931 millis Another factor to consider is the difficulty in quantifying the impact of generating lots of garbage. In a small benchmark like this there isn't a lot of memory pressure, so extra garbage doesn't have a lot of impact, however in a real application that would translate into increased GC cycles and so might be more of a factor. What I can say from watching memory usage under the profiler is that at any given point there are typically hundreds of megs worth of garbage Integer and UUID instances lying around when the existing codec is running. All of the alternative strategies I've included don't generate any garbage. --Rafael /* * * Licensed to the Apache Software Foundation (ASF) under one * or more contributor license agreements. See the NOTICE file * distributed with this work for additional information * regarding copyright ownership. The ASF licenses this file * to you under the Apache License, Version 2.0 (the * "License"); you may not use this file except in compliance * with the License. You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, * software distributed under the License is distributed on an * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY * KIND, either express or implied. See the License for the * specific language governing permissions and limitations * under the License. * */ package org; import java.util.UUID; import org.apache.qpid.proton.codec.*; import java.nio.*; /** * Codec * */ public class Codec { public static final void main(String[] args) { int loop = 10*1024*1024; if (args.length > 0) { loop = Integer.parseInt(args[0]); } String test = "all"; if (args.length > 1) { test = args[1]; } boolean runDispatching = test.equals("all") || test.equals("dispatching"); boolean runStreaming = test.equals("all") || test.equals("streaming"); boolean runExisting = test.equals("all") || test.equals("existing"); byte[] bytes = new byte[1024]; ByteBuffer buf = ByteBuffer.wrap(bytes); long start, end; if (runDispatching || runStreaming) { start = System.currentTimeMillis(); int size = simpleEncode(bytes, loop); end = System.currentTimeMillis(); time("simple encode", start, end); if (runDispatching) { start = System.currentTimeMillis(); dispatchingDecode(bytes, size, loop); end = System.currentTimeMillis(); time("dispatching decode", start, end); } if (runStreaming) { start = System.currentTimeMillis(); streamingDecode(bytes, size, loop); end = System.currentTimeMillis(); time("streaming decode", start, end); } } if (runExisting) { start = System.currentTimeMi