Re: FlatBuffers, ByteBuffers, and escape analysis

2018-08-08 Thread Todd Lipcon
I tried one more approach using an interface with the appropriate
'getBytes', etc, methods, but unfortunately its allocation doesn't seem to
be elided either:
https://gist.github.com/885d36cc5e0097c4628f454d3deb23a6

MyBenchmark.testWithByteBuffer
 thrpt2  164225320.729   ops/s
MyBenchmark.testWithByteBuffer:·gc.alloc.rate.norm
 thrpt2 56.000B/op
MyBenchmark.testWithBytes
  thrpt2  289869913.686   ops/s
MyBenchmark.testWithBytes:·gc.alloc.rate.norm
  thrpt2 ≈ 10⁻⁷B/op
MyBenchmark.testWithBytesWrappedInAccessor
 thrpt2  202213942.822   ops/s
MyBenchmark.testWithBytesWrappedInAccessor:·gc.alloc.rate.norm
 thrpt2 24.000B/op
MyBenchmark.testWithBytesWrappedInThreadLocalAccessor
thrpt2  183097145.922   ops/s
MyBenchmark.testWithBytesWrappedInThreadLocalAccessor:·gc.alloc.rate
 thrpt2 ≈ 10⁻⁴  MB/sec

So while my little byte-array-wrapper is smaller than ByteBuffer (and
faster), it still isn't allocation-free. Using a threadlocal can eliminate
the allocation but gives up a bit of performance.

So, does anyone have a clever idea to get the same performance as directly
passing the byte array, but without any allocation, and in such a way that
Java8 is supported? (clearly I could just locally hack the generator to
only support byte[] and not ByteBuffer, but would prefer to contribute a
change back to the flatbuffers project that can maintain back-compatibility
as well). I suppose storing an Object and using 'instanceof' checks is an
option, though makes me sad.





On Wed, Aug 8, 2018 at 8:35 AM Todd Lipcon  wrote:

> Thanks Gil. Unfortunately I'm stuck on Java 8 for now. And it sounds like
> I'll have to modify the flat buffers code generation either way to get rid
> of the byte buffer and replace it at least with some interface that could
> wrap a bytebuffer, unsafe, varhandle, etc.
>
> Todd
>
> On Tue, Aug 7, 2018, 11:45 PM Gil Tene  wrote:
>
>> Oh, and there is MethodHandles.byteBufferViewVarHandle
>> 
>> if you (for some reason) want to do the same but keep ByteBuffers around.
>>
>> On Tuesday, August 7, 2018 at 9:41:01 PM UTC-7, Gil Tene wrote:
>>>
>>> *IF* you can use post-java-8 stuff, VarHandles may have a more systemic
>>> and intentional/explicit answer for expressing what you are trying to do
>>> here, without resorting to Unsafe. Specifically, using a
>>> MethodHandles.byteArrayViewVarHandle
>>> ()
>>> that you would get once (statically), you should be able to peek into your
>>> many different byte[] instances and extract a field of a different
>>> primitive type (int, long, etc.) at some arbitrary index, without having to
>>> wrap it up in the super-short-lived ByteBuffer in your example, and hope
>>> for Escape analysis to take care of it...
>>>
>>> Here is a code example that does the same wrapping you were looking to
>>> do, using VarHandles:
>>>
>>> import java.lang.invoke.MethodHandles;
>>> import java.lang.invoke.VarHandle;
>>> import java.nio.ByteOrder;
>>>
>>>
>>> public class VarHandleExample {
>>>
>>> static final byte[] bytes = {0x02, 0x00, (byte) 0xbe, (byte) 0xba, (
>>> byte) 0xfe, (byte) 0xca};
>>>
>>> private static class FileDesc {
>>> static final VarHandle VH_intArrayView = MethodHandles.
>>> byteArrayViewVarHandle(int[].class, ByteOrder.LITTLE_ENDIAN);
>>> static final VarHandle VH_shortArrayView = MethodHandles.
>>> byteArrayViewVarHandle(short[].class, ByteOrder.LITTLE_ENDIAN);
>>> private final byte[] buf;
>>> int bufPos;
>>>
>>> FileDesc(byte[] buf, int headerPosition) {
>>> bufPos = ((short) VH_shortArrayView.get(buf, headerPosition
>>> )) + headerPosition;
>>> this.buf = buf;
>>> }
>>>
>>> public int getVal() {
>>> return (int) VH_intArrayView.get(buf, bufPos);
>>> }
>>> }
>>>
>>>
>>> public static void main(String[] args) {
>>> FileDesc fd = new FileDesc(bytes, 0);
>>> System.out.format("The int we get from fd.get() is: 0x%x\n", fd.
>>> getVal());
>>> }
>>> }
>>>
>>> Running this results in the probably correct output of:
>>>
>>> The int we get from fd.get() is: *0xcafebabe*
>>>
>>> Which means that the byte offset reading in the backing byte[], using
>>> little endian, and even at not-4-byte-offset-aligned locations, seems to
>>> work.
>>>
>>> NOTE: I have NOT examined what it looks like in generated code, beyond
>>> verifying that everything seems to get inlined, but as stated, the code
>>> would not incur an allocation or need an intermediate object per buffer
>>> instance.
>>>
>>> Now, since this only works in Java9+, you coul

Re: FlatBuffers, ByteBuffers, and escape analysis

2018-08-08 Thread Todd Lipcon
Thanks Gil. Unfortunately I'm stuck on Java 8 for now. And it sounds like
I'll have to modify the flat buffers code generation either way to get rid
of the byte buffer and replace it at least with some interface that could
wrap a bytebuffer, unsafe, varhandle, etc.

Todd

On Tue, Aug 7, 2018, 11:45 PM Gil Tene  wrote:

> Oh, and there is MethodHandles.byteBufferViewVarHandle
> 
> if you (for some reason) want to do the same but keep ByteBuffers around.
>
> On Tuesday, August 7, 2018 at 9:41:01 PM UTC-7, Gil Tene wrote:
>>
>> *IF* you can use post-java-8 stuff, VarHandles may have a more systemic
>> and intentional/explicit answer for expressing what you are trying to do
>> here, without resorting to Unsafe. Specifically, using a
>> MethodHandles.byteArrayViewVarHandle
>> ()
>> that you would get once (statically), you should be able to peek into your
>> many different byte[] instances and extract a field of a different
>> primitive type (int, long, etc.) at some arbitrary index, without having to
>> wrap it up in the super-short-lived ByteBuffer in your example, and hope
>> for Escape analysis to take care of it...
>>
>> Here is a code example that does the same wrapping you were looking to
>> do, using VarHandles:
>>
>> import java.lang.invoke.MethodHandles;
>> import java.lang.invoke.VarHandle;
>> import java.nio.ByteOrder;
>>
>>
>> public class VarHandleExample {
>>
>> static final byte[] bytes = {0x02, 0x00, (byte) 0xbe, (byte) 0xba, (
>> byte) 0xfe, (byte) 0xca};
>>
>> private static class FileDesc {
>> static final VarHandle VH_intArrayView = MethodHandles.
>> byteArrayViewVarHandle(int[].class, ByteOrder.LITTLE_ENDIAN);
>> static final VarHandle VH_shortArrayView = MethodHandles.
>> byteArrayViewVarHandle(short[].class, ByteOrder.LITTLE_ENDIAN);
>> private final byte[] buf;
>> int bufPos;
>>
>> FileDesc(byte[] buf, int headerPosition) {
>> bufPos = ((short) VH_shortArrayView.get(buf, headerPosition))
>> + headerPosition;
>> this.buf = buf;
>> }
>>
>> public int getVal() {
>> return (int) VH_intArrayView.get(buf, bufPos);
>> }
>> }
>>
>>
>> public static void main(String[] args) {
>> FileDesc fd = new FileDesc(bytes, 0);
>> System.out.format("The int we get from fd.get() is: 0x%x\n", fd.
>> getVal());
>> }
>> }
>>
>> Running this results in the probably correct output of:
>>
>> The int we get from fd.get() is: *0xcafebabe*
>>
>> Which means that the byte offset reading in the backing byte[], using
>> little endian, and even at not-4-byte-offset-aligned locations, seems to
>> work.
>>
>> NOTE: I have NOT examined what it looks like in generated code, beyond
>> verifying that everything seems to get inlined, but as stated, the code
>> would not incur an allocation or need an intermediate object per buffer
>> instance.
>>
>> Now, since this only works in Java9+, you could code it that way for
>> those versions, and revert to the Unsafe equivalent for Java 8-. You could
>> even convert the code above to code that dynamically uses VarHandle (when
>> available) without requiring javac to know anything about them (using
>> reflection and MethodHandles), and uses Usafe only if VarHandle is not
>> supported. Ugly ProtableVarHandleExample that does that (and would run on
>> Java 7...10) *might* follow...
>>
>> On Tuesday, August 7, 2018 at 1:55:35 PM UTC-7, Todd Lipcon wrote:
>>>
>>> Hey folks,
>>>
>>> I'm working on reducing heap usage of a big server application that
>>> currently holds on to tens of millions of generated FlatBuffer instances in
>>> the old generation. Each such instance looks more or less like this:
>>>
>>> private static class FileDesc {
>>>   private final ByteBuffer bb;
>>>   int bbPos;
>>>
>>>   FileDesc(ByteBuffer bb) {
>>> bbPos = bb.getShort(bb.position()) + bb.position();
>>> this.bb = bb;
>>>   }
>>>
>>>   public int getVal() {
>>> return bb.getInt(bbPos);
>>>   }
>>> }
>>>
>>> (I've simplified the code, but the important bit is the ByteBuffer
>>> member and the fact that it provides nice accessors which read data from
>>> various parts of the buffer)
>>>
>>> Unfortunately, the heap usage of these buffers adds up quite a bit --
>>> each ByteBuffer takes 56 bytes of heap, and each 'FileDesc' takes 32 bytes
>>> after padding. The underlying buffers themselves are typically on the order
>>> of 100 bytes, so it seems like almost 50% of the heap is being used by
>>> wrapper objects instead of the underlying data itself. Additionally, 2/3 of
>>> the object count are overhead, which I imagine contributes to GC
>>> scanning/mar