To keep stuff simple, I'd add an alternative feature instead: have the custom externalizers to optionally recommend an allocation buffer size.
In my experience people use a set of well known types for the key, and maybe for the value as well, for which they actually know the output byte size, so there's no point in Infinispan to try guessing the size and then adapting on it; an exception being the often used Strings, even in composite keys, but again as user of the API I have a pretty good idea of the size I'm going to need, for each object I store. Also in MarshalledValue I see that an ExposedByteArrayOutputStream is created, and after serialization if the buffer is found to be bigger than the buffer we're referencing a copy is made to create an exact matching byte[]. What about revamping the interface there, to expose the ExposedByteArrayOutputStream instead of byte[], up to the JGroups level? In case the value is not stored in binary form, the expected life of the stream is very short anyway, after being pushed directly to network buffers we don't need it anymore... couldn't we pass the non-truncated stream directly to JGroups without this final size adjustement ? Of course when values are stored in binary form it might make sense to save some memory, but again if that was an option I'd make use of it; in case of Lucene I can guess the size with a very good estimate (some bytes off), compared to buffer sizes of potentially many megabytes which I'd prefer to avoid copying - especially not interested in it to safe 2 bytes even if I where to store values in binary. Then if we just keep the ExposedByteArrayOutputStream around in the MarshalledValue, we could save some copying by replacing the "output.write(raw)" in writeObject(ObjectOutput output, MarshalledValue mv) by using a output.write( byte[] , offset, length ); Cheers, Sanne 2011/5/23 Bela Ban <b...@redhat.com>: > > > On 5/23/11 6:15 PM, Dan Berindei wrote: > >> I totally agree, combining adaptive size with buffer reuse would be >> really cool. I imagine when passing the buffer to JGroups we'd still >> make an arraycopy, but we'd get rid of a lot of arraycopy calls to >> resize the buffer when the average object size is> 500 bytes. At the >> same time, if a small percentage of the objects are much bigger than >> the rest, we wouldn't reuse those huge buffers so we wouldn't waste >> too much memory. > > > From my experience, reusing and syncing on a buffer will be slower than > making a simple arraycopy. I used to reuse buffers in JGroups, but got > better perf when I simply copied the buffer. > Plus the reservoir sampling's complexity is another source of bugs... > > -- > Bela Ban > Lead JGroups / Clustering Team > JBoss > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev > _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev