On Mon, May 23, 2011 at 7:44 PM, Sanne Grinovero <sanne.grinov...@gmail.com> wrote: > To keep stuff simple, I'd add an alternative feature instead: > have the custom externalizers to optionally recommend an allocation buffer > size. > > In my experience people use a set of well known types for the key, and > maybe for the value as well, for which they actually know the output > byte size, so there's no point in Infinispan to try guessing the size > and then adapting on it; an exception being the often used Strings, > even in composite keys, but again as user of the API I have a pretty > good idea of the size I'm going to need, for each object I store. >
Excellent idea, if the custom externalizer can give us the exact size of the serialized object we wouldn't need to do any guesswork. I'm a bit worried about over-zealous externalizers that will spend just as much computing the size of a complex object graph as they spend on actually serializing the whole thing, but as long as our internal externalizers are good examples I think we're ok. Big plus: we could use the size of the serialized object to estimate the memory usage of each cache entry, so maybe with this we could finally constrain the cache to use a fixed amount of memory :) > Also in MarshalledValue I see that an ExposedByteArrayOutputStream is > created, and after serialization if the buffer is found to be bigger > than the buffer we're referencing a copy is made to create an exact > matching byte[]. > What about revamping the interface there, to expose the > ExposedByteArrayOutputStream instead of byte[], up to the JGroups > level? > > In case the value is not stored in binary form, the expected life of > the stream is very short anyway, after being pushed directly to > network buffers we don't need it anymore... couldn't we pass the > non-truncated stream directly to JGroups without this final size > adjustement ? > > Of course when values are stored in binary form it might make sense to > save some memory, but again if that was an option I'd make use of it; > in case of Lucene I can guess the size with a very good estimate (some > bytes off), compared to buffer sizes of potentially many megabytes > which I'd prefer to avoid copying - especially not interested in it to > safe 2 bytes even if I where to store values in binary. > Yeah, but ExposedByteArrayOutputStream grows by 100% percent, so if you're off by 1 in your size estimate you'll waste 50% of the memory by keeping that buffer around. Even if your estimate is perfect you're still wasting at least 32 bytes on a 64-bit machine: 16 bytes for the buffer object header + 8 for the array reference + 4 (rounded up to 8) for the count, though I guess you could get that down to 4 bytes by keeping the byte[] and count as members of MarshalledValue. Besides, for Lucene couldn't you store the actual data separately as a byte[] so that Infinispan doesn't wrap it in a MarshalledValue? > Then if we just keep the ExposedByteArrayOutputStream around in the > MarshalledValue, we could save some copying by replacing the > "output.write(raw)" in writeObject(ObjectOutput output, > MarshalledValue mv) by using a > output.write( byte[] , offset, length ); > > Cheers, > Sanne > > > 2011/5/23 Bela Ban <b...@redhat.com>: >> >> >> On 5/23/11 6:15 PM, Dan Berindei wrote: >> >>> I totally agree, combining adaptive size with buffer reuse would be >>> really cool. I imagine when passing the buffer to JGroups we'd still >>> make an arraycopy, but we'd get rid of a lot of arraycopy calls to >>> resize the buffer when the average object size is> 500 bytes. At the >>> same time, if a small percentage of the objects are much bigger than >>> the rest, we wouldn't reuse those huge buffers so we wouldn't waste >>> too much memory. >> >> >> From my experience, reusing and syncing on a buffer will be slower than >> making a simple arraycopy. I used to reuse buffers in JGroups, but got >> better perf when I simply copied the buffer. >> Plus the reservoir sampling's complexity is another source of bugs... >> >> -- >> Bela Ban >> Lead JGroups / Clustering Team >> JBoss >> _______________________________________________ >> infinispan-dev mailing list >> infinispan-dev@lists.jboss.org >> https://lists.jboss.org/mailman/listinfo/infinispan-dev >> > > _______________________________________________ > infinispan-dev mailing list > infinispan-dev@lists.jboss.org > https://lists.jboss.org/mailman/listinfo/infinispan-dev _______________________________________________ infinispan-dev mailing list infinispan-dev@lists.jboss.org https://lists.jboss.org/mailman/listinfo/infinispan-dev