[
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Shai Erera updated LUCENE-3867:
-------------------------------
Attachment: LUCENE-3867.patch
Thanks Uwe !
I ran the test, and now with both J9 (IBM) and Oracle, I get this print
(without enabling any flag):
{code}
[junit] NOTE: running test testReferenceSize
[junit] NOTE: This JVM is 64bit: true
[junit] NOTE: Reference size in this JVM: 8
{code}
* I modified the test name to testReferenceSize (was testCompressedOops).
I wrote this small test to print the differences between sizeOf(String) and
estimateRamUsage(String):
{code}
public void testSizeOfString() throws Exception {
String s = "abcdefgkjdfkdsjdskljfdskfjdsf";
String sub = s.substring(0, 4);
System.out.println("original=" + RamUsageEstimator.sizeOf(s));
System.out.println("sub=" + RamUsageEstimator.sizeOf(sub));
System.out.println("checkInterned=true(orig): " + new
RamUsageEstimator().estimateRamUsage(s));
System.out.println("checkInterned=false(orig): " + new
RamUsageEstimator(false).estimateRamUsage(s));
System.out.println("checkInterned=false(sub): " + new
RamUsageEstimator(false).estimateRamUsage(sub));
}
{code}
It prints:
{code}
original=104
sub=56
checkInterned=true(orig): 0
checkInterned=false(orig): 98
checkInterned=false(sub): 98
{code}
So clearly estimateRamUsage factors in the sub-string's larger char[]. The
difference in sizes of 'orig' stem from AverageGuessMemoryModel which computes
the reference size to be 4 (hardcoded), and array size to be 16 (hardcoded). I
modified AverageGuess to use constants from RUE (they are best guesses
themselves). Still the test prints a difference, but now I think it's because
sizeOf(String) aligns the size to mod 8, while estimateRamUsage isn't. I fixed
that in size(Object), and now the prints are the same.
* I also fixed sizeOfArray -- if the array.length == 0, it returned 0, but it
should return its header, and aligned to mod 8 as well.
* I modified sizeOf(String[]) to sizeOf(Object[]) and compute its raw size
only. I started to add sizeOf(String), fastSizeOf(String) and
deepSizeOf(String[]), but reverted to avoid the hassle -- the documentation
confuses even me :).
* Changed all sizeOf() to return long, and align() to take and return long.
I think this is ready to commit, though I'd appreciate a second look on the
MemoryModel and size(Obj) changes.
Also, how about renaming MemoryModel methods to: arrayHeaderSize(),
classHeaderSize(), objReferenceSize() to make them more clear and accurate? For
instance, getArraySize does not return the size of an array, but its object
header ...
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
> -----------------------------------------------------
>
> Key: LUCENE-3867
> URL: https://issues.apache.org/jira/browse/LUCENE-3867
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Trivial
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch,
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that:
> NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The
> NUM_BYTES_OBJECT_REF part should not be included, at least not according to
> this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the
> usual object header. However, this object head is 12 bytes to accommodate a
> four-byte array length. Then comes the actual array data which, as you might
> expect, consists of the number of elements multiplied by the number of bytes
> required for one element, depending on its type. The memory usage for one
> element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel
> about including such helper methods in RUE, as static, stateless, methods?
> It's not perfect, there's some room for improvement I'm sure, here it is:
> {code}
> /**
> * Computes the approximate size of a String object. Note that if this
> object
> * is also referenced by another object, you should add
> * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
> * method.
> */
> public static int sizeOf(String str) {
> return 2 * str.length() + 6 // chars + additional safeness for
> arrays alignment
> + 3 * RamUsageEstimator.NUM_BYTES_INT // String
> maintains 3 integers
> + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER //
> char[] array
> + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; //
> String object
> }
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] /
> long[] / double[] ... and String[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]