[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-3867:
-------------------------------

    Attachment: LUCENE-3867.patch

Thanks Uwe !

I ran the test, and now with both J9 (IBM) and Oracle, I get this print 
(without enabling any flag):

{code}
    [junit] NOTE: running test testReferenceSize
    [junit] NOTE: This JVM is 64bit: true
    [junit] NOTE: Reference size in this JVM: 8
{code}

* I modified the test name to testReferenceSize (was testCompressedOops).

I wrote this small test to print the differences between sizeOf(String) and 
estimateRamUsage(String):

{code}
  public void testSizeOfString() throws Exception {
    String s = "abcdefgkjdfkdsjdskljfdskfjdsf";
    String sub = s.substring(0, 4);
    System.out.println("original=" + RamUsageEstimator.sizeOf(s));
    System.out.println("sub=" + RamUsageEstimator.sizeOf(sub));
    System.out.println("checkInterned=true(orig): " + new 
RamUsageEstimator().estimateRamUsage(s));
    System.out.println("checkInterned=false(orig): " + new 
RamUsageEstimator(false).estimateRamUsage(s));
    System.out.println("checkInterned=false(sub): " + new 
RamUsageEstimator(false).estimateRamUsage(sub));
  }
{code}

It prints:
{code}
original=104
sub=56
checkInterned=true(orig): 0
checkInterned=false(orig): 98
checkInterned=false(sub): 98
{code}

So clearly estimateRamUsage factors in the sub-string's larger char[]. The 
difference in sizes of 'orig' stem from AverageGuessMemoryModel which computes 
the reference size to be 4 (hardcoded), and array size to be 16 (hardcoded). I 
modified AverageGuess to use constants from RUE (they are best guesses 
themselves). Still the test prints a difference, but now I think it's because 
sizeOf(String) aligns the size to mod 8, while estimateRamUsage isn't. I fixed 
that in size(Object), and now the prints are the same.

* I also fixed sizeOfArray -- if the array.length == 0, it returned 0, but it 
should return its header, and aligned to mod 8 as well.

* I modified sizeOf(String[]) to sizeOf(Object[]) and compute its raw size 
only. I started to add sizeOf(String), fastSizeOf(String) and 
deepSizeOf(String[]), but reverted to avoid the hassle -- the documentation 
confuses even me :).

* Changed all sizeOf() to return long, and align() to take and return long.

I think this is ready to commit, though I'd appreciate a second look on the 
MemoryModel and size(Obj) changes.

Also, how about renaming MemoryModel methods to: arrayHeaderSize(), 
classHeaderSize(), objReferenceSize() to make them more clear and accurate? For 
instance, getArraySize does not return the size of an array, but its object 
header ...
                
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
> -----------------------------------------------------
>
>                 Key: LUCENE-3867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3867
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Trivial
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
> NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
> NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
> this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the 
> usual object header. However, this object head is 12 bytes to accommodate a 
> four-byte array length. Then comes the actual array data which, as you might 
> expect, consists of the number of elements multiplied by the number of bytes 
> required for one element, depending on its type. The memory usage for one 
> element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
> about including such helper methods in RUE, as static, stateless, methods? 
> It's not perfect, there's some room for improvement I'm sure, here it is:
> {code}
>       /**
>        * Computes the approximate size of a String object. Note that if this 
> object
>        * is also referenced by another object, you should add
>        * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
>        * method.
>        */
>       public static int sizeOf(String str) {
>               return 2 * str.length() + 6 // chars + additional safeness for 
> arrays alignment
>                               + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
> maintains 3 integers
>                               + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
> char[] array
>                               + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
> String object
>       }
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
> long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to