[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237172#comment-13237172
 ] 

Dawid Weiss edited comment on LUCENE-3867 at 3/23/12 9:47 PM:
--------------------------------------------------------------

I've been thinking how one can assess the estimation quality of the new code. I 
came up with this:
- I allocate an Object[] half the size of estimated maximum available RAM (just 
to make sure all objects will fit without the need to reallocate),
- I precompute shallow sizes for instances of all "wild classes" (classes with 
random fields, including arrays).
- I then fill in the "vault" array above with random instances of wild classes, 
summing up the estimated size UNTIL I HIT OOM.
- Once I git OOM I know how much we actually allocated vs. how much space we 
thought we did allocate.

The results are very accurate on HotSpot if one is using serial GC. For example:
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun 
Microsystems Inc., 1.6.0_29]
Max: 483.4 MB, Used: 698.9 KB, Committed: 123.8 MB
Expected free: 240.9 MB, Allocated estimation: 240.8 MB, Difference: -0.05% 
(113.6 KB)
{noformat}

If one runs with a parallel GC things do get out of hand because the GC is not 
keeping up with allocations (although I'm not sure how I should interpret this 
because we only allocate; it's not possible to free any space -- maybe there 
are different GC pools or something):
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun 
Microsystems Inc., 1.6.0_29]
Max: 444.5 MB, Used: 655.4 KB, Committed: 122.7 MB
Expected free: 221.5 MB, Allocated estimation: 174.2 MB, Difference: -21.34% 
(47.3 MB)
{noformat}

JRockit:
{noformat}
[JVM: Oracle JRockit(R), 
R28.1.4-7-144370-1.6.0_26-20110617-2130-windows-x86_64, Oracle Corporation, 
Oracle Corporation, 1.6.0_26]
Max: 500 MB, Used: 3.5 MB, Committed: 64 MB
Expected free: 247.7 MB, Allocated estimation: 249.5 MB, Difference: 0.74% (1.8 
MB)
{noformat}

I think we're good. If somebody wishes to experiment, the spike is here:
https://github.com/dweiss/java-sizeof
{noformat}
mvn test
mvn dependency:copy-dependencies
java -cp target\classes:target\test-classes:target\dependency\junit-4.10.jar \
  com.carrotsearch.sizeof.TestEstimationQuality
{noformat}
                
      was (Author: dweiss):
    I've been thinking how one can assess the estimation quality of the new 
code. I cam up with this:
- I allocate an Object[] half the size of estimated maximum available RAM (just 
to make sure all objects will fit without the need to reallocate),
- I precompute shallow sizes for instances of all "wild classes" (classes with 
random fields, including arrays).
- I then fill in the "vault" array above with random instances of wild classes, 
summing up the estimated size UNTIL I HIT OOM.
- Once I git OOM I know how much we actually allocated vs. how much space we 
thought we did allocate.

The results are very accurate on HotSpot if one is using serial GC. For example:
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun 
Microsystems Inc., 1.6.0_29]
Max: 483.4 MB, Used: 698.9 KB, Committed: 123.8 MB
Expected free: 240.9 MB, Allocated estimation: 240.8 MB, Difference: -0.05% 
(113.6 KB)
{noformat}

If one runs with a parallel GC things do get out of hand because the GC is not 
keeping up with allocations (although I'm not sure how I should interpret this 
because we only allocate; it's not possible to free any space -- maybe there 
are different GC pools or something):
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun 
Microsystems Inc., 1.6.0_29]
Max: 444.5 MB, Used: 655.4 KB, Committed: 122.7 MB
Expected free: 221.5 MB, Allocated estimation: 174.2 MB, Difference: -21.34% 
(47.3 MB)
{noformat}

JRockit:
{noformat}
[JVM: Oracle JRockit(R), 
R28.1.4-7-144370-1.6.0_26-20110617-2130-windows-x86_64, Oracle Corporation, 
Oracle Corporation, 1.6.0_26]
Max: 500 MB, Used: 3.5 MB, Committed: 64 MB
Expected free: 247.7 MB, Allocated estimation: 249.5 MB, Difference: 0.74% (1.8 
MB)
{noformat}

I think we're good. If somebody wishes to experiment, the spike is here:
https://github.com/dweiss/java-sizeof
{noformat}
mvn test
mvn dependency:copy-dependencies
java -cp target\classes:target\test-classes:target\dependency\junit-4.10.jar \
  com.carrotsearch.sizeof.TestEstimationQuality
{noformat}
                  
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-3867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3867
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Uwe Schindler
>            Priority: Trivial
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
> NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
> NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
> this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the 
> usual object header. However, this object head is 12 bytes to accommodate a 
> four-byte array length. Then comes the actual array data which, as you might 
> expect, consists of the number of elements multiplied by the number of bytes 
> required for one element, depending on its type. The memory usage for one 
> element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
> about including such helper methods in RUE, as static, stateless, methods? 
> It's not perfect, there's some room for improvement I'm sure, here it is:
> {code}
>       /**
>        * Computes the approximate size of a String object. Note that if this 
> object
>        * is also referenced by another object, you should add
>        * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
>        * method.
>        */
>       public static int sizeOf(String str) {
>               return 2 * str.length() + 6 // chars + additional safeness for 
> arrays alignment
>                               + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
> maintains 3 integers
>                               + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
> char[] array
>                               + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
> String object
>       }
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
> long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to