[
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13237172#comment-13237172
]
Dawid Weiss edited comment on LUCENE-3867 at 3/23/12 9:47 PM:
--------------------------------------------------------------
I've been thinking how one can assess the estimation quality of the new code. I
came up with this:
- I allocate an Object[] half the size of estimated maximum available RAM (just
to make sure all objects will fit without the need to reallocate),
- I precompute shallow sizes for instances of all "wild classes" (classes with
random fields, including arrays).
- I then fill in the "vault" array above with random instances of wild classes,
summing up the estimated size UNTIL I HIT OOM.
- Once I git OOM I know how much we actually allocated vs. how much space we
thought we did allocate.
The results are very accurate on HotSpot if one is using serial GC. For example:
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun
Microsystems Inc., 1.6.0_29]
Max: 483.4 MB, Used: 698.9 KB, Committed: 123.8 MB
Expected free: 240.9 MB, Allocated estimation: 240.8 MB, Difference: -0.05%
(113.6 KB)
{noformat}
If one runs with a parallel GC things do get out of hand because the GC is not
keeping up with allocations (although I'm not sure how I should interpret this
because we only allocate; it's not possible to free any space -- maybe there
are different GC pools or something):
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun
Microsystems Inc., 1.6.0_29]
Max: 444.5 MB, Used: 655.4 KB, Committed: 122.7 MB
Expected free: 221.5 MB, Allocated estimation: 174.2 MB, Difference: -21.34%
(47.3 MB)
{noformat}
JRockit:
{noformat}
[JVM: Oracle JRockit(R),
R28.1.4-7-144370-1.6.0_26-20110617-2130-windows-x86_64, Oracle Corporation,
Oracle Corporation, 1.6.0_26]
Max: 500 MB, Used: 3.5 MB, Committed: 64 MB
Expected free: 247.7 MB, Allocated estimation: 249.5 MB, Difference: 0.74% (1.8
MB)
{noformat}
I think we're good. If somebody wishes to experiment, the spike is here:
https://github.com/dweiss/java-sizeof
{noformat}
mvn test
mvn dependency:copy-dependencies
java -cp target\classes:target\test-classes:target\dependency\junit-4.10.jar \
com.carrotsearch.sizeof.TestEstimationQuality
{noformat}
was (Author: dweiss):
I've been thinking how one can assess the estimation quality of the new
code. I cam up with this:
- I allocate an Object[] half the size of estimated maximum available RAM (just
to make sure all objects will fit without the need to reallocate),
- I precompute shallow sizes for instances of all "wild classes" (classes with
random fields, including arrays).
- I then fill in the "vault" array above with random instances of wild classes,
summing up the estimated size UNTIL I HIT OOM.
- Once I git OOM I know how much we actually allocated vs. how much space we
thought we did allocate.
The results are very accurate on HotSpot if one is using serial GC. For example:
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun
Microsystems Inc., 1.6.0_29]
Max: 483.4 MB, Used: 698.9 KB, Committed: 123.8 MB
Expected free: 240.9 MB, Allocated estimation: 240.8 MB, Difference: -0.05%
(113.6 KB)
{noformat}
If one runs with a parallel GC things do get out of hand because the GC is not
keeping up with allocations (although I'm not sure how I should interpret this
because we only allocate; it's not possible to free any space -- maybe there
are different GC pools or something):
{noformat}
[JVM: Java HotSpot(TM) 64-Bit Server VM, 20.4-b02, Sun Microsystems Inc., Sun
Microsystems Inc., 1.6.0_29]
Max: 444.5 MB, Used: 655.4 KB, Committed: 122.7 MB
Expected free: 221.5 MB, Allocated estimation: 174.2 MB, Difference: -21.34%
(47.3 MB)
{noformat}
JRockit:
{noformat}
[JVM: Oracle JRockit(R),
R28.1.4-7-144370-1.6.0_26-20110617-2130-windows-x86_64, Oracle Corporation,
Oracle Corporation, 1.6.0_26]
Max: 500 MB, Used: 3.5 MB, Committed: 64 MB
Expected free: 247.7 MB, Allocated estimation: 249.5 MB, Difference: 0.74% (1.8
MB)
{noformat}
I think we're good. If somebody wishes to experiment, the spike is here:
https://github.com/dweiss/java-sizeof
{noformat}
mvn test
mvn dependency:copy-dependencies
java -cp target\classes:target\test-classes:target\dependency\junit-4.10.jar \
com.carrotsearch.sizeof.TestEstimationQuality
{noformat}
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
> --------------------------------------------------------------------------
>
> Key: LUCENE-3867
> URL: https://issues.apache.org/jira/browse/LUCENE-3867
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Reporter: Shai Erera
> Assignee: Uwe Schindler
> Priority: Trivial
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch,
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch,
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that:
> NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The
> NUM_BYTES_OBJECT_REF part should not be included, at least not according to
> this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the
> usual object header. However, this object head is 12 bytes to accommodate a
> four-byte array length. Then comes the actual array data which, as you might
> expect, consists of the number of elements multiplied by the number of bytes
> required for one element, depending on its type. The memory usage for one
> element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel
> about including such helper methods in RUE, as static, stateless, methods?
> It's not perfect, there's some room for improvement I'm sure, here it is:
> {code}
> /**
> * Computes the approximate size of a String object. Note that if this
> object
> * is also referenced by another object, you should add
> * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
> * method.
> */
> public static int sizeOf(String str) {
> return 2 * str.length() + 6 // chars + additional safeness for
> arrays alignment
> + 3 * RamUsageEstimator.NUM_BYTES_INT // String
> maintains 3 integers
> + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER //
> char[] array
> + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; //
> String object
> }
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] /
> long[] / double[] ... and String[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]