[
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229311#comment-13229311
]
Dawid Weiss commented on LUCENE-3867:
-------------------------------------
{code}
+ /** Returns the size in bytes of the String[] object. */
+ public static int sizeOf(String[] arr) {
+ int size = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF *
arr.length);
+ for (String s : arr) {
+ size += sizeOf(s);
+ }
+ return size;
+ }
+
+ /** Returns the approximate size of a String object. */
+ public static int sizeOf(String str) {
+ // String's char[] size
+ int arraySize = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_CHAR *
str.length());
+
+ // String's row object size
+ int objectSize = alignObjectSize(NUM_BYTES_OBJECT_REF /* array reference */
+ + 3 * NUM_BYTES_INT /* String holds 3 integers */
+ + NUM_BYTES_OBJECT_HEADER /* String object header */);
+
+ return objectSize + arraySize;
+ }
{code}
What I mean is that without looking at the code I would expect sizeOf(String[]
N) to return the actual memory taken by an array of strings. If they point to a
single char[], this should simple count the object overhead, not count every
character N times as it would do now. This isn't sizeOf(), this is sum(string
lengths * 2) + epsilon to me.
I'd keep RamUsageEstimator exactly what the name says -- an estimation of the
actual memory taken by a given object. A string can point to a char[] and if so
this should be traversed as an object and counted once.
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
> -----------------------------------------------------
>
> Key: LUCENE-3867
> URL: https://issues.apache.org/jira/browse/LUCENE-3867
> Project: Lucene - Java
> Issue Type: Bug
> Components: core/index
> Reporter: Shai Erera
> Assignee: Shai Erera
> Priority: Trivial
> Fix For: 3.6, 4.0
>
> Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that:
> NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The
> NUM_BYTES_OBJECT_REF part should not be included, at least not according to
> this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the
> usual object header. However, this object head is 12 bytes to accommodate a
> four-byte array length. Then comes the actual array data which, as you might
> expect, consists of the number of elements multiplied by the number of bytes
> required for one element, depending on its type. The memory usage for one
> element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel
> about including such helper methods in RUE, as static, stateless, methods?
> It's not perfect, there's some room for improvement I'm sure, here it is:
> {code}
> /**
> * Computes the approximate size of a String object. Note that if this
> object
> * is also referenced by another object, you should add
> * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
> * method.
> */
> public static int sizeOf(String str) {
> return 2 * str.length() + 6 // chars + additional safeness for
> arrays alignment
> + 3 * RamUsageEstimator.NUM_BYTES_INT // String
> maintains 3 integers
> + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER //
> char[] array
> + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; //
> String object
> }
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] /
> long[] / double[] ... and String[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]