[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13229311#comment-13229311
 ] 

Dawid Weiss commented on LUCENE-3867:
-------------------------------------

{code}
+  /** Returns the size in bytes of the String[] object. */
+  public static int sizeOf(String[] arr) {
+    int size = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_OBJECT_REF * 
arr.length);
+    for (String s : arr) {
+      size += sizeOf(s);
+    }
+    return size;
+  }
+
+  /** Returns the approximate size of a String object. */
+  public static int sizeOf(String str) {
+    // String's char[] size
+    int arraySize = alignObjectSize(NUM_BYTES_ARRAY_HEADER + NUM_BYTES_CHAR * 
str.length());
+
+    // String's row object size    
+    int objectSize = alignObjectSize(NUM_BYTES_OBJECT_REF /* array reference */
+        + 3 * NUM_BYTES_INT /* String holds 3 integers */
+        + NUM_BYTES_OBJECT_HEADER /* String object header */);
+    
+    return objectSize + arraySize;
+  }
{code}

What I mean is that without looking at the code I would expect sizeOf(String[] 
N) to return the actual memory taken by an array of strings. If they point to a 
single char[], this should simple count the object overhead, not count every 
character N times as it would do now. This isn't sizeOf(), this is sum(string 
lengths * 2) + epsilon to me.

I'd keep RamUsageEstimator exactly what the name says -- an estimation of the 
actual memory taken by a given object. A string can point to a char[] and if so 
this should be traversed as an object and counted once.

                
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is incorrect
> -----------------------------------------------------
>
>                 Key: LUCENE-3867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3867
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Trivial
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
> NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
> NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
> this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the 
> usual object header. However, this object head is 12 bytes to accommodate a 
> four-byte array length. Then comes the actual array data which, as you might 
> expect, consists of the number of elements multiplied by the number of bytes 
> required for one element, depending on its type. The memory usage for one 
> element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
> about including such helper methods in RUE, as static, stateless, methods? 
> It's not perfect, there's some room for improvement I'm sure, here it is:
> {code}
>       /**
>        * Computes the approximate size of a String object. Note that if this 
> object
>        * is also referenced by another object, you should add
>        * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
>        * method.
>        */
>       public static int sizeOf(String str) {
>               return 2 * str.length() + 6 // chars + additional safeness for 
> arrays alignment
>                               + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
> maintains 3 integers
>                               + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
> char[] array
>                               + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
> String object
>       }
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
> long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to