[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13231625#comment-13231625
 ] 

Dawid Weiss commented on LUCENE-3867:
-------------------------------------

This is very interesting indeed. 

So, I used the agent hook into a running VM to dump some of the internal 
diagnostics, including OOP sizes, heap word alignments, etc. Here's a scoop of 
the results (with client-side indicated sizes on the right):

{noformat}

# 1.7, 64 bit, OOPS compressed            (client)
getOopSize: 8                             ref size = 4         
Address size: 8                           array header = 16    
Bytes per long: 8                         object header = 12   
CPU: amd64
HeapOopSize: 4
HeapWordSize: 8
IntSize: 4
getMinObjAlignmentInBytes: 8
getObjectAlignmentInBytes: 8
isCompressedOopsEnabled: true
isLP64: true


# 1.7, 64 bit, full
getOopSize: 8                             ref size = 8     
Address size: 8                           array header = 24
Bytes per long: 8                         object header = 16
CPU: amd64
HeapOopSize: 8
HeapWordSize: 8
IntSize: 4
getMinObjAlignmentInBytes: 8
getObjectAlignmentInBytes: 8
isCompressedOopsEnabled: false
isLP64: true

# 1.7, 32 bit  
getOopSize: 4                             ref size = 4     
Address size: 4                           array header = 12
Bytes per long: 8                         object header = 8
CPU: x86
HeapOopSize: 4
HeapWordSize: 4
IntSize: 4
getMinObjAlignmentInBytes: 8
getObjectAlignmentInBytes: 8
isCompressedOopsEnabled: false
isLP64: false
{noformat}

The question we asked ourselves with Uwe is why an empty array takes 24 bytes 
without OOP compression (that's object overhead and an int length, so should be 
16 + 4 = 20)? The answer seems to be in how base offsets are calculated for 
arrays -- they seem to be enforced on HeapWordSize boundary and this is 8, even 
with OOP compressed:
{noformat}
  // Returns the offset of the first element.
  static int base_offset_in_bytes(BasicType type) {
    return header_size(type) * HeapWordSize;
  }
{noformat}
I'll spare you the detailed code but the rounding to next HeapWordSize multiple 
seems evident in all cases. What's even more interesting, this "wasted" space 
is not (and cannot) be used for data so even a single integer pushes the array 
size to the next available bound:
{noformat}
int[0] = 24
int[1] = 32   (*)
int[2] = 32
int[3] = 40
{noformat}

Finally, I could not resist to mention that object alignments... are 
adjustable, at least to 2^n boundaries. So you can also do this:
{noformat}
> java  -XX:-UseCompressedOops -XX:ObjectAlignmentInBytes=32 ...
Object = 32
int[0] = 32
int[1] = 32
int[2] = 32
int[3] = 64
{noformat}
Nice, huh? :) I don't think the JVM has been tested heavily for this 
possibility though because the code hung on me a few times if executed in that 
mode.
                
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
> --------------------------------------------------------------------------
>
>                 Key: LUCENE-3867
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3867
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/index
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>            Priority: Trivial
>             Fix For: 3.6, 4.0
>
>         Attachments: LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
> LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch
>
>
> RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
> NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
> NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
> this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
> {quote}
> A single-dimension array is a single object. As expected, the array has the 
> usual object header. However, this object head is 12 bytes to accommodate a 
> four-byte array length. Then comes the actual array data which, as you might 
> expect, consists of the number of elements multiplied by the number of bytes 
> required for one element, depending on its type. The memory usage for one 
> element is 4 bytes for an object reference ...
> {quote}
> While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
> about including such helper methods in RUE, as static, stateless, methods? 
> It's not perfect, there's some room for improvement I'm sure, here it is:
> {code}
>       /**
>        * Computes the approximate size of a String object. Note that if this 
> object
>        * is also referenced by another object, you should add
>        * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
>        * method.
>        */
>       public static int sizeOf(String str) {
>               return 2 * str.length() + 6 // chars + additional safeness for 
> arrays alignment
>                               + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
> maintains 3 integers
>                               + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
> char[] array
>                               + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
> String object
>       }
> {code}
> If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
> long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to