[ https://issues.apache.org/jira/browse/TEZ-3732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014305#comment-16014305 ]
Jonathan Eagles commented on TEZ-3732: -------------------------------------- Before for FetchInput {code} Failed to find matching constructor, falling back to class-only introspection. org.apache.tez.runtime.library.common.shuffle.DiskFetchedInput object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int FetchedInput.id N/A 16 8 long FetchedInput.actualSize N/A 24 8 long FetchedInput.compressedSize N/A 32 4 org.apache.tez.runtime.library.common.InputAttemptIdentifier FetchedInput.inputAttemptIdentifier N/A 36 4 org.apache.tez.runtime.library.common.shuffle.FetchedInput.Type FetchedInput.type N/A 40 4 org.apache.tez.runtime.library.common.shuffle.FetchedInputCallback FetchedInput.callback N/A 44 4 org.apache.tez.runtime.library.common.shuffle.FetchedInput.State FetchedInput.state N/A 48 4 org.apache.hadoop.fs.FileSystem DiskFetchedInput.localFS N/A 52 4 org.apache.hadoop.fs.Path DiskFetchedInput.tmpOutputPath N/A 56 4 org.apache.hadoop.fs.Path DiskFetchedInput.outputPath N/A 60 4 (loss due to the next object alignment) Instance size: 64 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total {code} {code} Failed to find matching constructor, falling back to class-only introspection. org.apache.tez.runtime.library.common.shuffle.LocalDiskFetchedInput object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 12 (object header) N/A 12 4 int FetchedInput.id N/A 16 8 long FetchedInput.actualSize N/A 24 8 long FetchedInput.compressedSize N/A 32 4 org.apache.tez.runtime.library.common.InputAttemptIdentifier FetchedInput.inputAttemptIdentifier N/A 36 4 org.apache.tez.runtime.library.common.shuffle.FetchedInput.Type FetchedInput.type N/A 40 4 org.apache.tez.runtime.library.common.shuffle.FetchedInputCallback FetchedInput.callback N/A 44 4 org.apache.tez.runtime.library.common.shuffle.FetchedInput.State FetchedInput.state N/A 48 8 long LocalDiskFetchedInput.startOffset N/A 56 4 org.apache.hadoop.fs.Path LocalDiskFetchedInput.inputFile N/A 60 4 org.apache.hadoop.fs.FileSystem LocalDiskFetchedInput.localFS N/A Instance size: 64 bytes Space losses: 0 bytes internal + 0 bytes external = 0 bytes total {code} {code} Instantiated the sample instance via public org.apache.tez.runtime.library.common.shuffle.MemoryFetchedInput(long,long,org.apache.tez.runtime.library.common.InputAttemptIdentifier,org.apache.tez.runtime.library.common.shuffle.FetchedInputCallback) org.apache.tez.runtime.library.common.shuffle.MemoryFetchedInput object internals: OFFSET SIZE TYPE DESCRIPTION VALUE 0 4 (object header) 01 00 00 00 (00000001 00000000 00000000 00000000) (1) 4 4 (object header) 00 00 00 00 (00000000 00000000 00000000 00000000) (0) 8 4 (object header) 18 ba 0d 00 (00011000 10111010 00001101 00000000) (899608) 12 4 int FetchedInput.id 2 16 8 long FetchedInput.actualSize 0 24 8 long FetchedInput.compressedSize 0 32 4 org.apache.tez.runtime.library.common.InputAttemptIdentifier FetchedInput.inputAttemptIdentifier null 36 4 org.apache.tez.runtime.library.common.shuffle.FetchedInput.Type FetchedInput.type (object) 40 4 org.apache.tez.runtime.library.common.shuffle.FetchedInputCallback FetchedInput.callback null 44 4 org.apache.tez.runtime.library.common.shuffle.FetchedInput.State FetchedInput.state (object) 48 4 org.apache.hadoop.io.BoundedByteArrayOutputStream MemoryFetchedInput.byteStream (object) 52 4 (loss due to the next object alignment) Instance size: 56 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total {code} > Reduce Object size of InputAttemptIdentifier and MapOutput for large jobs > ------------------------------------------------------------------------- > > Key: TEZ-3732 > URL: https://issues.apache.org/jira/browse/TEZ-3732 > Project: Apache Tez > Issue Type: Bug > Reporter: Jonathan Eagles > Assignee: Jonathan Eagles > Attachments: TEZ-3732.1.patch > > > Objects in 64bit java are 12bytes + member size aligned to 8 bytes > InputAttemptIdentifier -> 33Bytes gets aligned up to 40 bytes > This class is just one byte over the 32 byte alignment. Reducing object size > by one byte can save 8 bytes per object. > This is ~8MB savings for 1,000,000 inputs and ~80 MB savings for tasks with > 10,000,000 inputs to fetch (Yes this is a real job) > MapOutput -> 45 bytes gets aligned to 48 bytes > This class can be sub-classed to avoid all sub-classes paying the object size > cost for the other sub-classes > Wait InMemory and DiskDirect -> 32 bytes > Disk -> 40 bytes > Total savings is harder to account for but more than the above case. -- This message was sent by Atlassian JIRA (v6.3.15#6346)