[ 
https://issues.apache.org/jira/browse/PIG-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705188#action_12705188
 ] 

Hong Tang commented on PIG-793:
-------------------------------

Two ideas:

# when loading tuple from serialized data, keep it as a byte array and only 
instantiate datums when get/set calls are made. This would help if we are 
moving tuples from one container to another container.
{code}
class LazyTuple implements Tuple {
  ArrayList<Object> fields; // null if not deserialized
  DataByteArray lazyBytes; // e.g. serialized bytes of tuple in avro format.
}
{code} 
# improving DataByteArray. it may be changed to an interface (need get(), 
offset(), and length() ), and use a DataByteArrayFactory to create instances in 
two ways: 
## DataByteArrayFactor.createPrivate(byte[], offset, length), if we need to 
keep a private copy of the buffer.
## DataByteArrayCreateShared(). if the input buffer can be shared with the data 
byte array object. In this case, the contract would be that caller will no 
longer access the portion of byte array from offset to offset+length 
(exclusive).

There could be three different implementations of this:
- The current implementation will be used for createPrivate().
- An implementation for small buffers (offset/length can be represented in 
short/short).
- An implementation for large buffers (offset/length are int/int, and length is 
larger enough)

Note that the change to DataByteArray would break the current semantics where 
the offset is always 0, and length is always the length of the buffer.


> Improving memory efficiency of Tuple implementation
> ---------------------------------------------------
>
>                 Key: PIG-793
>                 URL: https://issues.apache.org/jira/browse/PIG-793
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>
> Currently, our tuple is a real pig and uses a lot of extra memory. 
> There are several places where we can improve memory efficiency:
> (1) Laying out memory for the fields rather than using java objects since 
> since each object for a numeric field takes 16 bytes
> (2) For the cases where we know the schema using Java arrays rather than 
> ArrayList.
> There might be more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to