Gopal V created TEZ-1288:
----------------------------

             Summary: Create FastTezSerialization as an optional feature
                 Key: TEZ-1288
                 URL: https://issues.apache.org/jira/browse/TEZ-1288
             Project: Apache Tez
          Issue Type: Improvement
    Affects Versions: 0.5.0
            Reporter: Gopal V


Tez inherits the writable framework from map-reduce. 

This is very flexible, but not particularly memory efficient for the small data 
types.

When deserializing, each value and key has to be allocated afresh for each 
small chunk of data (new IntWritable instead of .set()).

The bytes writable serialization operation always has to write a 4 byte prefix 
for  all values and keys, because of requirements around streamed .readFields() 
instead of a customer setter/getter impl.

Implement a faster serialization mechanism for the inner loop of sort, spill, 
merge, which doesn't trigger the GC and avoids adding simplistic overheads to 
the IFile format.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to