Gopal V created TEZ-1288:
----------------------------
Summary: Create FastTezSerialization as an optional feature
Key: TEZ-1288
URL: https://issues.apache.org/jira/browse/TEZ-1288
Project: Apache Tez
Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Gopal V
Tez inherits the writable framework from map-reduce.
This is very flexible, but not particularly memory efficient for the small data
types.
When deserializing, each value and key has to be allocated afresh for each
small chunk of data (new IntWritable instead of .set()).
The bytes writable serialization operation always has to write a 4 byte prefix
for all values and keys, because of requirements around streamed .readFields()
instead of a customer setter/getter impl.
Implement a faster serialization mechanism for the inner loop of sort, spill,
merge, which doesn't trigger the GC and avoids adding simplistic overheads to
the IFile format.
--
This message was sent by Atlassian JIRA
(v6.2#6252)