Ádám Szita created TEZ-4392:
-------------------------------
Summary: Streamed event serialization and distribution
Key: TEZ-4392
URL: https://issues.apache.org/jira/browse/TEZ-4392
Project: Apache Tez
Issue Type: Improvement
Reporter: Ádám Szita
Tez currently compiles the full list of events for a given job, then serializes
every event into another list before starting to distribute the events to
executor instances.
This way all the events are held in memory which in some cases may take up much
space (e.g. 1 MB split size X thousands of split count). It would be more
memory efficient to do this in a streamed way, that is, serialize an event
right before sending it out to an executor, not before.
Currently InputInitializer has the following methods that are of interest for
this:
{code:java}
public abstract List<Event> initialize() throws Exception;
public abstract void handleInputInitializerEvent(List<InputInitializerEvent>
var1) throws Exception;{code}
could these be changed to return/take an Iterator of
Event/InputInitializerEvent ?
--
This message was sent by Atlassian Jira
(v8.20.1#820001)