In case of UnorderedKVWriter (non-partitioned), a single file is used - to
which new entries are appended.

For the partitioned case - using a single file is not as straightforward,
since the number of elements and size of each partition is not known up
front. Generating a single file per partition can cause an explosion in the
number of files, as well as the number of streams open in parallel (OOM).
The current partitioned writer writes data into the in-memory buffer and
then spills this into files with individual partitions consolidated
together.
Without pipelined shuffle - a single file needs to be generated for a
single task, which is where the merge step comes in - in case the buffer is
large. With pipelined shuffle - there's almost no extra cost, since there's
no final merge - and each element is written out exactly once.

That said, optimizations are possible depending upon the use case. e.g. For
a small number of partitions - it's reasonable to write out a file per
partition. However, the ShuffleHandle and shuffle code will need to change
to handle this.

Pipelined Shuffle/avoiding a final merge has some limitations in case of
failures and partial chunks being transferred over. It should be possible
to work around these by modifying the receiving side to process each input
only when all data for that source has been received.

Either way, non trivial changes are required to make this more efficient.

In this particular case, how many partitions were generated, and what was
the size of the unordered output buffer ? Increasing the buffer size for
this particular job can help mitigating the problem - maybe not with 8500
spills though.



On Wed, Nov 18, 2015 at 1:32 PM, Rohini Palaniswamy <[email protected]
> wrote:

>   Came across a job which was taking a long time in
> UnorderedPartitionedKVWriter.mergeAll. Saw that it was decompressing and
> reading data from spill files (8500 spills) and then writing the final
> compressed merge file. Why do we need spill files for
> UnorderedPartitionedKVWriter? Why not just buffer and keep directly writing
> to the final file which will save a lot of time.
>
> Regards,
> Rohini
>

Reply via email to