[ 
https://issues.apache.org/jira/browse/TEZ-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14611176#comment-14611176
 ] 

Rajesh Balamohan commented on TEZ-2575:
---------------------------------------


bq. 3.10 sort() sees spills the span, resets the buffer list, allocates a new 
span of 256 mb(16mb kvmeta, 240mb kvbuffer)
Before new span allocation, this would result in empty spill (as the span 
length is 0). sort() was avoided by not adding to merger. spill() can also be 
avoided. 

bq. 3.11 now the key will fit into the buffer(so we didnt use the next buffer)
After 3.6, it would have returned the span with metasize of 16. This would 
cause "(span.kvmeta.remaining() < METASIZE)" to be triggered. "sort()" would 
return 1 as the span lenght was 0 as per patch. So it would end up spilling 
single record irrespective of whether new block buffer was allocated. This 
possibly might be fixed when TEZ-2574 would get fixed.

bq. In the patch 2575.2 there is already a check to not add an empty span to 
merger thread
Right. As mentioned in earlier comment, spill() can also be avoided when the 
length is 0.

bq. span.length() check in collect() has no way to know if this is the old span 
or a new span allocated in sort(). So we need some indication/status from 
sort() for that(else for a newspan, legth() will be always zero)
sort() should sort the span or spill as neeeded. It ideally should not return 
any status about the span. For fresh spans, length would not be 0. However, 
based on end() calculation it can be 0 when there are items written to the 
span. After BufferOverflowException-->sort(), span can be a result of new span 
or carved out span from remaining space. In next iteration (along with call to 
sort etc), "span.length == 0" would indicate that irrespective of the span, it 
is unable to fit in the KV & hence opting for singleRecordSpill.

- Also, pipelinedShuffle check after spillSingleRecord() can merged with 
spillSingleRecord() method.

> Handle KeyValue pairs size which do not fit in a single block
> -------------------------------------------------------------
>
>                 Key: TEZ-2575
>                 URL: https://issues.apache.org/jira/browse/TEZ-2575
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Saikat
>            Assignee: Saikat
>         Attachments: TEZ-2575.1.patch, TEZ-2575.2.patch, TEZ-2575.patch
>
>
> In the present implementation, the available buffer is divided into blocks 
> (specified in the constructor for pipeline sort). and a linked list of these 
> block byte buffers is maintained. 
> A span is created out of the buffers. 
> The present logic, doesnot handle scenario where a single key-value pair size 
> doesnot fit into any of the blocks.
> example if 1mb total memory is divided into 4 blocks, (256 kb each),
> if a single KV pair is greater than the blocksize(~ignoring meta data size), 
> then it fails with buffer exceptions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to