[ https://issues.apache.org/jira/browse/TEZ-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14608398#comment-14608398 ]
Saikat commented on TEZ-2575: ----------------------------- [~rajesh.balamohan] Thanks for the review. I had one question, with your proposed approach, in the worst case, there will be bufferList.size() number of additional spills per KV pair which doesnt fit into the buffer.(e.g say 4 blocks in buffer list, and the first allocated span is empty. if the KV doesnt fit into this span, collect() method will be recursively called 4 times until bufferOverflowRecursion > bufferList.size() condition is hit and that is when KV will be spilled to disk) By indicating a status in sort(), I am trying to catch the condition early, to avoid extra spills. > Handle KeyValue pairs size which do not fit in a single block > ------------------------------------------------------------- > > Key: TEZ-2575 > URL: https://issues.apache.org/jira/browse/TEZ-2575 > Project: Apache Tez > Issue Type: Improvement > Reporter: Saikat > Assignee: Saikat > Attachments: TEZ-2575.1.patch, TEZ-2575.patch > > > In the present implementation, the available buffer is divided into blocks > (specified in the constructor for pipeline sort). and a linked list of these > block byte buffers is maintained. > A span is created out of the buffers. > The present logic, doesnot handle scenario where a single key-value pair size > doesnot fit into any of the blocks. > example if 1mb total memory is divided into 4 blocks, (256 kb each), > if a single KV pair is greater than the blocksize(~ignoring meta data size), > then it fails with buffer exceptions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)