jerqi commented on issue #137:
URL: 
https://github.com/apache/incubator-uniffle/issues/137#issuecomment-1294629015

   > > > > It's better to sort MapId before the data are flushed.It won't bring 
too much cost for non-AQE optimized stages.
   > > > 
   > > > 
   > > > Does data need to sort by mapId?
   > > 
   > > 
   > > Yes, we only need local order. If we have local order, we can filter 
much data effectively.
   > 
   > Emm... I remember you prefer only sort the index-file instead of 
data-file, which is mentioned in offline meeting. Do i misunderstand you?
   
   Give an example:
   We have three buffers to flush, they taskId 1 block, taskId 2 block, taskId 
3 block. We should sort them to taskId 1 block, taskId 2 block, taskId 3 block. 
And then we can flush them to disks.Then we receive taskId 2 block, taskId 6 
block, taskId 1 block, we sort them and flush them, so currently the data on 
the disk should be
   taskId 1 block , taskId 2 block, taskId 3 block, taskId 1 block, taskId 2 
block, taskId 6 block.
   The data only have local order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to