[GitHub] [incubator-uniffle] jerqi commented on issue #137: [Improvement][AQE] Sort MapId before the data are flushed

GitBox Fri, 28 Oct 2022 00:53:47 -0700


jerqi commented on issue #137:
URL: 
https://github.com/apache/incubator-uniffle/issues/137#issuecomment-1294629015


   > > > > It's better to sort MapId before the data are flushed.It won't bring 
too much cost for non-AQE optimized stages.
   > > > 
   > > > 
   > > > Does data need to sort by mapId?
   > > 
   > > 
   > > Yes, we only need local order. If we have local order, we can filter 
much data effectively.
   > 
   > Emm... I remember you prefer only sort the index-file instead of 
data-file, which is mentioned in offline meeting. Do i misunderstand you?
   
   Give an example:
   We have three buffers to flush, they taskId 1 block, taskId 2 block, taskId 
3 block. We should sort them to taskId 1 block, taskId 2 block, taskId 3 block. 
And then we can flush them to disks.Then we receive taskId 2 block, taskId 6 
block, taskId 1 block, we sort them and flush them, so currently the data on 
the disk should be
   taskId 1 block , taskId 2 block, taskId 3 block, taskId 1 block, taskId 2 
block, taskId 6 block.
   The data only have local order.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [incubator-uniffle] jerqi commented on issue #137: [Improvement][AQE] Sort MapId before the data are flushed

Reply via email to