jerqi commented on issue #137: URL: https://github.com/apache/incubator-uniffle/issues/137#issuecomment-1294629015
> > > > It's better to sort MapId before the data are flushed.It won't bring too much cost for non-AQE optimized stages. > > > > > > > > > Does data need to sort by mapId? > > > > > > Yes, we only need local order. If we have local order, we can filter much data effectively. > > Emm... I remember you prefer only sort the index-file instead of data-file, which is mentioned in offline meeting. Do i misunderstand you? Give an example: We have three buffers to flush, they taskId 1 block, taskId 2 block, taskId 3 block. We should sort them to taskId 1 block, taskId 2 block, taskId 3 block. And then we can flush them to disks.Then we receive taskId 2 block, taskId 6 block, taskId 1 block, we sort them and flush them, so currently the data on the disk should be taskId 1 block , taskId 2 block, taskId 3 block, taskId 1 block, taskId 2 block, taskId 6 block. The data only have local order. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@uniffle.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org