Sort Files Merge

Gil Shomron Thu, 25 Feb 2016 15:01:47 -0800

Hi all,

I've been investigating sort functionality when sorting big files(larger than RAM size). I'm looking at the system calls and the actualread and write related instructions the application uses.For sort to manage large files, my guess is, it implements some kind ofexternal merge sort algorithm. And indeed, I can see sort load chunksfrom the big file, sorts them, and saves them back to disk when sorted.


But looking a bit further there are two things I do not understand:

1) There are plenty of reads to different buffers of different sizes(49152, 36864, 28672 for example) - why? I mean, why it doesn't justread the whole chunk once and then merge sort it?2) After splitting and sorting each chunk, sort needs to merge them toone sorted list. When I examine the instructions it seems like sort doesA LOT of accesses to those buffers in order to sort them, almost at thesame intensity of merging each chunk - why? doesn't it just need to lookat the beginning of each file, check where is the smallest value, put itin an output buffer and then write it to the final output file?

I've been digging in the code, but unfortunately didn't find my answersthere yet.

I will really appreciate your help with helping me understand how sortis really implemented.


Thanks!
   Gil Shomron.

Sort Files Merge

Reply via email to