[algogeeks] Re: Sort the data from a big file.

me13013 Mon, 28 Dec 2009 18:11:15 -0800

One approach is to read the file in K passes.  On the first pass you
collect the first 1/Kth of the data in memory, sort it, write it it
your output file.  On the next pass you do the same but for the second
1/Kth of the data and append it to the output file.  And so on.
Advantage is you aren't writing any temporary files.  Disadvantage is
you're reading your input file K times.

This works best if your data is uniformly distributed (in which case
you can estimate the key boundaries for each pass).  If not, you'll
have to do some work as you go to compute the next key boundary.

Bob H

P.S.  A 100M file doesn't seem that big these days.  Wouldn't the
standard *nix command line sort be able to handle your file?

On Dec 21, 8:14 am, dinesh bansal <bansal...@gmail.com> wrote:
> Suppose I have a big file (~100M) containing integer data. I want to sort
> this file. The problem is I don't want to load the complete file data into
> main memory in one shot. I mean I can read the file in batches and sort the
> batch and save it in another file but cannot store the entire file contents
> in main memory. Can somebody help me with algorithm or pseudo code?

--

You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algoge...@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: Sort the data from a big file.

Reply via email to