MapReduce is the best option .

For the word count its explained here -
http://en.wikipedia.org/wiki/MapReduce

Interesting thing is that the Map step can easily be made parallel.

Once again I request the members of this group to go through all the
parallel constructs. ( Parallel sorting, Parallel collections, etc )
Its cool to optimize sequential programs, but with GPUs and ever increasing
cores, you should start thinking in terms of parallelizing your code.

Kishen

On Fri, Oct 22, 2010 at 9:24 AM, ligerdave <david.c...@gmail.com> wrote:

> for a large file, you probably would want to use external sort. kinda
> like a map-reduce concept. it's actually how sort&uniq kinda stuff
> work in unix/linux when you try to find some "TOP X"
>
> again, we are talking about the memory might not hold the entire file
>
> On Oct 21, 9:35 am, "Vinay..." <vinumars...@gmail.com> wrote:
> > how do u find 10 most repeating words on a large file containing words
> > in most efficient way...if it can also be done using heapsort plz post
> > ur answers..
>
> --
> You received this message because you are subscribed to the Google Groups
> "Algorithm Geeks" group.
> To post to this group, send email to algoge...@googlegroups.com.
> To unsubscribe from this group, send email to
> algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/algogeeks?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algoge...@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to