MapReduce is the best option . For the word count its explained here - http://en.wikipedia.org/wiki/MapReduce
Interesting thing is that the Map step can easily be made parallel. Once again I request the members of this group to go through all the parallel constructs. ( Parallel sorting, Parallel collections, etc ) Its cool to optimize sequential programs, but with GPUs and ever increasing cores, you should start thinking in terms of parallelizing your code. Kishen On Fri, Oct 22, 2010 at 9:24 AM, ligerdave <david.c...@gmail.com> wrote: > for a large file, you probably would want to use external sort. kinda > like a map-reduce concept. it's actually how sort&uniq kinda stuff > work in unix/linux when you try to find some "TOP X" > > again, we are talking about the memory might not hold the entire file > > On Oct 21, 9:35 am, "Vinay..." <vinumars...@gmail.com> wrote: > > how do u find 10 most repeating words on a large file containing words > > in most efficient way...if it can also be done using heapsort plz post > > ur answers.. > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algoge...@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com<algogeeks%2bunsubscr...@googlegroups.com> > . > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algoge...@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.