[algogeeks] Re: 10 most repeating words

Chi Sat, 23 Oct 2010 15:41:51 -0700

The biggest weakness of the trie is the amount of space it wastes -
chances are there will be runs of characters with a word only at the
end, meaning a bunch of extra levels of the tree that aren't needed.
The PATRICIA trie, or radix trie, attempts to address this by allowing
the 'decision' at each note to be on something other than the first
character, so each node stores an offset and the branch options (e.g.
if the letter in position 5 is A, go down this branch).


On 23 Okt., 21:13, Chi <c...@linuxdna.com> wrote:
> I've written a kart-trie in php. You can easily extend yourself the
> payload to count the word frequency.
>
> >http://sourceforge.net/projects/kart-trie
>
> After you build your dictionary from your large file, you can easily
> find the highest frequency be recursively search the trie. It should
> be faster then a hash-Table, because the kart-trie is like an
> unbalance binary trie. The only difference between a kart-trie and a
> radix-trie, or a crip-bit-trie, is that it uses a hash-key to identify
> the payload. That makes is suitable for other data as well.
>
> On Oct 21, 3:35 pm, "Vinay..." <vinumars...@gmail.com> wrote:
>
> > how do u find 10 most repeating words on a large file containing words
> > in most efficient way...if it can also be done using heapsort plz post
> > ur answers..

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algoge...@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: 10 most repeating words

Reply via email to