We can still improve this trie idea..

say we have urls like
www.google.com
www.goodbye.com
www.google.com/transliterate
www.goodstrain.com/good

we can subdivide everything under "www.goo"
I mean we can store each character as a node in a trie and call it
like a "URL dictionary"


On Wed, May 16, 2012 at 5:43 PM, omega9 <tvssarma.ome...@gmail.com> wrote:
>
>
> On May 16, 10:33 am, atul anand <atul.87fri...@gmail.com> wrote:
>> @amit :
>>
>> here is the reason :-
>>
>> each url sayhttp://www.geeksforgeeks.org
>>
>> you will hash following 
>> urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/1111http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763
>>
>> "http://www.geeksforgeeks.org"; is the redundant part in each url ..... it
>> would unnecessary m/m to save all URLs.
>>
>> ok now say file have 20 million urls ..... .....now what would you do.??
>>
>
> I think the trie suggestion was good. Have each domain (with the
> protocol part) as a node and then have the subsequent directory
> locations as a hierarchy under it.
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Algorithm Geeks" group.
> To post to this group, send email to algogeeks@googlegroups.com.
> To unsubscribe from this group, send email to 
> algogeeks+unsubscr...@googlegroups.com.
> For more options, visit this group at 
> http://groups.google.com/group/algogeeks?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to