On May 16, 10:33 am, atul anand <atul.87fri...@gmail.com> wrote:
> @amit :
>
> here is the reason :-
>
> each url sayhttp://www.geeksforgeeks.org
>
> you will hash following 
> urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/1111http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763
>
> "http://www.geeksforgeeks.org"; is the redundant part in each url ..... it
> would unnecessary m/m to save all URLs.
>
> ok now say file have 20 million urls ..... .....now what would you do.??
>

I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

-- 
You received this message because you are subscribed to the Google Groups 
"Algorithm Geeks" group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Reply via email to