On May 16, 10:33 am, atul anand <atul.87fri...@gmail.com> wrote: > @amit : > > here is the reason :- > > each url sayhttp://www.geeksforgeeks.org > > you will hash following > urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/1111http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763 > > "http://www.geeksforgeeks.org" is the redundant part in each url ..... it > would unnecessary m/m to save all URLs. > > ok now say file have 20 million urls ..... .....now what would you do.?? >
I think the trie suggestion was good. Have each domain (with the protocol part) as a node and then have the subsequent directory locations as a hierarchy under it. -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.