We can still improve this trie idea.. say we have urls like www.google.com www.goodbye.com www.google.com/transliterate www.goodstrain.com/good
we can subdivide everything under "www.goo" I mean we can store each character as a node in a trie and call it like a "URL dictionary" On Wed, May 16, 2012 at 5:43 PM, omega9 <tvssarma.ome...@gmail.com> wrote: > > > On May 16, 10:33 am, atul anand <atul.87fri...@gmail.com> wrote: >> @amit : >> >> here is the reason :- >> >> each url sayhttp://www.geeksforgeeks.org >> >> you will hash following >> urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/1111http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763 >> >> "http://www.geeksforgeeks.org" is the redundant part in each url ..... it >> would unnecessary m/m to save all URLs. >> >> ok now say file have 20 million urls ..... .....now what would you do.?? >> > > I think the trie suggestion was good. Have each domain (with the > protocol part) as a node and then have the subsequent directory > locations as a hierarchy under it. > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algogeeks@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.