For the cases where Storing the Value is the only Concern and (Not the Retrieval efficiency), I would Suggest Something called DFA Subset minimization.. Google for it ... and after the final subset as said U can use something called DAWG for the most Most Optimal solution..
On Thu, May 17, 2012 at 2:58 PM, Prakash D <cegprak...@gmail.com> wrote: > We can still improve this trie idea.. > > say we have urls like > www.google.com > www.goodbye.com > www.google.com/transliterate > www.goodstrain.com/good > > we can subdivide everything under "www.goo" > I mean we can store each character as a node in a trie and call it > like a "URL dictionary" > > > On Wed, May 16, 2012 at 5:43 PM, omega9 <tvssarma.ome...@gmail.com> wrote: > > > > > > On May 16, 10:33 am, atul anand <atul.87fri...@gmail.com> wrote: > >> @amit : > >> > >> here is the reason :- > >> > >> each url sayhttp://www.geeksforgeeks.org > >> > >> you will hash following urlshttp://www.geeksforgeeks.orghttp:// > www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/1111http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763 > >> > >> "http://www.geeksforgeeks.org" is the redundant part in each url ..... > it > >> would unnecessary m/m to save all URLs. > >> > >> ok now say file have 20 million urls ..... .....now what would you do.?? > >> > > > > I think the trie suggestion was good. Have each domain (with the > > protocol part) as a node and then have the subsequent directory > > locations as a hierarchy under it. > > > > -- > > You received this message because you are subscribed to the Google > Groups "Algorithm Geeks" group. > > To post to this group, send email to algogeeks@googlegroups.com. > > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com. > > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > > > > -- > You received this message because you are subscribed to the Google Groups > "Algorithm Geeks" group. > To post to this group, send email to algogeeks@googlegroups.com. > To unsubscribe from this group, send email to > algogeeks+unsubscr...@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/algogeeks?hl=en. > > -- You received this message because you are subscribed to the Google Groups "Algorithm Geeks" group. To post to this group, send email to algogeeks@googlegroups.com. To unsubscribe from this group, send email to algogeeks+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/algogeeks?hl=en.