Re: [algogeeks] Re: storing URL's

2012-05-20 Thread atul anand
question is more of like asking which data structure is suitable for
implementing  DNS server like functionality ?

On Sat, May 19, 2012 at 10:46 PM, Gene gene.ress...@gmail.com wrote:

 This question has no answer. Every good student of computer science
 will know that you choose a data structure based on the _operations_
 that must be performed on it: insert, lookup and what flavors of
 lookup, delete, etc..  So if an interviewer uses this question, he or
 she is probably trying to get you discuss this. So the right
 _response_ (not an answer) is What will you be _doing_ with these
 URLs?

 An example: Suppose you take Varun's approach and build a tree.  Then
 it turns out the operation is Count the URLs for .png files.  Well,
 the tree is no help here. You have to search the whole thing.

 On May 15, 11:50 am, atul anand atul.87fri...@gmail.com wrote:
  Given a file which contain millions of URL's. which data structure would
  you use for storing these URL's . data structure used should store and
  fetch data in efficient manner.

 --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-19 Thread Ramindar Singh
@Rahul

rope data structure wont be a good idea...

Performance Operation Rope String(Array)  index O(log n) O(1)  split O(log 
n) O(1)  concatenate O(log n) O(n)  insert O(log n) O(n)  delete O(log n) 
O(n)  report O(log n) O(1)  build O(n) O(n)
the question says 
data structure used should store and fetch data in efficient manner.

but we would never be retrieving the partial url (a part of the url)

On Friday, 18 May 2012 16:33:48 UTC+5:30, rahul r. srivastava wrote:

 rope data structure can be gud in such cases.. hashing may not be too 
 efficient as many url wud almost be same as mentioned by prakash. trie 
 is another option but i believe overhead in trie will be more... 
 correct me if i am wrong.

 On Fri, May 18, 2012 at 11:33 AM, Ashish Goel ashg...@gmail.com wrote:

 Tiger Tree Hash

 Best Regards
 Ashish Goel
 Think positive and find fuel in failure
 +919985813081
 +919966006652



 On Thu, May 17, 2012 at 11:16 PM, Prem Krishna Chettri 
 hprem...@gmail.com wrote:

 For the cases where Storing the Value is the only Concern and (Not the 
 Retrieval efficiency), I would Suggest Something called DFA Subset 
 minimization.. Google for it ... and after the final subset as said U can 
 use something called DAWG for the most Most Optimal solution..


 On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

 We can still improve this trie idea..

 say we have urls like
 www.google.com
 www.goodbye.com
 www.google.com/transliterate
 www.goodstrain.com/good

 we can subdivide everything under www.goo
 I mean we can store each character as a node in a trie and call it
 like a URL dictionary


 On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com 
 wrote:
 
 
  On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
  @amit :
 
  here is the reason :-
 
  each url sayhttp://www.geeksforgeeks.org
 
  you will hash following urlshttp://www.geeksforgeeks.orghttp://
 www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763
 
  http://www.geeksforgeeks.org; is the redundant part in each url 
 . it
  would unnecessary m/m to save all URLs.
 
  ok now say file have 20 million urls . .now what would you 
 do.??
 
 
  I think the trie suggestion was good. Have each domain (with the
  protocol part) as a node and then have the subsequent directory
  locations as a hierarchy under it.
 
  --
  You received this message because you are subscribed to the Google 
 Groups Algorithm Geeks group.
  To post to this group, send email to algogeeks@googlegroups.com.
  To unsubscribe from this group, send email to 
 algogeeks+unsubscr...@googlegroups.com.
  For more options, visit this group at 
 http://groups.google.com/group/algogeeks?hl=en.
 

 --
 You received this message because you are subscribed to the Google 
 Groups Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to 
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/algogeeks?hl=en.


  -- 
 You received this message because you are subscribed to the Google 
 Groups Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to 
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/algogeeks?hl=en.


  -- 
 You received this message because you are subscribed to the Google Groups 
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to 
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/algogeeks?hl=en.




-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/algogeeks/-/_MFEHEH3cSUJ.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



[algogeeks] Re: storing URL's

2012-05-19 Thread Gene
This question has no answer. Every good student of computer science
will know that you choose a data structure based on the _operations_
that must be performed on it: insert, lookup and what flavors of
lookup, delete, etc..  So if an interviewer uses this question, he or
she is probably trying to get you discuss this. So the right
_response_ (not an answer) is What will you be _doing_ with these
URLs?

An example: Suppose you take Varun's approach and build a tree.  Then
it turns out the operation is Count the URLs for .png files.  Well,
the tree is no help here. You have to search the whole thing.

On May 15, 11:50 am, atul anand atul.87fri...@gmail.com wrote:
 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



[algogeeks] Re: storing URL's

2012-05-19 Thread Varun
That's agreed Gene.
Answer depends on context.

On Saturday, 19 May 2012 22:46:06 UTC+5:30, Gene wrote:

 This question has no answer. Every good student of computer science 
 will know that you choose a data structure based on the _operations_ 
 that must be performed on it: insert, lookup and what flavors of 
 lookup, delete, etc..  So if an interviewer uses this question, he or 
 she is probably trying to get you discuss this. So the right 
 _response_ (not an answer) is What will you be _doing_ with these 
 URLs? 

 An example: Suppose you take Varun's approach and build a tree.  Then 
 it turns out the operation is Count the URLs for .png files.  Well, 
 the tree is no help here. You have to search the whole thing. 

 On May 15, 11:50 am, atul anand atul.87fri...@gmail.com wrote: 
  Given a file which contain millions of URL's. which data structure would 
  you use for storing these URL's . data structure used should store and 
  fetch data in efficient manner.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/algogeeks/-/Pmzj6PeBWJ4J.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-18 Thread Ashish Goel
Tiger Tree Hash

Best Regards
Ashish Goel
Think positive and find fuel in failure
+919985813081
+919966006652


On Thu, May 17, 2012 at 11:16 PM, Prem Krishna Chettri
hprem...@gmail.comwrote:

 For the cases where Storing the Value is the only Concern and (Not the
 Retrieval efficiency), I would Suggest Something called DFA Subset
 minimization.. Google for it ... and after the final subset as said U can
 use something called DAWG for the most Most Optimal solution..


 On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

 We can still improve this trie idea..

 say we have urls like
 www.google.com
 www.goodbye.com
 www.google.com/transliterate
 www.goodstrain.com/good

 we can subdivide everything under www.goo
 I mean we can store each character as a node in a trie and call it
 like a URL dictionary


 On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com
 wrote:
 
 
  On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
  @amit :
 
  here is the reason :-
 
  each url sayhttp://www.geeksforgeeks.org
 
  you will hash following urlshttp://www.geeksforgeeks.orghttp://
 www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763
 
  http://www.geeksforgeeks.org; is the redundant part in each url
 . it
  would unnecessary m/m to save all URLs.
 
  ok now say file have 20 million urls . .now what would you
 do.??
 
 
  I think the trie suggestion was good. Have each domain (with the
  protocol part) as a node and then have the subsequent directory
  locations as a hierarchy under it.
 
  --
  You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
  To post to this group, send email to algogeeks@googlegroups.com.
  To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
  For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.
 

 --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-18 Thread rahul ranjan
rope data structure can be gud in such cases.. hashing may not be too
efficient as many url wud almost be same as mentioned by prakash. trie
is another option but i believe overhead in trie will be more...
correct me if i am wrong.

On Fri, May 18, 2012 at 11:33 AM, Ashish Goel ashg...@gmail.com wrote:

 Tiger Tree Hash

 Best Regards
 Ashish Goel
 Think positive and find fuel in failure
 +919985813081
 +919966006652



 On Thu, May 17, 2012 at 11:16 PM, Prem Krishna Chettri hprem...@gmail.com
  wrote:

 For the cases where Storing the Value is the only Concern and (Not the
 Retrieval efficiency), I would Suggest Something called DFA Subset
 minimization.. Google for it ... and after the final subset as said U can
 use something called DAWG for the most Most Optimal solution..


 On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

 We can still improve this trie idea..

 say we have urls like
 www.google.com
 www.goodbye.com
 www.google.com/transliterate
 www.goodstrain.com/good

 we can subdivide everything under www.goo
 I mean we can store each character as a node in a trie and call it
 like a URL dictionary


 On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com
 wrote:
 
 
  On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
  @amit :
 
  here is the reason :-
 
  each url sayhttp://www.geeksforgeeks.org
 
  you will hash following urlshttp://www.geeksforgeeks.orghttp://
 www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763
 
  http://www.geeksforgeeks.org; is the redundant part in each url
 . it
  would unnecessary m/m to save all URLs.
 
  ok now say file have 20 million urls . .now what would you
 do.??
 
 
  I think the trie suggestion was good. Have each domain (with the
  protocol part) as a node and then have the subsequent directory
  locations as a hierarchy under it.
 
  --
  You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
  To post to this group, send email to algogeeks@googlegroups.com.
  To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
  For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.
 

 --
 You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-17 Thread Prakash D
We can still improve this trie idea..

say we have urls like
www.google.com
www.goodbye.com
www.google.com/transliterate
www.goodstrain.com/good

we can subdivide everything under www.goo
I mean we can store each character as a node in a trie and call it
like a URL dictionary


On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com wrote:


 On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
 @amit :

 here is the reason :-

 each url sayhttp://www.geeksforgeeks.org

 you will hash following 
 urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763

 http://www.geeksforgeeks.org; is the redundant part in each url . it
 would unnecessary m/m to save all URLs.

 ok now say file have 20 million urls . .now what would you do.??


 I think the trie suggestion was good. Have each domain (with the
 protocol part) as a node and then have the subsequent directory
 locations as a hierarchy under it.

 --
 You received this message because you are subscribed to the Google Groups 
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to 
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at 
 http://groups.google.com/group/algogeeks?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-17 Thread Prem Krishna Chettri
For the cases where Storing the Value is the only Concern and (Not the
Retrieval efficiency), I would Suggest Something called DFA Subset
minimization.. Google for it ... and after the final subset as said U can
use something called DAWG for the most Most Optimal solution..

On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

 We can still improve this trie idea..

 say we have urls like
 www.google.com
 www.goodbye.com
 www.google.com/transliterate
 www.goodstrain.com/good

 we can subdivide everything under www.goo
 I mean we can store each character as a node in a trie and call it
 like a URL dictionary


 On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com wrote:
 
 
  On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
  @amit :
 
  here is the reason :-
 
  each url sayhttp://www.geeksforgeeks.org
 
  you will hash following urlshttp://www.geeksforgeeks.orghttp://
 www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763
 
  http://www.geeksforgeeks.org; is the redundant part in each url .
 it
  would unnecessary m/m to save all URLs.
 
  ok now say file have 20 million urls . .now what would you do.??
 
 
  I think the trie suggestion was good. Have each domain (with the
  protocol part) as a node and then have the subsequent directory
  locations as a hierarchy under it.
 
  --
  You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
  To post to this group, send email to algogeeks@googlegroups.com.
  To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
  For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.
 

 --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



[algogeeks] Re: storing URL's

2012-05-16 Thread omega9


On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
 @amit :

 here is the reason :-

 each url sayhttp://www.geeksforgeeks.org

 you will hash following 
 urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763

 http://www.geeksforgeeks.org; is the redundant part in each url . it
 would unnecessary m/m to save all URLs.

 ok now say file have 20 million urls . .now what would you do.??


I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



[algogeeks] Re: storing URL's

2012-05-15 Thread Varun
should be a tree based on domain in url and directory mentioned in url.

On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would 
 you use for storing these URL's . data structure used should store and 
 fetch data in efficient manner. 


On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would 
 you use for storing these URL's . data structure used should store and 
 fetch data in efficient manner. 

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-15 Thread atul anand
i was thinking about using TRIE or patricia tree. hashing is another but it
wont work if URLs are in millions
is there any better data structure ?

On Tue, May 15, 2012 at 11:37 PM, Varun tewari.va...@gmail.com wrote:

 should be a tree based on domain in url and directory mentioned in url.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-15 Thread Amit Mittal
Why hashing won;t work for millions of URL.
If you hash each URL in to a distinct 32 bit integer, you can map 2^32 URL
which is around 4 billion. it should work.

On Wed, May 16, 2012 at 10:42 AM, atul anand atul.87fri...@gmail.comwrote:

 i was thinking about using TRIE or patricia tree. hashing is another but
 it wont work if URLs are in millions
 is there any better data structure ?


 On Tue, May 15, 2012 at 11:37 PM, Varun tewari.va...@gmail.com wrote:

 should be a tree based on domain in url and directory mentioned in url.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.




-- 
Regards
Amit Mittal

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.



Re: [algogeeks] Re: storing URL's

2012-05-15 Thread atul anand
@amit :

here is the reason :-

each url say http://www.geeksforgeeks.org

you will hash following urls
http://www.geeksforgeeks.org
http://www.geeksforgeeks.org/archives
http://www.geeksforgeeks.org/archives/19248
http://www.geeksforgeeks.org/archives/
http://www.geeksforgeeks.org/archives/19221
http://www.geeksforgeeks.org/archives/19290
http://www.geeksforgeeks.org/archives/1876
http://www.geeksforgeeks.org/archives/1763

http://www.geeksforgeeks.org; is the redundant part in each url . it
would unnecessary m/m to save all URLs.

ok now say file have 20 million urls . .now what would you do.??



On Wed, May 16, 2012 at 10:50 AM, Amit Mittal amit.mitt...@gmail.comwrote:

 Why hashing won;t work for millions of URL.
 If you hash each URL in to a distinct 32 bit integer, you can map 2^32 URL
 which is around 4 billion. it should work.


 On Wed, May 16, 2012 at 10:42 AM, atul anand atul.87fri...@gmail.comwrote:

 i was thinking about using TRIE or patricia tree. hashing is another but
 it wont work if URLs are in millions
 is there any better data structure ?


 On Tue, May 15, 2012 at 11:37 PM, Varun tewari.va...@gmail.com wrote:

 should be a tree based on domain in url and directory mentioned in url.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure
 would you use for storing these URL's . data structure used should store
 and fetch data in efficient manner.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure
 would you use for storing these URL's . data structure used should store
 and fetch data in efficient manner.

  --
 You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.




 --
 Regards
 Amit Mittal

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.