subject:"\\\\\\\[algogeeks\\\\\\\] Re\\\\\\\: storing URL's"

Re: [algogeeks] Re: storing URL's

2012-05-20 Thread atul anand

question is more of like asking which data structure is suitable for
implementing  DNS server like functionality ?

On Sat, May 19, 2012 at 10:46 PM, Gene gene.ress...@gmail.com wrote:

 This question has no answer. Every good student of computer science
 will know that you choose a data structure based on the _operations_
 that must be performed on it: insert, lookup and what flavors of
 lookup, delete, etc..  So if an interviewer uses this question, he or
 she is probably trying to get you discuss this. So the right
 _response_ (not an answer) is What will you be _doing_ with these
 URLs?

 An example: Suppose you take Varun's approach and build a tree.  Then
 it turns out the operation is Count the URLs for .png files.  Well,
 the tree is no help here. You have to search the whole thing.

 On May 15, 11:50 am, atul anand atul.87fri...@gmail.com wrote:
  Given a file which contain millions of URL's. which data structure would
  you use for storing these URL's . data structure used should store and
  fetch data in efficient manner.

 --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-19 Thread Ramindar Singh

@Rahul

rope data structure wont be a good idea...

Performance Operation Rope String(Array) index O(log n) O(1) split O(log
n) O(1) concatenate O(log n) O(n) insert O(log n) O(n) delete O(log n)
O(n) report O(log n) O(1) build O(n) O(n)
the question says
data structure used should store and fetch data in efficient manner.

but we would never be retrieving the partial url (a part of the url)

On Friday, 18 May 2012 16:33:48 UTC+5:30, rahul r. srivastava wrote:

rope data structure can be gud in such cases.. hashing may not be too
efficient as many url wud almost be same as mentioned by prakash. trie
is another option but i believe overhead in trie will be more...
correct me if i am wrong.

On Fri, May 18, 2012 at 11:33 AM, Ashish Goel ashg...@gmail.com wrote:

Tiger Tree Hash

Best Regards
Ashish Goel
Think positive and find fuel in failure
+919985813081
+919966006652

On Thu, May 17, 2012 at 11:16 PM, Prem Krishna Chettri
hprem...@gmail.com wrote:

For the cases where Storing the Value is the only Concern and (Not the
Retrieval efficiency), I would Suggest Something called DFA Subset
minimization.. Google for it ... and after the final subset as said U can
use something called DAWG for the most Most Optimal solution..

On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

We can still improve this trie idea..

say we have urls like
www.google.com
www.goodbye.com
www.google.com/transliterate
www.goodstrain.com/good

we can subdivide everything under www.goo
I mean we can store each character as a node in a trie and call it
like a URL dictionary

On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com
wrote:

On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
@amit :

here is the reason :-

each url sayhttp://www.geeksforgeeks.org

you will hash following urlshttp://www.geeksforgeeks.orghttp://
www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763

http://www.geeksforgeeks.org; is the redundant part in each url
. it
would unnecessary m/m to save all URLs.

ok now say file have 20 million urls . .now what would you
do.??

I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

--
You received this message because you are subscribed to the Google
Groups Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.

--
You received this message because you are subscribed to the Google Groups
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.

--
You received this message because you are subscribed to the Google Groups
Algorithm Geeks group.
To view this discussion on the web visit
https://groups.google.com/d/msg/algogeeks/-/_MFEHEH3cSUJ.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: storing URL's

2012-05-19 Thread Gene

This question has no answer. Every good student of computer science
will know that you choose a data structure based on the _operations_
that must be performed on it: insert, lookup and what flavors of
lookup, delete, etc..  So if an interviewer uses this question, he or
she is probably trying to get you discuss this. So the right
_response_ (not an answer) is What will you be _doing_ with these
URLs?

An example: Suppose you take Varun's approach and build a tree.  Then
it turns out the operation is Count the URLs for .png files.  Well,
the tree is no help here. You have to search the whole thing.

On May 15, 11:50 am, atul anand atul.87fri...@gmail.com wrote:
 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: storing URL's

2012-05-19 Thread Varun

That's agreed Gene.
Answer depends on context.

On Saturday, 19 May 2012 22:46:06 UTC+5:30, Gene wrote:

 This question has no answer. Every good student of computer science 
 will know that you choose a data structure based on the _operations_ 
 that must be performed on it: insert, lookup and what flavors of 
 lookup, delete, etc..  So if an interviewer uses this question, he or 
 she is probably trying to get you discuss this. So the right 
 _response_ (not an answer) is What will you be _doing_ with these 
 URLs? 

 An example: Suppose you take Varun's approach and build a tree.  Then 
 it turns out the operation is Count the URLs for .png files.  Well, 
 the tree is no help here. You have to search the whole thing. 

 On May 15, 11:50 am, atul anand atul.87fri...@gmail.com wrote: 
  Given a file which contain millions of URL's. which data structure would 
  you use for storing these URL's . data structure used should store and 
  fetch data in efficient manner.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/algogeeks/-/Pmzj6PeBWJ4J.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-18 Thread Ashish Goel

Tiger Tree Hash

Best Regards
Ashish Goel
Think positive and find fuel in failure
+919985813081
+919966006652

On Thu, May 17, 2012 at 11:16 PM, Prem Krishna Chettri
hprem...@gmail.comwrote:

On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

We can still improve this trie idea..

say we have urls like
www.google.com
www.goodbye.com
www.google.com/transliterate
www.goodstrain.com/good

we can subdivide everything under www.goo
I mean we can store each character as a node in a trie and call it
like a URL dictionary

On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com
wrote:

On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
@amit :

here is the reason :-

each url sayhttp://www.geeksforgeeks.org

http://www.geeksforgeeks.org; is the redundant part in each url
. it
would unnecessary m/m to save all URLs.

ok now say file have 20 million urls . .now what would you
do.??

I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

--
You received this message because you are subscribed to the Google Groups
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-18 Thread rahul ranjan

On Fri, May 18, 2012 at 11:33 AM, Ashish Goel ashg...@gmail.com wrote:

Tiger Tree Hash

Best Regards
Ashish Goel
Think positive and find fuel in failure
+919985813081
+919966006652

On Thu, May 17, 2012 at 11:16 PM, Prem Krishna Chettri hprem...@gmail.com
wrote:

On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

We can still improve this trie idea..

say we have urls like
www.google.com
www.goodbye.com
www.google.com/transliterate
www.goodstrain.com/good

we can subdivide everything under www.goo
I mean we can store each character as a node in a trie and call it
like a URL dictionary

On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com
wrote:

On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
@amit :

here is the reason :-

each url sayhttp://www.geeksforgeeks.org

http://www.geeksforgeeks.org; is the redundant part in each url
. it
would unnecessary m/m to save all URLs.

ok now say file have 20 million urls . .now what would you
do.??

I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

--
You received this message because you are subscribed to the Google Groups
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-17 Thread Prakash D

We can still improve this trie idea..

say we have urls like
www.google.com
www.goodbye.com
www.google.com/transliterate
www.goodstrain.com/good

we can subdivide everything under www.goo
I mean we can store each character as a node in a trie and call it
like a URL dictionary

On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com wrote:

On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
@amit :

here is the reason :-

each url sayhttp://www.geeksforgeeks.org

you will hash following
urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763

http://www.geeksforgeeks.org; is the redundant part in each url . it
would unnecessary m/m to save all URLs.

ok now say file have 20 million urls . .now what would you do.??

I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

--
You received this message because you are subscribed to the Google Groups
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-17 Thread Prem Krishna Chettri

On Thu, May 17, 2012 at 2:58 PM, Prakash D cegprak...@gmail.com wrote:

We can still improve this trie idea..

say we have urls like
www.google.com
www.goodbye.com
www.google.com/transliterate
www.goodstrain.com/good

we can subdivide everything under www.goo
I mean we can store each character as a node in a trie and call it
like a URL dictionary

On Wed, May 16, 2012 at 5:43 PM, omega9 tvssarma.ome...@gmail.com wrote:

On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
@amit :

here is the reason :-

each url sayhttp://www.geeksforgeeks.org

http://www.geeksforgeeks.org; is the redundant part in each url .
it
would unnecessary m/m to save all URLs.

ok now say file have 20 million urls . .now what would you do.??

I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

--
You received this message because you are subscribed to the Google Groups
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: storing URL's

2012-05-16 Thread omega9



On May 16, 10:33 am, atul anand atul.87fri...@gmail.com wrote:
 @amit :

 here is the reason :-

 each url sayhttp://www.geeksforgeeks.org

 you will hash following 
 urlshttp://www.geeksforgeeks.orghttp://www.geeksforgeeks.org/archiveshttp://www.geeksforgeeks.org/archives/19248http://www.geeksforgeeks.org/archives/http://www.geeksforgeeks.org/archives/19221http://www.geeksforgeeks.org/archives/19290http://www.geeksforgeeks.org/archives/1876http://www.geeksforgeeks.org/archives/1763

 http://www.geeksforgeeks.org; is the redundant part in each url . it
 would unnecessary m/m to save all URLs.

 ok now say file have 20 million urls . .now what would you do.??


I think the trie suggestion was good. Have each domain (with the
protocol part) as a node and then have the subsequent directory
locations as a hierarchy under it.

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

[algogeeks] Re: storing URL's

2012-05-15 Thread Varun

should be a tree based on domain in url and directory mentioned in url.

On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would 
 you use for storing these URL's . data structure used should store and 
 fetch data in efficient manner. 


On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would 
 you use for storing these URL's . data structure used should store and 
 fetch data in efficient manner. 

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-15 Thread atul anand

i was thinking about using TRIE or patricia tree. hashing is another but it
wont work if URLs are in millions
is there any better data structure ?

On Tue, May 15, 2012 at 11:37 PM, Varun tewari.va...@gmail.com wrote:

 should be a tree based on domain in url and directory mentioned in url.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-15 Thread Amit Mittal

Why hashing won;t work for millions of URL.
If you hash each URL in to a distinct 32 bit integer, you can map 2^32 URL
which is around 4 billion. it should work.

On Wed, May 16, 2012 at 10:42 AM, atul anand atul.87fri...@gmail.comwrote:

 i was thinking about using TRIE or patricia tree. hashing is another but
 it wont work if URLs are in millions
 is there any better data structure ?


 On Tue, May 15, 2012 at 11:37 PM, Varun tewari.va...@gmail.com wrote:

 should be a tree based on domain in url and directory mentioned in url.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure would
 you use for storing these URL's . data structure used should store and
 fetch data in efficient manner.

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.




-- 
Regards
Amit Mittal

-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

2012-05-15 Thread atul anand

@amit :

here is the reason :-

each url say http://www.geeksforgeeks.org

you will hash following urls
http://www.geeksforgeeks.org
http://www.geeksforgeeks.org/archives
http://www.geeksforgeeks.org/archives/19248
http://www.geeksforgeeks.org/archives/
http://www.geeksforgeeks.org/archives/19221
http://www.geeksforgeeks.org/archives/19290
http://www.geeksforgeeks.org/archives/1876
http://www.geeksforgeeks.org/archives/1763

http://www.geeksforgeeks.org; is the redundant part in each url . it
would unnecessary m/m to save all URLs.

ok now say file have 20 million urls . .now what would you do.??



On Wed, May 16, 2012 at 10:50 AM, Amit Mittal amit.mitt...@gmail.comwrote:

 Why hashing won;t work for millions of URL.
 If you hash each URL in to a distinct 32 bit integer, you can map 2^32 URL
 which is around 4 billion. it should work.


 On Wed, May 16, 2012 at 10:42 AM, atul anand atul.87fri...@gmail.comwrote:

 i was thinking about using TRIE or patricia tree. hashing is another but
 it wont work if URLs are in millions
 is there any better data structure ?


 On Tue, May 15, 2012 at 11:37 PM, Varun tewari.va...@gmail.com wrote:

 should be a tree based on domain in url and directory mentioned in url.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure
 would you use for storing these URL's . data structure used should store
 and fetch data in efficient manner.


 On Tuesday, 15 May 2012 21:20:55 UTC+5:30, atul007 wrote:

 Given a file which contain millions of URL's. which data structure
 would you use for storing these URL's . data structure used should store
 and fetch data in efficient manner.

  --
 You received this message because you are subscribed to the Google
 Groups Algorithm Geeks group.
 To view this discussion on the web visit
 https://groups.google.com/d/msg/algogeeks/-/idbhSUZ6TNIJ.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.




 --
 Regards
 Amit Mittal

  --
 You received this message because you are subscribed to the Google Groups
 Algorithm Geeks group.
 To post to this group, send email to algogeeks@googlegroups.com.
 To unsubscribe from this group, send email to
 algogeeks+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/algogeeks?hl=en.


-- 
You received this message because you are subscribed to the Google Groups 
Algorithm Geeks group.
To post to this group, send email to algogeeks@googlegroups.com.
To unsubscribe from this group, send email to 
algogeeks+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/algogeeks?hl=en.

Re: [algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

[algogeeks] Re: storing URL's

[algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

[algogeeks] Re: storing URL's

[algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

Re: [algogeeks] Re: storing URL's

13 matches

Site Navigation

Mail list logo

Footer information