Still I have one confusion.

If I set TOPN to 300. and suppose after one round(depth 1) crawldb contains
1000 unfetched links which points to depth 2 pages.
Thus for second round generator will select 300 links out of 1000. Now if
updatedb inserts 500 more urls, which point to depth 3 pag.
Now for third round generator will select 300 urls from 700 depth 2 urls +
500 depth 3 urls.  Am I right?
Then how it is ensured that all the 300 urls selected for third round are
from 500 depth 3 urls?

On 6/13/07, Tim Gautier <[EMAIL PROTECTED]> wrote:

The tutorial is correct, it just uses a different definition of depth
than what you are. :)

The depth is essentially the number of links that must be followed
before reaching a certain page.  For instance:

If you start with http://www.blabla.com/home.html, that page has a
depth of 1.  If that page then contains a link to
http://www.blabla.com/a/b/c/d/e/a.html, that means
http://www.blabla.com/a/b/c/d/e/a.html has a depth of 2.

Remember, you're talking about a web here.  Each page is a node in the
web.  The first node is a depth of 1.  Following its links leads you
to nodes at a depth of 2.  Following the links of those nodes takes
you to nodes of a depth of 3.

On 6/12/07, Manoharam Reddy <[EMAIL PROTECTED]> wrote:
> the tutorial says that depth value is the level of depth of a page
> from the root of a website. so as per the tutorial, if i want to fetch
> a page say, http://www.blabla.com/a/b/c/d/e/a.html, I must set the
> value of depth >= 6.
>
> but I find in the source code that depth is simply a for loop. It will
> run fetch loop as many number of times as mentioned in the depth
> value. so it has no connection with the depth of a page from the root.
>
> please confirm whether my understanding is right. and if so shouldn't
> the tutorial be corrected in order to prevent noobs like me from being
> misled?
>

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to