-----Original Message----- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Tuesday, April 18, 2006 8:02 PM To: [email protected] Subject: Re: question about crawldb Importance: High
Anton Potehin wrote: > 1. We have found these flags in CrawlDatum class: > > public static final byte STATUS_SIGNATURE = 0; > public static final byte STATUS_DB_UNFETCHED = 1; > public static final byte STATUS_DB_FETCHED = 2; > public static final byte STATUS_DB_GONE = 3; > public static final byte STATUS_LINKED = 4; > public static final byte STATUS_FETCH_SUCCESS = 5; > public static final byte STATUS_FETCH_RETRY = 6; > public static final byte STATUS_FETCH_GONE = 7; > > Though the names of these flags describe their aims, it is not clear > completely what they mean and what is the difference between > STATUS_DB_FETCHED and STATUS_FETCH_SUCCESS for example. The STATUS_DB_* codes are used in entries in the crawldb. STATUS_FETCH_* codes are used in fetcher output. STATUS_LINKED is used in parser output for urls that are linked to. A crawldb update combines all of these (the old version of the db, plus fetcher and parser output) to generate a new version of the db, containing only STATUS_DB_* entries. This logic is in CrawlDbReducer. Does that help? Yes ;-) tnx... ------------------------------------------------------- This SF.Net email is sponsored by xPML, a groundbreaking scripting language that extends applications into web and mobile media. Attend the live webcast and join the prime developer group breaking into this new coding territory! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
