Re: very slow generator step
If you were observing low performance with the urlfilter-regex, directly switching to url-filter automation may or may not help. As Julien pointed out, bad performance might be accounted to some nasty urls which consume lot of time. To check this you can run urlfilter-regex plugin as a standalone plugin ( http://wiki.apache.org/nutch/bin/nutch%20plugin) and pass all the urls to it. With a minor tweak you can dump the time taken for each url. If you are sure that the low perf is not due to nasty urls, switching to url-filter automation will be best thing to do. You must carefully design the rules in automaton-urlfilter.txt as it has limited capability. Even crawlspace expansion could be a reason ie. nutch found a huge number of urls all of a sudden. This had happened with me when nutch crawled sitemap pages which had enormous outlinks. This can be checked by observing the fetched count for the earlier rounds and the recent round. On top of everything, I agree with what Markus suggested. ie. using -noFilter option for generate. It gives good perf. Update phase is already preventing unwanted urls being added. So no need to do filtering again in generate (unless you want to do custom crawling of some specific hosts or urls and quickly get it data). thanks, Tejas On Mon, Nov 12, 2012 at 1:21 PM, Markus Jelsma wrote: > You may need to change your expressions but it is performant. Not all > features of traditional regex are supported. > http://wiki.apache.org/nutch/RegexURLFiltersBenchs > > > > -Original message- > > From:Mohammad wrk > > Sent: Mon 12-Nov-2012 22:17 > > To: user@nutch.apache.org > > Subject: Re: very slow generator step > > > > > > > > That's a good thinking. I have never used url-filter automation. Where > can I find more info? > > > > Thanks, > > Mohammad > > > > > > From: Julien Nioche > > To: user@nutch.apache.org; Mohammad wrk > > Sent: Monday, November 12, 2012 12:38:44 PM > > Subject: Re: very slow generator step > > > > Could be that a particularly long and tricky URL got into your crawldb > and > > put the regex into a spin. I'd use the url-filter automaton instead as it > > is much faster. Would be interesting to know what caused the regex to > take > > so much time, in case you fancy a bit of debugging ;-) > > > > Julien > > > > On 12 November 2012 20:29, Mohammad wrk wrote: > > > > > Thanks for the tip. It went down to 2 minutes :-) > > > > > > What I don't understand is that how come everything was working fine > with > > > the default configuration for about 4 days and all of a sudden one > crawl > > > causes a jump of 100 minutes? > > > > > > Cheers, > > > Mohammad > > > > > > > > > > > > From: Markus Jelsma > > > To: "user@nutch.apache.org" > > > Sent: Monday, November 12, 2012 11:19:11 AM > > > Subject: RE: very slow generator step > > > > > > Hi - Please use the -noFilter option. It is usually useless to filter > in > > > the generator because they've already been filtered in the parse step > and > > > or update step. > > > > > > > > > > > > -Original message- > > > > From:Mohammad wrk > > > > Sent: Mon 12-Nov-2012 18:43 > > > > To: user@nutch.apache.org > > > > Subject: very slow generator step > > > > > > > > Hi, > > > > > > > > The generator time has gone from 8 minutes to 106 minutes few days > ago > > > and stayed there since then. AFAIK, I haven't made any configuration > > > changes recently (attached you can find some of the configurations > that I > > > thought might be related). > > > > > > > > A quick CPU sampling shows that most of the time is spent on > > > java.util.regex.Matcher.find(). Since I'm using default regex > > > configurations and my crawldb has only 3,052,412 urls, I was wondering > if > > > this is a known issue with nutch-1.5.1 ? > > > > > > > > Here are some more information that might help: > > > > > > > > = Generator logs > > > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: starting > at > > > 2012-11-09 03:14:50 > > > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: Selecting > > > best-scoring urls due for fetch. > > > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: f
RE: very slow generator step
You may need to change your expressions but it is performant. Not all features of traditional regex are supported. http://wiki.apache.org/nutch/RegexURLFiltersBenchs -Original message- > From:Mohammad wrk > Sent: Mon 12-Nov-2012 22:17 > To: user@nutch.apache.org > Subject: Re: very slow generator step > > > > That's a good thinking. I have never used url-filter automation. Where can I > find more info? > > Thanks, > Mohammad > > > From: Julien Nioche > To: user@nutch.apache.org; Mohammad wrk > Sent: Monday, November 12, 2012 12:38:44 PM > Subject: Re: very slow generator step > > Could be that a particularly long and tricky URL got into your crawldb and > put the regex into a spin. I'd use the url-filter automaton instead as it > is much faster. Would be interesting to know what caused the regex to take > so much time, in case you fancy a bit of debugging ;-) > > Julien > > On 12 November 2012 20:29, Mohammad wrk wrote: > > > Thanks for the tip. It went down to 2 minutes :-) > > > > What I don't understand is that how come everything was working fine with > > the default configuration for about 4 days and all of a sudden one crawl > > causes a jump of 100 minutes? > > > > Cheers, > > Mohammad > > > > > > ____ > > From: Markus Jelsma > > To: "user@nutch.apache.org" > > Sent: Monday, November 12, 2012 11:19:11 AM > > Subject: RE: very slow generator step > > > > Hi - Please use the -noFilter option. It is usually useless to filter in > > the generator because they've already been filtered in the parse step and > > or update step. > > > > > > > > -Original message- > > > From:Mohammad wrk > > > Sent: Mon 12-Nov-2012 18:43 > > > To: user@nutch.apache.org > > > Subject: very slow generator step > > > > > > Hi, > > > > > > The generator time has gone from 8 minutes to 106 minutes few days ago > > and stayed there since then. AFAIK, I haven't made any configuration > > changes recently (attached you can find some of the configurations that I > > thought might be related). > > > > > > A quick CPU sampling shows that most of the time is spent on > > java.util.regex.Matcher.find(). Since I'm using default regex > > configurations and my crawldb has only 3,052,412 urls, I was wondering if > > this is a known issue with nutch-1.5.1 ? > > > > > > Here are some more information that might help: > > > > > > = Generator logs > > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: starting at > > 2012-11-09 03:14:50 > > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: Selecting > > best-scoring urls due for fetch. > > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: filtering: > > true > > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: normalizing: > > true > > > 2012-11-09 03:14:50,921 INFO crawl.Generator - Generator: topN: 3000 > > > 2012-11-09 03:14:50,923 INFO crawl.Generator - Generator: jobtracker is > > 'local', generating exactly one partition. > > > 2012-11-09 03:23:39,741 INFO crawl.Generator - Generator: Partitioning > > selected urls for politeness. > > > 2012-11-09 03:23:40,743 INFO crawl.Generator - Generator: segment: > > segments/20121109032340 > > > 2012-11-09 03:23:47,860 INFO crawl.Generator - Generator: finished at > > 2012-11-09 03:23:47, elapsed: 00:08:56 > > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: starting at > > 2012-11-09 05:35:14 > > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: Selecting > > best-scoring urls due for fetch. > > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: filtering: > > true > > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: normalizing: > > true > > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: topN: 3000 > > > 2012-11-09 05:35:14,037 INFO crawl.Generator - Generator: jobtracker is > > 'local', generating exactly one partition. > > > 2012-11-09 07:21:42,840 INFO crawl.Generator - Generator: Partitioning > > selected urls for politeness. > > > 2012-11-09 07:21:43,841 INFO crawl.Generator - Generator: segment: > > segments/20121109072143 > > > 2012-11-09 07:21:51,004 INFO crawl.Generator - Generator: finished at > > 2012-11
Re: very slow generator step
That's a good thinking. I have never used url-filter automation. Where can I find more info? Thanks, Mohammad From: Julien Nioche To: user@nutch.apache.org; Mohammad wrk Sent: Monday, November 12, 2012 12:38:44 PM Subject: Re: very slow generator step Could be that a particularly long and tricky URL got into your crawldb and put the regex into a spin. I'd use the url-filter automaton instead as it is much faster. Would be interesting to know what caused the regex to take so much time, in case you fancy a bit of debugging ;-) Julien On 12 November 2012 20:29, Mohammad wrk wrote: > Thanks for the tip. It went down to 2 minutes :-) > > What I don't understand is that how come everything was working fine with > the default configuration for about 4 days and all of a sudden one crawl > causes a jump of 100 minutes? > > Cheers, > Mohammad > > > > From: Markus Jelsma > To: "user@nutch.apache.org" > Sent: Monday, November 12, 2012 11:19:11 AM > Subject: RE: very slow generator step > > Hi - Please use the -noFilter option. It is usually useless to filter in > the generator because they've already been filtered in the parse step and > or update step. > > > > -Original message- > > From:Mohammad wrk > > Sent: Mon 12-Nov-2012 18:43 > > To: user@nutch.apache.org > > Subject: very slow generator step > > > > Hi, > > > > The generator time has gone from 8 minutes to 106 minutes few days ago > and stayed there since then. AFAIK, I haven't made any configuration > changes recently (attached you can find some of the configurations that I > thought might be related). > > > > A quick CPU sampling shows that most of the time is spent on > java.util.regex.Matcher.find(). Since I'm using default regex > configurations and my crawldb has only 3,052,412 urls, I was wondering if > this is a known issue with nutch-1.5.1 ? > > > > Here are some more information that might help: > > > > = Generator logs > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: starting at > 2012-11-09 03:14:50 > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: filtering: > true > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: normalizing: > true > > 2012-11-09 03:14:50,921 INFO crawl.Generator - Generator: topN: 3000 > > 2012-11-09 03:14:50,923 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > > 2012-11-09 03:23:39,741 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > > 2012-11-09 03:23:40,743 INFO crawl.Generator - Generator: segment: > segments/20121109032340 > > 2012-11-09 03:23:47,860 INFO crawl.Generator - Generator: finished at > 2012-11-09 03:23:47, elapsed: 00:08:56 > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: starting at > 2012-11-09 05:35:14 > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: filtering: > true > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: normalizing: > true > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: topN: 3000 > > 2012-11-09 05:35:14,037 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > > 2012-11-09 07:21:42,840 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > > 2012-11-09 07:21:43,841 INFO crawl.Generator - Generator: segment: > segments/20121109072143 > > 2012-11-09 07:21:51,004 INFO crawl.Generator - Generator: finished at > 2012-11-09 07:21:51, elapsed: 01:46:36 > > > > = CrawlDb statistics > > CrawlDb statistics start: ./crawldb > > Statistics for CrawlDb: ./crawldb > > TOTAL urls:3052412 > > retry 0:3047404 > > retry 1:338 > > retry 2:1192 > > retry 3:822 > > retry 4:336 > > retry 5:2320 > > min score:0.0 > > avg score:0.015368268 > > max score:48.608 > > status 1 (db_unfetched):2813249 > > status 2 (db_fetched):196717 > > status 3 (db_gone):14204 > > status 4 (db_redir_temp):10679 > > status 5 (db_redir_perm):17563 > > CrawlDb statistics: done > > > > = System info > > Memory: 4 GB > > CPUs: Intel® Core™ i3-2310M CPU @ 2.10GHz × 4 > > Available diskspace: 171.7 GB > > OS: Release 12.10 (quantal) 64-bit > > > > > > Thanks, > > Mohammad > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
Re: very slow generator step
Could be that a particularly long and tricky URL got into your crawldb and put the regex into a spin. I'd use the url-filter automaton instead as it is much faster. Would be interesting to know what caused the regex to take so much time, in case you fancy a bit of debugging ;-) Julien On 12 November 2012 20:29, Mohammad wrk wrote: > Thanks for the tip. It went down to 2 minutes :-) > > What I don't understand is that how come everything was working fine with > the default configuration for about 4 days and all of a sudden one crawl > causes a jump of 100 minutes? > > Cheers, > Mohammad > > > > From: Markus Jelsma > To: "user@nutch.apache.org" > Sent: Monday, November 12, 2012 11:19:11 AM > Subject: RE: very slow generator step > > Hi - Please use the -noFilter option. It is usually useless to filter in > the generator because they've already been filtered in the parse step and > or update step. > > > > -Original message- > > From:Mohammad wrk > > Sent: Mon 12-Nov-2012 18:43 > > To: user@nutch.apache.org > > Subject: very slow generator step > > > > Hi, > > > > The generator time has gone from 8 minutes to 106 minutes few days ago > and stayed there since then. AFAIK, I haven't made any configuration > changes recently (attached you can find some of the configurations that I > thought might be related). > > > > A quick CPU sampling shows that most of the time is spent on > java.util.regex.Matcher.find(). Since I'm using default regex > configurations and my crawldb has only 3,052,412 urls, I was wondering if > this is a known issue with nutch-1.5.1 ? > > > > Here are some more information that might help: > > > > = Generator logs > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: starting at > 2012-11-09 03:14:50 > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: filtering: > true > > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: normalizing: > true > > 2012-11-09 03:14:50,921 INFO crawl.Generator - Generator: topN: 3000 > > 2012-11-09 03:14:50,923 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > > 2012-11-09 03:23:39,741 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > > 2012-11-09 03:23:40,743 INFO crawl.Generator - Generator: segment: > segments/20121109032340 > > 2012-11-09 03:23:47,860 INFO crawl.Generator - Generator: finished at > 2012-11-09 03:23:47, elapsed: 00:08:56 > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: starting at > 2012-11-09 05:35:14 > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: filtering: > true > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: normalizing: > true > > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: topN: 3000 > > 2012-11-09 05:35:14,037 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > > 2012-11-09 07:21:42,840 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > > 2012-11-09 07:21:43,841 INFO crawl.Generator - Generator: segment: > segments/20121109072143 > > 2012-11-09 07:21:51,004 INFO crawl.Generator - Generator: finished at > 2012-11-09 07:21:51, elapsed: 01:46:36 > > > > = CrawlDb statistics > > CrawlDb statistics start: ./crawldb > > Statistics for CrawlDb: ./crawldb > > TOTAL urls:3052412 > > retry 0:3047404 > > retry 1:338 > > retry 2:1192 > > retry 3:822 > > retry 4:336 > > retry 5:2320 > > min score:0.0 > > avg score:0.015368268 > > max score:48.608 > > status 1 (db_unfetched):2813249 > > status 2 (db_fetched):196717 > > status 3 (db_gone):14204 > > status 4 (db_redir_temp):10679 > > status 5 (db_redir_perm):17563 > > CrawlDb statistics: done > > > > = System info > > Memory: 4 GB > > CPUs: Intel® Core™ i3-2310M CPU @ 2.10GHz × 4 > > Available diskspace: 171.7 GB > > OS: Release 12.10 (quantal) 64-bit > > > > > > Thanks, > > Mohammad > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble
Re: very slow generator step
Thanks for the tip. It went down to 2 minutes :-) What I don't understand is that how come everything was working fine with the default configuration for about 4 days and all of a sudden one crawl causes a jump of 100 minutes? Cheers, Mohammad From: Markus Jelsma To: "user@nutch.apache.org" Sent: Monday, November 12, 2012 11:19:11 AM Subject: RE: very slow generator step Hi - Please use the -noFilter option. It is usually useless to filter in the generator because they've already been filtered in the parse step and or update step. -Original message- > From:Mohammad wrk > Sent: Mon 12-Nov-2012 18:43 > To: user@nutch.apache.org > Subject: very slow generator step > > Hi, > > The generator time has gone from 8 minutes to 106 minutes few days ago and > stayed there since then. AFAIK, I haven't made any configuration changes > recently (attached you can find some of the configurations that I thought > might be related). > > A quick CPU sampling shows that most of the time is spent on > java.util.regex.Matcher.find(). Since I'm using default regex configurations > and my crawldb has only 3,052,412 urls, I was wondering if this is a known > issue with nutch-1.5.1 ? > > Here are some more information that might help: > > = Generator logs > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: starting at > 2012-11-09 03:14:50 > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: filtering: true > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: normalizing: true > 2012-11-09 03:14:50,921 INFO crawl.Generator - Generator: topN: 3000 > 2012-11-09 03:14:50,923 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > 2012-11-09 03:23:39,741 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > 2012-11-09 03:23:40,743 INFO crawl.Generator - Generator: segment: > segments/20121109032340 > 2012-11-09 03:23:47,860 INFO crawl.Generator - Generator: finished at > 2012-11-09 03:23:47, elapsed: 00:08:56 > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: starting at > 2012-11-09 05:35:14 > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: filtering: true > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: normalizing: true > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: topN: 3000 > 2012-11-09 05:35:14,037 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > 2012-11-09 07:21:42,840 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > 2012-11-09 07:21:43,841 INFO crawl.Generator - Generator: segment: > segments/20121109072143 > 2012-11-09 07:21:51,004 INFO crawl.Generator - Generator: finished at > 2012-11-09 07:21:51, elapsed: 01:46:36 > > = CrawlDb statistics > CrawlDb statistics start: ./crawldb > Statistics for CrawlDb: ./crawldb > TOTAL urls:3052412 > retry 0:3047404 > retry 1:338 > retry 2:1192 > retry 3:822 > retry 4:336 > retry 5:2320 > min score:0.0 > avg score:0.015368268 > max score:48.608 > status 1 (db_unfetched):2813249 > status 2 (db_fetched):196717 > status 3 (db_gone):14204 > status 4 (db_redir_temp):10679 > status 5 (db_redir_perm):17563 > CrawlDb statistics: done > > = System info > Memory: 4 GB > CPUs: Intel® Core™ i3-2310M CPU @ 2.10GHz × 4 > Available diskspace: 171.7 GB > OS: Release 12.10 (quantal) 64-bit > > > Thanks, > Mohammad >
RE: very slow generator step
Hi - Please use the -noFilter option. It is usually useless to filter in the generator because they've already been filtered in the parse step and or update step. -Original message- > From:Mohammad wrk > Sent: Mon 12-Nov-2012 18:43 > To: user@nutch.apache.org > Subject: very slow generator step > > Hi, > > The generator time has gone from 8 minutes to 106 minutes few days ago and > stayed there since then. AFAIK, I haven't made any configuration changes > recently (attached you can find some of the configurations that I thought > might be related). > > A quick CPU sampling shows that most of the time is spent on > java.util.regex.Matcher.find(). Since I'm using default regex configurations > and my crawldb has only 3,052,412 urls, I was wondering if this is a known > issue with nutch-1.5.1 ? > > Here are some more information that might help: > > = Generator logs > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: starting at > 2012-11-09 03:14:50 > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: filtering: true > 2012-11-09 03:14:50,920 INFO crawl.Generator - Generator: normalizing: true > 2012-11-09 03:14:50,921 INFO crawl.Generator - Generator: topN: 3000 > 2012-11-09 03:14:50,923 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > 2012-11-09 03:23:39,741 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > 2012-11-09 03:23:40,743 INFO crawl.Generator - Generator: segment: > segments/20121109032340 > 2012-11-09 03:23:47,860 INFO crawl.Generator - Generator: finished at > 2012-11-09 03:23:47, elapsed: 00:08:56 > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: starting at > 2012-11-09 05:35:14 > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: Selecting > best-scoring urls due for fetch. > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: filtering: true > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: normalizing: true > 2012-11-09 05:35:14,033 INFO crawl.Generator - Generator: topN: 3000 > 2012-11-09 05:35:14,037 INFO crawl.Generator - Generator: jobtracker is > 'local', generating exactly one partition. > 2012-11-09 07:21:42,840 INFO crawl.Generator - Generator: Partitioning > selected urls for politeness. > 2012-11-09 07:21:43,841 INFO crawl.Generator - Generator: segment: > segments/20121109072143 > 2012-11-09 07:21:51,004 INFO crawl.Generator - Generator: finished at > 2012-11-09 07:21:51, elapsed: 01:46:36 > > = CrawlDb statistics > CrawlDb statistics start: ./crawldb > Statistics for CrawlDb: ./crawldb > TOTAL urls:3052412 > retry 0:3047404 > retry 1:338 > retry 2:1192 > retry 3:822 > retry 4:336 > retry 5:2320 > min score:0.0 > avg score:0.015368268 > max score:48.608 > status 1 (db_unfetched):2813249 > status 2 (db_fetched):196717 > status 3 (db_gone):14204 > status 4 (db_redir_temp):10679 > status 5 (db_redir_perm):17563 > CrawlDb statistics: done > > = System info > Memory: 4 GB > CPUs: Intel® Core™ i3-2310M CPU @ 2.10GHz × 4 > Available diskspace: 171.7 GB > OS: Release 12.10 (quantal) 64-bit > > > Thanks, > Mohammad >