Re: site-specific crawling policies
What if I want to index different metatags for different site? On Fri, Nov 16, 2012 at 11:03 AM, Markus Jelsma wrote: > you can override some URL Filter paths in nutch site or with command line > options (tools) such as bin/nutch fetch -Durlfilter.regex.file=bla. You can > also set NUTCH_HOME and keep everything separate if you're running it > locally. On Hadoop you'll need separate job files. > > -Original message- > > From:Joe Zhang > > Sent: Fri 16-Nov-2012 18:35 > > To: user@nutch.apache.org > > Subject: Re: site-specific crawling policies > > > > That's easy to do. But what about the configuration files? The same > > nutchs-site.xml, urlfiter files will be read. > > > > On Fri, Nov 16, 2012 at 3:28 AM, Sourajit Basak < > sourajit.ba...@gmail.com>wrote: > > > > > Group related sites together and use separate crawldb, segment > > > directories. > > > > > > On Fri, Nov 16, 2012 at 9:40 AM, Joe Zhang > wrote: > > > > > > > So how exactly do I set up different nutch instances then? > > > > > > > > On Thu, Nov 15, 2012 at 7:52 PM, Lewis John Mcgibbney < > > > > lewis.mcgibb...@gmail.com> wrote: > > > > > > > > > Hi Joe, > > > > > > > > > > In all honesty, it might sound slightly optimistic, it may also > depend > > > > > upon the size and calibre of the different sites/domains but if you > > > > > are attempting a depth first, domain specific crawl, then maybe > > > > > separate Nutch instances will be your friend... > > > > > > > > > > Lewis > > > > > > > > > > > > > > > On Thu, Nov 15, 2012 at 11:53 PM, Joe Zhang > > > > wrote: > > > > > > well, these are all details. The bigger question is, how to > seperate > > > > the > > > > > > crawling policy of site A from that of site B? > > > > > > > > > > > > On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak < > > > > > sourajit.ba...@gmail.com>wrote: > > > > > > > > > > > >> You probably need to customize parse-metatags plugin. > > > > > >> > > > > > >> I think you go ahead and include all possible metatags. And take > > > care > > > > of > > > > > >> missing metatags in solr. > > > > > >> > > > > > >> On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang < > smartag...@gmail.com> > > > > > wrote: > > > > > >> > > > > > >> > I understand conf/regex-urlfilter.txt; I can put domain names > into > > > > the > > > > > >> URL > > > > > >> > patterns. > > > > > >> > > > > > > >> > But what about meta tags? What if I want to parse out > different > > > meta > > > > > tags > > > > > >> > for different sites? > > > > > >> > > > > > > >> > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < > > > > > >> sourajit.ba...@gmail.com > > > > > >> > >wrote: > > > > > >> > > > > > > >> > > 1) For parsing & indexing customized meta tags enable & > > > configure > > > > > >> plugin > > > > > >> > > "parse-metatags" > > > > > >> > > > > > > > >> > > 2) There are several filters of url, like regex based. For > > > regex, > > > > > the > > > > > >> > > patterns are specified via conf/regex-urlfilter.txt > > > > > >> > > > > > > > >> > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil < > > > > > tejas.patil...@gmail.com > > > > > >> > > >wrote: > > > > > >> > > > > > > > >> > > > While defining url patterns, have the domain name in it so > > > that > > > > > you > > > > > >> get > > > > > >> > > > site/domain specific rules. I don't know about configuring > > > meta > > > > > tags. > > > > > >> > > > > > > > > >> > > > Thanks, > > > > > >> > > > Tejas > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang < > > > > smartag...@gmail.com > > > > > > > > > > > >> > > wrote: > > > > > >> > > > > > > > > >> > > > > How to enforce site-specific crawling policies, i.e, > > > different > > > > > URL > > > > > >> > > > > patterns, meta tags, etc. for different websites to be > > > > crawled? > > > > > I > > > > > >> got > > > > > >> > > the > > > > > >> > > > > sense that multiple instances of nutch are needed? Is it > > > > > correct? > > > > > >> If > > > > > >> > > yes, > > > > > >> > > > > how? > > > > > >> > > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > Lewis > > > > > > > > > > > > > > >
RE: site-specific crawling policies
you can override some URL Filter paths in nutch site or with command line options (tools) such as bin/nutch fetch -Durlfilter.regex.file=bla. You can also set NUTCH_HOME and keep everything separate if you're running it locally. On Hadoop you'll need separate job files. -Original message- > From:Joe Zhang > Sent: Fri 16-Nov-2012 18:35 > To: user@nutch.apache.org > Subject: Re: site-specific crawling policies > > That's easy to do. But what about the configuration files? The same > nutchs-site.xml, urlfiter files will be read. > > On Fri, Nov 16, 2012 at 3:28 AM, Sourajit Basak > wrote: > > > Group related sites together and use separate crawldb, segment > > directories. > > > > On Fri, Nov 16, 2012 at 9:40 AM, Joe Zhang wrote: > > > > > So how exactly do I set up different nutch instances then? > > > > > > On Thu, Nov 15, 2012 at 7:52 PM, Lewis John Mcgibbney < > > > lewis.mcgibb...@gmail.com> wrote: > > > > > > > Hi Joe, > > > > > > > > In all honesty, it might sound slightly optimistic, it may also depend > > > > upon the size and calibre of the different sites/domains but if you > > > > are attempting a depth first, domain specific crawl, then maybe > > > > separate Nutch instances will be your friend... > > > > > > > > Lewis > > > > > > > > > > > > On Thu, Nov 15, 2012 at 11:53 PM, Joe Zhang > > > wrote: > > > > > well, these are all details. The bigger question is, how to seperate > > > the > > > > > crawling policy of site A from that of site B? > > > > > > > > > > On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak < > > > > sourajit.ba...@gmail.com>wrote: > > > > > > > > > >> You probably need to customize parse-metatags plugin. > > > > >> > > > > >> I think you go ahead and include all possible metatags. And take > > care > > > of > > > > >> missing metatags in solr. > > > > >> > > > > >> On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang > > > > wrote: > > > > >> > > > > >> > I understand conf/regex-urlfilter.txt; I can put domain names into > > > the > > > > >> URL > > > > >> > patterns. > > > > >> > > > > > >> > But what about meta tags? What if I want to parse out different > > meta > > > > tags > > > > >> > for different sites? > > > > >> > > > > > >> > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < > > > > >> sourajit.ba...@gmail.com > > > > >> > >wrote: > > > > >> > > > > > >> > > 1) For parsing & indexing customized meta tags enable & > > configure > > > > >> plugin > > > > >> > > "parse-metatags" > > > > >> > > > > > > >> > > 2) There are several filters of url, like regex based. For > > regex, > > > > the > > > > >> > > patterns are specified via conf/regex-urlfilter.txt > > > > >> > > > > > > >> > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil < > > > > tejas.patil...@gmail.com > > > > >> > > >wrote: > > > > >> > > > > > > >> > > > While defining url patterns, have the domain name in it so > > that > > > > you > > > > >> get > > > > >> > > > site/domain specific rules. I don't know about configuring > > meta > > > > tags. > > > > >> > > > > > > > >> > > > Thanks, > > > > >> > > > Tejas > > > > >> > > > > > > > >> > > > > > > > >> > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang < > > > smartag...@gmail.com > > > > > > > > > >> > > wrote: > > > > >> > > > > > > > >> > > > > How to enforce site-specific crawling policies, i.e, > > different > > > > URL > > > > >> > > > > patterns, meta tags, etc. for different websites to be > > > crawled? > > > > I > > > > >> got > > > > >> > > the > > > > >> > > > > sense that multiple instances of nutch are needed? Is it > > > > correct? > > > > >> If > > > > >> > > yes, > > > > >> > > > > how? > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > -- > > > > Lewis > > > > > > > > > >
Re: site-specific crawling policies
That's easy to do. But what about the configuration files? The same nutchs-site.xml, urlfiter files will be read. On Fri, Nov 16, 2012 at 3:28 AM, Sourajit Basak wrote: > Group related sites together and use separate crawldb, segment > directories. > > On Fri, Nov 16, 2012 at 9:40 AM, Joe Zhang wrote: > > > So how exactly do I set up different nutch instances then? > > > > On Thu, Nov 15, 2012 at 7:52 PM, Lewis John Mcgibbney < > > lewis.mcgibb...@gmail.com> wrote: > > > > > Hi Joe, > > > > > > In all honesty, it might sound slightly optimistic, it may also depend > > > upon the size and calibre of the different sites/domains but if you > > > are attempting a depth first, domain specific crawl, then maybe > > > separate Nutch instances will be your friend... > > > > > > Lewis > > > > > > > > > On Thu, Nov 15, 2012 at 11:53 PM, Joe Zhang > > wrote: > > > > well, these are all details. The bigger question is, how to seperate > > the > > > > crawling policy of site A from that of site B? > > > > > > > > On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak < > > > sourajit.ba...@gmail.com>wrote: > > > > > > > >> You probably need to customize parse-metatags plugin. > > > >> > > > >> I think you go ahead and include all possible metatags. And take > care > > of > > > >> missing metatags in solr. > > > >> > > > >> On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang > > > wrote: > > > >> > > > >> > I understand conf/regex-urlfilter.txt; I can put domain names into > > the > > > >> URL > > > >> > patterns. > > > >> > > > > >> > But what about meta tags? What if I want to parse out different > meta > > > tags > > > >> > for different sites? > > > >> > > > > >> > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < > > > >> sourajit.ba...@gmail.com > > > >> > >wrote: > > > >> > > > > >> > > 1) For parsing & indexing customized meta tags enable & > configure > > > >> plugin > > > >> > > "parse-metatags" > > > >> > > > > > >> > > 2) There are several filters of url, like regex based. For > regex, > > > the > > > >> > > patterns are specified via conf/regex-urlfilter.txt > > > >> > > > > > >> > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil < > > > tejas.patil...@gmail.com > > > >> > > >wrote: > > > >> > > > > > >> > > > While defining url patterns, have the domain name in it so > that > > > you > > > >> get > > > >> > > > site/domain specific rules. I don't know about configuring > meta > > > tags. > > > >> > > > > > > >> > > > Thanks, > > > >> > > > Tejas > > > >> > > > > > > >> > > > > > > >> > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang < > > smartag...@gmail.com > > > > > > > >> > > wrote: > > > >> > > > > > > >> > > > > How to enforce site-specific crawling policies, i.e, > different > > > URL > > > >> > > > > patterns, meta tags, etc. for different websites to be > > crawled? > > > I > > > >> got > > > >> > > the > > > >> > > > > sense that multiple instances of nutch are needed? Is it > > > correct? > > > >> If > > > >> > > yes, > > > >> > > > > how? > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > > > > > > > > > > -- > > > Lewis > > > > > >
Re: site-specific crawling policies
Group related sites together and use separate crawldb, segment directories. On Fri, Nov 16, 2012 at 9:40 AM, Joe Zhang wrote: > So how exactly do I set up different nutch instances then? > > On Thu, Nov 15, 2012 at 7:52 PM, Lewis John Mcgibbney < > lewis.mcgibb...@gmail.com> wrote: > > > Hi Joe, > > > > In all honesty, it might sound slightly optimistic, it may also depend > > upon the size and calibre of the different sites/domains but if you > > are attempting a depth first, domain specific crawl, then maybe > > separate Nutch instances will be your friend... > > > > Lewis > > > > > > On Thu, Nov 15, 2012 at 11:53 PM, Joe Zhang > wrote: > > > well, these are all details. The bigger question is, how to seperate > the > > > crawling policy of site A from that of site B? > > > > > > On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak < > > sourajit.ba...@gmail.com>wrote: > > > > > >> You probably need to customize parse-metatags plugin. > > >> > > >> I think you go ahead and include all possible metatags. And take care > of > > >> missing metatags in solr. > > >> > > >> On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang > > wrote: > > >> > > >> > I understand conf/regex-urlfilter.txt; I can put domain names into > the > > >> URL > > >> > patterns. > > >> > > > >> > But what about meta tags? What if I want to parse out different meta > > tags > > >> > for different sites? > > >> > > > >> > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < > > >> sourajit.ba...@gmail.com > > >> > >wrote: > > >> > > > >> > > 1) For parsing & indexing customized meta tags enable & configure > > >> plugin > > >> > > "parse-metatags" > > >> > > > > >> > > 2) There are several filters of url, like regex based. For regex, > > the > > >> > > patterns are specified via conf/regex-urlfilter.txt > > >> > > > > >> > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil < > > tejas.patil...@gmail.com > > >> > > >wrote: > > >> > > > > >> > > > While defining url patterns, have the domain name in it so that > > you > > >> get > > >> > > > site/domain specific rules. I don't know about configuring meta > > tags. > > >> > > > > > >> > > > Thanks, > > >> > > > Tejas > > >> > > > > > >> > > > > > >> > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang < > smartag...@gmail.com > > > > > >> > > wrote: > > >> > > > > > >> > > > > How to enforce site-specific crawling policies, i.e, different > > URL > > >> > > > > patterns, meta tags, etc. for different websites to be > crawled? > > I > > >> got > > >> > > the > > >> > > > > sense that multiple instances of nutch are needed? Is it > > correct? > > >> If > > >> > > yes, > > >> > > > > how? > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > > > > > > > -- > > Lewis > > >
Re: site-specific crawling policies
So how exactly do I set up different nutch instances then? On Thu, Nov 15, 2012 at 7:52 PM, Lewis John Mcgibbney < lewis.mcgibb...@gmail.com> wrote: > Hi Joe, > > In all honesty, it might sound slightly optimistic, it may also depend > upon the size and calibre of the different sites/domains but if you > are attempting a depth first, domain specific crawl, then maybe > separate Nutch instances will be your friend... > > Lewis > > > On Thu, Nov 15, 2012 at 11:53 PM, Joe Zhang wrote: > > well, these are all details. The bigger question is, how to seperate the > > crawling policy of site A from that of site B? > > > > On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak < > sourajit.ba...@gmail.com>wrote: > > > >> You probably need to customize parse-metatags plugin. > >> > >> I think you go ahead and include all possible metatags. And take care of > >> missing metatags in solr. > >> > >> On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang > wrote: > >> > >> > I understand conf/regex-urlfilter.txt; I can put domain names into the > >> URL > >> > patterns. > >> > > >> > But what about meta tags? What if I want to parse out different meta > tags > >> > for different sites? > >> > > >> > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < > >> sourajit.ba...@gmail.com > >> > >wrote: > >> > > >> > > 1) For parsing & indexing customized meta tags enable & configure > >> plugin > >> > > "parse-metatags" > >> > > > >> > > 2) There are several filters of url, like regex based. For regex, > the > >> > > patterns are specified via conf/regex-urlfilter.txt > >> > > > >> > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil < > tejas.patil...@gmail.com > >> > > >wrote: > >> > > > >> > > > While defining url patterns, have the domain name in it so that > you > >> get > >> > > > site/domain specific rules. I don't know about configuring meta > tags. > >> > > > > >> > > > Thanks, > >> > > > Tejas > >> > > > > >> > > > > >> > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang > > >> > > wrote: > >> > > > > >> > > > > How to enforce site-specific crawling policies, i.e, different > URL > >> > > > > patterns, meta tags, etc. for different websites to be crawled? > I > >> got > >> > > the > >> > > > > sense that multiple instances of nutch are needed? Is it > correct? > >> If > >> > > yes, > >> > > > > how? > >> > > > > > >> > > > > >> > > > >> > > >> > > > > -- > Lewis >
Re: site-specific crawling policies
Hi Joe, In all honesty, it might sound slightly optimistic, it may also depend upon the size and calibre of the different sites/domains but if you are attempting a depth first, domain specific crawl, then maybe separate Nutch instances will be your friend... Lewis On Thu, Nov 15, 2012 at 11:53 PM, Joe Zhang wrote: > well, these are all details. The bigger question is, how to seperate the > crawling policy of site A from that of site B? > > On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak > wrote: > >> You probably need to customize parse-metatags plugin. >> >> I think you go ahead and include all possible metatags. And take care of >> missing metatags in solr. >> >> On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang wrote: >> >> > I understand conf/regex-urlfilter.txt; I can put domain names into the >> URL >> > patterns. >> > >> > But what about meta tags? What if I want to parse out different meta tags >> > for different sites? >> > >> > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < >> sourajit.ba...@gmail.com >> > >wrote: >> > >> > > 1) For parsing & indexing customized meta tags enable & configure >> plugin >> > > "parse-metatags" >> > > >> > > 2) There are several filters of url, like regex based. For regex, the >> > > patterns are specified via conf/regex-urlfilter.txt >> > > >> > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil > > > >wrote: >> > > >> > > > While defining url patterns, have the domain name in it so that you >> get >> > > > site/domain specific rules. I don't know about configuring meta tags. >> > > > >> > > > Thanks, >> > > > Tejas >> > > > >> > > > >> > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang >> > > wrote: >> > > > >> > > > > How to enforce site-specific crawling policies, i.e, different URL >> > > > > patterns, meta tags, etc. for different websites to be crawled? I >> got >> > > the >> > > > > sense that multiple instances of nutch are needed? Is it correct? >> If >> > > yes, >> > > > > how? >> > > > > >> > > > >> > > >> > >> -- Lewis
Re: site-specific crawling policies
well, these are all details. The bigger question is, how to seperate the crawling policy of site A from that of site B? On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak wrote: > You probably need to customize parse-metatags plugin. > > I think you go ahead and include all possible metatags. And take care of > missing metatags in solr. > > On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang wrote: > > > I understand conf/regex-urlfilter.txt; I can put domain names into the > URL > > patterns. > > > > But what about meta tags? What if I want to parse out different meta tags > > for different sites? > > > > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < > sourajit.ba...@gmail.com > > >wrote: > > > > > 1) For parsing & indexing customized meta tags enable & configure > plugin > > > "parse-metatags" > > > > > > 2) There are several filters of url, like regex based. For regex, the > > > patterns are specified via conf/regex-urlfilter.txt > > > > > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil > > >wrote: > > > > > > > While defining url patterns, have the domain name in it so that you > get > > > > site/domain specific rules. I don't know about configuring meta tags. > > > > > > > > Thanks, > > > > Tejas > > > > > > > > > > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang > > > wrote: > > > > > > > > > How to enforce site-specific crawling policies, i.e, different URL > > > > > patterns, meta tags, etc. for different websites to be crawled? I > got > > > the > > > > > sense that multiple instances of nutch are needed? Is it correct? > If > > > yes, > > > > > how? > > > > > > > > > > > > > > >
Re: site-specific crawling policies
You probably need to customize parse-metatags plugin. I think you go ahead and include all possible metatags. And take care of missing metatags in solr. On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang wrote: > I understand conf/regex-urlfilter.txt; I can put domain names into the URL > patterns. > > But what about meta tags? What if I want to parse out different meta tags > for different sites? > > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak >wrote: > > > 1) For parsing & indexing customized meta tags enable & configure plugin > > "parse-metatags" > > > > 2) There are several filters of url, like regex based. For regex, the > > patterns are specified via conf/regex-urlfilter.txt > > > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil > >wrote: > > > > > While defining url patterns, have the domain name in it so that you get > > > site/domain specific rules. I don't know about configuring meta tags. > > > > > > Thanks, > > > Tejas > > > > > > > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang > > wrote: > > > > > > > How to enforce site-specific crawling policies, i.e, different URL > > > > patterns, meta tags, etc. for different websites to be crawled? I got > > the > > > > sense that multiple instances of nutch are needed? Is it correct? If > > yes, > > > > how? > > > > > > > > > >
Re: site-specific crawling policies
I understand conf/regex-urlfilter.txt; I can put domain names into the URL patterns. But what about meta tags? What if I want to parse out different meta tags for different sites? On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak wrote: > 1) For parsing & indexing customized meta tags enable & configure plugin > "parse-metatags" > > 2) There are several filters of url, like regex based. For regex, the > patterns are specified via conf/regex-urlfilter.txt > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil >wrote: > > > While defining url patterns, have the domain name in it so that you get > > site/domain specific rules. I don't know about configuring meta tags. > > > > Thanks, > > Tejas > > > > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang > wrote: > > > > > How to enforce site-specific crawling policies, i.e, different URL > > > patterns, meta tags, etc. for different websites to be crawled? I got > the > > > sense that multiple instances of nutch are needed? Is it correct? If > yes, > > > how? > > > > > >
Re: site-specific crawling policies
1) For parsing & indexing customized meta tags enable & configure plugin "parse-metatags" 2) There are several filters of url, like regex based. For regex, the patterns are specified via conf/regex-urlfilter.txt On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil wrote: > While defining url patterns, have the domain name in it so that you get > site/domain specific rules. I don't know about configuring meta tags. > > Thanks, > Tejas > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang wrote: > > > How to enforce site-specific crawling policies, i.e, different URL > > patterns, meta tags, etc. for different websites to be crawled? I got the > > sense that multiple instances of nutch are needed? Is it correct? If yes, > > how? > > >
Re: site-specific crawling policies
While defining url patterns, have the domain name in it so that you get site/domain specific rules. I don't know about configuring meta tags. Thanks, Tejas On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang wrote: > How to enforce site-specific crawling policies, i.e, different URL > patterns, meta tags, etc. for different websites to be crawled? I got the > sense that multiple instances of nutch are needed? Is it correct? If yes, > how? >