that must work, but it seems weird. u know, from the seed url you given, nutch will crawl from the seed url and the whole crawled pages is actually a tree. the root node is the seed url. if u can not reach those two urls from the seed url by yourself, nutch can not too.
yanky 2009/3/20 陈琛 <kylin.chc...@gmail.com> > thanks.. > the url is http://www.laopdr.gov.la/... > depth 15 topN1200 ... > > seems must put > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A& > < > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A > > > in > the urls directory > > > > 2009/3/19 yanky young <yanky.yo...@gmail.com> > > > Hi: > > > > i guess the urls you mentioned are all directed to the same jsp or > servlet, > > apparently they all begin with > > http://app02.laopdr.gov.la/ePortal/news/detail.action< > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome > > >. > > the difference is the request_locale parameter. I have no idea how these > > two > > urls with different request_locale parameters are generated, but I guess > > nutch just don't know this request_locale parameters because this > parameter > > may be added by javascript or backend content management system. Maybe u > > can > > write these links in a page that can be crawled by nutch. The point is > that > > these links must can be found somewhere in your whole website pages. if > > not, > > they can not be found by nutch. > > > > good luck > > > > yanky > > > > > > > > 2009/3/19 陈琛 <kylin.chc...@gmail.com> > > > > > please help me, it is Urgent and Important, thanks > > > > > > ---------- Forwarded message ---------- > > > From: 陈琛 <kylin.chc...@gmail.com> > > > Date: 2009/3/19 > > > Subject: index web > > > To: nutch-user@lucene.apache.org > > > > > > > > > hi, all: > > > > > > i can get index url like > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome > > > > > > but cannot get index like > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome > > > &< > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome%0A& > > > > > > and > > > > > > > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome > > > &< > > > http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A& > > > > > > > > > > > > why not index ? > > > the web have any different? > > > > > > please notice "request_locale=" > > > > > > > > > thanks > > > > > >