that must work, but it seems weird. u know, from the seed url you given,
nutch will crawl from the seed url and the whole crawled pages is actually a
tree. the root node is the seed url. if u can not reach those two urls from
the seed url by yourself, nutch can not too.

yanky


2009/3/20 陈琛 <kylin.chc...@gmail.com>

> thanks..
>               the url is http://www.laopdr.gov.la/...
> depth 15 topN1200 ...
>
> seems must put
>
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> <
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A
> >
> in
> the urls directory
>
>
>
> 2009/3/19 yanky young <yanky.yo...@gmail.com>
>
> > Hi:
> >
> > i guess the urls you mentioned are all directed to the same jsp or
> servlet,
> > apparently they all begin with
> > http://app02.laopdr.gov.la/ePortal/news/detail.action<
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> > >.
> > the difference is the request_locale parameter. I have no idea how these
> > two
> > urls with different request_locale parameters are generated, but I guess
> > nutch just don't know this request_locale parameters because this
> parameter
> > may be added by javascript or backend content management system. Maybe u
> > can
> > write these links in a page that can be crawled by nutch. The point is
> that
> > these links must can be found somewhere in your whole website pages. if
> > not,
> > they can not be found by nutch.
> >
> > good luck
> >
> > yanky
> >
> >
> >
> > 2009/3/19 陈琛 <kylin.chc...@gmail.com>
> >
> > > please help me, it is Urgent and Important, thanks
> > >
> > > ---------- Forwarded message ----------
> > > From: 陈琛 <kylin.chc...@gmail.com>
> > > Date: 2009/3/19
> > > Subject: index web
> > > To: nutch-user@lucene.apache.org
> > >
> > >
> > > hi, all:
> > >
> > > i can get index url like
> > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?id=10110&from=ePortal_NewsDetail_FromHome
> > >
> > > but  cannot get index like
> > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome
> > > &<
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=en_US&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > >
> > > and
> > >
> > >
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome
> > > &<
> >
> http://app02.laopdr.gov.la/ePortal/news/detail.action?request_locale=lo_LA&id=10110&from=ePortal_NewsDetail_FromHome%0A&;
> > >
> >  >
> > >
> > > why not index ?
> > > the web have any different?
> > >
> > > please notice "request_locale="
> > >
> > >
> > > thanks
> > >
> >
>

Reply via email to