ok thx, looks great.

2009/8/26 Fuad Efendi <[email protected]>

> Probably this is suitable:
>
>
> <property>
>  <name>generate.max.per.host</name>
>  <value>-1</value>
>  <description>The maximum number of urls per host in a single
>  fetchlist.  -1 if unlimited.</description>
> </property>
>
>
> [-topN N] - Number of top URLs to be selected
>
>
>
> -----Original Message-----
> From: MilleBii [mailto:[email protected]]
> Sent: August-26-09 5:39 AM
> To: [email protected]
> Subject: Re: Limiting number of URL from the same site in a fetch cycle
>
>  db.max.outlinks.per.page will result in missing links ? Don't want that.
> I just would want to balance them on a next fetch cycle.
>
>
>
>
> 2009/8/26 Fuad Efendi <[email protected]>
>
> > You can filter some unnecessary "tail" using UrlFilter; for instance,
> some
> > sites may have long forums which you don't need, or shopping cart /
> process
> > to checkout pages which they forgot to restrict via robots.txt...
> >
> > Check regex-urlfilter.txt.template in /conf
> >
> >
> > Another parameter which equalizes 'per-site' URLs is
> > db.max.outlinks.per.page=100 (some sites may have 10 links per page,
> others
> > - 1000...)
> >
> >
> > -Fuad
> > http://www.linkedin.com/in/liferay
> > http://www.tokenizer.org
> >
> >
> >
> > -----Original Message-----
> > From: MilleBii [mailto:[email protected]]
> > Sent: August-25-09 5:48 PM
> > To: [email protected]
> > Subject: Limiting number of URL from the same site in a fetch cycle
> >
> > I'm wondering if there is a setting by which you can limit the number of
> > urls per site on a fetch list, not a on a total site.
> > In this way I could avoid long tails in a fetch list all from the same
> site
> > so it takes damn long (5s per URL), I'd like to fetch them on the next
> > cycle.
> >
> > --
> > -MilleBii-
> >
> >
> >
>
>
> --
> -MilleBii-
>
>
>


-- 
-MilleBii-

Reply via email to