On Mon, 2004-07-26 at 12:51, Doug Cutting wrote:
> Michael Rosset wrote:
> > Attached is a patch for search.jsp adding support for grouping by host.
>
> I just tried this on a test index with 160k pages. It gets really slow
> when there are lots of duplicates. I haven't looked too closely, but I
> assume this is because it has to look at lots of hit details.
>
> I think we can accelerate this. We index the hostname in the "site"
> field. When re-querying we could add a clause to the query which
> prohibits sites we don't want to see any more hits from. This could be
> done with something like:
>
> query.addProhibitedTerm("site", host);
>
> The query should be cloned first, which means that Query needs to be
> made cloneable.
>
> Does this sound like a good approach to accelerating this? If so,
> Stefan or Andrzej, do you want to look into implementing this?
>
> Doug
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
> _______________________________________________
> Nutch-developers mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers