On 8/2/07, Emmanuel <[EMAIL PROTECTED]> wrote:
> I've got a simple question why do we normalize each single outlink int he
> constructor of the object. It involved the creation of many URLNormalizer
> object.
>
> We could just add the normalizer in ParseOutputFormat just before the filter
> and it will limited the number of instanciation.
> Don't u think ? or did i miss something ?
>


I am not sure, but I think the idea is to make Outlink class useful
outside of ParseOutputformat (so that if you use Outlink w/o
ParseOutputFormat, you would still end up with a normalized url).

However, this minor advantage is hugely offset by the fact that we are
recreating URLNormalizers for every outlink (and if you have an
ordering on your normalizers, re-ordering them *every* *single* time),
so overall moving normalizing into ParseOutputFormat seems like a good
idea to me. (and while we are doing that, perhaps we can stop creating
 a ParseUtil instance for every ParseSegment.map [even though it has a
smaller overhead]).

-- 
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
_______________________________________________
Nutch-general mailing list
Nutch-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to