On 8/2/07, Emmanuel <[EMAIL PROTECTED]> wrote: > I've got a simple question why do we normalize each single outlink int he > constructor of the object. It involved the creation of many URLNormalizer > object. > > We could just add the normalizer in ParseOutputFormat just before the filter > and it will limited the number of instanciation. > Don't u think ? or did i miss something ? >
I am not sure, but I think the idea is to make Outlink class useful outside of ParseOutputformat (so that if you use Outlink w/o ParseOutputFormat, you would still end up with a normalized url). However, this minor advantage is hugely offset by the fact that we are recreating URLNormalizers for every outlink (and if you have an ordering on your normalizers, re-ordering them *every* *single* time), so overall moving normalizing into ParseOutputFormat seems like a good idea to me. (and while we are doing that, perhaps we can stop creating a ParseUtil instance for every ParseSegment.map [even though it has a smaller overhead]). -- Doğacan Güney ------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Nutch-general mailing list Nutch-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-general