At what point does the URL normalization happen?
Example: I give the fetcher 100 starting URLs, and it get the 100 pages which have another 1000 URL in them. Based on rules specified in the RegexUrlNormalizer, all those URL will get modified.
Does the WebDB hold the normalized URLs or the Raw URL? Or does it normalize the URL just before doing a fetch?
Thankx for taking the time to answer
CC-
URL normalization occurs when new URLs are injected or found by crawling. So, prior to being stored in WebDB, the URLs are normalized.
Luke Baker
------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
