Andrzej Bialecki wrote: > [EMAIL PROTECTED] wrote: >> What is the best way to accomplish this? >> >> One thing I was thinking was to index the staging site, then open up >> CrawlDb and LinkDb (any others?), loop through them and write out a >> new version of those files, changing the keys (URLs) along the way, >> for instance from http://STAGING.example.com/foo/bar.html to >> http://WWW.example.com/foo/bar.html >> >> Has anyone done this? Does this sound realistic/doable? >> Is there a better/faster/easier way? >> e.g. changing URLs immediately at fetch/parse/index time? >> e.g. changing URLs on the fly at search time when displaying results? > > There is another option - when fetching configure nutch to use a URL > rewriting proxy, which will rewrite on the fly your requests of > www.example.com to staging.example.com, get the response, and return the > content - the only thing to do then would be to rewrite absolute > outlinks contained in the content, from staging to www - but this can be > done in URLNormalizers. >
You could also let your reverse proxy do the rewriting using something like http://apache.webthing.com/mod_proxy_html/. I have been using something like that for rewriting massive amount of html in realtime for AA purposes to hammer web applications to different url space. -- Sami Siren ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
