Andrzej Bialecki wrote:
> [EMAIL PROTECTED] wrote:
>> What is the best way to accomplish this?
>>
>> One thing I was thinking was to index the staging site, then open up
>> CrawlDb and LinkDb (any others?), loop through them and write out a
>> new version of those files, changing the keys (URLs) along the way,
>> for instance from http://STAGING.example.com/foo/bar.html to
>> http://WWW.example.com/foo/bar.html
>>
>> Has anyone done this?  Does this sound realistic/doable?
>> Is there a better/faster/easier way?
>>   e.g. changing URLs immediately at fetch/parse/index time?
>>   e.g. changing URLs on the fly at search time when displaying results?
> 
> There is another option - when fetching configure nutch to use a URL
> rewriting proxy, which will rewrite on the fly your requests of
> www.example.com to staging.example.com, get the response, and return the
> content - the only thing to do then would be to rewrite absolute
> outlinks contained in the content, from staging to www - but this can be
> done in URLNormalizers.
> 

You could also let your reverse proxy do the rewriting using something
like http://apache.webthing.com/mod_proxy_html/. I have been using
something like that for rewriting massive amount of html in realtime for
AA purposes to hammer web applications to different url space.

--
 Sami Siren


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to