Dear Wiki user, You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.
The "RedirectHandling" page has been changed by LewisJohnMcgibbney: http://wiki.apache.org/nutch/RedirectHandling?action=diff&rev1=2&rev2=3 = Redirect handling in Nutch = This page is in construction but when completed will provide a comprehensive overview of redirect handling in Apache Nutch. - To begin with, we really want to define what HTTP URL redirects are, what types of problems they present for crawlers, and finally what Nutch does to address some of the areas. By the end of this tutorial, we should have addressed the complex and rather confusing area of redirects. For a whirlwind tour of this page please see the Table of Contents. + To begin with, we really want to define what HTTP URL redirects are, what types of problems they present for crawlers, and finally what Nutch does to address some of these problems. By the end of this tutorial, we should have addressed the complex and rather confusing area of redirects. For a whirlwind tour of this page please see the Table of Contents below. <<TableOfContents(3)>> - + == Introduction == + URL redirects as they are most commonly known (and hereby referred to in this document), in a high level sense play the role of temporarily or permanently redirecting an HTTP response recipient to a location other than the request URI. By doing this, it is possible to ''easily'' direct browsers, web crawler, and subsequently users to your preferred domain (well this is true in theory anyway). + Some typical reasons for implementing URL redirects: (all courtesy of wikipedia) + + * [[http://en.wikipedia.org/wiki/URL_redirection#Similar_domain_names|Similar domain names]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Moving_a_site_to_a_new_domain|Moving a site to a new domain]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Logging_outgoing_links|Logging outgoing links]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Short_aliases_for_long_URLs|Short aliases for long URLs]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Meaningful.2C_persistent_aliases_for_long_or_changing_URLs|Meaningful, persistent aliases for long or changing URLs]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Manipulating_search_engines|Manipulating search engines]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Satire_and_criticism|Satire and criticism]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Manipulating_visitors|Manipulating visitors]] + * [[http://en.wikipedia.org/wiki/URL_redirection#Removing_referer_information|Removing referer information]] +