Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change
notification.
The "RedirectHandling" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/RedirectHandling?action=diff&rev1=2&rev2=3
= Redirect handling in Nutch =
This page is in construction but when completed will provide a comprehensive
overview of redirect handling in Apache Nutch.
- To begin with, we really want to define what HTTP URL redirects are, what
types of problems they present for crawlers, and finally what Nutch does to
address some of the areas. By the end of this tutorial, we should have
addressed the complex and rather confusing area of redirects. For a whirlwind
tour of this page please see the Table of Contents.
+ To begin with, we really want to define what HTTP URL redirects are, what
types of problems they present for crawlers, and finally what Nutch does to
address some of these problems. By the end of this tutorial, we should have
addressed the complex and rather confusing area of redirects. For a whirlwind
tour of this page please see the Table of Contents below.
<>
-
+ == Introduction ==
+ URL redirects as they are most commonly known (and hereby referred to in this
document), in a high level sense play the role of temporarily or permanently
redirecting an HTTP response recipient to a location other than the request
URI. By doing this, it is possible to ''easily'' direct browsers, web crawler,
and subsequently users to your preferred domain (well this is true in theory
anyway).
+ Some typical reasons for implementing URL redirects: (all courtesy of
wikipedia)
+
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Similar_domain_names|Similar
domain names]]
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Moving_a_site_to_a_new_domain|Moving
a site to a new domain]]
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Logging_outgoing_links|Logging
outgoing links]]
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Short_aliases_for_long_URLs|Short
aliases for long URLs]]
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Meaningful.2C_persistent_aliases_for_long_or_changing_URLs|Meaningful,
persistent aliases for long or changing URLs]]
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Manipulating_search_engines|Manipulating
search engines]]
+ * [[http://en.wikipedia.org/wiki/URL_redirection#Satire_and_criticism|Satire
and criticism]]
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Manipulating_visitors|Manipulating
visitors]]
+ *
[[http://en.wikipedia.org/wiki/URL_redirection#Removing_referer_information|Removing
referer information]]
+