Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change 
notification.

The "RedirectHandling" page has been changed by LewisJohnMcgibbney:
http://wiki.apache.org/nutch/RedirectHandling?action=diff&rev1=2&rev2=3

  = Redirect handling in Nutch =
  This page is in construction but when completed will provide a comprehensive 
overview of redirect handling in Apache Nutch.
  
- To begin with, we really want to define what HTTP URL redirects are, what 
types of problems they present for crawlers, and finally what Nutch does to 
address some of the areas. By the end of this tutorial, we should have 
addressed the complex and rather confusing area of redirects. For a whirlwind 
tour of this page please see the Table of Contents.
+ To begin with, we really want to define what HTTP URL redirects are, what 
types of problems they present for crawlers, and finally what Nutch does to 
address some of these problems. By the end of this tutorial, we should have 
addressed the complex and rather confusing area of redirects. For a whirlwind 
tour of this page please see the Table of Contents below.
  
  <<TableOfContents(3)>>
  
-  
+ == Introduction ==
+ URL redirects as they are most commonly known (and hereby referred to in this 
document), in a high level sense play the role of temporarily or permanently 
redirecting an HTTP response recipient to a location other than the request 
URI. By doing this, it is possible to ''easily'' direct browsers, web crawler, 
and subsequently users to your preferred domain (well this is true in theory 
anyway).
  
+ Some typical reasons for implementing URL redirects: (all courtesy of 
wikipedia)
+ 
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Similar_domain_names|Similar 
domain names]]   
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Moving_a_site_to_a_new_domain|Moving
 a site to a new domain]]
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Logging_outgoing_links|Logging 
outgoing links]]
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Short_aliases_for_long_URLs|Short
 aliases for long URLs]]
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Meaningful.2C_persistent_aliases_for_long_or_changing_URLs|Meaningful,
 persistent aliases for long or changing URLs]]
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Manipulating_search_engines|Manipulating
 search engines]]
+  * [[http://en.wikipedia.org/wiki/URL_redirection#Satire_and_criticism|Satire 
and criticism]]
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Manipulating_visitors|Manipulating
 visitors]]
+  * 
[[http://en.wikipedia.org/wiki/URL_redirection#Removing_referer_information|Removing
 referer information]]
+ 

Reply via email to