On 28 Feb 2012, at 14:57, Rich Bowen wrote: >> That's what robots.txt is for! Surely we can use that to stop indexing 2.0 >> as well as 1.3? Maybe even 2.2 once 2.4 is windows-ready and in the distros? > > The rel canonical thing is a way to actively update the Google index for a > particular page and search term, and has been very effective in updating > certain searches. For example, searching Google for "rewriterule" has long > given the 1.3 Rewrite Guide, but within 24 hours of adding a rel canonical > tag, it started pointing to the 2.2 mod_rewrite docs as the top hit.
I agree with Nick. Why not change http://httpd.apache.org/robots.txt so that the 1.3 documents are no longer crawled? If I wanted to go through each page to make more fine-grained changes I'd only end up adding: <meta name="robots" content="noindex"> …which does almost exactly the same thing, for more effort. The ASF doesn't really need extra help getting the top Google / Bing / whatever hit for “httpd”, “Apache” etc. That's why most people use Link: … rel="canonical": they want to preserve their PageRank. But this discussion has been about the 1.3 docs having *too much* PageRank. I can spot one downside. Excluding a document with robots.txt also blocks access to historical versions via web.archive.org Is this important? -- Tim Bannister – [email protected]
smime.p7s
Description: S/MIME cryptographic signature
