I’m actually OK with them being indexed. It could be helpful to search for 
“Solr 8.11 aliases” or something like that.

The priority attribute in sitemap.xml should boost the default, latest manual 
and that shouldn’t require any web server config. I’m glad to craft a static 
sitemap.xml file. One generated from the guide would be better, but that can be 
a later improvement.

To get the old versions completely out of the index, add a robots.txt file to 
the solr-site repo under contents/ with these lines:

User-agent: *
Disallow: /guide/8*
Disallow: /guide/7*
Disallow: /guide/6*

Note that the wildcards on the paths aren't needed, but they helps humans 
understand that the disallows are a prefix match.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Sep 21, 2023, at 12:08 PM, Houston Putman <hous...@apache.org> wrote:
> 
> I've been trying to get this working for the last year. Basically our issue
> is that the htaccess files do not add the right X-Robots-Tag header for old
> ref guide pages.
> 
> https://github.com/apache/solr-site/blob/main/themes/solr/templates/htaccess.ref-guide-old#L1
> 
> This works locally, but in the actual Solr site, the headers are not
> returned. I have no idea why. Would love some help though, as I also hate
> seeing the old ref guide in the google results.
> 
> - Houston
> 
> On Thu, Sep 21, 2023 at 11:30 AM Walter Underwood <wun...@wunderwood.org>
> wrote:
> 
>> When I get web search results that include the Solr Reference Guide, I
>> often get older versions (6.6, 7.4) in the results. I would prefer to
>> always get the latest reference (
>> https://solr.apache.org/guide/solr/latest/index.html).
>> 
>> I think we can list the URLs for that in a sitemap.xml file with a higher
>> priority to suggest to the crawlers that these are the preferred pages.
>> 
>> I don’t see a sitemap.xml or sitemap.xml.gz at https://solr.apached.org <
>> https://solr.apached.org/>.
>> 
>> Should we prefer the latest manual? How do we build/deploy a sitemap? See:
>> https://www.sitemaps.org/
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 

Reply via email to