Re: Canonicalization of URLs on our website
Hi Gary, On Sun, 21 Apr 2024 at 00:02, Gary Gregory wrote: > > I agree with Piotr. I prefer the simplest solution, pointing to > `index.html`, no guessing required. Personally I prefer the shortest one: * no www, * no `index.html`, * no `.html`. Piotr
Re: Canonicalization of URLs on our website
I agree with Piotr. I prefer the simplest solution, pointing to `index.html`, no guessing required. Gary On Sat, Apr 20, 2024 at 4:17 PM Piotr P. Karwasz wrote: > > Hi, > > I scanned our https://logging.apache.org/ website and found out that > the internal hyperlinks between our pages are not consistent. For > example links to: > > https://logging.apache.org/log4j/2.x/ > > might appear in hyperlinks with an URI path of: > > * `/log4j/2.x` (which causes a 301 HTTP redirect), > * `/log4j/2.x/`, > * `/log4j/2.x/index.html`. > > This lack of uniformity can cause several problems: > > * search engines might treat those 3 links as equivalent, but not necessarily. > * if an `index.html` file is moved, we need to provide a redirect for > all 3 alternatives: a recent example is > `/log4j/2.x/log4j-1.2-api/index.html` that was moved to > `/log4j2/2.x/log4j-1.2-api.html`. > * for the rare people that actually look at the URL of a page, it > doesn't seem coherent. > > So I would propose to adopt only one of the 3 alternatives and stick > to it as much as possible? Which one should we choose? > > The simplest one (`/log4j/2.x/index.html`) does not require a Web > server and can be viewed locally and can be viewed using the `file:` > scheme in a browser. However I find it less elegant than > `/log4j/2.x/`. > Antora is probably able to generate both versions through some > configuration option, so choosing `/log4j/2.x/` does not preclude the > possibility to generate a different version to check the web site > locally. > > Another canonicalization we might apply regards trailing `.html` > extensions in the URL. The current website supports both: > > * `/log4j2/log4j-api`, > * `/log4j2/log4j-api.html`. > > through `mod_negotiation`. Should we use the version with a trailing > `.html` or without it? The `https://apache.org/` website hides the > `.html` extension in most the links. > > Piotr
Canonicalization of URLs on our website
Hi, I scanned our https://logging.apache.org/ website and found out that the internal hyperlinks between our pages are not consistent. For example links to: https://logging.apache.org/log4j/2.x/ might appear in hyperlinks with an URI path of: * `/log4j/2.x` (which causes a 301 HTTP redirect), * `/log4j/2.x/`, * `/log4j/2.x/index.html`. This lack of uniformity can cause several problems: * search engines might treat those 3 links as equivalent, but not necessarily. * if an `index.html` file is moved, we need to provide a redirect for all 3 alternatives: a recent example is `/log4j/2.x/log4j-1.2-api/index.html` that was moved to `/log4j2/2.x/log4j-1.2-api.html`. * for the rare people that actually look at the URL of a page, it doesn't seem coherent. So I would propose to adopt only one of the 3 alternatives and stick to it as much as possible? Which one should we choose? The simplest one (`/log4j/2.x/index.html`) does not require a Web server and can be viewed locally and can be viewed using the `file:` scheme in a browser. However I find it less elegant than `/log4j/2.x/`. Antora is probably able to generate both versions through some configuration option, so choosing `/log4j/2.x/` does not preclude the possibility to generate a different version to check the web site locally. Another canonicalization we might apply regards trailing `.html` extensions in the URL. The current website supports both: * `/log4j2/log4j-api`, * `/log4j2/log4j-api.html`. through `mod_negotiation`. Should we use the version with a trailing `.html` or without it? The `https://apache.org/` website hides the `.html` extension in most the links. Piotr