Re: Canonicalization of URLs on our website

2024-04-20 Thread Piotr P. Karwasz
Hi Gary,

On Sun, 21 Apr 2024 at 00:02, Gary Gregory  wrote:
>
> I agree with Piotr. I prefer the simplest solution, pointing to
> `index.html`, no guessing required.

Personally I prefer the shortest one:

* no www,
* no `index.html`,
* no `.html`.

Piotr


Re: Canonicalization of URLs on our website

2024-04-20 Thread Gary Gregory
I agree with Piotr. I prefer the simplest solution, pointing to
`index.html`, no guessing required.

Gary

On Sat, Apr 20, 2024 at 4:17 PM Piotr P. Karwasz
 wrote:
>
> Hi,
>
> I scanned our https://logging.apache.org/ website and found out that
> the internal hyperlinks between our pages are not consistent. For
> example links to:
>
> https://logging.apache.org/log4j/2.x/
>
> might appear in hyperlinks with an URI path of:
>
> * `/log4j/2.x` (which causes a 301 HTTP redirect),
> * `/log4j/2.x/`,
> * `/log4j/2.x/index.html`.
>
> This lack of uniformity can cause several problems:
>
> * search engines might treat those 3 links as equivalent, but not necessarily.
> * if an `index.html` file is moved, we need to provide a redirect for
> all 3 alternatives: a recent example is
> `/log4j/2.x/log4j-1.2-api/index.html` that was moved to
> `/log4j2/2.x/log4j-1.2-api.html`.
> * for the rare people that actually look at the URL of a page, it
> doesn't seem coherent.
>
> So I would propose to adopt only one of the 3 alternatives and stick
> to it as much as possible? Which one should we choose?
>
> The simplest one (`/log4j/2.x/index.html`) does not require a Web
> server and can be viewed locally and can be viewed using the `file:`
> scheme in a browser. However I find it less elegant than
> `/log4j/2.x/`.
> Antora is probably able to generate both versions through some
> configuration option, so choosing `/log4j/2.x/` does not preclude the
> possibility to generate a different version to check the web site
> locally.
>
> Another canonicalization we might apply regards trailing `.html`
> extensions in the URL. The current website supports both:
>
> * `/log4j2/log4j-api`,
> * `/log4j2/log4j-api.html`.
>
> through `mod_negotiation`. Should we use the version with a trailing
> `.html` or without it? The `https://apache.org/` website hides the
> `.html` extension in most the links.
>
> Piotr


Canonicalization of URLs on our website

2024-04-20 Thread Piotr P. Karwasz
Hi,

I scanned our https://logging.apache.org/ website and found out that
the internal hyperlinks between our pages are not consistent. For
example links to:

https://logging.apache.org/log4j/2.x/

might appear in hyperlinks with an URI path of:

* `/log4j/2.x` (which causes a 301 HTTP redirect),
* `/log4j/2.x/`,
* `/log4j/2.x/index.html`.

This lack of uniformity can cause several problems:

* search engines might treat those 3 links as equivalent, but not necessarily.
* if an `index.html` file is moved, we need to provide a redirect for
all 3 alternatives: a recent example is
`/log4j/2.x/log4j-1.2-api/index.html` that was moved to
`/log4j2/2.x/log4j-1.2-api.html`.
* for the rare people that actually look at the URL of a page, it
doesn't seem coherent.

So I would propose to adopt only one of the 3 alternatives and stick
to it as much as possible? Which one should we choose?

The simplest one (`/log4j/2.x/index.html`) does not require a Web
server and can be viewed locally and can be viewed using the `file:`
scheme in a browser. However I find it less elegant than
`/log4j/2.x/`.
Antora is probably able to generate both versions through some
configuration option, so choosing `/log4j/2.x/` does not preclude the
possibility to generate a different version to check the web site
locally.

Another canonicalization we might apply regards trailing `.html`
extensions in the URL. The current website supports both:

* `/log4j2/log4j-api`,
* `/log4j2/log4j-api.html`.

through `mod_negotiation`. Should we use the version with a trailing
`.html` or without it? The `https://apache.org/` website hides the
`.html` extension in most the links.

Piotr