Hi David,

On Tue, Sep 7, 2021 at 7:32 PM David Jencks <[email protected]> wrote:
>
> I investigated the patch-sitemap.js question a bit.
>
> Using my `issue-16854-jsonpath-options` camel-website branch, I built the 
> site twice, with site:url set to `https://camel.apache.org` and set to `/`.  
> I didn’t look at every page, but diffing the generated sites seems to 
> consistently show two differences between the results:
>
> - with `https://camel.apache.org` a head <link> element is included such as
>
>     <link rel="canonical" 
> href="https://camel.apache.org/components/latest/ironmq-component.html";>
>
> It’s omitted as expected with `/`.  My understanding is that this needs to be 
> an absolute URI and that it’s function is to help search engines.  However, 
> if we don’t want it, it’s trivial to modify the UI to not generate it.

The Hugo built bits also contain the `<link rel="canonical>`[1]. When
used, recommendation is to place absolute URLs[2]. Now since we use
301 redirects, have sitemap(s), I'm not entirely convinced we need
`<link rel="canonical>` at all. I suggest we remove it (from Hugo
layout and Antora UI).

> - with  `https://camel.apache.org` a footer micro data script is plausible 
> rather than meaningless (the `url:`  entry):

I think the URLs in JSON-LD microdata need to be absolute, I can't
find a definite reference on that, but if I test[3] with relative URL
I get "http://example-test.site/"; for URL, might be a placeholder...

> In both cases I’d expect that to be usable the logo should be an absolute URI?

Yeah, I think all URLs need to be absolute in JSON-LD, but I'm not
100% on that...

> I note that the next bit of micro data, BreadcrumbList, is always generated 
> with absolute URIs with https://camel.apache.org. Shouldn’t this be generated 
> from the site:url?

Well, I'm guessing that was a pragmatic choice when we did that
initially, I do remember some back and forth on that but the context
escapes me

> My conclusions are:
>
> - There is no need for patch-sitemap.js and that the site needs to be 
> generated with the correct site:url.

Let's check sitemaps and JSON-LD microdata if they are all generated
and contain absolute URLs first. otherwise what Dan suggested on #772
could be a good way to go...

> - If inclusion of the <link> element causes a problem it can be removed from 
> the UI.

+1, not sure if we need it at all...

> - The Organization micro data needs it’s logo URL fixed to be absolute based 
> on site:url

+1

> - The BreadcrumbList micro data needs to be generated based on site:url.

+1

> Have I missed something?

We need to double check the XML sitemaps...

zoran

[1] 
https://github.com/apache/camel-website/blob/ad0b8c6efcea9943ae8e690ec020f5589c227a54/layouts/partials/header.html#L28
[2] 
https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls#rel-canonical-link-method
[3] https://search.google.com/test/rich-results?id=CaNl08DpmzTpYqlM7Cg1kA
-- 
Zoran Regvart

Reply via email to