Hi Zoran! I’m going to leave checking the sitemap generation to you because AFAICT they are correct when generated with the site:url.
Would you like to make the UI changes you suggest or shall I write a PR? Thanks! David Jencks > On Sep 8, 2021, at 2:26 AM, Zoran Regvart <zo...@regvart.com> wrote: > > Hi David, > > On Tue, Sep 7, 2021 at 7:32 PM David Jencks <david.a.jen...@gmail.com> wrote: >> >> I investigated the patch-sitemap.js question a bit. >> >> Using my `issue-16854-jsonpath-options` camel-website branch, I built the >> site twice, with site:url set to `https://camel.apache.org` and set to `/`. >> I didn’t look at every page, but diffing the generated sites seems to >> consistently show two differences between the results: >> >> - with `https://camel.apache.org` a head <link> element is included such as >> >> <link rel="canonical" >> href="https://camel.apache.org/components/latest/ironmq-component.html"> >> >> It’s omitted as expected with `/`. My understanding is that this needs to >> be an absolute URI and that it’s function is to help search engines. >> However, if we don’t want it, it’s trivial to modify the UI to not generate >> it. > > The Hugo built bits also contain the `<link rel="canonical>`[1]. When > used, recommendation is to place absolute URLs[2]. Now since we use > 301 redirects, have sitemap(s), I'm not entirely convinced we need > `<link rel="canonical>` at all. I suggest we remove it (from Hugo > layout and Antora UI). > >> - with `https://camel.apache.org` a footer micro data script is plausible >> rather than meaningless (the `url:` entry): > > I think the URLs in JSON-LD microdata need to be absolute, I can't > find a definite reference on that, but if I test[3] with relative URL > I get "http://example-test.site/" for URL, might be a placeholder... > >> In both cases I’d expect that to be usable the logo should be an absolute >> URI? > > Yeah, I think all URLs need to be absolute in JSON-LD, but I'm not > 100% on that... > >> I note that the next bit of micro data, BreadcrumbList, is always generated >> with absolute URIs with https://camel.apache.org. Shouldn’t this be >> generated from the site:url? > > Well, I'm guessing that was a pragmatic choice when we did that > initially, I do remember some back and forth on that but the context > escapes me > >> My conclusions are: >> >> - There is no need for patch-sitemap.js and that the site needs to be >> generated with the correct site:url. > > Let's check sitemaps and JSON-LD microdata if they are all generated > and contain absolute URLs first. otherwise what Dan suggested on #772 > could be a good way to go... > >> - If inclusion of the <link> element causes a problem it can be removed from >> the UI. > > +1, not sure if we need it at all... > >> - The Organization micro data needs it’s logo URL fixed to be absolute based >> on site:url > > +1 > >> - The BreadcrumbList micro data needs to be generated based on site:url. > > +1 > >> Have I missed something? > > We need to double check the XML sitemaps... > > zoran > > [1] > https://github.com/apache/camel-website/blob/ad0b8c6efcea9943ae8e690ec020f5589c227a54/layouts/partials/header.html#L28 > [2] > https://developers.google.com/search/docs/advanced/crawling/consolidate-duplicate-urls#rel-canonical-link-method > [3] https://search.google.com/test/rich-results?id=CaNl08DpmzTpYqlM7Cg1kA > -- > Zoran Regvart