I hope this is the right list to bring this up. Basically, I'm looking for the people who maintain blogs.gnome.org, as in the blogs hosted on it (blogs.gnome.org/view/username), not how b.g.o by itself redirects to planet.gnome.org.
Like others, I have a blogs.gnome.org blog [1]. That link [1] is the 'main page', which shows the standard five most recent posts. The main page also points to permalinks, such as [5]. Furthermore, the date suffix of that URL is hackable to give e.g. "All posts for 2007", "for May 2007" or "for 1 May 2007" [2,3,4]. Here's a pretty cascade of links: [1] http://blogs.gnome.org/view/nigeltao [2] http://blogs.gnome.org/view/nigeltao/2007 [3] http://blogs.gnome.org/view/nigeltao/2007/05 [4] http://blogs.gnome.org/view/nigeltao/2007/05/01 [5] http://blogs.gnome.org/view/nigeltao/2007/05/01/0 Unfortunately, the permalinked page [5] has this HTML snippet in its <head>: <meta name="robots" content="noindex,follow" /> which means that search engines should, uh, not index it. This particularly affects the "Bloggers of Planet GNOME" custom search engine [6], since it's running off planet.gnome.org's OPML file, which (for b.g.o hosted blogs) is pointing (via the ATOM feed [7]) to the permalinked pages (ones that look like [5]), and hence sizable chunks (40ish member blogs) of the p.g.o are currently not searchable, even from vanilla Google (let alone the custom search engine). [6] http://mail.gnome.org/archives/gnome-announce-list/2006-November/msg00030.html [7] http://blogs.gnome.org/syndicate/nigeltao For example, Elijah's entry in that OPML file points to http://blogs.gnome.org/syndicate/newren which contains <guid isPermaLink="true">http://blogs.gnome.org/view/newren/2007/04/18/0</guid> and that page contains <meta name="robots" content="noindex,follow" /> What is odd is that some of the date-filtered posts are marked as indexable, some aren't. For example, [5], [4] and [2] are noindex, but [3] and [1] have <meta name="robots" content="index,follow" /> Note that this says "index", instead of "noindex". This means that the monthly summaries actually do show up on Google. For example, http://www.google.com/search?q=foxybuntu+site%3Ablogs.gnome.org finds my October 2006 summary page http://blogs.gnome.org/view/nigeltao/2006/10 but not the actual post http://blogs.gnome.org/view/nigeltao/2006/10/02/0 Basically, the noindex-ability of the blog pages seems IMHO (1) arbitrary and (2) wrong. Normally, in good open source style, I'd write a patch, but I don't know what software is running blogs.gnome.org, so instead I'm making noise on this list. My suggestion would be to scrap the <meta name="robots" ...> tag entirely - I don't see why would we want to hide from searchers (GNOME users!) the same content we proudly share on p.g.o. If indeed there are good reasons to noindex (e.g. avoiding duplicate search results, although search engines are good enough these days to skip dupes), I say make [1] (the blog's homepage) and [5] (the permalink) index, and [2], [3] and [4] (yearly / monthly / daily summaries) noindex. Cheers, Nigel (wearing his GNOME hat). _______________________________________________ gnome-web-list mailing list [email protected] http://mail.gnome.org/mailman/listinfo/gnome-web-list
