Bug#664782: meta: please add support to express content language
Martin Michlmayr wrote: I've read this page: http://www.w3.org/TR/i18n-html-tech-lang/ in the meantime which explains the best practice. In short: - HTML 5: use html lang = ... - XHTML 1.0: use html lang=... xml:lang=... xmlns =http://www.w3.org/1999/xhtml; - XHTML 1.1: use html xml:lang=... xmlns =http://www.w3.org/1999/xhtml; The patch below implements this: - it uses the lang_code template variable (already defined by po) to put the language into page.tmpl Well, po's LANG_CODE template variable is used on translated pages, to show the language code of the translation. Although this is only provided, AFAICS nothing uses that variable in the current template. A meta lang should surely only affect the master page, not change the LANG_CODE for translation pages too. I have not verified to my satisfaction that this gets along right with po. Beyond that one template variable, there's the question of how this should interoperate with po generally. Since po assumes that all pages are in one master language, using it in addition to meta lang would result in some confusion like a page that claims in the html header to be the meta lang set value, but has po-plugin inserted content that indicates it's in eg, English. -- see shy jo signature.asc Description: Digital signature
Bug#664782: meta: please add support to express content language
* Joey Hess jo...@debian.org [2012-04-05 14:36]: A meta lang should surely only affect the master page, not change the LANG_CODE for translation pages too. I have not verified to my satisfaction that this gets along right with po. Beyond that one template variable, there's the question of how this should interoperate with po generally. Since po assumes that all pages are in one master language, using it in addition to meta lang would result in some confusion like a page that claims in the html header to be the meta lang set value, but has po-plugin inserted content that indicates it's in eg, English. Thanks for your comments. These are issues I haven't really considered. I'll take a look at the po module and think about the issues you raised. This will take a few weeks since I'm travelling at the moment. -- Martin Michlmayr http://www.cyrius.com/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#664782: meta: please add support to express content language
I've read this page: http://www.w3.org/TR/i18n-html-tech-lang/ in the meantime which explains the best practice. In short: - HTML 5: use html lang = ... - XHTML 1.0: use html lang=... xml:lang=... xmlns =http://www.w3.org/1999/xhtml; - XHTML 1.1: use html xml:lang=... xmlns =http://www.w3.org/1999/xhtml; The patch below implements this: - it uses the lang_code template variable (already defined by po) to put the language into page.tmpl - it accepts a lang in meta to set that variable. - it validates the language tag - If html5 is not set, it will also generate a meta content-language header. Comments: - I don't really like the name lang_code used by po since the RFCs talk about language tag or language. I didn't want to change the template variable but I used lang for meta instead. If you prefer consistency, you can rename it to lang_code. I'm not sure if lang_code could be renamed without breaking too much. - The regex checking for the language tag could be moved to a global function and then po's islanguagecode() replaced with it. But this can wait for a future patch. diff --git a/IkiWiki/Plugin/meta.pm b/IkiWiki/Plugin/meta.pm index 220fff9..1dfbf91 100644 --- a/IkiWiki/Plugin/meta.pm +++ b/IkiWiki/Plugin/meta.pm @@ -153,6 +153,16 @@ sub preprocess (@) { $pagestate{$page}{meta}{updated}=$time if defined $time; } } + elsif ($key eq 'lang') { + # Check if a valid language tag is specified according to + # BCP 47 at http://tools.ietf.org/html/bcp47 + # We don't implement all of BCP 47 but we check for the most + # common variants of: language, extlang, script and region + if (!$value =~ (/^[[:alpha:]]{2,3}(-[[:alpha:]]{3})?(-[[:alpha:]]{4})?(-[[:alpha:]]{2}|-\d{3})?$/)) { + return ; + } + $pagestate{$page}{meta}{lang_code}=$value; + } if (! defined wantarray) { # avoid collecting duplicate data during scan pass @@ -280,6 +290,11 @@ sub preprocess (@) { encode_entities($key). ' content='.encode_entities($value).' /'; } + elsif ($key eq 'lang') { + push @{$metaheaders{$page}}, 'meta http-equiv='. + encode_entities('content-language'). + ' content='.encode_entities($value).' /' if !$config{html5}; + } elsif ($key eq 'name') { push @{$metaheaders{$page}}, scrub('meta '.$key.'='. encode_entities($value). @@ -317,6 +332,11 @@ sub pagetemplate (@) { if exists $pagestate{$page}{meta}{$field} $template-query(name = $field); } + foreach my $field (qw{lang_code}) { + $template-param($field = $pagestate{$page}{meta}{$field}) + if exists $pagestate{$page}{meta}{$field} $template-query(name = $field); + } + foreach my $field (qw{permalink}) { if (exists $pagestate{$page}{meta}{$field} $template-query(name = $field)) { eval q{use HTML::Entities}; diff --git a/doc/ikiwiki/directive/meta.mdwn b/doc/ikiwiki/directive/meta.mdwn index f8494db..3e8d86f 100644 --- a/doc/ikiwiki/directive/meta.mdwn +++ b/doc/ikiwiki/directive/meta.mdwn @@ -59,6 +59,13 @@ Supported fields: Specifies a short description for the page. This will be put in the html header, and can also be displayed by eg, the [[map]] directive. +* lang + + Specifies a language tag (such as en, en-US, zh-Hant, zh-cmn-Hans-CN, + or es-419) indicating the language used on this page. This information + will be put in the html header. Page templates can access this + information via the `lang_code` variable. + * permalink Specifies a permanent link to the page, if different than the page diff --git a/templates/page.tmpl b/templates/page.tmpl index 770ac23..742fd21 100644 --- a/templates/page.tmpl +++ b/templates/page.tmpl @@ -1,9 +1,17 @@ TMPL_IF HTML5!DOCTYPE html +TMPL_IF LANG_CODE +html lang=TMPL_VAR LANG_CODE +TMPL_ELSE html +/TMPL_IF TMPL_ELSE!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd; +TMPL_IF LANG_CODE +html lang=TMPL_VAR LANG_CODE xml:lang=TMPL_VAR LANG_CODE xmlns=http://www.w3.org/1999/xhtml; +TMPL_ELSE html xmlns=http://www.w3.org/1999/xhtml; /TMPL_IF +/TMPL_IF head TMPL_IF DYNAMIC TMPL_IF FORCEBASEURLbase href=TMPL_VAR FORCEBASEURL /TMPL_ELSE -- Martin Michlmayr http://www.cyrius.com/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#664782: meta: please add support to express content language
* Joey Hess jo...@debian.org [2012-03-20 16:26]: According to http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html you can specify the language of your content using a header like this: meta http-equiv=content-language content=en-GB / This could be added to the whitelisted values, if it can be proven that it's safe to let users set these values, and if a suitable regex were developed to block invalid values (that might attempt javascript or other content inseration attacks). It's fairly easy to check for the most common variants of valid language tags (e.g. de, en-GB, zh-Hant, zh-cmn-Hans-CN, es-419). I've tried to do this in the patch below. BTW, in preprocess() some values are written to $pagestate{$page}{meta} Is this for tags that are relevant for the page template? Anyway, I noticed the followning warning at http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html | Using the meta element to specify the document-wide default language | is obsolete. Consider specifying the language on the root element | instead. Apparently the preferred way in HTML 5 is to specify something like: | html lang=en But given that not most people are not using HTML 5 yet, it might still be worthwhile to add content-language support to meta. I'll leave it up to you. In any case, I appreciate comments on the patch (i.e. if I added the code in the right location) since I'm new to ikiwiki. --- a/IkiWiki/Plugin/meta.pm2012-03-26 00:43:05.257460466 +0100 +++ b/IkiWiki/Plugin/meta.pm2012-03-26 01:00:27.262627492 +0100 @@ -280,6 +280,17 @@ encode_entities($key). ' content='.encode_entities($value).' /'; } + elsif ($key eq 'content-language') { + # Check if a valid language tag is specified according to + # BCP 47 at http://tools.ietf.org/html/bcp47 + # We don't implement all of BCP 47 but we check for the most + # common variants of: language, extlang, script and region + if ($value =~ (/^[[:alpha:]]{2,3}(-[[:alpha:]]{3})?(-[[:alpha:]]{4})?(-[[:alpha:]]{2}|-\d{3})?$/)) { + push @{$metaheaders{$page}}, 'meta http-equiv='. + encode_entities($key). + ' content='.encode_entities($value).' /'; + } + } elsif ($key eq 'name') { push @{$metaheaders{$page}}, scrub('meta '.$key.'='. encode_entities($value). -- Martin Michlmayr http://www.cyrius.com/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#664782: meta: please add support to express content language
Package: ikiwiki Version: 3.20120202 Severity: wishlist According to http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html you can specify the language of your content using a header like this: meta http-equiv=content-language content=en-GB / It would be great if the meta plug would support this. -- Martin Michlmayr http://www.cyrius.com/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#664782: meta: please add support to express content language
Martin Michlmayr wrote: According to http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html you can specify the language of your content using a header like this: meta http-equiv=content-language content=en-GB / It would be great if the meta plug would support this. Well, it does, if htmlscrubber is disabled. Then meta can be used to make arbitrary meta tags: [[!meta http-equiv=content-language content=en-GB]] This could be added to the whitelisted values, if it can be proven that it's safe to let users set these values, and if a suitable regex were developed to block invalid values (that might attempt javascript or other content inseration attacks). -- see shy jo signature.asc Description: Digital signature
Bug#664782: meta: please add support to express content language
* Joey Hess jo...@debian.org [2012-03-20 16:26]: Well, it does, if htmlscrubber is disabled. Then meta can be used to make arbitrary meta tags: Thanks, I wasn't aware of that. I can see now that this is documented at http://ikiwiki.info/ikiwiki/directive/meta/ However, I didn't see it because it's at the end of the page. Do you think you could move it up before the list of supported tags? [[!meta http-equiv=content-language content=en-GB]] This could be added to the whitelisted values, if it can be proven that it's safe to let users set these values, and if a suitable regex were developed to block invalid values (that might attempt javascript or other content inseration attacks). I guess this might be possible for this particular tag. I'll have a look to see what is allowed. -- Martin Michlmayr http://www.cyrius.com/ -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org