Bug#664782: meta: please add support to express content language

2012-04-05 Thread Joey Hess
Martin Michlmayr wrote:
 I've read this page: http://www.w3.org/TR/i18n-html-tech-lang/
 in the meantime which explains the best practice.
 
 In short:
 
  - HTML 5: use html lang = ...
 
  - XHTML 1.0: use html lang=... xml:lang=... xmlns 
 =http://www.w3.org/1999/xhtml;
 
  - XHTML 1.1: use html xml:lang=... xmlns =http://www.w3.org/1999/xhtml;
 
 The patch below implements this:
 
  - it uses the lang_code template variable (already defined by po) to
put the language into page.tmpl

Well, po's LANG_CODE template variable is used on translated pages,
to show the language code of the translation. Although this is only
provided, AFAICS nothing uses that variable in the current template.

A meta lang should surely only affect the master page, not change
the LANG_CODE for translation pages too. I have not verified to my
satisfaction that this gets along right with po.

Beyond that one template variable, there's the question of how this
should interoperate with po generally. Since po assumes that all pages
are in one master language, using it in addition to meta lang
would result in some confusion like a page that claims in the html
header to be the meta lang set value, but has po-plugin inserted content
that indicates it's in eg, English.

-- 
see shy jo


signature.asc
Description: Digital signature


Bug#664782: meta: please add support to express content language

2012-04-05 Thread Martin Michlmayr
* Joey Hess jo...@debian.org [2012-04-05 14:36]:
 A meta lang should surely only affect the master page, not change
 the LANG_CODE for translation pages too. I have not verified to my
 satisfaction that this gets along right with po.
 
 Beyond that one template variable, there's the question of how this
 should interoperate with po generally. Since po assumes that all pages
 are in one master language, using it in addition to meta lang
 would result in some confusion like a page that claims in the html
 header to be the meta lang set value, but has po-plugin inserted content
 that indicates it's in eg, English.

Thanks for your comments.  These are issues I haven't really
considered.

I'll take a look at the po module and think about the issues you
raised.

This will take a few weeks since I'm travelling at the moment.

-- 
Martin Michlmayr
http://www.cyrius.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#664782: meta: please add support to express content language

2012-03-26 Thread Martin Michlmayr
I've read this page: http://www.w3.org/TR/i18n-html-tech-lang/
in the meantime which explains the best practice.

In short:

 - HTML 5: use html lang = ...

 - XHTML 1.0: use html lang=... xml:lang=... xmlns 
=http://www.w3.org/1999/xhtml;

 - XHTML 1.1: use html xml:lang=... xmlns =http://www.w3.org/1999/xhtml;

The patch below implements this:

 - it uses the lang_code template variable (already defined by po) to
   put the language into page.tmpl

 - it accepts a lang in meta to set that variable.

 - it validates the language tag

 - If html5 is not set, it will also generate a meta content-language
   header.

Comments:

 - I don't really like the name lang_code used by po since the RFCs
   talk about language tag or language.  I didn't want to change the
   template variable but I used lang for meta instead.  If you
   prefer consistency, you can rename it to lang_code.  I'm not sure
   if lang_code could be renamed without breaking too much.

 - The regex checking for the language tag could be moved to a global
   function and then po's islanguagecode() replaced with it.  But this
   can wait for a future patch.

diff --git a/IkiWiki/Plugin/meta.pm b/IkiWiki/Plugin/meta.pm
index 220fff9..1dfbf91 100644
--- a/IkiWiki/Plugin/meta.pm
+++ b/IkiWiki/Plugin/meta.pm
@@ -153,6 +153,16 @@ sub preprocess (@) {
$pagestate{$page}{meta}{updated}=$time if defined $time;
}
}
+   elsif ($key eq 'lang') {
+   # Check if a valid language tag is specified according to
+   # BCP 47 at http://tools.ietf.org/html/bcp47
+   # We don't implement all of BCP 47 but we check for the most
+   # common variants of: language, extlang, script and region
+   if (!$value =~ 
(/^[[:alpha:]]{2,3}(-[[:alpha:]]{3})?(-[[:alpha:]]{4})?(-[[:alpha:]]{2}|-\d{3})?$/))
 {
+   return ;
+   }
+   $pagestate{$page}{meta}{lang_code}=$value;
+   }
 
if (! defined wantarray) {
# avoid collecting duplicate data during scan pass
@@ -280,6 +290,11 @@ sub preprocess (@) {
encode_entities($key).
' content='.encode_entities($value).' /';
}
+   elsif ($key eq 'lang') {
+   push @{$metaheaders{$page}}, 'meta http-equiv='.
+   encode_entities('content-language').
+   ' content='.encode_entities($value).' /' if !$config{html5};
+   }
elsif ($key eq 'name') {
push @{$metaheaders{$page}}, scrub('meta '.$key.'='.
encode_entities($value).
@@ -317,6 +332,11 @@ sub pagetemplate (@) {
if exists $pagestate{$page}{meta}{$field}  
$template-query(name = $field);
}
 
+   foreach my $field (qw{lang_code}) {
+   $template-param($field = $pagestate{$page}{meta}{$field})
+   if exists $pagestate{$page}{meta}{$field}  
$template-query(name = $field);
+   }
+
foreach my $field (qw{permalink}) {
if (exists $pagestate{$page}{meta}{$field}  
$template-query(name = $field)) {
eval q{use HTML::Entities};
diff --git a/doc/ikiwiki/directive/meta.mdwn b/doc/ikiwiki/directive/meta.mdwn
index f8494db..3e8d86f 100644
--- a/doc/ikiwiki/directive/meta.mdwn
+++ b/doc/ikiwiki/directive/meta.mdwn
@@ -59,6 +59,13 @@ Supported fields:
   Specifies a short description for the page. This will be put in
   the html header, and can also be displayed by eg, the [[map]] directive.
 
+* lang
+
+  Specifies a language tag (such as en, en-US, zh-Hant, zh-cmn-Hans-CN,
+  or es-419) indicating the language used on this page. This information
+  will be put in the html header. Page templates can access this
+  information via the `lang_code` variable.
+
 * permalink
 
   Specifies a permanent link to the page, if different than the page
diff --git a/templates/page.tmpl b/templates/page.tmpl
index 770ac23..742fd21 100644
--- a/templates/page.tmpl
+++ b/templates/page.tmpl
@@ -1,9 +1,17 @@
 TMPL_IF HTML5!DOCTYPE html
+TMPL_IF LANG_CODE
+html lang=TMPL_VAR LANG_CODE
+TMPL_ELSE
 html
+/TMPL_IF
 TMPL_ELSE!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Strict//EN
  http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd;
+TMPL_IF LANG_CODE
+html lang=TMPL_VAR LANG_CODE xml:lang=TMPL_VAR LANG_CODE 
xmlns=http://www.w3.org/1999/xhtml;
+TMPL_ELSE
 html xmlns=http://www.w3.org/1999/xhtml;
 /TMPL_IF
+/TMPL_IF
 head
 TMPL_IF DYNAMIC
 TMPL_IF FORCEBASEURLbase href=TMPL_VAR FORCEBASEURL /TMPL_ELSE

-- 
Martin Michlmayr
http://www.cyrius.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#664782: meta: please add support to express content language

2012-03-25 Thread Martin Michlmayr
* Joey Hess jo...@debian.org [2012-03-20 16:26]:
  According to
  http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html
  you can specify the language of your content using a header like this:
  meta http-equiv=content-language content=en-GB /
 
 This could be added to the whitelisted values, if it can be proven that
 it's safe to let users set these values, and if a suitable regex were
 developed to block invalid values (that might attempt javascript or
 other content inseration attacks).

It's fairly easy to check for the most common variants of valid
language tags (e.g. de, en-GB, zh-Hant, zh-cmn-Hans-CN, es-419).  I've
tried to do this in the patch below.

BTW, in preprocess() some values are written to $pagestate{$page}{meta}
Is this for tags that are relevant for the page template?

Anyway, I noticed the followning warning at
http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html
| Using the meta element to specify the document-wide default language
| is obsolete. Consider specifying the language on the root element
| instead.

Apparently the preferred way in HTML 5 is to specify something like:
| html lang=en

But given that not most people are not using HTML 5 yet, it might
still be worthwhile to add content-language support to meta.  I'll
leave it up to you.  In any case, I appreciate comments on the patch
(i.e. if I added the code in the right location) since I'm new to
ikiwiki.

--- a/IkiWiki/Plugin/meta.pm2012-03-26 00:43:05.257460466 +0100
+++ b/IkiWiki/Plugin/meta.pm2012-03-26 01:00:27.262627492 +0100
@@ -280,6 +280,17 @@
encode_entities($key).
' content='.encode_entities($value).' /';
}
+   elsif ($key eq 'content-language') {
+   # Check if a valid language tag is specified according to
+   # BCP 47 at http://tools.ietf.org/html/bcp47
+   # We don't implement all of BCP 47 but we check for the most
+   # common variants of: language, extlang, script and region
+   if ($value =~ 
(/^[[:alpha:]]{2,3}(-[[:alpha:]]{3})?(-[[:alpha:]]{4})?(-[[:alpha:]]{2}|-\d{3})?$/))
 {
+   push @{$metaheaders{$page}}, 'meta http-equiv='.
+   encode_entities($key).
+   ' content='.encode_entities($value).' /';
+   }
+   }
elsif ($key eq 'name') {
push @{$metaheaders{$page}}, scrub('meta '.$key.'='.
encode_entities($value).

-- 
Martin Michlmayr
http://www.cyrius.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#664782: meta: please add support to express content language

2012-03-20 Thread Martin Michlmayr
Package: ikiwiki
Version: 3.20120202
Severity: wishlist

According to
http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html
you can specify the language of your content using a header like this:
meta http-equiv=content-language content=en-GB /

It would be great if the meta plug would support this.

-- 
Martin Michlmayr
http://www.cyrius.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#664782: meta: please add support to express content language

2012-03-20 Thread Joey Hess
Martin Michlmayr wrote:
 According to
 http://www.w3.org/TR/2011/WD-html-markup-20110113/meta.http-equiv.content-language.html
 you can specify the language of your content using a header like this:
 meta http-equiv=content-language content=en-GB /
 
 It would be great if the meta plug would support this.

Well, it does, if htmlscrubber is disabled. Then meta can be
used to make arbitrary meta tags:

[[!meta http-equiv=content-language content=en-GB]]

This could be added to the whitelisted values, if it can be proven that
it's safe to let users set these values, and if a suitable regex were
developed to block invalid values (that might attempt javascript or
other content inseration attacks).

-- 
see shy jo


signature.asc
Description: Digital signature


Bug#664782: meta: please add support to express content language

2012-03-20 Thread Martin Michlmayr
* Joey Hess jo...@debian.org [2012-03-20 16:26]:
 Well, it does, if htmlscrubber is disabled. Then meta can be
 used to make arbitrary meta tags:

Thanks, I wasn't aware of that.  I can see now that this is documented
at http://ikiwiki.info/ikiwiki/directive/meta/
However, I didn't see it because it's at the end of the page.  Do you
think you could move it up before the list of supported tags?

 [[!meta http-equiv=content-language content=en-GB]]
 
 This could be added to the whitelisted values, if it can be proven that
 it's safe to let users set these values, and if a suitable regex were
 developed to block invalid values (that might attempt javascript or
 other content inseration attacks).

I guess this might be possible for this particular tag.  I'll have a
look to see what is allowed.

-- 
Martin Michlmayr
http://www.cyrius.com/



-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org