subject:"\[Freeipa\-devel\] not ascii, not utf\-8, what's a parser supposed to do\?"

[Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

2010-01-26 Thread John Dennis

I've run into a small problem with xgettext. By default xgettext expects
all strings in an input file to be encoded in ascii. It will also allow
you to override that by specifying the strings in the input file are utf-8.

In ipappython/ipautil.py line 296 is the following string:

SAFE_STRING_PATTERN = '(^(\000|\n|\r| |:|)|[\000\n\r\200-\377]+|[ ]+$)'

In it's default ascii mode xgettext throws an error claiming the string
is not ascii. In fact xgettext is correct, the string is not ascii. (You
may be wondering why xgettext even cares since it's not marked as
translatable, but xgettext fully parses the input before deciding what
is marked as translatable, bottom line: all strings get parsed and decoded).

If I override the default ascii input by telling xgettext the input
strings are encoded in utf-8 xgettext stops complaining, the string is
properly skipped.

But ... the string isn't really utf-8 either and I'm not sure how
comfortable I feel about telling xgettext every string in IPA is encoded
in utf-8 (when it isn't) just to get around this failure, especially
since the offending string isn't even utf-8. (However, maybe we should
allow utf-8 as an input format since ascii is a subset of utf-8, we
might want to use utf-8 in the future and we can just hold our noses
with respect to the above regular expression).

Do we have a stake in the ground as to what our input strings are
encoded in?

Can you think of another way to express the offending string such that
it doesn't trigger the non-ascii error? The only thing I could think of
and get to work was this:

SAFE_STRING_PATTERN='%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c'
% \

(40,94,40,0,124,10,124,13,124,32,124,58,124,60,41,124,91,0,10,13,128,45,255,93,43,124,91,32,93,43,36,41)

Which is pretty unreadable, but with sufficient comments could be
acceptable.

--
John Dennis jden...@redhat.com

Looking to carve out IT costs?
www.redhat.com/carveoutcosts/

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Re: [Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

2010-01-26 Thread Jason Gerard DeRose

On Tue, 2010-01-26 at 17:28 -0500, John Dennis wrote:
I've run into a small problem with xgettext. By default xgettext expects
all strings in an input file to be encoded in ascii. It will also allow
you to override that by specifying the strings in the input file are utf-8.

In ipappython/ipautil.py line 296 is the following string:

SAFE_STRING_PATTERN = '(^(\000|\n|\r| |:|)|[\000\n\r\200-\377]+|[ ]+$)'

ipapython still has a lot of legacy code, so first thing we should do is
check if we even use SAFE_STRING_PATTERN. Rob, do you know off hand?

If I override the default ascii input by telling xgettext the input
strings are encoded in utf-8 xgettext stops complaining, the string is
properly skipped.

Do we have a stake in the ground as to what our input strings are
encoded in?

Can you think of another way to express the offending string such that
it doesn't trigger the non-ascii error? The only thing I could think of
and get to work was this:

SAFE_STRING_PATTERN='%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c'

% \
(40,94,40,0,124,10,124,13,124,32,124,58,124,60,41,124,91,0,10,13,128,45,255,93,43,124,91,32,93,43,36,41)

Which is pretty unreadable, but with sufficient comments could be
acceptable.

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

Re: [Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

2010-01-26 Thread Howard Chu

John Dennis wrote:
 I've run into a small problem with xgettext. By default xgettext expects 
 all strings in an input file to be encoded in ascii. It will also allow 
 you to override that by specifying the strings in the input file are utf-8.

Do you ever expect to run this stuff on IBM mainframes (i.e., systems using
EBCDIC or some other non-ASCII-related character set) ?

 Can you think of another way to express the offending string such that 
 it doesn't trigger the non-ascii error? The only thing I could think of 
 and get to work was this:
 
 SAFE_STRING_PATTERN='%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c%c'
  
 % \
 (40,94,40,0,124,10,124,13,124,32,124,58,124,60,41,124,91,0,10,13,128,45,255,93,43,124,91,32,93,43,36,41)
 
 Which is pretty unreadable, but with sufficient comments could be 
 acceptable.

I had to use similar hacks when porting OpenSSL to z/OS. It kinda sucks, but
it has the virtue of being completely independent of the machine's language
settings. And frankly, it doesn't take too much explanation in the comments to
be understandable.

-- 
  -- Howard Chu
  CTO, Symas Corp.   http://www.symas.com
  Director, Highland Sun http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

___
Freeipa-devel mailing list
Freeipa-devel@redhat.com
https://www.redhat.com/mailman/listinfo/freeipa-devel

[Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

Re: [Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

Re: [Freeipa-devel] not ascii, not utf-8, what's a parser supposed to do?

3 matches

Site Navigation

Mail list logo

Footer information