Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-11 Thread Juliusz Chroboczek

TK> I have to study about your suggestion and how to use
TK> XtAppAddConverter.

Don't bother, then.  Just encapsulate the parsing into a separate
function (the code is already spaghetti-like enough).

>> Why do you copy the argument into locale_string, rather than directly
>> doing a strcasecmp on the argument?

TK> I thought strcasecmp was not portable...

int
my_strcasecmp(char *a, char *b)
{
while(a && b) {
if(tolower(a) != tolower(b))
return 0;
}
return (!a && !b);
}

No need to do a copy (which I find confusing).

Juliusz

  
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-07 Thread Tomohiro KUBOTA

Hi,

At 06 Jun 2002 18:05:45 +0100,
Juliusz Chroboczek wrote:

> I would suggest modularising the parsing of the argument.  The
> officially sanctioned way is to define a converter, say
> CvtStringToTristate, and register it with Xt.  See lib/Xt/Converters.c
> and man XtAppAddConverter(3x).  Or modularise it by hand (just putting
> it in a separate function).

You mean, parsing of request->misc.locale_str in VTInitialize()
in charproc.c (line 4638 - 4717) ?  I have to study about your
suggestion and how to use XtAppAddConverter.


> Why do you copy the argument into locale_string, rather than directly
> doing a strcasecmp on the argument?

I thought strcasecmp was not portable...

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-07 Thread Tomohiro KUBOTA

Hi,

At Fri, 7 Jun 2002 15:06:09 +0200 (CEST),
Bruno Haible wrote:

> CP1255  (Hebrew)
> CP1258, TCVN  (Thai)
> 
> Either you hardwire them, or you document that xterm should not be
> used with 8-bit fonts in these encodings. (Are there 8-bit fonts for
> CP1255, CP1258, TCVN at all??)

For TIS-620 (ISO-8859-11) Thai, I don't like documentation way
because luit already supports TIS-620 and Thai people apparently
benefit from it.

For CP1258 and TCVN Vietnamese, I think luit will easily support
them, though it doesn't support them now.

For Hebrew, I don't think we have to care about it so far, because
XTerm doesn't support bidi and we are still not agreed whether to
support bidi or not.

I can add ISCII for complex 8bit encodings list.  However, since
XTerm doesn't support complex Indic scripts, I think it can be
neglected so far.


IMO, documentation way should be avoided as far as possible.
It is because, if we need to write a documentation for a language,
speakers of the language will probably need to read tens of documents
to use tens of softwares.  It is just Japanese people are localted
and I imagine people from other countries such as Thai and Vietnam
are also.


Thus, I think hard-coding of "th" and "vi" is a good way so far.

And also, I heard that systems without locale (with X_LOCALE)
do not have MB_CUR_MAX.  If it is true, we also have to have
a fallback for this.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-07 Thread Bruno Haible

Markus Kuhn writes:
> > TCVN  (Thai)
> 
> s/TCVN/TIS-620/g

Oops, I meant:

  CP1258, TCVN  (Vietnamese)
  TIS-620   (Thai)

Bruno
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-07 Thread Markus Kuhn

Bruno Haible wrote on 2002-06-07 13:06 UTC:
> TCVN  (Thai)

s/TCVN/TIS-620/g

As a check list:

The encodings currently supported by glibc 2.2 locales are:

$ echo $(for i in `locale -a` ; do LC_ALL=$i locale charmap ; done | sort -u)
ANSI_X3.4-1968 BIG5 BIG5-HKSCS CP1251 CP1255 EUC-JP EUC-KR EUC-TW GB18030
GB2312 GBK GEORGIAN-PS ISO-8859-1 ISO-8859-13 ISO-8859-14 ISO-8859-15
ISO-8859-2 ISO-8859-3 ISO-8859-5 ISO-8859-6 ISO-8859-7 ISO-8859-8 ISO-8859-9
KOI8-R KOI8-T KOI8-U TIS-620 UTF-8

I'd be surprised to hear anything outside this list that is actually
used under Unix (disclaimer: by people who know what they want to use).

Markus

-- 
Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK
Email: mkuhn at acm.org,  WWW: 

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-07 Thread Bruno Haible

Tomohiro KUBOTA writes:
> However, this algorithm does not work well for
> Thai, for which I'd like to use "3. UTF-8 with luit" behavior.
> 
> Do you have any idea to include 8bit encodings which need
> special processings such as combining?

You are right. I forgot about these encodings. The encodings that
cannot be dealt with nicely in mode 1 are

CP1255  (Hebrew)
CP1258, TCVN  (Thai)

Either you hardwire them, or you document that xterm should not be
used with 8-bit fonts in these encodings. (Are there 8-bit fonts for
CP1255, CP1258, TCVN at all??)

Bruno
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-06 Thread Tomohiro KUBOTA

Hi,

At Thu, 6 Jun 2002 18:53:34 +0200 (CEST),
Bruno Haible wrote:

> The default should follow the locale settings. In detail:
> 
>   - If MB_CUR_MAX == 1:
> 
> Look at the specified main font. If it is an 8-bit font,
> use mode 1. Otherwise use mode 3.
> 
>   - If MB_CUR_MAX > 1:
> 
> If nl_langinfo(CODESET) is "UTF-8", use mode 2.
> Otherwise use mode 3.

I think your opinion is to use this algorithm for "medium" mode
and use this mode for default.

This algorithm is better because it does not hard-code any
locale names.  However, this algorithm does not work well for
Thai, for which I'd like to use "3. UTF-8 with luit" behavior.

Do you have any idea to include 8bit encodings which need
special processings such as combining?

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-06 Thread Juliusz Chroboczek

Nice work.  Just a few minor comments.

I would suggest modularising the parsing of the argument.  The
officially sanctioned way is to define a converter, say
CvtStringToTristate, and register it with Xt.  See lib/Xt/Converters.c
and man XtAppAddConverter(3x).  Or modularise it by hand (just putting
it in a separate function).

Why do you copy the argument into locale_string, rather than directly
doing a strcasecmp on the argument?

It looks like falling back to non-luit operation when luit fails is
implemented -- good.

Other than the above, I don't have any objections right now.

Juliusz

___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-06 Thread Bruno Haible

Tomohiro KUBOTA writes:

>   1. conventional 8bit
>  (8bit encodings are supported by changing fonts)
>   2. UTF-8
>  (UTF-8 is supported)
>   3. UTF-8 with luit
>  (various encodings are supported)
> 
> To select these modes, I added three command options (-en, -lc,
> and -lcc) and two resources (locale and localefilter).
> ...
> However, I feel we need discussion on which mode should be
> the default behavior.

The default should follow the locale settings. In detail:

  - If MB_CUR_MAX == 1:

Look at the specified main font. If it is an 8-bit font,
use mode 1. Otherwise use mode 3.

  - If MB_CUR_MAX > 1:

If nl_langinfo(CODESET) is "UTF-8", use mode 2.
Otherwise use mode 3.

Bruno
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



Re: [I18n][call for comments] XTerm patch to invoke luit

2002-06-06 Thread Juliusz Chroboczek

Cool.  Some more reading ;-)

TK>   1. conventional 8bit
TK>  (8bit encodings are supported by changing fonts)
TK>   2. UTF-8
TK>  (UTF-8 is supported)
TK>   3. UTF-8 with luit
TK>  (various encodings are supported)

XTerm, as you know, can be configured to use either Render fonts or
core fonts.  For the foreseeable future, core still dominate.

The core fonts system has significant performance problems with
Unicode fonts.  Thus, when using core fonts, XTerm in 8-bit mode is
vastly faster and uses less resources than in UTF-8 mode.

I would therefore strongly recommend that ``medium'' mode should be
the default when using core fonts.  Feel free to make ``true'' the
default when using Render fonts.

Juliusz

P.S. And please remember to bully Thomas into changing the XTerm
terminfo entry to use VT 220-style AC.
___
I18n mailing list
[EMAIL PROTECTED]
http://XFree86.Org/mailman/listinfo/i18n



[I18n][call for comments] XTerm patch to invoke luit

2002-06-06 Thread Tomohiro KUBOTA

Hi,

Now I found that i18n-related patches are adopted into XFree86 CVS,
including luit's patches.

Thus, I'd like to start discussion on improving XTerm to invoke
luit to support various major encodings.

I wrote a patch for XTerm to invoke luit.  There are three encoding
modes, where two of them are already available.

  1. conventional 8bit
 (8bit encodings are supported by changing fonts)
  2. UTF-8
 (UTF-8 is supported)
  3. UTF-8 with luit
 (various encodings are supported)

To select these modes, I added three command options (-en, -lc,
and -lcc) and two resources (locale and localefilter).

Resource:

locale (class Locale)
  "true"
UTF-8 mode for UTF-8 locales and luit mode for all others.
As a result, always obeys the current locale.
  "medium"
UTF-8 mode for UTF-8 locales, luit mode for east Asian and
Thai locales, and 8bit mode for all others.
As a result, obeys the current locale in UTF-8, east Asian,
and Thai locales.
  "false"
UTF-8 mode for UTF-8 locales and 8bit mode for all others.
Same as the current version of XTerm.
  other strings
luit mode -- regarded as an encoding name and passed to luit.

localeFilter (class LocaleFilter)
  specify the pathname of "luit".

Command Line Options:

-en encoding
  luit mode with specifying encoding (overriding locale).
-lc
  UTF-8 mode for UTF-8 locales and luit mode for others.
+lc
  UTF-8 mode for UTF-8 locales and 8bit mode for others.
-lcc
  same as localeFilter resource.

I believe that people in this mailing list don't oppose to
the basic idea for XTerm to invoke luit in order to support
various encodings.  (luit was developed for this purpose.)

However, I feel we need discussion on which mode should be
the default behavior.

If we don't need to think about backward compatibility,
I think "locale: true" mode should be the default.  However,
if we think about compatibility, "locale: medium" or
"locale: false" modes can be candidates.

Please discuss on this point.  Of course, other comments
are also welcome.  (I am thinking on sending the patch to
Thomas Dickey or to patch@xfree86 .  This is why I want
comments.)


Here is a patch for XTerm (XFree86 CVS 2002-06-04).

begin 644 xterm-20020604-luit.diff.gz
M'XL("*V&_SP"`WAT97)M+3(P,#(P-C`T+6QU:70N9&EF9@#$.VM7(LF2G^%7
MY'#O;4$*Y*&(NCT[BMC-C(H'L-L[8Q^Z*+*DCD456UGEX_:=_[X1D5DOK$*=
MW;/;9Z:!S(S(R'A'9';/72Y=AXE@-K<\;OBN9W%QR)Y\[BUKK4:CU>@T=G=Z
M7\9,=^9KPS4[L'R<*\XMTV2U8!ULL-3ON6G9/!,PFBW6:K5A/[-F
MF[5:A[O-PSWX`LN*U6IU,]["5SYGOP8.8WNLU3YL'1PV6A+TEU]8K=EL:?NL
MBA]=]LLO158H,%8['8XO!I?#T9?^:#P87G[\>SD]4"DR^',Q&/>FI_VSP65_
MS#ZRG6T`/#X_'WX]'W[Z-+C\E/AY-CCO]V_Z/;:]0Z#LYK>3!.C-_:QG6]SQ
M3X%DAXMBC:D_\9K:H,[^7D[`5>#GI#^ZZ!U?)4;.^L>3ZU$_N6C<&PW/ST^.
M1]/1X-/G"0Y=3\ZZT^'5!,^"6$=GL)X0CHY[??Q>K,+N5\>3SPDR:Z=7H^&O
M_=YD-!Q.@"F)7W+]_PO)22+31P@I&ESVSJ]/B20"#'_#BB+[FV7.N,AL+W5MYKE$W,K4NGLY2YWAV79_W#IO[F_4Y`3M9!%*A.Z%"
MMV.%WNLV4*'QXX`4&L['@&W3KX/3_K3W^7@TEGSY<>-?!K[9U=B-W[M67T8#
MQ]>8L/[%7;-L.7Y%4TR\\8>F*3C\7;Y!ZKY:\SONC[@!JPV/.N*9+=\X!
MIJ9@1F/?LYP[C0&']<#V489_:DI5$M.E=BD<'GM*F2"7EFV4!G`CX<2&`QU1`HR_FP=WP..G0.ZAD>^D0A?+3F
MO`?L%X3M:_Q+G1SGIX8<.CL^'_?7$3SXS4;CDZ>O%I8AD7R9I$84(EHWO8N&
M)Z/K"-H-.9T\<9G0IE?C(\/ST;7DX`$>K';KO3T3JL
M"I]=K=D@%2%:7=?FNL,,UW:]J7M_!)K#';`".8W'8]OBJ%C-T"=B?^`(Z\X!
M7RN72@X?R3D:BB4',O^CU?B&R-06G<>2B7SGM3"!6E2H5]_,@:[-__#AF(O[=N&UL55JP6]5^#
MWK#[\>6GUZ`+$=3E]?FY.HSI>JQL?6P<,>L_F@?P=[5:83_2E(8"M[X!K.\&
MJQ7WRA[_KX`+O_;SFDW#*D5'15$\\[A^K[;[4WZD\38/$'%#+4DCAG6V-/+<
M)8'@4^X8[AQPJ46QH&`+8[E*'P3<#MIE*6(7$)JS;GCYEE7_[(_?LNSX>C)\
MR[JF6B0%`2;UB):!GI+5ZW5F!K;]S-P9?\8#A[JC>(KIB12>^OV3%#?[\('!
M!O!?.?3+)?#EM2[LI9:$\GI%``5N"_[ZTB8NC>;7(PPL:(4JP1#A)F&11TXQ
M3F4%>4([.WO'ZLOA.Q8W,D5CZG@"E`VDW7>6H]OL0;=!7."HXB-;@G'3A.3$
M>E!B>HW3R),-+/S(VI*.A+*_)N!714+(E(@W+B8*_WRK#"_ZIX/KBW=P>MR_
M&"0,)LUP70]2*LRF/'ZG>W..V1>+8BRF6&P%^1B,^R[#XN95>MX4#@H;`WLSSB%4
M?(]RM;7]HDPMO2")#Y9DYWOI##Y:MLD3H[A8`9/9:"[.&H'P"43NH[QR=*[[
M>DXI*J>RRE`Y\[($;3CBZ
M.(8L,CN7KE&]D*@9_VA!CE>L_=AJ;&FL\2?!&:XC_+CJ$`O7\P'&:`6&^*--
M`,#937Q;Y)]RD/[F9KXM7N.;[$.U6HIO<664
MQ2".^SCL!9^@IF+AW"O\211X>5Q:ZI:3HUUR*HM+#./$O"
M[;-&XQ#_2W!I=W\/N20_D$O;V^A#&D^F^@,6*7RNSS%WHCJ%8K@(9@+-#\)V
M(/0[""Z\?E?',D^L]$>G7`$DA&O<__2E7D1W5!.^[EN&*F:WH?)<`J:I[T[Y
M$S?B*N@MR[27,]-'RU],R?>%J%#2!R
M,!G`IYP_;O7+W0KQYZ"QC\4J?JC2/JM2_5&J!=V25BB4MM$+P;<;;^FN?,MU
M+MUC[PYFRH8^GWM3O\)*+6S-`$SU'3`-:N?`/K8A862X>@4*/9T"K+X+\`PS
MZ'A+!`TA9=>GI$7@8[XB^!@V$?^PVQ07V:3CY==T`@8@*
M[5*5&/'DFS&J;)%0!H(R=K"=&`=*'9(*?Y%&5,(+`$HXW`@'^,4'[F%>42ZM
M=_Y8J:).'PI5GCB+0$6?:>Y`')U;8F5#+,62S9ESST93A:R'A!DQ%/O,GX^_
M]*?7DXLK*;B#3EMK-4!R!YT#Y?D+JN7!9+K"BK6L"*#Z91/(*3#1H\PKP7=P
M]X$]9S.@0I:,L^@REQ/E$B6#H#A/\>J>#8FOM-E6HXT2K\)G4VNW56>Q<.,#
M75^P_D^2I7MW<-Q6)=**J$5%D70]959-.(QW2&YB32HK_D_68H=0,(1U[%J0
M2K0'`),35CAAK`M_`\?*SL>&QHR/:PB.&*QB3K4*<]@9C#``44L@UC7*98=5
M61O^#RIL.[QXD#M4*D>)[H01%L@5-GX6?<^#7?NCT7`T'5^-^GC/UHO6+_G2
M6#V7C6JK&FALC2B-E9UJ<\-NAM2>!-.2MP5)F@+)H8(!U3A`8/R0G"T=R>%6
M)B*LVD(T?T:;-JN!1%(K191L2!J,C"(VE8S@NC]VOX6XZ.=[#T9`F6>CFQ*
MEE:UGR?\OK]$JZ"^X!&98ON