Encode-1.50 and PerlIO::encoding 0.02 released

2002-04-19 Thread Dan Kogai

I am daydreaming that I am a caravan member, driving a herd of 
disobedient camels on the never-ending desert to an oasis called 5.8.0 
when I released new Encode and PerlIO::encoding.  You can get one as 
follows.

Whole:
Encode
http://www.dan.co.jp/~dankogai/Encode-1.50.tar.gz
and CPAN
PerlIO::encoding
http://www.dan.co.jp/~dankogai/PerlIO-encoding-0.02.tar.gz
Diff
Encode
http://www.dan.co.jp/~dankogai/current-1.50.diff.gz
PerlIO::encoding
[ none ]

Diff is pretty big ( 3000 lines) so you should get a whole thing 
instead.

The biggest and the foremost change is the fallback API which is greatly 
enhanced.  NI-XS request of

On Friday, April 19, 2002, at 05:01 , Nick Ing-Simmons wrote:
   check == 11 - silent fail with $string updated (What Tk wants)

is implemented as FB_QUIET.  see below;


Handling Malformed Data
THE CHECK argument is used as follows.  When you omit it,
it is identical to CHECK = 0.

CHECK = Encode::FB_DEFAULT ( == 0)
If CHECK is 0, (en|de)code will put substitution char-
acter in place of the malformed character.  for UCM-
based encodings, subchar will be used.  For Unicode,
\xFFFD is used.  If the data is supposed to be UTF-8,
an optional lexical warning (category utf8) is given.

CHECK = Encode::DIE_ON_ERROR (== 1)
If CHECK is 1, methods will die immediately  with an
error message.  so when CHECK is set,  you should trap
the fatal error with eval{} unless you really want to
let it die on error.

CHECK = Encode::FB_QUIET
If CHECK is set to Encode::FB_QUIET, (en|de)code will
immediately return proccessed part on error, with data
passed via argument overwritten with unproccessed
part.  This is handy when have to repeatedly call
because the source data is chopped in the middle for
some reasons, such as fixed-width buffer.  Here is a
sample code that just does this.

  my $data = '';
  while(defined(read $fh, $buffer, 256)){
# buffer may end in partial character so we append
$data .= $buffer;
$utf8 .= decode($encoding, $data, ENCODE::FB_QUIET);
# $data now contains unprocessed partial character
  }

CHECK = Encode::FB_WARN
This is the same as above, except it warns on error.
Handy when you are debugging the mode above.

perlqq mode (CHECK = Encode::FB_PERLQQ)
For encodings that are implemented by Encode::XS,
CHECK == Encode::FB_PERLQQ turns (en|de)code into
perlqq fallback mode.

When you decode, '\xXX' will be placed where XX is the
hex representation of the octet  that could not be
decoded to utf8.  And when you encode, '\x{}' will
be placed where  is the Unicode ID of the charac-
ter that cannot be found in the character repartoire
of the encoding.

The bitmask
These modes are actually set via bitmask.  here is how
FB_XX are laid out.  for FB_XX you can import via use
Encode qw(:fallbacks) for generic bitmask constants,
you can import via
 use Encode qw(:fallback_all).

 FB_DEFAULT FB_CROAK FB_QUIET FB_WARN  
FB_PERLQQ
 DIE_ON_ERR0x0001 X
 WARN_ON_ER0x0002   X
 RETURN_ON_ERR 0x0004  XX
 LEAVE_SRC 0x0008
 PERLQQ0x0100X

Unemplemented fallback schemes

In future you will be able to use a code reference to a
callback function for the value of CHECK but its API is
still undecided.


Since PerlIO::encoding was uncapable of using this new feature, I have 
updated PerlIO::encoding as well;  Instead of pushing PL_sv_yes to 
stack, now struct PerlIOEncode has one more member, chk, that is 
initialized with Encode::FB_QUIET.

typedef struct {
 PerlIOBuf base; /* PerlIOBuf stuff */
 SV *bufsv;  /* buffer seen by layers above */
 SV *dataSV; /* data we have read from layer below */
 SV *enc;/* the encoding object */
 SV *chk;/* CHECK in Encode methods */
} PerlIOEncode;

Encode now checks the version of PerlIO::encoding and refuse to use an 
obsolete version.  see t/perlio.t on details.

That way PerlIO::encode has no trouble should Encode changes the value 
of FB_QUIET.
As for the partial character problem, I have found it is nearly 
impossible for escape-based encodings to 

[PATCH] Big5-related changes.

2002-04-19 Thread Autrijus Tang

I've been immersed in Big5-related issues in the past few days, and
came back with these last-minute (err, week?) changes before 5.8-RC1.

The Diff contains fixes to TW.pm, Alias.pm, and README.(tw|cn).

(For jhi) README fixes are trivial -- mentions new HanExtra encodings,
fix some China word usage, and add my latin-1 name.

(For dan) big5-hkscs should be upgraded to the 2001 edition, as per
Hong Kong government's decree. It's available separately at:

http://egb.elixus.org/~autrijus/big5-hkscs.ucm.gz

Also, please delete big5.ucm and replace it with big5-eten, at:

http://egb.elixus.org/~autrijus/big5-eten.ucm.gz

I've fixed Alias.pm so big5 aliases to big5-eten. The reason is that
the 'Big5' as originally defined isn't used anywhere on earth; non-
Microsoft systems uses 'big5' to mean 'big5-eten', and Microsoft
uses 'big5' to mean 'cp950'.

It is therefore unwise to have a canonical 'big5' encoding, much like
there should not be a 'gb2312' encoding. Since gb2312 is now aliased
to euc-cn and not cp936, I think big5 should alias to big5-eten and
not cp950.

!--
This is agreeing with T. H. Hsieh's similiar decision on glibc-2.2:
http://www.linux.org.tw/mail-archie/cle-devel/cle-devel.29/msg00100.html;
this agrees with my FreeBSD charmap (and the dominating ETen charmap
in taiwan). The Unicode mappings now also agrees with libiconv-1.7's,
although the latter does not contain the ETen-specific parts.
--

Oh, I just noticed that Dan retained the 'gb2312.ucm' name, although
the encoding is called 'gb2312-raw'. I admit that I don't fully
understand the reason, but if that's to stand, then big5-eten could also
be named 'big5.ucm', and still say 'code_set_name big5-eten', for
consistency's sake.

Thanks,
/Autrijus/

--- /home/autrijus/perl/ext/Encode/TW/TW.pm Fri Apr 19 22:02:58 2002
+++ TW.pm   Sat Apr 20 03:13:07 2002
@@ -30,10 +30,10 @@
 
   Canonical   AliasDescription
   
-  big5/\bbig-?5$/i The original Big5 encoding
-  big5-hkscs  /\bbig5-hk(scs)?$/i
-Big5 plus Cantonese characters in 
-Hong Kong
+  big5-eten   /\bbig-?5$/i Big5 encoding (with ETen extensions)
+ /\bbig5-?et(en)?$/i
+  big5-hkscs  /\bbig5-?hk(scs)?$/i
+Big5 + Cantonese characters in Hong Kong
   MacChineseSimp   Big5 + Apple Vendor Mappings
   cp950Code Page 950 
 = Big5 + Microsoft vendor mappings
@@ -44,11 +44,18 @@
 =head1 NOTES
 
 Due to size concerns, CEUC-TW (Extended Unix Character), II
-(Chinese Character Code for Information Interchange) and CBIG5PLUS
-(CMEX's Big5+) are distributed separately on CPAN, under the name
-LEncode::HanExtra. That module also contains extra China-based encodings.
+(Chinese Character Code for Information Interchange), CBIG5PLUS
+(CMEX's Big5+) and CBIG5EXT (CMEX's Big5e) are distributed separately
+on CPAN, under the name LEncode::HanExtra. That module also contains
+extra China-based encodings.
 
 =head1 BUGS
+
+Since the original Cbig5 encoding (1984) is not supported anywhere
+(glibc and DOS-based systems uses Cbig5 to mean Cbig5-eten; Microsoft
+uses Cbig5 to mean Ccp950), a concious decision was made to alias
+Cbig5 to Cbig5-eten, which is the de facto superset of the original
+big5.
 
 The CCNS11643 encoding files are not complete. For common CCNS11643
 manipulation, please use CEUC-TW in LEncode::HanExtra, which contains
--- /home/autrijus/perl/ext/Encode/lib/Encode/Alias.pm  Wed Apr 10 05:13:28 2002
+++ Alias.pmSat Apr 20 03:11:11 2002
@@ -217,8 +217,9 @@
 define_alias( qr/(?:x-)?windows-949$/i= 'cp949' );
 define_alias( qr/\bks_c_5601-1987$/i  = 'cp949' );
 # for Encode::TW
-   define_alias( qr/\bbig-?5$/i  = 'big5' );
-   define_alias( qr/\bbig5-hk(?:scs)?$/i = 'big5-hkscs' );
+   define_alias( qr/\bbig-?5$/i  = 'big5-eten' );
+   define_alias( qr/\bbig5-?et(?:en)$/i  = 'big5-eten' );
+   define_alias( qr/\bbig5-?hk(?:scs)?$/i= 'big5-hkscs' );
 }
 # utf8 is blessed :)
 define_alias( qr/^UTF-8$/i = 'utf8',);
--- /home/autrijus/perl/README.tw   Thu Apr 18 06:01:01 2002
+++ README.tw   Sat Apr 20 03:15:51 2002
@@ -29,8 +29,8 @@
 
 Encode ©µ¦ù¼Ò²Õ¤ä´©¤U¦C¥¿Å餤¤åªº½s½X¤è¦¡:
 
-big5   ­ì©lªº Big5 ½s½X (§t­Ê¤Ñ¤é¤å¦r§Î)
-big5-hkscs Big5 + ­»´ä¥~¦r¶°
+big5   Big5 ½s½X (§t­Ê¤Ñ©µ¦ù¦r§Î)
+big5-hkscs Big5 + ­»´ä¥~¦r¶°, 2001 ¦~ª©
 cp950  ¦r½X­¶ 950 (Big5 + ·L³n²K¥[ªº¦r²Å)
 
 Á|¨Ò¨Ó»¡, ±N Big5 ½s½XªºÀÉ®×Âন Unicode, ¯­»ÝÁä¤J¤U¦C«ü¥O:
@@ -61,8 +61,10 @@
 ¦pªG»Ý­n§ó¦hªº¤¤¤å½s½X, ¥i¥H±q CPAN (Lhttp://www.cpan.org/) ¤U¸ü
 Encode::HanExtra ¼Ò²Õ. ¥¦¥Ø«e´£¨Ñ¤U¦C½s½X¤è¦¡:
 
+cccii  1980 ¦~¤å«Ø·|ªº¤¤¤å¸ê°T¥æ´«½X
 euc-tw Unix ©µ¦ù¦r²Å¶°, ¥]§t CNS11643 ¥­­± 1-7
 

Re: [PATCH] Big5-related changes.

2002-04-19 Thread Autrijus Tang

On Sat, Apr 20, 2002 at 03:53:46AM +0800, Autrijus Tang wrote:
 The Diff contains fixes to TW.pm, Alias.pm, and README.(tw|cn).
 (For jhi) README fixes are trivial -- mentions new HanExtra encodings,
 fix some China word usage, and add my latin-1 name.

Err, forget the patch chunks, please use the attachments verbatim. Sorry.

/Autrijus/


If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see perlpod manpage) which is
specially designed to be readable as is.

The following documentation is written in Big5 encoding.

¦pªG§A¥Î¤@¯ëªº¤å¦r½s¿è¾¹¾\Äý³o¥÷¤å¥ó, ½Ð©¿²¤¤å¤¤©_¯Sªºµù°O¦r²Å.
³o¥÷¤å¥ó¬O¥H POD (²©ú¤å¥ó®æ¦¡) ¼g¦¨; ³oºØ®æ¦¡¬O¬°¤F¯àÅý¤Hª½±µÅª¨ú,
¦Ó¯S§O³]­pªº. Ãö©ó¦¹®æ¦¡ªº¶i¤@¨B¸ê°T, ½Ð°Ñ¦Ò perlpod ½u¤W¤å¥ó.

=head1 NAME

perltw - ¥¿Å餤¤å Perl «ü«n

=head1 DESCRIPTION

Åwªï¨Ó¨ì Perl ªº¤Ñ¦a!

±q 5.8.0 ª©¶}©l, Perl ¨ã³Æ¤F§¹µ½ªº Unicode (¸U°ê½X) ¤ä´©,
¤]³s±a¤ä´©¤F³\¦h©Ô¤B»y¨t¥H¥~ªº½s½X¤è¦¡; CJK (¤¤¤éÁú) «K¬O¨ä¤¤ªº¤@³¡¥÷.
Unicode ¬O°ê»Ú©Êªº¼Ð·Ç, ¸Õ¹Ï²[»\¥@¬É¤W©Ò¦³ªº¦r²Å: ¦è¤è¥@¬É, ªF¤è¥@¬É,
¥H¤Î¨âªÌ¶¡ªº¤@¤Á (§Æþ¤å, ±Ô§Q¨È¤å, ªü©Ô§B¤å, §Æ§B¨Ó¤å, ¦L«×¤å,
¦L¦a¦w¤å, µ¥µ¥). ¥¦¤]®e¯Ç¤F¦hºØ§@·~¨t²Î»P¥­»O (¦p PC ¤Î³Áª÷¶ð).

Perl ¥»¨­¥H Unicode ¶i¦æ¾Þ§@. ³oªí¥Ü Perl ¤º³¡ªº¦r¦ê¸ê®Æ¥i¥Î Unicode
ªí¥Ü; Perl ªº¨ç¦¡»Pºâ²Å (¨Ò¦p¥¿³Wªí¥Ü¦¡¤ñ¹ï) ¤]¯à¹ï Unicode ¶i¦æ¾Þ§@.
¦b¿é¤J¤Î¿é¥X®É, ¬°¤F³B²z¥H Unicode ¤§«eªº½s½X¤è¦¡Àx¦sªº¸ê®Æ, Perl
´£¨Ñ¤F Encode ³o­Ó¼Ò²Õ, ¥i¥HÅý§A»´©ö¦aŪ¨ú¤Î¼g¤J¦³ªº½s½X¸ê®Æ.

Encode ©µ¦ù¼Ò²Õ¤ä´©¤U¦C¥¿Å餤¤åªº½s½X¤è¦¡ ('big5' ªí¥Ü 'big5-eten'):

big5-eten   Big5 ½s½X (§t­Ê¤Ñ©µ¦ù¦r§Î)
big5-hkscs  Big5 + ­»´ä¥~¦r¶°, 2001 ¦~ª©
cp950   ¦r½X­¶ 950 (Big5 + ·L³n²K¥[ªº¦r²Å)

Á|¨Ò¨Ó»¡, ±N Big5 ½s½XªºÀÉ®×Âন Unicode, ¯­»ÝÁä¤J¤U¦C«ü¥O:

perl -Mencoding=big5,STDOUT,utf8 -pe1  file.big5  file.utf8

Perl ¤]¤ºªþ¤F piconv, ¤@¤ä§¹¥þ¥H Perl ¼g¦¨ªº¦r²ÅÂà´«¤u¨ãµ{¦¡, ¥Îªk¦p¤U:

piconv -f big5 -t utf8  file.big5  file.utf8
piconv -f utf8 -t big5  file.utf8  file.big5

¥t¥~, §Q¥Î encoding ¼Ò²Õ, §A¥i¥H»´©ö¼g¥X¥H¦r²Å¬°³æ¦ìªºµ{¦¡½X, ¦p¤U©Ò¥Ü:

#!/usr/bin/env perl
# ±Ò°Ê big5 ¦r¦ê¸ÑªR; ¼Ð·Ç¿é¥X¤J¤Î¼Ð·Ç¿ù»~³£³]¬° big5 ½s½X
use encoding 'big5', STDIN = 'big5', STDOUT = 'big5';
print length(Àd¾m);#  2 (Âù¤Þ¸¹ªí¥Ü¦r²Å)
print length('Àd¾m');#  4 (³æ¤Þ¸¹ªí¥Ü¦ì¤¸²Õ)
print index(½Î½Î±Ð»£, να); # -1 (¤£¥]§t¦¹¤l¦r¦ê)
print index('½Î½Î±Ð»£', 'να'); #  1 (±q²Ä¤G­Ó¦ì¤¸²Õ¶}©l)

¦b³Ì«á¤@¦C¨Ò¤l¸Ì, ½Î ªº²Ä¤G­Ó¦ì¤¸²Õ»P ½Î ªº²Ä¤@­Ó¦ì¤¸²Õµ²¦X¦¨ Big5
½Xªº ν; ½Î ªº²Ä¤G­Ó¦ì¤¸²Õ«h»P ±Ð ªº²Ä¤@­Ó¦ì¤¸²Õµ²¦X¦¨ α.
³o¸Ñ¨M¤F¥H«e Big5 ½X¤ñ¹ï³B²z¤W±`¨£ªº°ÝÃD.

=head2 ÃB¥~ªº¤¤¤å½s½X

¦pªG»Ý­n§ó¦hªº¤¤¤å½s½X, ¥i¥H±q CPAN (Lhttp://www.cpan.org/) ¤U¸ü
Encode::HanExtra ¼Ò²Õ. ¥¦¥Ø«e´£¨Ñ¤U¦C½s½X¤è¦¡:

cccii   1980 ¦~¤å«Ø·|ªº¤¤¤å¸ê°T¥æ´«½X
euc-tw  Unix ©µ¦ù¦r²Å¶°, ¥]§t CNS11643 ¥­­± 1-7
big5plus¤¤¤å¼Æ¦ì¤Æ§Þ³N±À¼s°òª÷·|ªº Big5+
big5ext ¤¤¤å¼Æ¦ì¤Æ§Þ³N±À¼s°òª÷·|ªº Big5e

¥t¥~, Encode::HanConvert ¼Ò²Õ«h´£¨Ñ¤F²ÁcÂà´«¥Îªº¨âºØ½s½X:

big5-simp   Big5 ¥¿Å餤¤å»P Unicode ²Å餤¤å¤¬Âà
gbk-tradGBK ²Å餤¤å»P Unicode ¥¿Å餤¤å¤¬Âà

­Y·Q¦b GBK »P Big5 ¤§¶¡¤¬Âà, ½Ð°Ñ¦Ò¸Ó¼Ò²Õ¤ºªþªº b2g.pl »P g2b.pl ¨â¤äµ{¦¡,
©Î¦bµ{¦¡¤º¨Ï¥Î¤U¦C¼gªk:

use Encode::HanConvert;
$euc_cn = big5_to_gb($big5); # ±q Big5 Âର GBK
$big5 = gb_to_big5($euc_cn); # ±q GBK Âର Big5

=head2 ¶i¤@¨Bªº¸ê°T

½Ð°Ñ¦Ò Perl ¤ºªþªº¤j¶q»¡©ú¤å¥ó (¤£©¯¥þ¬O¥Î­^¤å¼gªº), ¨Ó¾Ç²ß§ó¦hÃö©ó
Perl ªºª¾ÃÑ, ¥H¤Î Unicode ªº¨Ï¥Î¤è¦¡. ¤£¹L, ¥~³¡ªº¸ê·½¬Û·íÂ×´I:

=head2 ´£¨Ñ Perl ¸ê·½ªººô§}

=over 4

=item Lhttp://www.perl.com/

Perl ªº­º­¶ (¥Ñ¼ÚµÜ§¤½¥qºûÅ@)

=item Lhttp://www.cpan.org/

Perl ºî¦X¨åÂúô (Comprehensive Perl Archive Network)

=item Lhttp://lists.perl.org/

Perl ¶l»¼½×¾Â¤@Äý

=back

=head2 ¾Ç²ß Perl ªººô§}

=over 4

=item Lhttp://www.oreilly.com.tw/chinese/perl/index.html

¥¿Å餤¤åª©ªº¼ÚµÜ§ Perl ®ÑÂÇ

=item Lhttp://groups.google.com/groups?q=tw.bbs.comp.lang.perl

»OÆW Perl ³s½u°Q½×°Ï (¤]´N¬O¦U¤j BBS ªº Perl ³s½uª©)

=back

=head2 Perl ¨Ï¥ÎªÌ¶°·|

=over 4

=item Lhttp://www.pm.org/groups/asia.shtml#Taiwan

»OÆW Perl ±À¼s²Õ¤@Äý

=item Lhttp://irc.elixus.org/

ÃÀ¥ß¨ó½u¤W²á¤Ñ«Ç

=back

=head2 Unicode ¬ÛÃöºô§}

=over 4

=item Lhttp://www.unicode.org/

Unicode ¾Ç³N¾Ç·| (Unicode ¼Ð·Çªº¨î©wªÌ)

=item Lhttp://www.cl.cam.ac.uk/%7Emgk25/unicode.html

Unix/Linux ¤Wªº UTF-8 ¤Î Unicode µª«È°Ý

=head2 ¤¤¤å¤Æ¸ê°T

=item ¬°¤°»ò¥s ¥¿Å餤¤å ¤£¥s ÁcÅ餤¤å?

Lhttp://www.csie.ntu.edu.tw/~b7506051/mozilla/faq.html#faqglossary

=item ¤¤¤å¤Æ³nÅéÁp·ù

Lhttp://www.cpatch.org/

=item Linux ³nÅ餤¤å¤Æ­p¹º

Lhttp://www.linux.org.tw/CLDP/

=back

=head1 SEE ALSO

LEncode, LEncode::TW, Lencoding, Lperluniintro, Lperlunicode

=head1 AUTHORS

Jarkko Hietaniemi Elt[EMAIL PROTECTED]gt

Autrijus Tang (­ð©vº~) Elt[EMAIL PROTECTED]gt

=cut


If you read this file _as_is_, just ignore the funny characters you
see. It is written in the POD format (see perlpod manpage) which is
specially designed to be readable as is.

The following documentation is written in EUC-CN encoding.

Re: [PATCH] Big5-related changes.

2002-04-19 Thread Dan Kogai

On Saturday, April 20, 2002, at 04:53 , Autrijus Tang wrote:
 I've been immersed in Big5-related issues in the past few days, and
 came back with these last-minute (err, week?) changes before 5.8-RC1.

 The Diff contains fixes to TW.pm, Alias.pm, and README.(tw|cn).

Excellent!

 (For dan) big5-hkscs should be upgraded to the 2001 edition, as per
 Hong Kong government's decree. It's available separately at:

 http://egb.elixus.org/~autrijus/big5-hkscs.ucm.gz

 Also, please delete big5.ucm and replace it with big5-eten, at:

 http://egb.elixus.org/~autrijus/big5-eten.ucm.gz

Thus updated.  I needed to update TW/Makefile.PL and 
lib/Encode/Config.pm (so it loads on 'big5-eten' instead of just 
'big5'). but that's not at all a big deal.

 I've fixed Alias.pm so big5 aliases to big5-eten. The reason is that
 the 'Big5' as originally defined isn't used anywhere on earth; non-
 Microsoft systems uses 'big5' to mean 'big5-eten', and Microsoft
 uses 'big5' to mean 'cp950'.

 It is therefore unwise to have a canonical 'big5' encoding, much like
 there should not be a 'gb2312' encoding. Since gb2312 is now aliased
 to euc-cn and not cp936, I think big5 should alias to big5-eten and
 not cp950.

I agree.  AFAIK, Big5 is the only major CJK encoding not endorsed by the 
government.  What's so funny is that there seems less confusions between 
encodings there in Taiwan than in Japan or Korea.  Japan is the worst 
for using Shift_JIS, EUC-JP, ISO-2022-JP(-[12])? and now Unicode (IMHO, 
however, the Japanese people should be proud for making multibyte 
character encoding a reality.  But I can't help wondering this mess is 
way too much a price to pay :)

 Oh, I just noticed that Dan retained the 'gb2312.ucm' name, although
 the encoding is called 'gb2312-raw'. I admit that I don't fully
 understand the reason, but if that's to stand, then big5-eten could also
 be named 'big5.ucm', and still say 'code_set_name big5-eten', for
 consistency's sake.

I renamed big5.ucm to big5-eten.ucm.  -raw that are missing from *.ucm 
filenames is just that they look too funny on 8.3 filesystems, nothing 
more :)

 Thanks,
 /Autrijus/

Xin Ku  Le  !
\x{8f9b}\x{82e6}\x{4e86}

XiaoSi   Dan
\x{5c0f}\x{98fc} \x{5f3e}\n




LAST Call for Papers - 22nd Unicode Conference - Sep 2002 - San Jose,CA

2002-04-19 Thread Misha . Wolf

 Twenty-second International Unicode Conference (IUC22)
 Unicode and the Web: Evolution or Revolution?
http://www.unicode.org/iuc/iuc22
  September 9-13, 2002
  San Jose, California
***
Call for Papers  Just 3 weeks to go  Send in your submission now!
***
 Submissions due: May 10, 2002
Notification date: May 31, 2002
  Completed papers due : June 21, 2002
(in electronic form and camera-ready paper form)

The software industry continues its rapid growth and change.  In this
year alone, Unicode 3.2 was released and several new proposals for the
Internet and the World Wide Web were promoted to standards.  Web
Services is the latest buzz.  Are the vendors of software that support
these technologies keeping up?  How can you be sure that you are
deploying software components that work well together today and in the
future?  This Conference is where you go to find out.  Experts will
describe the latest changes to the Unicode standard and the other
standards used for e-business today.  You will also learn about the best
practices for utilizing, integrating and deploying these technologies
based on real-world examples and experience.  Demonstrations are often
provided.

We invite you to submit papers which either define the software of
tomorrow, demonstrate best practice with today's software, or articulate
problems that must be solved before further advances can occur.  Papers
should discuss subjects in the context of Unicode, internationalization
or localization.  You can view the programs of previous Conferences at:
http://www.unicode.org/unicode/conference/about-conf.html

Conference attendees are generally involved in either the development,
deployment or use of Unicode software or content, or the globalization
of software and the Internet.  They include managers, software engineers,
systems analysts, font designers, graphic designers, content developers,
technical writers, and product marketing personnel.

THEME  TOPICS

Computing with Unicode is the overall theme of the Conference.
Presentations should be geared towards a technical audience.  Topics of
interest include, but are not limited to, the following (within the
context of Unicode, internationalization or localization):

- Web Services
- XML and related specifications
- The World Wide Web (WWW)
- Portable devices
- UTFs: Not enough or too many?
- Security concerns e.g. Avoiding the spoofing of UTF-8 data
- Impact of new encoding standards
- Implementing Unicode: Practical and political hurdles
- Implementing new features of recent versions of Unicode
- Algorithms (e.g. normalization, collation, bidirectional)
- Programming languages and libraries (Java, Perl, et al)
- Search engines
- Library and archival concerns
- Operating systems
- Databases
- Large scale networks
- Government applications
- Evaluations (case studies, usability studies)
- Natural language processing
- Migrating legacy applications
- Cross platform issues
- Printing and imaging
- Optimizing performance of systems and applications
- Testing applications
- Business models for software development (e.g. Open source)

SESSIONS

The Conference Program will provide a wide range of sessions including:
- Keynote presentations
- Workshops/Tutorials
- Technical presentations
- Panel sessions

All sessions except the Workshops/Tutorials will be of 40 minute
duration.  In some cases, two consecutive 40 minute program slots may be
devoted to a single session.

The Workshops/Tutorials will each last approximately three hours.  They
should be designed to stimulate discussion and participation, using
slides and demonstrations.

PUBLICITY

If your paper is accepted, your details will be included in the
Conference brochure and Web pages and the paper itself will appear on a
Conference CD, with an optional printed book of Conference Proceedings.

CONFERENCE LANGUAGE

The Conference language is English.  All submissions, papers and
presentations should be provided in English.

SUBMISSIONS

Submissions MUST contain:

1. An abstract of 150-250 words, consisting of statement of purpose,
   paper description, and your conclusions or final summary.

2. A brief biography.

3. The details listed below:

   SESSION TITLE: _

  _

   TITLE (eg Dr/Mr/Mrs/Ms):   _

   NAME:  _

   JOB TITLE: _

   ORGANIZATION/AFFILIATION:  _

   ORGANIZATION'S WWW URL:_

   OWN WWW URL: 

Tk804 + Encode-1.50 :-) again

2002-04-19 Thread Nick Ing-Simmons

Dan Kogai [EMAIL PROTECTED] writes:
I am daydreaming that I am a caravan member, driving a herd of 
disobedient camels on the never-ending desert to an oasis called 5.8.0 
when I released new Encode and PerlIO::encoding.  You can get one as 
follows.

p4 integrated to //depot/perlio for testing.

Without any changes to Tk804 things improved a bit - only the JP.t and KR.t
tests were failing, and those not failing as badly.

Adding ENCODE_FB_QUIET to Tk's encode glue makes those pass as well.

Suggest one small tweak as in attached patch.

The patch turns off utf8_to_uvuni's warning and checks as only 
thing we are using the UV for is an error message (which in my case
isn't going to be printed as I am in FB_QUIET). Otherwise I get noise
when Tk is groping about in U+FFXX page. 

The indent looks better - but has cuddled else - no big deal.

I was a little surprised that Encode/encode.h gets installed in lib
rather than archlib/CORE but can live with that (makes a kind of sense
it is architecture neutral - but perl.h et. al. go elsewhere).
The snag here is that Makefile.PL has added -I to find perl.h, so I 
have to 
#include ../../Encode/encode.h 
which is portability issue as there is no certainty that lib / archlib 
relative paths work like that. Will tweak Tk's Makefile.PL configure
to hunt down encode.h. 

Will do a spelling patch on the pod(s) when I get a chance.

-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/



--- Encode.xs.ship  Fri Apr 19 19:25:26 2002
+++ Encode.xs   Fri Apr 19 19:27:59 2002
@@ -122,7 +122,7 @@
if (dir == enc-f_utf8) { 
STRLEN clen;
UV ch =
-   utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), clen, 0);
+   utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), clen, 
+UTF8_ALLOW_ANY|UTF8_CHECK_ONLY);
if (check  ENCODE_DIE_ON_ERR) {
Perl_croak(
aTHX_ \\\N{U+% UVxf }\ does not map to %s, %d, 



Re: Tk804 + Encode-1.50 :-) again

2002-04-19 Thread Dan Kogai

On Saturday, April 20, 2002, at 03:45 , Nick Ing-Simmons wrote:
 Dan Kogai [EMAIL PROTECTED] writes:
 I am daydreaming that I am a caravan member, driving a herd of
 disobedient camels on the never-ending desert to an oasis called 5.8.0
 when I released new Encode and PerlIO::encoding.  You can get one as
 follows.

 p4 integrated to //depot/perlio for testing.

 Without any changes to Tk804 things improved a bit - only the JP.t and 
 KR.t
 tests were failing, and those not failing as badly.

I though I relocated perlio-related test in them to t/perlio.t.  Is 
there any left?

 Adding ENCODE_FB_QUIET to Tk's encode glue makes those pass as well.

That was my biggest concern.  So glad to hear that.

 Suggest one small tweak as in attached patch.

 The patch turns off utf8_to_uvuni's warning and checks as only
 thing we are using the UV for is an error message (which in my case
 isn't going to be printed as I am in FB_QUIET). Otherwise I get noise
 when Tk is groping about in U+FFXX page.

Applied, thanks.

 The indent looks better - but has cuddled else - no big deal.

 I was a little surprised that Encode/encode.h gets installed in lib
 rather than archlib/CORE but can live with that (makes a kind of sense
 it is architecture neutral - but perl.h et. al. go elsewhere).
 The snag here is that Makefile.PL has added -I to find perl.h, so I
 have to
 #include ../../Encode/encode.h
 which is portability issue as there is no certainty that lib / archlib
 relative paths work like that. Will tweak Tk's Makefile.PL configure
 to hunt down encode.h.

I wonder if there is more sensible way to install NON-PM files to 
PERL5LIB.  For the time being it is at the mercy of MM.  Though not a 
show stopper I would like Encode to be as clean and standard-compliant 
as possible.  MM is so vast I don't even know how many more features are 
hidden...

 Will do a spelling patch on the pod(s) when I get a chance.

Yes, please.  Emacs doesn't do spellcheck-as-you-type like recent 
mailers in MacOS and Windows :)  (I know you can spellcheck in Emacs but 
I am not sure if it is a good idea to to do so in .pm).

Dan the Encode Maintainer




Re: Tk804 + Encode-1.50 :-) again

2002-04-19 Thread Nicholas Clark

On Sat, Apr 20, 2002 at 04:27:15AM +0900, Dan Kogai wrote:
 Yes, please.  Emacs doesn't do spellcheck-as-you-type like recent 
 mailers in MacOS and Windows :)  (I know you can spellcheck in Emacs but 
 I am not sure if it is a good idea to to do so in .pm).

You underestimate the power of the dark side.

M-x flyspell-mode

Definitely part of the dark side because here it defaults to American.
And then refuses to start because I don't have American dictionaries
installed. ispell has no problem just running and finding the correct
dictionaries.

Nicholas Clark
-- 
Even better than the real thing:http://nms-cgi.sourceforge.net/



[Encode] Dark Side of the Emacs Modes [Was: Re: Tk804 ...]

2002-04-19 Thread Dan Kogai

On Saturday, April 20, 2002, at 05:38 , Nicholas Clark wrote:
 On Sat, Apr 20, 2002 at 04:27:15AM +0900, Dan Kogai wrote:
 Yes, please.  Emacs doesn't do spellcheck-as-you-type like recent
 mailers in MacOS and Windows :)  (I know you can spellcheck in Emacs 
 but
 I am not sure if it is a good idea to to do so in .pm).

 You underestimate the power of the dark side.

 M-x flyspell-mode

I knew something like this existed but never checked the mode name :)
Hmm  Requires ispell...  Piece of cake with portupgrade (could be 
the most widely used ruby program in (Free)BSD world) Oh man! you're 
right!  It even supports mouse (but I usually use emacs only via tty).  
But how about perl jargons?  automagicalNi!  
barewordsNi!  Hmm.  This mode needs some more education :)  
Thanks.  More than 10 years w/ Emacs and still lost in modes

 Definitely part of the dark side because here it defaults to American.

Does it correct pronunciation of the Britons so CAN'T do that sounds 
less obscene :?

 And then refuses to start because I don't have American dictionaries
 installed. ispell has no problem just running and finding the correct
 dictionaries.

Dan the Emacs User, not Elisp Hacker
 ^pretty funny.  MacOS X Mail underline this 
but not
 Emacs.  Is it smart enough to scan $PATH and 
make them
correct?