Encode-1.50 and PerlIO::encoding 0.02 released
I am daydreaming that I am a caravan member, driving a herd of disobedient camels on the never-ending desert to an oasis called 5.8.0 when I released new Encode and PerlIO::encoding. You can get one as follows. Whole: Encode http://www.dan.co.jp/~dankogai/Encode-1.50.tar.gz and CPAN PerlIO::encoding http://www.dan.co.jp/~dankogai/PerlIO-encoding-0.02.tar.gz Diff Encode http://www.dan.co.jp/~dankogai/current-1.50.diff.gz PerlIO::encoding [ none ] Diff is pretty big ( 3000 lines) so you should get a whole thing instead. The biggest and the foremost change is the fallback API which is greatly enhanced. NI-XS request of On Friday, April 19, 2002, at 05:01 , Nick Ing-Simmons wrote: check == 11 - silent fail with $string updated (What Tk wants) is implemented as FB_QUIET. see below; Handling Malformed Data THE CHECK argument is used as follows. When you omit it, it is identical to CHECK = 0. CHECK = Encode::FB_DEFAULT ( == 0) If CHECK is 0, (en|de)code will put substitution char- acter in place of the malformed character. for UCM- based encodings, subchar will be used. For Unicode, \xFFFD is used. If the data is supposed to be UTF-8, an optional lexical warning (category utf8) is given. CHECK = Encode::DIE_ON_ERROR (== 1) If CHECK is 1, methods will die immediately with an error message. so when CHECK is set, you should trap the fatal error with eval{} unless you really want to let it die on error. CHECK = Encode::FB_QUIET If CHECK is set to Encode::FB_QUIET, (en|de)code will immediately return proccessed part on error, with data passed via argument overwritten with unproccessed part. This is handy when have to repeatedly call because the source data is chopped in the middle for some reasons, such as fixed-width buffer. Here is a sample code that just does this. my $data = ''; while(defined(read $fh, $buffer, 256)){ # buffer may end in partial character so we append $data .= $buffer; $utf8 .= decode($encoding, $data, ENCODE::FB_QUIET); # $data now contains unprocessed partial character } CHECK = Encode::FB_WARN This is the same as above, except it warns on error. Handy when you are debugging the mode above. perlqq mode (CHECK = Encode::FB_PERLQQ) For encodings that are implemented by Encode::XS, CHECK == Encode::FB_PERLQQ turns (en|de)code into perlqq fallback mode. When you decode, '\xXX' will be placed where XX is the hex representation of the octet that could not be decoded to utf8. And when you encode, '\x{}' will be placed where is the Unicode ID of the charac- ter that cannot be found in the character repartoire of the encoding. The bitmask These modes are actually set via bitmask. here is how FB_XX are laid out. for FB_XX you can import via use Encode qw(:fallbacks) for generic bitmask constants, you can import via use Encode qw(:fallback_all). FB_DEFAULT FB_CROAK FB_QUIET FB_WARN FB_PERLQQ DIE_ON_ERR0x0001 X WARN_ON_ER0x0002 X RETURN_ON_ERR 0x0004 XX LEAVE_SRC 0x0008 PERLQQ0x0100X Unemplemented fallback schemes In future you will be able to use a code reference to a callback function for the value of CHECK but its API is still undecided. Since PerlIO::encoding was uncapable of using this new feature, I have updated PerlIO::encoding as well; Instead of pushing PL_sv_yes to stack, now struct PerlIOEncode has one more member, chk, that is initialized with Encode::FB_QUIET. typedef struct { PerlIOBuf base; /* PerlIOBuf stuff */ SV *bufsv; /* buffer seen by layers above */ SV *dataSV; /* data we have read from layer below */ SV *enc;/* the encoding object */ SV *chk;/* CHECK in Encode methods */ } PerlIOEncode; Encode now checks the version of PerlIO::encoding and refuse to use an obsolete version. see t/perlio.t on details. That way PerlIO::encode has no trouble should Encode changes the value of FB_QUIET. As for the partial character problem, I have found it is nearly impossible for escape-based encodings to
[PATCH] Big5-related changes.
I've been immersed in Big5-related issues in the past few days, and came back with these last-minute (err, week?) changes before 5.8-RC1. The Diff contains fixes to TW.pm, Alias.pm, and README.(tw|cn). (For jhi) README fixes are trivial -- mentions new HanExtra encodings, fix some China word usage, and add my latin-1 name. (For dan) big5-hkscs should be upgraded to the 2001 edition, as per Hong Kong government's decree. It's available separately at: http://egb.elixus.org/~autrijus/big5-hkscs.ucm.gz Also, please delete big5.ucm and replace it with big5-eten, at: http://egb.elixus.org/~autrijus/big5-eten.ucm.gz I've fixed Alias.pm so big5 aliases to big5-eten. The reason is that the 'Big5' as originally defined isn't used anywhere on earth; non- Microsoft systems uses 'big5' to mean 'big5-eten', and Microsoft uses 'big5' to mean 'cp950'. It is therefore unwise to have a canonical 'big5' encoding, much like there should not be a 'gb2312' encoding. Since gb2312 is now aliased to euc-cn and not cp936, I think big5 should alias to big5-eten and not cp950. !-- This is agreeing with T. H. Hsieh's similiar decision on glibc-2.2: http://www.linux.org.tw/mail-archie/cle-devel/cle-devel.29/msg00100.html; this agrees with my FreeBSD charmap (and the dominating ETen charmap in taiwan). The Unicode mappings now also agrees with libiconv-1.7's, although the latter does not contain the ETen-specific parts. -- Oh, I just noticed that Dan retained the 'gb2312.ucm' name, although the encoding is called 'gb2312-raw'. I admit that I don't fully understand the reason, but if that's to stand, then big5-eten could also be named 'big5.ucm', and still say 'code_set_name big5-eten', for consistency's sake. Thanks, /Autrijus/ --- /home/autrijus/perl/ext/Encode/TW/TW.pm Fri Apr 19 22:02:58 2002 +++ TW.pm Sat Apr 20 03:13:07 2002 @@ -30,10 +30,10 @@ Canonical AliasDescription - big5/\bbig-?5$/i The original Big5 encoding - big5-hkscs /\bbig5-hk(scs)?$/i -Big5 plus Cantonese characters in -Hong Kong + big5-eten /\bbig-?5$/i Big5 encoding (with ETen extensions) + /\bbig5-?et(en)?$/i + big5-hkscs /\bbig5-?hk(scs)?$/i +Big5 + Cantonese characters in Hong Kong MacChineseSimp Big5 + Apple Vendor Mappings cp950Code Page 950 = Big5 + Microsoft vendor mappings @@ -44,11 +44,18 @@ =head1 NOTES Due to size concerns, CEUC-TW (Extended Unix Character), II -(Chinese Character Code for Information Interchange) and CBIG5PLUS -(CMEX's Big5+) are distributed separately on CPAN, under the name -LEncode::HanExtra. That module also contains extra China-based encodings. +(Chinese Character Code for Information Interchange), CBIG5PLUS +(CMEX's Big5+) and CBIG5EXT (CMEX's Big5e) are distributed separately +on CPAN, under the name LEncode::HanExtra. That module also contains +extra China-based encodings. =head1 BUGS + +Since the original Cbig5 encoding (1984) is not supported anywhere +(glibc and DOS-based systems uses Cbig5 to mean Cbig5-eten; Microsoft +uses Cbig5 to mean Ccp950), a concious decision was made to alias +Cbig5 to Cbig5-eten, which is the de facto superset of the original +big5. The CCNS11643 encoding files are not complete. For common CCNS11643 manipulation, please use CEUC-TW in LEncode::HanExtra, which contains --- /home/autrijus/perl/ext/Encode/lib/Encode/Alias.pm Wed Apr 10 05:13:28 2002 +++ Alias.pmSat Apr 20 03:11:11 2002 @@ -217,8 +217,9 @@ define_alias( qr/(?:x-)?windows-949$/i= 'cp949' ); define_alias( qr/\bks_c_5601-1987$/i = 'cp949' ); # for Encode::TW - define_alias( qr/\bbig-?5$/i = 'big5' ); - define_alias( qr/\bbig5-hk(?:scs)?$/i = 'big5-hkscs' ); + define_alias( qr/\bbig-?5$/i = 'big5-eten' ); + define_alias( qr/\bbig5-?et(?:en)$/i = 'big5-eten' ); + define_alias( qr/\bbig5-?hk(?:scs)?$/i= 'big5-hkscs' ); } # utf8 is blessed :) define_alias( qr/^UTF-8$/i = 'utf8',); --- /home/autrijus/perl/README.tw Thu Apr 18 06:01:01 2002 +++ README.tw Sat Apr 20 03:15:51 2002 @@ -29,8 +29,8 @@ Encode ©µ¦ù¼Ò²Õ¤ä´©¤U¦C¥¿Å餤¤åªº½s½X¤è¦¡: -big5 ì©lªº Big5 ½s½X (§tʤѤé¤å¦r§Î) -big5-hkscs Big5 + »´ä¥~¦r¶° +big5 Big5 ½s½X (§tʤѩµ¦ù¦r§Î) +big5-hkscs Big5 + »´ä¥~¦r¶°, 2001 ¦~ª© cp950 ¦r½X¶ 950 (Big5 + ·L³n²K¥[ªº¦r²Å) Á|¨Ò¨Ó»¡, ±N Big5 ½s½XªºÀÉ®×Âন Unicode, ¯»ÝÁä¤J¤U¦C«ü¥O: @@ -61,8 +61,10 @@ ¦pªG»Ýn§ó¦hªº¤¤¤å½s½X, ¥i¥H±q CPAN (Lhttp://www.cpan.org/) ¤U¸ü Encode::HanExtra ¼Ò²Õ. ¥¦¥Ø«e´£¨Ñ¤U¦C½s½X¤è¦¡: +cccii 1980 ¦~¤å«Ø·|ªº¤¤¤å¸ê°T¥æ´«½X euc-tw Unix ©µ¦ù¦r²Å¶°, ¥]§t CNS11643 ¥± 1-7
Re: [PATCH] Big5-related changes.
On Sat, Apr 20, 2002 at 03:53:46AM +0800, Autrijus Tang wrote: The Diff contains fixes to TW.pm, Alias.pm, and README.(tw|cn). (For jhi) README fixes are trivial -- mentions new HanExtra encodings, fix some China word usage, and add my latin-1 name. Err, forget the patch chunks, please use the attachments verbatim. Sorry. /Autrijus/ If you read this file _as_is_, just ignore the funny characters you see. It is written in the POD format (see perlpod manpage) which is specially designed to be readable as is. The following documentation is written in Big5 encoding. ¦pªG§A¥Î¤@¯ëªº¤å¦r½s¿è¾¹¾\Äý³o¥÷¤å¥ó, ½Ð©¿²¤¤å¤¤©_¯Sªºµù°O¦r²Å. ³o¥÷¤å¥ó¬O¥H POD (²©ú¤å¥ó®æ¦¡) ¼g¦¨; ³oºØ®æ¦¡¬O¬°¤F¯àÅý¤Hª½±µÅª¨ú, ¦Ó¯S§O³]pªº. Ãö©ó¦¹®æ¦¡ªº¶i¤@¨B¸ê°T, ½Ð°Ñ¦Ò perlpod ½u¤W¤å¥ó. =head1 NAME perltw - ¥¿Å餤¤å Perl «ü«n =head1 DESCRIPTION Åwªï¨Ó¨ì Perl ªº¤Ñ¦a! ±q 5.8.0 ª©¶}©l, Perl ¨ã³Æ¤F§¹µ½ªº Unicode (¸U°ê½X) ¤ä´©, ¤]³s±a¤ä´©¤F³\¦h©Ô¤B»y¨t¥H¥~ªº½s½X¤è¦¡; CJK (¤¤¤éÁú) «K¬O¨ä¤¤ªº¤@³¡¥÷. Unicode ¬O°ê»Ú©Êªº¼Ð·Ç, ¸Õ¹Ï²[»\¥@¬É¤W©Ò¦³ªº¦r²Å: ¦è¤è¥@¬É, ªF¤è¥@¬É, ¥H¤Î¨âªÌ¶¡ªº¤@¤Á (§Æþ¤å, ±Ô§Q¨È¤å, ªü©Ô§B¤å, §Æ§B¨Ó¤å, ¦L«×¤å, ¦L¦a¦w¤å, µ¥µ¥). ¥¦¤]®e¯Ç¤F¦hºØ§@·~¨t²Î»P¥»O (¦p PC ¤Î³Áª÷¶ð). Perl ¥»¨¥H Unicode ¶i¦æ¾Þ§@. ³oªí¥Ü Perl ¤º³¡ªº¦r¦ê¸ê®Æ¥i¥Î Unicode ªí¥Ü; Perl ªº¨ç¦¡»Pºâ²Å (¨Ò¦p¥¿³Wªí¥Ü¦¡¤ñ¹ï) ¤]¯à¹ï Unicode ¶i¦æ¾Þ§@. ¦b¿é¤J¤Î¿é¥X®É, ¬°¤F³B²z¥H Unicode ¤§«eªº½s½X¤è¦¡Àx¦sªº¸ê®Æ, Perl ´£¨Ñ¤F Encode ³oÓ¼Ò²Õ, ¥i¥HÅý§A»´©ö¦aŪ¨ú¤Î¼g¤J¦³ªº½s½X¸ê®Æ. Encode ©µ¦ù¼Ò²Õ¤ä´©¤U¦C¥¿Å餤¤åªº½s½X¤è¦¡ ('big5' ªí¥Ü 'big5-eten'): big5-eten Big5 ½s½X (§tʤѩµ¦ù¦r§Î) big5-hkscs Big5 + »´ä¥~¦r¶°, 2001 ¦~ª© cp950 ¦r½X¶ 950 (Big5 + ·L³n²K¥[ªº¦r²Å) Á|¨Ò¨Ó»¡, ±N Big5 ½s½XªºÀÉ®×Âন Unicode, ¯»ÝÁä¤J¤U¦C«ü¥O: perl -Mencoding=big5,STDOUT,utf8 -pe1 file.big5 file.utf8 Perl ¤]¤ºªþ¤F piconv, ¤@¤ä§¹¥þ¥H Perl ¼g¦¨ªº¦r²ÅÂà´«¤u¨ãµ{¦¡, ¥Îªk¦p¤U: piconv -f big5 -t utf8 file.big5 file.utf8 piconv -f utf8 -t big5 file.utf8 file.big5 ¥t¥~, §Q¥Î encoding ¼Ò²Õ, §A¥i¥H»´©ö¼g¥X¥H¦r²Å¬°³æ¦ìªºµ{¦¡½X, ¦p¤U©Ò¥Ü: #!/usr/bin/env perl # ±Ò°Ê big5 ¦r¦ê¸ÑªR; ¼Ð·Ç¿é¥X¤J¤Î¼Ð·Ç¿ù»~³£³]¬° big5 ½s½X use encoding 'big5', STDIN = 'big5', STDOUT = 'big5'; print length(Àd¾m);# 2 (Âù¤Þ¸¹ªí¥Ü¦r²Å) print length('Àd¾m');# 4 (³æ¤Þ¸¹ªí¥Ü¦ì¤¸²Õ) print index(½Î½Î±Ð»£, να); # -1 (¤£¥]§t¦¹¤l¦r¦ê) print index('½Î½Î±Ð»£', 'να'); # 1 (±q²Ä¤GӦ줸²Õ¶}©l) ¦b³Ì«á¤@¦C¨Ò¤l¸Ì, ½Î ªº²Ä¤GӦ줸²Õ»P ½Î ªº²Ä¤@Ӧ줸²Õµ²¦X¦¨ Big5 ½Xªº ν; ½Î ªº²Ä¤GӦ줸²Õ«h»P ±Ð ªº²Ä¤@Ӧ줸²Õµ²¦X¦¨ α. ³o¸Ñ¨M¤F¥H«e Big5 ½X¤ñ¹ï³B²z¤W±`¨£ªº°ÝÃD. =head2 ÃB¥~ªº¤¤¤å½s½X ¦pªG»Ýn§ó¦hªº¤¤¤å½s½X, ¥i¥H±q CPAN (Lhttp://www.cpan.org/) ¤U¸ü Encode::HanExtra ¼Ò²Õ. ¥¦¥Ø«e´£¨Ñ¤U¦C½s½X¤è¦¡: cccii 1980 ¦~¤å«Ø·|ªº¤¤¤å¸ê°T¥æ´«½X euc-tw Unix ©µ¦ù¦r²Å¶°, ¥]§t CNS11643 ¥± 1-7 big5plus¤¤¤å¼Æ¦ì¤Æ§Þ³N±À¼s°òª÷·|ªº Big5+ big5ext ¤¤¤å¼Æ¦ì¤Æ§Þ³N±À¼s°òª÷·|ªº Big5e ¥t¥~, Encode::HanConvert ¼Ò²Õ«h´£¨Ñ¤F²ÁcÂà´«¥Îªº¨âºØ½s½X: big5-simp Big5 ¥¿Å餤¤å»P Unicode ²Å餤¤å¤¬Âà gbk-tradGBK ²Å餤¤å»P Unicode ¥¿Å餤¤å¤¬Âà Y·Q¦b GBK »P Big5 ¤§¶¡¤¬Âà, ½Ð°Ñ¦Ò¸Ó¼Ò²Õ¤ºªþªº b2g.pl »P g2b.pl ¨â¤äµ{¦¡, ©Î¦bµ{¦¡¤º¨Ï¥Î¤U¦C¼gªk: use Encode::HanConvert; $euc_cn = big5_to_gb($big5); # ±q Big5 Âର GBK $big5 = gb_to_big5($euc_cn); # ±q GBK Âର Big5 =head2 ¶i¤@¨Bªº¸ê°T ½Ð°Ñ¦Ò Perl ¤ºªþªº¤j¶q»¡©ú¤å¥ó (¤£©¯¥þ¬O¥Î^¤å¼gªº), ¨Ó¾Ç²ß§ó¦hÃö©ó Perl ªºª¾ÃÑ, ¥H¤Î Unicode ªº¨Ï¥Î¤è¦¡. ¤£¹L, ¥~³¡ªº¸ê·½¬Û·íÂ×´I: =head2 ´£¨Ñ Perl ¸ê·½ªººô§} =over 4 =item Lhttp://www.perl.com/ Perl ªºº¶ (¥Ñ¼ÚµÜ§¤½¥qºûÅ@) =item Lhttp://www.cpan.org/ Perl ºî¦X¨åÂúô (Comprehensive Perl Archive Network) =item Lhttp://lists.perl.org/ Perl ¶l»¼½×¾Â¤@Äý =back =head2 ¾Ç²ß Perl ªººô§} =over 4 =item Lhttp://www.oreilly.com.tw/chinese/perl/index.html ¥¿Å餤¤åª©ªº¼ÚµÜ§ Perl ®ÑÂÇ =item Lhttp://groups.google.com/groups?q=tw.bbs.comp.lang.perl »OÆW Perl ³s½u°Q½×°Ï (¤]´N¬O¦U¤j BBS ªº Perl ³s½uª©) =back =head2 Perl ¨Ï¥ÎªÌ¶°·| =over 4 =item Lhttp://www.pm.org/groups/asia.shtml#Taiwan »OÆW Perl ±À¼s²Õ¤@Äý =item Lhttp://irc.elixus.org/ ÃÀ¥ß¨ó½u¤W²á¤Ñ«Ç =back =head2 Unicode ¬ÛÃöºô§} =over 4 =item Lhttp://www.unicode.org/ Unicode ¾Ç³N¾Ç·| (Unicode ¼Ð·Çªº¨î©wªÌ) =item Lhttp://www.cl.cam.ac.uk/%7Emgk25/unicode.html Unix/Linux ¤Wªº UTF-8 ¤Î Unicode µª«È°Ý =head2 ¤¤¤å¤Æ¸ê°T =item ¬°¤°»ò¥s ¥¿Å餤¤å ¤£¥s ÁcÅ餤¤å? Lhttp://www.csie.ntu.edu.tw/~b7506051/mozilla/faq.html#faqglossary =item ¤¤¤å¤Æ³nÅéÁp·ù Lhttp://www.cpatch.org/ =item Linux ³nÅ餤¤å¤Æp¹º Lhttp://www.linux.org.tw/CLDP/ =back =head1 SEE ALSO LEncode, LEncode::TW, Lencoding, Lperluniintro, Lperlunicode =head1 AUTHORS Jarkko Hietaniemi Elt[EMAIL PROTECTED]gt Autrijus Tang (ð©vº~) Elt[EMAIL PROTECTED]gt =cut If you read this file _as_is_, just ignore the funny characters you see. It is written in the POD format (see perlpod manpage) which is specially designed to be readable as is. The following documentation is written in EUC-CN encoding.
Re: [PATCH] Big5-related changes.
On Saturday, April 20, 2002, at 04:53 , Autrijus Tang wrote: I've been immersed in Big5-related issues in the past few days, and came back with these last-minute (err, week?) changes before 5.8-RC1. The Diff contains fixes to TW.pm, Alias.pm, and README.(tw|cn). Excellent! (For dan) big5-hkscs should be upgraded to the 2001 edition, as per Hong Kong government's decree. It's available separately at: http://egb.elixus.org/~autrijus/big5-hkscs.ucm.gz Also, please delete big5.ucm and replace it with big5-eten, at: http://egb.elixus.org/~autrijus/big5-eten.ucm.gz Thus updated. I needed to update TW/Makefile.PL and lib/Encode/Config.pm (so it loads on 'big5-eten' instead of just 'big5'). but that's not at all a big deal. I've fixed Alias.pm so big5 aliases to big5-eten. The reason is that the 'Big5' as originally defined isn't used anywhere on earth; non- Microsoft systems uses 'big5' to mean 'big5-eten', and Microsoft uses 'big5' to mean 'cp950'. It is therefore unwise to have a canonical 'big5' encoding, much like there should not be a 'gb2312' encoding. Since gb2312 is now aliased to euc-cn and not cp936, I think big5 should alias to big5-eten and not cp950. I agree. AFAIK, Big5 is the only major CJK encoding not endorsed by the government. What's so funny is that there seems less confusions between encodings there in Taiwan than in Japan or Korea. Japan is the worst for using Shift_JIS, EUC-JP, ISO-2022-JP(-[12])? and now Unicode (IMHO, however, the Japanese people should be proud for making multibyte character encoding a reality. But I can't help wondering this mess is way too much a price to pay :) Oh, I just noticed that Dan retained the 'gb2312.ucm' name, although the encoding is called 'gb2312-raw'. I admit that I don't fully understand the reason, but if that's to stand, then big5-eten could also be named 'big5.ucm', and still say 'code_set_name big5-eten', for consistency's sake. I renamed big5.ucm to big5-eten.ucm. -raw that are missing from *.ucm filenames is just that they look too funny on 8.3 filesystems, nothing more :) Thanks, /Autrijus/ Xin Ku Le ! \x{8f9b}\x{82e6}\x{4e86} XiaoSi Dan \x{5c0f}\x{98fc} \x{5f3e}\n
LAST Call for Papers - 22nd Unicode Conference - Sep 2002 - San Jose,CA
Twenty-second International Unicode Conference (IUC22) Unicode and the Web: Evolution or Revolution? http://www.unicode.org/iuc/iuc22 September 9-13, 2002 San Jose, California *** Call for Papers Just 3 weeks to go Send in your submission now! *** Submissions due: May 10, 2002 Notification date: May 31, 2002 Completed papers due : June 21, 2002 (in electronic form and camera-ready paper form) The software industry continues its rapid growth and change. In this year alone, Unicode 3.2 was released and several new proposals for the Internet and the World Wide Web were promoted to standards. Web Services is the latest buzz. Are the vendors of software that support these technologies keeping up? How can you be sure that you are deploying software components that work well together today and in the future? This Conference is where you go to find out. Experts will describe the latest changes to the Unicode standard and the other standards used for e-business today. You will also learn about the best practices for utilizing, integrating and deploying these technologies based on real-world examples and experience. Demonstrations are often provided. We invite you to submit papers which either define the software of tomorrow, demonstrate best practice with today's software, or articulate problems that must be solved before further advances can occur. Papers should discuss subjects in the context of Unicode, internationalization or localization. You can view the programs of previous Conferences at: http://www.unicode.org/unicode/conference/about-conf.html Conference attendees are generally involved in either the development, deployment or use of Unicode software or content, or the globalization of software and the Internet. They include managers, software engineers, systems analysts, font designers, graphic designers, content developers, technical writers, and product marketing personnel. THEME TOPICS Computing with Unicode is the overall theme of the Conference. Presentations should be geared towards a technical audience. Topics of interest include, but are not limited to, the following (within the context of Unicode, internationalization or localization): - Web Services - XML and related specifications - The World Wide Web (WWW) - Portable devices - UTFs: Not enough or too many? - Security concerns e.g. Avoiding the spoofing of UTF-8 data - Impact of new encoding standards - Implementing Unicode: Practical and political hurdles - Implementing new features of recent versions of Unicode - Algorithms (e.g. normalization, collation, bidirectional) - Programming languages and libraries (Java, Perl, et al) - Search engines - Library and archival concerns - Operating systems - Databases - Large scale networks - Government applications - Evaluations (case studies, usability studies) - Natural language processing - Migrating legacy applications - Cross platform issues - Printing and imaging - Optimizing performance of systems and applications - Testing applications - Business models for software development (e.g. Open source) SESSIONS The Conference Program will provide a wide range of sessions including: - Keynote presentations - Workshops/Tutorials - Technical presentations - Panel sessions All sessions except the Workshops/Tutorials will be of 40 minute duration. In some cases, two consecutive 40 minute program slots may be devoted to a single session. The Workshops/Tutorials will each last approximately three hours. They should be designed to stimulate discussion and participation, using slides and demonstrations. PUBLICITY If your paper is accepted, your details will be included in the Conference brochure and Web pages and the paper itself will appear on a Conference CD, with an optional printed book of Conference Proceedings. CONFERENCE LANGUAGE The Conference language is English. All submissions, papers and presentations should be provided in English. SUBMISSIONS Submissions MUST contain: 1. An abstract of 150-250 words, consisting of statement of purpose, paper description, and your conclusions or final summary. 2. A brief biography. 3. The details listed below: SESSION TITLE: _ _ TITLE (eg Dr/Mr/Mrs/Ms): _ NAME: _ JOB TITLE: _ ORGANIZATION/AFFILIATION: _ ORGANIZATION'S WWW URL:_ OWN WWW URL:
Tk804 + Encode-1.50 :-) again
Dan Kogai [EMAIL PROTECTED] writes: I am daydreaming that I am a caravan member, driving a herd of disobedient camels on the never-ending desert to an oasis called 5.8.0 when I released new Encode and PerlIO::encoding. You can get one as follows. p4 integrated to //depot/perlio for testing. Without any changes to Tk804 things improved a bit - only the JP.t and KR.t tests were failing, and those not failing as badly. Adding ENCODE_FB_QUIET to Tk's encode glue makes those pass as well. Suggest one small tweak as in attached patch. The patch turns off utf8_to_uvuni's warning and checks as only thing we are using the UV for is an error message (which in my case isn't going to be printed as I am in FB_QUIET). Otherwise I get noise when Tk is groping about in U+FFXX page. The indent looks better - but has cuddled else - no big deal. I was a little surprised that Encode/encode.h gets installed in lib rather than archlib/CORE but can live with that (makes a kind of sense it is architecture neutral - but perl.h et. al. go elsewhere). The snag here is that Makefile.PL has added -I to find perl.h, so I have to #include ../../Encode/encode.h which is portability issue as there is no certainty that lib / archlib relative paths work like that. Will tweak Tk's Makefile.PL configure to hunt down encode.h. Will do a spelling patch on the pod(s) when I get a chance. -- Nick Ing-Simmons http://www.ni-s.u-net.com/ --- Encode.xs.ship Fri Apr 19 19:25:26 2002 +++ Encode.xs Fri Apr 19 19:27:59 2002 @@ -122,7 +122,7 @@ if (dir == enc-f_utf8) { STRLEN clen; UV ch = - utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), clen, 0); + utf8n_to_uvuni(s+slen, (SvCUR(src)-slen), clen, +UTF8_ALLOW_ANY|UTF8_CHECK_ONLY); if (check ENCODE_DIE_ON_ERR) { Perl_croak( aTHX_ \\\N{U+% UVxf }\ does not map to %s, %d,
Re: Tk804 + Encode-1.50 :-) again
On Saturday, April 20, 2002, at 03:45 , Nick Ing-Simmons wrote: Dan Kogai [EMAIL PROTECTED] writes: I am daydreaming that I am a caravan member, driving a herd of disobedient camels on the never-ending desert to an oasis called 5.8.0 when I released new Encode and PerlIO::encoding. You can get one as follows. p4 integrated to //depot/perlio for testing. Without any changes to Tk804 things improved a bit - only the JP.t and KR.t tests were failing, and those not failing as badly. I though I relocated perlio-related test in them to t/perlio.t. Is there any left? Adding ENCODE_FB_QUIET to Tk's encode glue makes those pass as well. That was my biggest concern. So glad to hear that. Suggest one small tweak as in attached patch. The patch turns off utf8_to_uvuni's warning and checks as only thing we are using the UV for is an error message (which in my case isn't going to be printed as I am in FB_QUIET). Otherwise I get noise when Tk is groping about in U+FFXX page. Applied, thanks. The indent looks better - but has cuddled else - no big deal. I was a little surprised that Encode/encode.h gets installed in lib rather than archlib/CORE but can live with that (makes a kind of sense it is architecture neutral - but perl.h et. al. go elsewhere). The snag here is that Makefile.PL has added -I to find perl.h, so I have to #include ../../Encode/encode.h which is portability issue as there is no certainty that lib / archlib relative paths work like that. Will tweak Tk's Makefile.PL configure to hunt down encode.h. I wonder if there is more sensible way to install NON-PM files to PERL5LIB. For the time being it is at the mercy of MM. Though not a show stopper I would like Encode to be as clean and standard-compliant as possible. MM is so vast I don't even know how many more features are hidden... Will do a spelling patch on the pod(s) when I get a chance. Yes, please. Emacs doesn't do spellcheck-as-you-type like recent mailers in MacOS and Windows :) (I know you can spellcheck in Emacs but I am not sure if it is a good idea to to do so in .pm). Dan the Encode Maintainer
Re: Tk804 + Encode-1.50 :-) again
On Sat, Apr 20, 2002 at 04:27:15AM +0900, Dan Kogai wrote: Yes, please. Emacs doesn't do spellcheck-as-you-type like recent mailers in MacOS and Windows :) (I know you can spellcheck in Emacs but I am not sure if it is a good idea to to do so in .pm). You underestimate the power of the dark side. M-x flyspell-mode Definitely part of the dark side because here it defaults to American. And then refuses to start because I don't have American dictionaries installed. ispell has no problem just running and finding the correct dictionaries. Nicholas Clark -- Even better than the real thing:http://nms-cgi.sourceforge.net/
[Encode] Dark Side of the Emacs Modes [Was: Re: Tk804 ...]
On Saturday, April 20, 2002, at 05:38 , Nicholas Clark wrote: On Sat, Apr 20, 2002 at 04:27:15AM +0900, Dan Kogai wrote: Yes, please. Emacs doesn't do spellcheck-as-you-type like recent mailers in MacOS and Windows :) (I know you can spellcheck in Emacs but I am not sure if it is a good idea to to do so in .pm). You underestimate the power of the dark side. M-x flyspell-mode I knew something like this existed but never checked the mode name :) Hmm Requires ispell... Piece of cake with portupgrade (could be the most widely used ruby program in (Free)BSD world) Oh man! you're right! It even supports mouse (but I usually use emacs only via tty). But how about perl jargons? automagicalNi! barewordsNi! Hmm. This mode needs some more education :) Thanks. More than 10 years w/ Emacs and still lost in modes Definitely part of the dark side because here it defaults to American. Does it correct pronunciation of the Britons so CAN'T do that sounds less obscene :? And then refuses to start because I don't have American dictionaries installed. ispell has no problem just running and finding the correct dictionaries. Dan the Emacs User, not Elisp Hacker ^pretty funny. MacOS X Mail underline this but not Emacs. Is it smart enough to scan $PATH and make them correct?