Re: IO-Tty-1.02 on z/os

2006-02-16 Thread Nick Ing-Simmons
Mohammad Yaseen <[EMAIL PROTECTED]> writes: >I'm trying to build IO-Tty-1.02 on z/os using perl-5.8.7, i'm getting this >error messages > > Now let's see what we can find out about your system >(logfiles of failing tests are available in the conf/ dir)... >FSUM7332 syntax error: got (, expecti

Re: PLAN9

2006-02-02 Thread Nick Ing-Simmons
Mohammad Yaseen <[EMAIL PROTECTED]> writes: > Hi, > > I'm using perl-5.8.7. > What is PLAN9? An operating system. > Ans what is plan9 directory is meant for in the source directory. Building perl for/on a plan9 system > > Thanks and Regards > Yaseen > > > >

Re: Encode the subject line in MIME header using Perl 5.6

2005-12-30 Thread Nick Ing-Simmons
John Delacour <[EMAIL PROTECTED]> writes: >use MIME::QuotedPrint; >$qp = encode_qp ($_, ''); >print "=?UTF-8?Q?$qp?=" . $/; That isn't quite right. MIME::QuotedPrint does NOT encode space or tab. RFC2047 says: " The "Q" encoding is similar to the "Quoted-Printable" content- transfer-encodin

Re: Local installation

2005-12-29 Thread Nick Ing-Simmons
David Olsson <[EMAIL PROTECTED]> writes: >What is the easiest way to install Encode for a single >user? Same as any other CPAN module. perl Makefile.PL PREFIX=/home/cedric/perl_modules make make install then: #!/usr/bin/perl use lib '/home/cedric/perl_modules'; # or if script is relative to in

Re : Encode the subject line in MIME header using Perl 5 . 6 my €0 . 02

2005-12-29 Thread Nick Ing-Simmons
Wing <[EMAIL PROTECTED]> writes: >"John Delacour" <[EMAIL PROTECTED]> wrote in message >news:[EMAIL PROTECTED] >> At 12:42 am +0800 28/12/05, wing wrote: >> >>>I need to encode the subject line in a MIME header in UTF8 (something like >>>Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=).

Re: iso-2022-jp encoding on EBCDIC

2005-12-15 Thread Nick Ing-Simmons
Rajarshi Das <[EMAIL PROTECTED]> writes: > Hi, > > The following two line script gives an error on z/OS : "Unknown encoding > 'iso-2022- > jp' at line ..". > - > use Encode; use encoding 'iso-2022-jp'; > On an EBCDIC platform like z/OS that is going to be one st

Re: data written on ebcdic

2005-07-08 Thread Nick Ing-Simmons
Rajarshi Das <[EMAIL PROTECTED]> writes: >I run the following on an ebcdic platform >(perl-5.8.6), > >$BOM = chr(0xFEFF); >open(UTF_PL, ">:raw:encoding(utf16le)", "utf.pl") >or die "utf.pl($enc,$tag): $!"; >print UTF_PL $BOM; >print UTF_PL "1"; > > > >should the data that is written using

Re: chr function on z/OS.

2005-06-06 Thread Nick Ing-Simmons
Rajarshi Das <[EMAIL PROTECTED]> writes: >Hi, > >I have a basic doubt regarding unicode and z/OS >(ebcdic : ibm-1047). > >$a = chr(0x00A1); > >$b = chr(0xA1); > >Should $a and $b be equal or yield different results >on ebcdic ? As far as I know they should be the same. chr() takes a number and t

Re: UTF-8 and matching [^\s]

2005-02-02 Thread Nick Ing-Simmons
Stuart Hughes <[EMAIL PROTECTED]> writes: >Hi everyone, > >I've run into problems matching the regex [^\s] on RedHat 8/9 and the >version of perl shipped with it (5.8.0). It isn't 5.8.0 is 5.8.0-with-RedHatBugs :-( To be fair to them it is some development track thing - there was an experimenta

Re: losing utf8 flag on strings?

2005-01-15 Thread Nick Ing-Simmons
Paul Bijnens <[EMAIL PROTECTED]> writes: >Can anyone explain what I'm doing wrong? As I recall HTML::Entities has a build-time option as to whether it handles Unicode - do you know if yours has that turned on? What locale are you in (i.e. is it something that has â as a native 8-bit coding (Window

Re: "Undocumented feature" of Encode::{en,de}code()

2004-12-23 Thread Nick Ing-Simmons
Radoslaw Zielinski <[EMAIL PROTECTED]> writes: >Hello, > >What's the point of lines 151 and 167 in Encode.pm? Respectively: > ># sub encode >$_[1] = $string if $check; > ># sub decode >$_[1] = $octets if $check; > >I really can't see a point in overwriting the input value... Why

Re: Make Encode.pm support the real UTF-8

2004-12-06 Thread Nick Ing-Simmons
Bjoern Hoehrmann <[EMAIL PROTECTED]> writes: > >>> Now that we have this problem, introducing more places where one needs >>> to carefully check the documentation what is considered UTF-8 does not >>> seem like the best option, having decode_utf8() and decode(utf8=>...) >>> mean some- thing differe

Re: :encoding() layer modifies read-only scalars

2004-11-29 Thread Nick Ing-Simmons
Bjoern Hoehrmann <[EMAIL PROTECTED]> writes: >* Bjoern Hoehrmann wrote: >> Enocde 2.08, PerlIO::scalar 0.02, ActivePerl 5.8.2, >> >> #!perl -w >> use strict; >> use warnings; >> use Encode; >> >> my $string = encode(UTF16 => ""); >> >> for (qw/UTF-8 UTF-16LE UTF-16BE UTF-32LE UTF-32BE/)

Re: Website encoding

2004-11-19 Thread Nick Ing-Simmons
Rick Measham <[EMAIL PROTECTED]> writes: >That being the case, I grab the charset and use Encode's decode function >to turn it into 'perl's internal format' .. which in 5.8.5 is utf8 >right? As it happens the answer is "maybe", but it is the _internal_ form it is none of your business ;-) - so

Re: clearing the utf8 flag

2004-11-10 Thread Nick Ing-Simmons
Paul Bijnens <[EMAIL PROTECTED]> writes: >I have a program that reads and writes (among others) strings that >should be utf8 encoded. I say "should", because somewhere deep >inside the dark corners of that program, sometimes, the utf8 flag on >a string is lost. (I'm still investigating where, tips

Re: problem installing Encode module

2004-11-09 Thread Nick Ing-Simmons
Piyush Shourie <[EMAIL PROTECTED]> writes: >Hi, > > > >I am not able to compile Encode module, as one of the pre-requisites of >Encode module, Text::Iconv does not compile on Windows platform. When did that happen? >I am >currently using ActiveState Perl 5.6.1, and cannot upgrade to the newer >

Re: Question about converting utf8 to ascii and char refs

2004-10-27 Thread Nick Ing-Simmons
Aaron Siladi <[EMAIL PROTECTED]> writes: >I have a UTF-8 string which I want to output as ascii and have the UTF8 >characters converted to numeric character references. > > > >I tried using Encode with the FB_HTMLCREFS fail back option enabled, but for >the 2 byte UTF8 characters, 2 incorrect char

RE: Resolving charset names with Encode

2004-10-25 Thread Nick Ing-Simmons
Martin 'Kingpin' Thurn <[EMAIL PROTECTED]> writes: > It seems to me that the main problem is that Encode does not use IANA >registered names. It is supposed to have IANA names as aliases. >And ebcdic-cp-us didn't work because of a bug in >I18N::Charset (sorry about that). > The proper solutio

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-25 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Oct 25, 2004, at 03:01, Nick Ing-Simmons wrote: >> But as Dan said at the start \xF6 on its own (say as 1023 octet >> in a 0..1023 1024-octet buffer is not a fail. >> Changing that will make :encoding() layer have problems as buf

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Oct 24, 2004, at 18:34, Rafael Garcia-Suarez wrote: >> Welcome to backward compatibility hell :) > >Hell it was but seems like I came up with a way out (yay). > >>> I just want Encode::utf8->decode() to make sure Encode:RETURN_ON_ERR >>> is >>> on when the

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Rafael Garcia-Suarez <[EMAIL PROTECTED]> writes: >Dan Kogai wrote: >> This makes perl-5.8.6 happy but the problem is that I have made >> Encode::utf8 so that it accepts fallback values like Encode::XS (upon >> the request by Bjoern Hoehrmann via RT). Encode::utf8 used to return >> immediately a

Re: Encode-2.07 vs. PerlIO::encoding

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Oct 24, 2004, at 06:41, Rafael Garcia-Suarez wrote: >> Dan Kogai wrote: >>> Within less than 24hrs I resorted to release version 2.07. What the >>> heck. 5.8.6 is soon >> >> I applied 2.07 to bleadperl, and looks like something is broken in >> PerlIO:

Re: [cpan #8089] Encode::utf8::decode_xs does not check partial chars

2004-10-24 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Oct 23, 2004, at 01:04, Bjoern Hoehrmann wrote: >> C12a in Unicode 4.0.1 notes >> >> [...] >> For example, in UTF-8 every code unit of the form 110 must be >> followed by a code unit of the form 10xx. A sequence such as >> 110x 0xxx is

Re: Resolving charset names with Encode

2004-10-24 Thread Nick Ing-Simmons
Bjoern Hoehrmann <[EMAIL PROTECTED]> writes: >Hi, > > What is currently the best way to resolve charset names to use them >with Encode.pm? I would have expected that e.g. > > Encode::decode('ebcdic-cp-us', '') > >would just work but it does not appear to know that alias. Then I've >tried to use I

Re: Does LWP know anything (or need to know anything) about Unicode?

2004-10-11 Thread Nick Ing-Simmons
Rick Measham <[EMAIL PROTECTED]> writes: >G'day Unicode Gurus and other assorted members of the perl Unicode >community. > >I have a script that attempts to collect translations from Babelfish. >I've posted it below. > >It uses LWP::Useragent to turn an English phrase into Japanese (or any >other l

Re: Problem with 'Mailformed UTF-8 caracter' warning messageswhen I use Switch.pm package!

2004-09-04 Thread Nick Ing-Simmons
Rafael Garcia-Suarez <[EMAIL PROTECTED]> writes: >> I have a problem to avoid "Mailformed UTF-8 caracter" messages when I use the >> Switch.pm module on SuSE 9.1 Profesional with english or german language >> settings. > >Could we see a snippet of code that demonstrates the problem ? >The version

Re: Weird behavior of encoding & open pragmas

2004-08-17 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> writes: >$ perl -e 'use encoding "ISO-8859-2"; use open ":encoding(ISO-8859-2)"; print >ord($ARGV[0]), chr(260), $ARGV[0], "\n"' Ä "\x{00a1}" does not map to iso-8859- >2 at -e line 1. 260Ä\x{00a1} > >I don't understand it: ord($ARGV[0]) is 260, chr(260

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >Nick Ing-Simmons wrote: >> Once we had >> >> use encoding qw(locale); >> >> But it did not work well as not all locale implementations >> give the API to return the encoding. >> (And even en_GB

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> writes: >W liÅcie z pon, 16-08-2004, godz. 16:54 +0300, Jarkko Hietaniemi >napisaÅ: > >> > The encoding pragma partially works. It doesn't influence assumed >> > encoding of files opened without specifying the encoding, nor handling >> > of filenames, a

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> writes: >> But there is a simple workaround for that, as perluniintro would tell >> you: the encoding pragma. > >The encoding pragma partially works. It doesn't influence assumed >encoding of files opened without specifying the encoding, nor handling >o

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> writes: >W liÅcie z pon, 16-08-2004, godz. 11:16 +0100, Nick Ing-Simmons napisaÅ: > >> >Perl treats them inconsistently. On one hand they are read from files >> >and used as filenames without any recoding, which

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Dominic Mitchell <[EMAIL PROTECTED]> writes: >Marcin 'Qrczak' Kowalczyk wrote: >> This leaves chr() ambiguous, so there should be some other function for >> making Unicode code points, as chr should probably be kept for >> compatibility to mean the default encoding. > >In the past when I've needed

Re: Interpretation of non-UTF8 strings

2004-08-16 Thread Nick Ing-Simmons
Marcin 'Qrczak' Kowalczyk <[EMAIL PROTECTED]> writes: >Should strings without the UTF8 flag be interpreted in the default >encoding of the current locale or in ISO-8859-1? This is a tricky question - and status quo is likely to remain for compatibility reasons. > >Perl treats them inconsistently

Re: Always setting UTF-8 flag - am I bad?

2004-08-05 Thread Nick Ing-Simmons
Erland Sommarskog <[EMAIL PROTECTED]> writes: >Jean-Michel Hiver ([EMAIL PROTECTED]) writes: >> Erland Sommarskog wrote: >>>I working with an XS module that passes queries to MS SQL Server and >>>returns data back using SQLOLEDB. MS SQL Server stores Unicode data >>>as UTF-16. Also, all metadata is

Re: Extending the scope of a PERLIO Layer across packages

2004-07-20 Thread Nick Ing-Simmons
Frank Krout <[EMAIL PROTECTED]> writes: >I'm trying to support a legacy multilingual website that has been upgraded >to perl58 and now using PERLIO to properly encode html output. (STDOUT is >mapped via binmode) I have had this marked as needing a detailed/reasoned reply now for over a year, so I

Re: Unicode filenames on Windows with Perl >= 5.8.2

2004-06-25 Thread Nick Ing-Simmons
Nicholas Clark <[EMAIL PROTECTED]> writes: >On Mon, Jun 21, 2004 at 08:46:07AM -0700, Jan Dubois wrote: > >> I think it is possible, but it requires someone to both do the work and >> to argue for it on P5P. Without this "champion", I don't see it >> happening at all. > >Nor do I. But P5P isn't big

Re: BOM and principle of least surprise

2004-05-18 Thread Nick Ing-Simmons
Erland Sommarskog <[EMAIL PROTECTED]> writes: >Jarkko Hietaniemi ([EMAIL PROTECTED]) writes: >> Nick Ing-Simmons wrote: >>> This thread started as complaint that perl5 can't read a >>> script saved as UCS-2/UTF-16 or whatever Windows uses. >> >>

Re: utf8, japanese, web-pages: beginning to see the light...

2004-05-18 Thread Nick Ing-Simmons
Marco Baroni <[EMAIL PROTECTED]> writes: >A few days ago, I queried this list about my problems with a script >that finds the charset of Japanese web pages and translates their text >into utf-8. > >The following solution, proposed by Nick Ing-Simmons, worked for my &

Re: utf8, japanese, web-pages, the horror, the horror...

2004-05-11 Thread Nick Ing-Simmons
Marco Baroni <[EMAIL PROTECTED]> writes: >Thanks for your advice... the output does look different, this time, >but it still doesn't look like utf8... (I get the same error with >recode). > >If somebody could suggest a way to convert to another encoding, or a >better way to identify the encodin

Re: BOM and principle of least surprise

2004-05-11 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >Nick Ing-Simmons wrote: > >> Larry Wall <[EMAIL PROTECTED]> writes: >> >>>Right now, the meaning of "text" is subject to severe distortions >>>due to legacy issues. But in the long run, &qu

Re: BOM and principle of least surprise

2004-05-10 Thread Nick Ing-Simmons
Larry Wall <[EMAIL PROTECTED]> writes: > >Right now, the meaning of "text" is subject to severe distortions >due to legacy issues. But in the long run, "text" is going to mean >Unicode, and that probably means a UTF-8 file encoding at least in >the western world, Microsoft seem to be somewhat fo

Re: Printing Unicode from XS

2004-05-10 Thread Nick Ing-Simmons
Erland Sommarskog <[EMAIL PROTECTED]> writes: >I have to admit that I have not completely researched what the documentation >has to say, but this is not only a question on how, but also on which way >to take. > >I'm working on an XS module that will interact with the SQL Server OLE DB >Provider, th

Re: BOM and principle of least surprise

2004-04-26 Thread Nick Ing-Simmons
Erland Sommarskog <[EMAIL PROTECTED]> writes: >Nick Ing-Simmons ([EMAIL PROTECTED]) writes: >> Erland Sommarskog <[EMAIL PROTECTED]> writes: >>>I would really expect someone to have done this already, but I see no >>>reference to such a module. Or layer-di

Re: Decoding more languages

2004-04-13 Thread Nick Ing-Simmons
ote that you cannot (in general) "print" the combined string as either 8859-1 or 8859-2 > >Thank you. > > >- Original Message - >From: "Nick Ing-Simmons" <[EMAIL PROTECTED]> >To: <[EMAIL PROTECTED]> >Sent: Tuesday, April 13, 2004 11:13 AM >Subje

Re: Creating a UTF-8 web page

2004-04-08 Thread Nick Ing-Simmons
Octavian Rasnita <[EMAIL PROTECTED]> writes: >I have tried the following script: > >#!/perl/bin/perl -wC > >use Encode; > >my $text = Encode::decode('latin2', 'mÃta'); > >binmode(STDOUT, ":utf8"); > >print "Content-Type: text/html; Charset=UTF-8\n\n"; >print Encode::encode('utf8', $text); > You ha

Re: question about PerlIO

2004-04-02 Thread Nick Ing-Simmons
Cremers LMG <[EMAIL PROTECTED]> writes: >I've used your clear description in 'Encode.html' to convert from cp1252 to >utf8, >using the lines: > >use Encode; >open (INPUT,"<:encoding(cp1252)","$in")|| die "FileOpen fail: $in $!\n"; >open (OUT,">:utf8","$out") || die "FileOpen 1 failed: $out : $!\n";

Re: BOM and principle of least surprise

2004-03-31 Thread Nick Ing-Simmons
Erland Sommarskog <[EMAIL PROTECTED]> writes: > >It seems that the only way out, is to first open the file in plain mode, binmode I suspect. >look at the first three bytes, and if it is BOM, close the file, open >again with the appropriate options and discard the BOM. You don't have to close it

Re: BOM and principle of least surprise

2004-03-19 Thread Nick Ing-Simmons
Erland Sommarskog <[EMAIL PROTECTED]> writes: > > open (F, '<:encoding(ucs-2le)', 'rÃkmacka-ucs2.txt'); > >And one things seems just plain wrong to me: The "\n" is written as >0A 0D to the file, not 000A, 000D. But may there is some more manual >reading I need to do find out how to do it. 0

Re: Converting string to UTF-16LE

2004-03-01 Thread Nick Ing-Simmons
Larry Wall <[EMAIL PROTECTED]> writes: >On Wed, Feb 25, 2004 at 06:19:02PM +0100, Sebastian Lehmann wrote: >: For this example the search value will be "Ibaïez". Because of the search >: isn't case-sensitive, all letters should be uppercased, using the uc method. > >I don't think this is your probl

Re: Converting string to UTF-16LE

2004-02-26 Thread Nick Ing-Simmons
Sebastian Lehmann <[EMAIL PROTECTED]> writes: >Hello, > >i use a perl script to search different files. The search values are given >from a HTML page, the results are displayed on this page, too. The files are >saved in the UTF16LE format, therefore i will open them with the following >open command

Re: Question regarding Unicode handling in perl: auto-sensing

2004-02-22 Thread Nick Ing-Simmons
Andreas Jaekel <[EMAIL PROTECTED]> writes: >Dear Perl Dieties! > >I've been trying to figure this out for myself for a couple >of hours now, but I got to the point were I gave up and decided >that I'll have to bother you. Hope you don't mind. > >My task is the following, and I'm running out of ide

Re: How to convert base64 string to utf-8

2004-02-06 Thread Nick Ing-Simmons
ALexander N. Treyner <[EMAIL PROTECTED]> writes: >Hi John, >Your code works perfect. >But I found one strange thing. >For example I have next string: > > hello hello world > >that converted by the mail client to > > hello =?windows-1255?Q?=F9=EC=E5=ED_hello_world?= >

Re: How to convert base64 string to utf-8

2004-02-06 Thread Nick Ing-Simmons
Guido Flohr <[EMAIL PROTECTED]> writes: >ALexander N. Treyner wrote: >> Hello All, >> I'm using utf-8 Postgres database, where I save strings in many languages. >> I have to match the database with strings encoded in mime base64 or >> quoted-printable format. Like next: >> =?utf-8?B?15TXoNeUINee16

Re: Patch for tests on un*x

2004-01-29 Thread Nick Ing-Simmons
Brad Guillory <[EMAIL PROTECTED]> writes: >Last spring someone committed a patch to fix the tests on windows >platforms (see Change 18966 by [EMAIL PROTECTED] on 2003/03/14 04:20:51). >This broke the tests on my Redhat box. Here is a compromise patch: > >--- t/enc_module.t.orig 2004-01-28 11:34

Re: \W and [\W]

2004-01-02 Thread Nick Ing-Simmons
Eric Cholet <[EMAIL PROTECTED]> writes: >Le 1 janv. 04, Ã 17:50, Rafael Garcia-Suarez a Ãcrit : > >> +(However, and as a limitation of the current implementation, using >> +C<\w> or C<\W> I a C<[...]> character class will still match >> +with byte semantics.) > >I don't think it applies to \w, only

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jungshik Shin <[EMAIL PROTECTED]> writes: >> That will work if there's en_GB.UTF-8 available for him in his >> particular Unixes and assuming using UTF-8 locales won't break other >> things. Just so we get this clear. A year or so back I - as a Unicode advocate - tried to switch to en_GB.utf8. Wi

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >>> What I wish is that the whole current locale system would curl up and >>> die. >> >> As you'd agree, it's only 'encoding' part that has to die. > >Oh no, there are plenty of parts in it that I wish would die :-) >(though the coupling of encoding i

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jungshik Shin <[EMAIL PROTECTED]> writes: > > Then, he should switch to en_GB.UTF-8. I probably will. >Besides, he implied that >he still uses ISO-8859-1 for files whose names can be covered by >ISO-8859-1, which is why I wrote about mixing up two encodings >in a single file system _under_ his

Re: perlunicode comment - when Unicode does not happen

2003-12-28 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >>Let's not 'fix' it (not carve it on a stone), but offer a few >> well-thought-out options. For instance, Perl may offer (not that these >> are particularly well-thought-out) 'just treat this as a sequence of >> octets', 'locale', and 'unicode'. 'l

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Nick Ing-Simmons
Ed Batutis <[EMAIL PROTECTED]> writes: > >The point I'm trying to make (agreeing with most perl 5 porters I suspect) >is that supporting Shift-JIS in Perl5 is hopeless. I seem to recall my Japanese collegues at TI using it years ago... just treating it as octets and with a 'jperl' which did a lit

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Nick Ing-Simmons
Jungshik Shin <[EMAIL PROTECTED]> writes: >On Mon, 22 Dec 2003, Jarkko Hietaniemi wrote: > >> (AFAIK) W2K and later _are able_ to use UTF-16LE encoded Unicode for >> filenames, >> but because of backward compatibility reasons using 8-bit codepages is >> much >> more likely. > > No. _Both_ NTFS (on

Re: perlunicode comment - when Unicode does not happen

2003-12-23 Thread Nick Ing-Simmons
Ed Batutis <[EMAIL PROTECTED]> writes: >"Jarkko Hietaniemi" <[EMAIL PROTECTED]> wrote in message >news:[EMAIL PROTECTED] > >> You do know that ... >Yes. > >If wctomb or mbtowc are to be used, then Perl's Unicode must be converted >either to the locale's wide char or to its multibyte. Locale is pe

Re: using Encode module

2003-12-11 Thread Nick Ing-Simmons
Dana Sharvit - M <[EMAIL PROTECTED]> writes: >Hi , >I am using the Encode module (perl 5.8)to convert a string from utf8 to big >5. >There is something that I do not understand that I thought you may help >with: >The input to the program is a file that contains a utf8 string, >The encoding works pr

RE: unicode on windows

2003-11-21 Thread Nick Ing-Simmons
Edward Batutis <[EMAIL PROTECTED]> writes: >> Also each character when I view it via character >> listing of IME pad, it has three hex numbers. > >Seeing three hex numbers per character is a sure sign you've got utf8. You >need to convert the characters to the platform encoding before using 'open'

Re: possible patch for Perl 5.8.2's Alias.pm

2003-10-30 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >> a year ago, there was a discussion on this list about Encode not >> recognizing "TIS-620" as alias for "iso-8859-11": >> >> http://nntp.x.perl.org/group/perl.unicode/1656 >> >> In the latest release of Encode::Alias (1.38 from Encode 1.9801, >> inc

Re: roundtrip conversion for Mac OS CJK encodings

2003-09-28 Thread Nick Ing-Simmons
SADAHIRO Tomoyuki <[EMAIL PROTECTED]> writes: >Hello. > >For round-trip fidelity, Mac OS CJK encodings include many characters >with mapping a single character in a Mac OS encoding >to a sequence of standard Unicode characters. >(cf. ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/README.TXT )

Re: UCM file and combining character sequences

2003-09-23 Thread Nick Ing-Simmons
Sadahiro Tomoyuki <[EMAIL PROTECTED]> writes: >> Are the Unicode character sequences in [1] normalized? >> Can you explain what the diacritics mean I assume '`^ etc. are tone marks? >> What do the macron and dot and dots-below signify? > >Apparently POJ system uses ten vowels >(a, e, i, m, ng, o, o

Re: UCM file and combining character sequences

2003-09-22 Thread Nick Ing-Simmons
Hank Tt <[EMAIL PROTECTED]> writes: >Hi, > >I'm trying to make a UCM file to feed to enc2xs. The legacy encoding for >Taiwanese romanization *must* have its code points mapped to Unicode >character sequences, for the simple reason that the UCS lacks the >corresponding precomposed characters (and i

Re: UCM file and combining character sequences

2003-09-22 Thread Nick Ing-Simmons
Hank Tt <[EMAIL PROTECTED]> writes: >Hi, > >I'm trying to make a UCM file to feed to enc2xs. The legacy encoding for >Taiwanese romanization *must* have its code points mapped to Unicode >character sequences, for the simple reason that the UCS lacks the >corresponding precomposed characters (and i

Re: Invalid Uicode characters

2003-09-17 Thread Nick Ing-Simmons
John Delacour <[EMAIL PROTECTED]> writes: >At 11:31 am +0100 16/9/03, [EMAIL PROTECTED] wrote: >>Dear PERLists, >> >>I am running Perl 5.8. and trying to filter out some invalid Unicode >>characters from Unicoded texts of some South Asian languages. There >>are 28 such characters in my data (all

Re: Inverse of /\p{script}/

2003-08-31 Thread Nick Ing-Simmons
Owen Taylor <[EMAIL PROTECTED]> writes: >On Fri, 2003-08-29 at 11:14, Nick Ing-Simmons wrote: >> > >> >We're dropping support for this code and for core X fonts >> >in the next release of Pango, >> >> In favour of what? (FreeType on client

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Jungshik Shin <[EMAIL PROTECTED]> writes: > > If you want, you can take a look at nsFontMetricsGTK.cpp file >of mozilla. Can you pass on my admiration to the Mozilla team - its handling of these issues in version 1.4 is so much better than ye-olde Netscape. >You can view that huge file (over 6

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Owen Taylor <[EMAIL PROTECTED]> writes: >On Fri, 2003-08-29 at 11:14, Nick Ing-Simmons wrote: >> > >> >We're dropping support for this code and for core X fonts >> >in the next release of Pango, >> >> In favour of what? (FreeType on client

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Owen Taylor <[EMAIL PROTECTED]> writes: >You might want to look at what we did for Pango - see >pango/modules/basic/tables-big.i in >ftp://ftp.gtk.org/pub/gtk/v2.2/pango-1.2.5.tar.gz. [There may come a time when I just give up Tcl/Tk and implement perl/Tk OO interface on top of gtk instead. But

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: > >But that is not good enough for cases below because... > (Hiragana | Katakana | Han) => 'jisx0208.1990-0' > >This is very wrong because jisx0208.1990-0 only contains \p{Han} that >appears in Japanese (JIS X 0208, to be exact). On the other hand, >ji

Re: Inverse of /\p{script}/

2003-08-29 Thread Nick Ing-Simmons
Jarkko Hietaniemi <[EMAIL PROTECTED]> writes: >On Thu, Aug 28, 2003 at 03:16:20PM +0100, [EMAIL PROTECTED] wrote: >> >> Does the existing perl5.8.* Unicode support have a way to efficently >> determine which script(s) or block (in unicode sense) a code point belongs >> to? > > use Unicode::

Re: bytes::substr() ?

2003-08-27 Thread Nick Ing-Simmons
<[EMAIL PROTECTED]> writes: >On Wed, Aug 27, 2003 at 06:04:48PM +0200, Guido Flohr wrote: >> Hi, >> >> [EMAIL PROTECTED] wrote: >> >I'm working with a byte oriented protocol, and need to extract byte n1 >> >through >> >byte n2 from a string. No problem (honest;-)) (At least in perl5.8 ...) A b

Re: Encode from XS

2003-08-14 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Saturday, Aug 9, 2003, at 00:08 Asia/Tokyo, Simon Cozens wrote: >> This is sad and I ought to know the answer, but... >> >> Can someone give me a few quick examples of creating Encode::XS objects >> to do simple transcoding, from XS? > >You should check the

Re: Encode from XS

2003-08-11 Thread Nick Ing-Simmons
Simon Cozens <[EMAIL PROTECTED]> writes: >[EMAIL PROTECTED] (Simon Cozens) writes: >> Can someone give me a few quick examples of creating Encode::XS objects >> to do simple transcoding, from XS? > >I think I expressed myself badly. Perhaps I don't mean "creating" Encode::XS >objects, but instantia

Re: IO::Socket::INET and utf-8

2003-07-02 Thread Nick Ing-Simmons
merged my patched version into mainline (5.9.*) and I would expect it to be in 5.8.1 as well. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: IO::Socket::INET and utf-8

2003-07-01 Thread Nick Ing-Simmons
Nick Ing-Simmons <[EMAIL PROTECTED]> writes: >Martin J. Evans <[EMAIL PROTECTED]> writes: >>> >>> A socket is a file handle so : >>> >>> binmode($sock,":utf8"); >>> >>> should work. >>I'm obviously mi

Re: IO::Socket::INET and utf-8

2003-07-01 Thread Nick Ing-Simmons
Martin J. Evans <[EMAIL PROTECTED]> writes: >Dan Kogai wrote: >> On Tuesday, July 1, 2003, at 05:49 PM, Martin J. Evans wrote: >> >>> Nick Ing-Simmons wrote: >>> >>>> Martin J. Evans <[EMAIL PROTECTED]> writes: >>>> A socke

Re: IO::Socket::INET and utf-8

2003-07-01 Thread Nick Ing-Simmons
;, 'anything') or >binmode (FH, ":utf8"))? A socket is a file handle so : binmode($sock,":utf8"); should work. > >I'm using Perl 5.8.0. > >Thanks. > >Martin -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: utf8_heavy noise

2003-06-23 Thread Nick Ing-Simmons
e only thing it means these days is "my script is in UTF-8". And even that is a potential dead-end - scripts in other encodings don't have a unique pragma so why does UTF-8 ? >For "all the other" things, I think there can't ever be a consensus >for "all those

Re: Unicode and XS

2003-03-23 Thread Nick Ing-Simmons
he UTF-EBCDIC stuff. The snag being that perl's 'utf8' encoding uses core's SvUTF8 scheme - which is just fine if it _IS_ UTF-8 What we need for Encode::* to have its _own_ UTF-8 and UTF-EBCDIC encode/decode independant of what core is using... > >Thanks >Brian -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Need some help in understanding Unicode in Perl...

2003-02-21 Thread Nick Ing-Simmons
e success displaying email using Encode::'s euc-cn and Unicode fonts, but as I can't read many chineese characters this was mainly just as an exercise. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [PATCH] %open::modes to hold ${^OPEN} values for run-time access

2003-01-29 Thread Nick Ing-Simmons
modes> >+hash. Its keys are the caller's packages (or the second-level calling >+package if the caller is C); the values are hash references >+with two keys: C holds the input mode, and C for the output >+mode. > > If you have a legacy encoding, you can use the C<:encoding(...)> tag. > >-BEGIN PGP SIGNATURE- >Version: GnuPG v1.2.1 (FreeBSD) > >iD8DBQE+NvePtLPdNzw1AaARAm5fAJ9cURDB+e2FO88Aa+ULzJxACOWwAACfSiy0 >i/vf6NBdmU5ynqXHU66nRso= >=keaI >-END PGP SIGNATURE- -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Encode utf-16 problem

2003-01-06 Thread Nick Ing-Simmons
kes NI-XS to fix the prob The partial char stuff needs the encoding to use same rules as Encode::XS will take a look if it isn't fixed yet. > >Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Fallback problems with Encode

2002-12-28 Thread Nick Ing-Simmons
s undefined. > >Maybe I am misunderatanding Encode's conversion operations, so >maybe it is a problem with the documentation not being clear about >this behavior. But IMHO, what I am getting appears to be incorrect. And IMHO you are getting what I "designed" it to produce ;-) I strongly recommend doing conversions in two steps explcitly - that way you can get whatever you want. I am also willing to concede that documentation could be improved :-) > >--ewh -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Fallback problems with Encode

2002-12-23 Thread Nick Ing-Simmons
is (almost) by design - i.e. it happened that way and I decided it made a kind of sense. Using ASCII is considered as asking for 7-bit ness. If you want one of 8-bit super-sets use the one you want (iso8859-1 aka latin1 most likely, but perhaps one of the windows ones with smart quotes, m-dash etc.) There is a good case for a "latin-guess" or latin-superset or ... which trys to do the right thing. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Monday, Nov 4, 2002, at 19:17 Asia/Tokyo, Nick Ing-Simmons wrote: >> Someone could/should write a generic test that pushes all codepoints >> supported by a .ucm file both ways through the generated encoder >> and checks for c

Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nick Ing-Simmons
and checks for correctness. This would be a pointless thing to do as part of perl's "make test" as once the "compiler" works it works, but would be useful for folk working on the compile process. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: [not-yet-a-PATCH] compress Encode better

2002-11-04 Thread Nick Ing-Simmons
think it would be useful to have something which will print them out from the internal form. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Fixed Encode::utf8

2002-10-20 Thread Nick Ing-Simmons
Dan Kogai <[EMAIL PROTECTED]> writes: >On Sunday, Oct 20, 2002, at 22:49 Asia/Tokyo, Nick Ing-Simmons wrote: >> Attached is patch that implements ->decode and ->encode of >> Encode::utf8 as XS code that obeys all the rules that Encode::XS does. >> This allows :

Re: [Encode] HEADS-UP: ucm/cp932.ucm will be updated

2002-10-20 Thread Nick Ing-Simmons
ke the new tables. Tcl/Tk can fight its own battles (though once I have a solid Tk804 I will be offering them patches). I don't think cp932 is going to affect any fonts (Windows fonts being Unicode indexed and X11 fonts needing a fixed width encoding). -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Fixed Encode::utf8

2002-10-20 Thread Nick Ing-Simmons
Attached is patch that implements ->decode and ->encode of Encode::utf8 as XS code that obeys all the rules that Encode::XS does. This allows :encoding(UTF-8) to handle partial chars at end of buffers correctly. Submited as //depot/perlio/...@18032 -- Nick Ing-Simmons http://www.

Re: Is perl unicode or not?

2002-10-13 Thread Nick Ing-Simmons
Nadim <[EMAIL PROTECTED]> writes: >On Sunday 13 October 2002 14:45, Nick Ing-Simmons wrote: >> I am using 5.6.3 on windows from activestate. I do the >> >following. >> >> I don't think you are. As far as I am aware there is only perl5.6.1 >> there i

Re: Is perl unicode or not?

2002-10-13 Thread Nick Ing-Simmons
so horrible I can't recall it). For perl5.8 this is easy - it was a major goal of perl5.8. >3/ compare both strings and act upon the comparison Once you have two Unicode strings this is easy. > >if the string I get from ole _is_ unicode (and it seems so) What leads you to that conclusion? >how can I >flatten it to binary? I tried with unpack without success. > >Nadim. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Can from_to($s, SRC, TGT) leave chars missing in TGT unchanged?

2002-10-11 Thread Nick Ing-Simmons
ENCODE_LEAVE_SRC) >is just what I wanted, because it LEAVEs those chars in SRC that >ENCODE_NOREP... but unfortunately no, it leaves all source string >untouched unconditionally. > >Thanks in advance for any clues. > >If my English and/or my question is far from clear, please tell me and >I'll do my best to rewrite it in other words. -- Nick Ing-Simmons http://www.ni-s.u-net.com/

Re: Re[4]: ISO 8859-11 (Thai) cross-mapping table

2002-10-09 Thread Nick Ing-Simmons
default to their >preferred MIME names, all in lowercase. Maybe the unique ID number >("MIBenum") could also be taken into account. I have no objection to that - and I doubt Dan will either. Would you care to at least enumerate the cases we fail - or ideally provide patch(es) ? -- Nick Ing-Simmons http://www.ni-s.u-net.com/

  1   2   3   >