Hi group,

 

Finally I made Perl to understand my file (the context is the mail quoted 
below). The problems is with how the file is saved in Unicode. I saved the file 
this time as utf8 (in notepad) and the same code works now (earlier I saved as 
Unicode text file) including regular expression features. But, I think Perl 
doesn’t allow us to I/O with Unicode as encoding in the handle (instead of 
utf8).

 

Thanks to those who answered earlier.

Baskaran

 

 

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Phoenix
Sent: 20 February 2006 11:56
To: Baskaran Sankaran
Cc: beginners@perl.org
Subject: Re: FW: Reading a Unicode text file

 

On 2/17/06, Baskaran Sankaran <[EMAIL PROTECTED]> wrote:

 

> File: Sample_Hin.txt

> 

> दूसरे राज्य पुनर्गठन आयोग के गठन का यही सही वक्त है।

 

> The sample files were created in Windows in Unicode (both English & Hindi)

> and I am able to open then in notepad and wordpad. But, the output as you

> see is garbage and somehow it misses the utf8. This apart, a blank space is

> added for every character in both English and Hindi.

 

I've done a little experimenting, and I think you're right and Perl is

wrong here. At least, Perl seemingly disagrees with some common tools

about what a utf8 file is. I confess that I don't know enough about

utf8 to be certain.

 

If you don't get any better responses soon, you could use perlbug to

file a bug report. It is best if you can include a (small) utf8 file,

such as the first few lines of your Sample_Hin.txt file. But it's

important that the exact file contents be part of the bug report, not

just the text. One way would be if you can include a URL where the

files could be downloaded. But if the files are small enough, you can

convert them to a textual form (such as a hex dump) and include them

with your bug report.

 

Good luck with it!

 

--Tom Phoenix

Stonehenge Perl Training

Reply via email to