Hi group,
Finally I made Perl to understand my file (the context is the mail quoted below). The problems is with how the file is saved in Unicode. I saved the file this time as utf8 (in notepad) and the same code works now (earlier I saved as Unicode text file) including regular expression features. But, I think Perl doesn’t allow us to I/O with Unicode as encoding in the handle (instead of utf8). Thanks to those who answered earlier. Baskaran -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Tom Phoenix Sent: 20 February 2006 11:56 To: Baskaran Sankaran Cc: beginners@perl.org Subject: Re: FW: Reading a Unicode text file On 2/17/06, Baskaran Sankaran <[EMAIL PROTECTED]> wrote: > File: Sample_Hin.txt > > दूसरे राज्य पुनर्गठन आयोग के गठन का यही सही वक्त है। > The sample files were created in Windows in Unicode (both English & Hindi) > and I am able to open then in notepad and wordpad. But, the output as you > see is garbage and somehow it misses the utf8. This apart, a blank space is > added for every character in both English and Hindi. I've done a little experimenting, and I think you're right and Perl is wrong here. At least, Perl seemingly disagrees with some common tools about what a utf8 file is. I confess that I don't know enough about utf8 to be certain. If you don't get any better responses soon, you could use perlbug to file a bug report. It is best if you can include a (small) utf8 file, such as the first few lines of your Sample_Hin.txt file. But it's important that the exact file contents be part of the bug report, not just the text. One way would be if you can include a URL where the files could be downloaded. But if the files are small enough, you can convert them to a textual form (such as a hex dump) and include them with your bug report. Good luck with it! --Tom Phoenix Stonehenge Perl Training