Re: UTF-8 support?

Sherm Pendley Sat, 30 Apr 2005 16:08:50 -0700

On Apr 30, 2005, at 6:48 PM, John Blumel wrote:

OK, here's the code without the bug written into the example (which is inside a foreach loop that is looping through a long list of keywords),

... while ($articleWorkText =~ m/\b$kWord\b/igs) { $position = pos($articleWorkText) - length($kWord); $matchedText = substr($articleWorkText, $position, length($kWord)); $matchedText =~ s/ /_/g; substr($patternSpace, $position, length($matchedText)) = $matchedText; } ...

Which works fine in most cases but, if there is a wide character in $articleWorkText before the matched text, then $position, as used by substr() ends up being in front of the $position as calculated from pos().

How are $articleWorkText and $kWord being read into your app? Perl handles a variety of text encodings, but it does need to be told about the encoding to use.

If you're reading them from a file, you need to make certain to tell Perl that the file is UTF8 (or whatever) encoded. You can use Perl's three-argument open() for that:

    open(FH, '<:utf8', '/path/to/file') or die;

Have a look at perluniintro and perlunicode if you haven't already.

See also the -C switch in perlrun - you can use that to specify that stdin and/or stdout should be regarded as UTF8, or make UTF8 the default encoding for all i/o streams.

sherm--

Cocoa programming in Perl: http://camelbones.sourceforge.net
Hire me! My resume: http://www.dot-app.org

Re: UTF-8 support?

Reply via email to