Peter Volkov skribis 2008-07-11 10:10 (+0400): > The problem is that in Linux (Gentoo and Debian I've tried) /\w/ does > not match Russian letter while I use locale and LC_COLLATE is set to > ru_RU.UTF-8.
\w should match Cyrillic letters even without "use locale". You might be running into an annoying bug which makes \w lose its unicode support depending on the *internal* state of a value. To work around this bug, read Unicode::Semantics on CPAN and use it or utf8::upgrade. > Linux $ perl -e 'use locale; open(IN, "< test-file"); while(<IN>) { print if > /\w/; }' > string with spaces (not only with [:alnum:]) > English; > hello_привет Despite the above there's a slightly more important issue here. You're opening a text file but you don't specify the character encoding. Likewise, you need to specify the encoding for output. Assuming utf8 for both: perl -le' binmode STDOUT, ":encoding(utf8)"; open my $in, "< :encoding(utf8)", "test-file"; while (<$in>) { print "match: [$1]" if /(\w+)/; } ' Which on my system prints: match: [слово] match: [строка] match: [string] match: [English] match: [hello_привет] I'm not sufficiently familiar with "use encoding" to say anything about it, but you shouldn't need it just for this. > Do I understand correctly that we should always supply encoding of > streams? Yes. > If yes, why in FreeBSD this works without supplying any encoding and is > it possible (good idea) to do the same in Linux? I have no idea. -- Met vriendelijke groet, Kind regards, Korajn salutojn, Juerd Waalboer: Perl hacker <[EMAIL PROTECTED]> <http://juerd.nl/sig> Convolution: ICT solutions and consultancy <[EMAIL PROTECTED]> 1;