Dear Ende,
On Apr 19, 2006, at 5:22 PM, ende wrote:
Thanks Nobumi,
Your solution is not only shorter but also more precise and correct
than my first attempt. But, anyway, although it works better it
doesn't find words with different accented capitalization. That
is, if you look for "Ángeles" it doesn't find nor "Angeles" nor
"angeles" nor "ángeles"...
Well, on my machine, if I call that script with:
perl Ende_test.pl Ángeles
it does find "Ángeles" AND "ángeles" (because it has the "i" option
in the regex).
But you seem to want to do a kind of "accent insensitive search"...?
That should not be simple.
One possible -- and rather simple -- solution would be to use
"Unicode::Normalize". I just tried this script:
#!/usr/bin/perl
use utf8;
use Encode;
use Unicode::Normalize;
binmode (STDOUT, ":utf8");
my $re = join("|", @ARGV);
$re = decode ("utf8", $re);
my $listin = "/Users/me/Documents/documentos/Familia/Casa/Telistin.txt";
open my $f, "<:encoding(MacRoman)", "$listin" or die "$listin no
abre: $!";
while (<$f>) {
chomp;
if (/$re/i) {
print $_, "\n";
}
else {
my $temp = NFD($re);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, "\n" if /$temp/i;
}
}
close $f;
I can call this script from Terminal like this:
perl Ende_test.pl Ángeles
or
perl Ende_test.pl ángeles
and get the reply:
Ángeles
Angeles
ángeles
angeles
-- But you have to use the accented character to match non-accented
characters -- that is, you will find only
Angeles
angeles
if you invoke the script with:
perl Ende_test.pl Angeles
or
perl Ende_test.pl angeles
Best regards,
Nobumi Iyanaga
Tokyo,
Japan