Re: Enconding, locate, etc.

2006-04-19 Thread ende


Thanks  Nobumi,

Your solution is not only shorter but also more precise and correct  
than my first attempt.  But, anyway, although it works better it  
doesn't find words with different accented capitalization.  That is,  
if you look for Ángeles it doesn't find nor Angeles nor angeles  
nor ángeles...


Any idea how to address this issue?

Might I convert the lines to ascii and look each line in ascii  
although writting them in the original encoding?




Thanks!


El 19/04/2006, a las 7:46, Nobumi Iyanaga escribió:


Dear Ende,

I think you have already received replies to your query.

I don't know if my 2 cents are helpful for you, but I will try.  I  
don't understand exactly what you are trying to do, but...:


On Apr 19, 2006, at 3:14 AM, ende wrote:


In Perl

#!/usr/bin/env perl
#
# telefonos.pl
#
# me 2006-04-07
#
#

binmode STDOUT, :utf8;
use encoding 'utf8';

my $listin = /Users/me/Documents/documentos/Familia/Casa/ 
Telistin.txt;

my $alphax = /Applications/Alpha/AlphaX.app;

if ([EMAIL PROTECTED]) {
exec open -a $alphax $listin;
}

if (! -e $listin){
print strange! file not found: $listing \n;
exit 1;
}
open my $f, :encoding(MacRoman), $listin or die $listin  
no abre: $!;

my @todo = $f;
close $f;


# PROBLEM
my @args = map {utf8::decode($_)} @ARGV;
my $re = join(|, @ARGV);



print grep(/$re/i, @todo), \n;

# also look in the Apple AddressBook
foreach my $a (@ARGV) {
system(abtool $a);
}



I would do something like the following:

#!/usr/bin/perl

use utf8;
use Encode;

binmode (STDOUT, :utf8);

my $re = join(|, @ARGV);
$re = decode (utf8, $re);
my $listin = /Users/me/Documents/documentos/Familia/Casa/ 
Telistin.txt;


open my $f, :encoding(MacRoman), $listin or die $listin no  
abre: $!;

while ($f) {
chomp;
print $_, \n if /$re/i;
}
close $f;


You can save this script as Ende_test.pl, and call it in the  
following way:


perl Ende_test.pl Ángeles angeles

If your Telistin.txt has lines containing:
Ángeles
Angeles
ángeles
angeles

the above command line should print all these lines.  The trick  
would have been the conversion of @ARGV to utf8, using


use Encode;
$re = decode (utf8, $re);

And you must have:

use utf8;

at the beginning to do a regex search with utf8 string.

I hope this is of some help for you.

All the best,

Nobumi

---

Nobumi Iyanaga
Tokyo,
Japan




 ende





Re: Terminal (spurious command line) problem

2006-04-19 Thread Alan Fry

Many thanks to everybody who responded.

On 18 Apr 2006, at 18:59, Bill Stephenson wrote:


Can anyone suggest where to look for such a file?


Look in the Terminal preferences to see if you have a script  
entered to run when opening a window.


The Open a saved .term file… checkbox was unselected and no '.term'  
files could be found anywhere either, so that didn't seem to be the  
cause.


Similarly Boysenberry Payne wrote:


Have you tried looking in ~/.profile?  Sorry if that's too obvious...


Nope -- there was nothing unusual in '.bash_profile', just some PATH  
statements.


Brian McKee wrote:


Start Terminal.app and check under preferences (apple-,)
If you don't see it there, quit Terminal,  backup and delete
~/Library/Preferences/com.apple.Terminal.plist
Betcha that gets it.


Yippee! That hit the nail right on the head.

In fact it is possible to mouse the 'com.apple.Terminal.plist out of  
the folder onto the desktop temporarily just to see what happens.  
MacOS then immediately rewrites a new '.plist'; Terminal opens  
normally and seems to function normally. However it raises some  
interesting questions.


Sherm Pendley wrote:

It's not a great idea to manipulate preference files directly.  
Their location, filename, format, etc. are considered an  
implementation detail that's subject to change without notice.  
Apple has already made at least two changes, from old-style plists  
to XML-based plists, and then from that to a binary file format.


Looking in the '.plist' file with BBEdit, it is indeed a binary file.  
However Property List Editor shows its content to be a XML  
dictionary. Clicking the little triangle next to 'Root' reveals the  
'Property list'. In this particular case there was an 'Execution  
String' with a 'value' which was exactly the spurious command line in  
question. Property List Editor allowed that entry to be selected  
and deleted. Having done that, and having put the file back into '~/ 
Library/Preferences', everything seemed to return to 'normal', or at  
least, what it had been in the past.


Interestingly, the XML content of the newly written '.plist' and the  
original '.plist' had not one single word in common -- they were  
totally and radically different. But Terminal seems to behave pretty  
much the same whichever '.plist' is installed in ~/Library/ 
Preferences. I guess the original '.plist' dates back to OS 10.2.?  
and now it is OS 10.4.6 and, er, things have changed a bit…It  
certainly bears out what Sherm says above.


Chris Devers added:

[/snip/

 -- preferences and caches are generally easy  safe to rename or  
remove when trying to diagnose software problems.


So, for the moment I have stayed with the original version of the  
'.plist', edited with Property List Editor and all seems to be well.


Again thank you all very much for your prompt help and advice which  
will no doubt prove very useful to many folk in a similar pickle.


Alan Fry










Re: Enconding, locate, etc.

2006-04-19 Thread Nobumi Iyanaga

Dear Ende,

On Apr 19, 2006, at 5:22 PM, ende wrote:



Thanks  Nobumi,

Your solution is not only shorter but also more precise and correct  
than my first attempt.  But, anyway, although it works better it  
doesn't find words with different accented capitalization.  That  
is, if you look for Ángeles it doesn't find nor Angeles nor  
angeles nor ángeles...




Well, on my machine, if I call that script with:

perl Ende_test.pl Ángeles

it does find Ángeles AND ángeles (because it has the i option  
in the regex).


But you seem to want to do a kind of accent insensitive search...?   
That should not be simple.


One possible -- and rather simple -- solution would be to use  
Unicode::Normalize.  I just tried this script:


#!/usr/bin/perl

use utf8;
use Encode;
use Unicode::Normalize;

binmode (STDOUT, :utf8);

my $re = join(|, @ARGV);
$re = decode (utf8, $re);
my $listin = /Users/me/Documents/documentos/Familia/Casa/Telistin.txt;

open my $f, :encoding(MacRoman), $listin or die $listin no  
abre: $!;

while ($f) {
chomp;
if (/$re/i) {
print $_, \n;
}
else {
my $temp = NFD($re);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, \n if /$temp/i;
}
}
close $f;

I can call this script from Terminal like this:

perl Ende_test.pl Ángeles

or

perl Ende_test.pl ángeles

and get the reply:

Ángeles
Angeles
ángeles
angeles

-- But you have to use the accented character to match non-accented  
characters -- that is, you will find only 


Angeles
angeles

if you invoke the script with:

perl Ende_test.pl Angeles

or

perl Ende_test.pl angeles

Best regards,

Nobumi Iyanaga
Tokyo,
Japan



Re: Enconding, locate, etc.

2006-04-19 Thread ende


Wow!  It is near the full solution!  It is a pity it fails when you  
do not use accented chars!!


Thank you... I'll try it again... Thanks!!!





El 19/04/2006, a las 11:31, Nobumi Iyanaga escribió:


I can call this script from Terminal like this:

perl Ende_test.pl Ángeles

or

perl Ende_test.pl ángeles

and get the reply:

Ángeles
Angeles
ángeles
angeles

-- But you have to use the accented character to match non-accented  
characters -- that is, you will find only

Angeles
angeles

if you invoke the script with:

perl Ende_test.pl Angeles

or

perl Ende_test.pl angeles



 ende





Re: Enconding, locate, etc.

2006-04-19 Thread ende


Awful!!  You did it!  Although I don't understand well the regexps..

The code only need very small fixes

1) To exit when no args
	2) Try to not identify ñ with n  (but this is a bug in the collating  
in utf..., :)



Really impressed and very grateful to you!


This will be part of my AddressBook terminal script and a now a model  
to learn from.  Thanks.





El 19/04/2006, a las 15:57, Nobumi Iyanaga escribió:


#!/usr/bin/perl

use utf8;
use Encode;
use Unicode::Normalize;

binmode (STDOUT, :utf8);

my $re = join(|, @ARGV);
$re = decode (utf8, $re);
my $listin = /Users/me/Documents/documentos/Familia/Casa/ 
Telistin.txt;


open my $f, :encoding(MacRoman), $listin or die $listin no  
abre: $!;

while ($f) {
chomp;
if (/$re/i) {
print $_, \n;
}
else {
my $temp = NFD($_);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, \n if $temp =~ /$re/i;
}
if ($re !~ /^[\x{}-\x{007F}]+$/) {
my $temp = NFD($re);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, \n if /$temp/i;
}
}
close $f;



 ende





Re: Enconding, locate, etc.

2006-04-19 Thread Nobumi Iyanaga

Hello Ende,

On Apr 19, 2006, at 8:58 PM, ende wrote:



Wow!  It is near the full solution!  It is a pity it fails when you  
do not use accented chars!!


I tried again, and with the following script, it *seems* that you can  
use either non-accented characters or accented character:


#!/usr/bin/perl

use utf8;
use Encode;
use Unicode::Normalize;

binmode (STDOUT, :utf8);

my $re = join(|, @ARGV);
$re = decode (utf8, $re);
my $listin = /Users/me/Documents/documentos/Familia/Casa/Telistin.txt;

open my $f, :encoding(MacRoman), $listin or die $listin no  
abre: $!;

while ($f) {
chomp;
if (/$re/i) {
print $_, \n;
}
else {
my $temp = NFD($_);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, \n if $temp =~ /$re/i;
}
if ($re !~ /^[\x{}-\x{007F}]+$/) {
my $temp = NFD($re);
$temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g;
print $_, \n if /$temp/i;
}
}
close $f;

You would call this script either:

perl Ende_test.pl angeles

or

perl Ende_test.pl ángeles

to get:

Ángeles
Angeles
ángeles
angeles

Is this what you would want...?

Note that I am not sure at all if this will work for all cases.

Best regards,

Nobumi Iyanaga
Tokyo,
Japan