Re: Enconding, locate, etc.
Thanks Nobumi, Your solution is not only shorter but also more precise and correct than my first attempt. But, anyway, although it works better it doesn't find words with different accented capitalization. That is, if you look for Ángeles it doesn't find nor Angeles nor angeles nor ángeles... Any idea how to address this issue? Might I convert the lines to ascii and look each line in ascii although writting them in the original encoding? Thanks! El 19/04/2006, a las 7:46, Nobumi Iyanaga escribió: Dear Ende, I think you have already received replies to your query. I don't know if my 2 cents are helpful for you, but I will try. I don't understand exactly what you are trying to do, but...: On Apr 19, 2006, at 3:14 AM, ende wrote: In Perl #!/usr/bin/env perl # # telefonos.pl # # me 2006-04-07 # # binmode STDOUT, :utf8; use encoding 'utf8'; my $listin = /Users/me/Documents/documentos/Familia/Casa/ Telistin.txt; my $alphax = /Applications/Alpha/AlphaX.app; if ([EMAIL PROTECTED]) { exec open -a $alphax $listin; } if (! -e $listin){ print strange! file not found: $listing \n; exit 1; } open my $f, :encoding(MacRoman), $listin or die $listin no abre: $!; my @todo = $f; close $f; # PROBLEM my @args = map {utf8::decode($_)} @ARGV; my $re = join(|, @ARGV); print grep(/$re/i, @todo), \n; # also look in the Apple AddressBook foreach my $a (@ARGV) { system(abtool $a); } I would do something like the following: #!/usr/bin/perl use utf8; use Encode; binmode (STDOUT, :utf8); my $re = join(|, @ARGV); $re = decode (utf8, $re); my $listin = /Users/me/Documents/documentos/Familia/Casa/ Telistin.txt; open my $f, :encoding(MacRoman), $listin or die $listin no abre: $!; while ($f) { chomp; print $_, \n if /$re/i; } close $f; You can save this script as Ende_test.pl, and call it in the following way: perl Ende_test.pl Ángeles angeles If your Telistin.txt has lines containing: Ángeles Angeles ángeles angeles the above command line should print all these lines. The trick would have been the conversion of @ARGV to utf8, using use Encode; $re = decode (utf8, $re); And you must have: use utf8; at the beginning to do a regex search with utf8 string. I hope this is of some help for you. All the best, Nobumi --- Nobumi Iyanaga Tokyo, Japan ende
Re: Terminal (spurious command line) problem
Many thanks to everybody who responded. On 18 Apr 2006, at 18:59, Bill Stephenson wrote: Can anyone suggest where to look for such a file? Look in the Terminal preferences to see if you have a script entered to run when opening a window. The Open a saved .term file… checkbox was unselected and no '.term' files could be found anywhere either, so that didn't seem to be the cause. Similarly Boysenberry Payne wrote: Have you tried looking in ~/.profile? Sorry if that's too obvious... Nope -- there was nothing unusual in '.bash_profile', just some PATH statements. Brian McKee wrote: Start Terminal.app and check under preferences (apple-,) If you don't see it there, quit Terminal, backup and delete ~/Library/Preferences/com.apple.Terminal.plist Betcha that gets it. Yippee! That hit the nail right on the head. In fact it is possible to mouse the 'com.apple.Terminal.plist out of the folder onto the desktop temporarily just to see what happens. MacOS then immediately rewrites a new '.plist'; Terminal opens normally and seems to function normally. However it raises some interesting questions. Sherm Pendley wrote: It's not a great idea to manipulate preference files directly. Their location, filename, format, etc. are considered an implementation detail that's subject to change without notice. Apple has already made at least two changes, from old-style plists to XML-based plists, and then from that to a binary file format. Looking in the '.plist' file with BBEdit, it is indeed a binary file. However Property List Editor shows its content to be a XML dictionary. Clicking the little triangle next to 'Root' reveals the 'Property list'. In this particular case there was an 'Execution String' with a 'value' which was exactly the spurious command line in question. Property List Editor allowed that entry to be selected and deleted. Having done that, and having put the file back into '~/ Library/Preferences', everything seemed to return to 'normal', or at least, what it had been in the past. Interestingly, the XML content of the newly written '.plist' and the original '.plist' had not one single word in common -- they were totally and radically different. But Terminal seems to behave pretty much the same whichever '.plist' is installed in ~/Library/ Preferences. I guess the original '.plist' dates back to OS 10.2.? and now it is OS 10.4.6 and, er, things have changed a bit…It certainly bears out what Sherm says above. Chris Devers added: [/snip/ -- preferences and caches are generally easy safe to rename or remove when trying to diagnose software problems. So, for the moment I have stayed with the original version of the '.plist', edited with Property List Editor and all seems to be well. Again thank you all very much for your prompt help and advice which will no doubt prove very useful to many folk in a similar pickle. Alan Fry
Re: Enconding, locate, etc.
Dear Ende, On Apr 19, 2006, at 5:22 PM, ende wrote: Thanks Nobumi, Your solution is not only shorter but also more precise and correct than my first attempt. But, anyway, although it works better it doesn't find words with different accented capitalization. That is, if you look for Ángeles it doesn't find nor Angeles nor angeles nor ángeles... Well, on my machine, if I call that script with: perl Ende_test.pl Ángeles it does find Ángeles AND ángeles (because it has the i option in the regex). But you seem to want to do a kind of accent insensitive search...? That should not be simple. One possible -- and rather simple -- solution would be to use Unicode::Normalize. I just tried this script: #!/usr/bin/perl use utf8; use Encode; use Unicode::Normalize; binmode (STDOUT, :utf8); my $re = join(|, @ARGV); $re = decode (utf8, $re); my $listin = /Users/me/Documents/documentos/Familia/Casa/Telistin.txt; open my $f, :encoding(MacRoman), $listin or die $listin no abre: $!; while ($f) { chomp; if (/$re/i) { print $_, \n; } else { my $temp = NFD($re); $temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g; print $_, \n if /$temp/i; } } close $f; I can call this script from Terminal like this: perl Ende_test.pl Ángeles or perl Ende_test.pl ángeles and get the reply: Ángeles Angeles ángeles angeles -- But you have to use the accented character to match non-accented characters -- that is, you will find only Angeles angeles if you invoke the script with: perl Ende_test.pl Angeles or perl Ende_test.pl angeles Best regards, Nobumi Iyanaga Tokyo, Japan
Re: Enconding, locate, etc.
Wow! It is near the full solution! It is a pity it fails when you do not use accented chars!! Thank you... I'll try it again... Thanks!!! El 19/04/2006, a las 11:31, Nobumi Iyanaga escribió: I can call this script from Terminal like this: perl Ende_test.pl Ángeles or perl Ende_test.pl ángeles and get the reply: Ángeles Angeles ángeles angeles -- But you have to use the accented character to match non-accented characters -- that is, you will find only Angeles angeles if you invoke the script with: perl Ende_test.pl Angeles or perl Ende_test.pl angeles ende
Re: Enconding, locate, etc.
Awful!! You did it! Although I don't understand well the regexps.. The code only need very small fixes 1) To exit when no args 2) Try to not identify ñ with n (but this is a bug in the collating in utf..., :) Really impressed and very grateful to you! This will be part of my AddressBook terminal script and a now a model to learn from. Thanks. El 19/04/2006, a las 15:57, Nobumi Iyanaga escribió: #!/usr/bin/perl use utf8; use Encode; use Unicode::Normalize; binmode (STDOUT, :utf8); my $re = join(|, @ARGV); $re = decode (utf8, $re); my $listin = /Users/me/Documents/documentos/Familia/Casa/ Telistin.txt; open my $f, :encoding(MacRoman), $listin or die $listin no abre: $!; while ($f) { chomp; if (/$re/i) { print $_, \n; } else { my $temp = NFD($_); $temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g; print $_, \n if $temp =~ /$re/i; } if ($re !~ /^[\x{}-\x{007F}]+$/) { my $temp = NFD($re); $temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g; print $_, \n if /$temp/i; } } close $f; ende
Re: Enconding, locate, etc.
Hello Ende, On Apr 19, 2006, at 8:58 PM, ende wrote: Wow! It is near the full solution! It is a pity it fails when you do not use accented chars!! I tried again, and with the following script, it *seems* that you can use either non-accented characters or accented character: #!/usr/bin/perl use utf8; use Encode; use Unicode::Normalize; binmode (STDOUT, :utf8); my $re = join(|, @ARGV); $re = decode (utf8, $re); my $listin = /Users/me/Documents/documentos/Familia/Casa/Telistin.txt; open my $f, :encoding(MacRoman), $listin or die $listin no abre: $!; while ($f) { chomp; if (/$re/i) { print $_, \n; } else { my $temp = NFD($_); $temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g; print $_, \n if $temp =~ /$re/i; } if ($re !~ /^[\x{}-\x{007F}]+$/) { my $temp = NFD($re); $temp =~ s/[\x{0300}-\x{036F}\x{0081}]+//g; print $_, \n if /$temp/i; } } close $f; You would call this script either: perl Ende_test.pl angeles or perl Ende_test.pl ángeles to get: Ángeles Angeles ángeles angeles Is this what you would want...? Note that I am not sure at all if this will work for all cases. Best regards, Nobumi Iyanaga Tokyo, Japan