Hello Alejandro,

Does something like this work for you?

use encoding 'latin1';
$text =~ s/[^\w\s\n\.\¿\?\¡\!\:\/\\\<\>]//ig;
$text =~ s/[\ä\ë\ï\ö\ÿ\ý\ð]//ig;

Please let me know.

HTH

Paco Zarabozo


-----------------------------------------------------------------
From: Alejandro Santillan Iturres
Sent: Saturday, October 04, 2008 7:22 PM
To: Brian Raven ; [EMAIL PROTECTED]
Cc: [email protected]
Subject: Re: regexp to "clean" a text file


I've tried this wonderful command:
hexadump -c file.txt
and I found that I have to include more and more chars to erase as
follows:

$text=~s/\177//g;
$text=~s/\377//g;
$text=~s/\335//g;
$text=~s/\360//g;
$text=~s/\204//g;
$text=~s/\222/\n/g;
$text=~s/\214//g;
$text=~s/\216//g;
$text=~s/\224//g;
$text=~s/\240//g;
$text=~s/\237//g;
$text=~s/\234//g;
$text=~s/\325//g;
$text=~s/\351//g;
$text=~s/\352//g;
$text=~s/\355//g;
$text=~s/\361//g;
$text=~s/\362//g;
$text=~s/\366//g;

Is there a way to erase all the chars that are higher than, say, 300?
Does this make sense?

Thank you again!

On Oct 3, 2008, at 7:33 AM, Brian Raven wrote:

> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of
> Alejandro Santillan Iturres
> Sent: 02 October 2008 19:45
> To: [email protected]
> Cc: [EMAIL PROTECTED]
> Subject: Re: regexp to "clean" a text file
>
>> Thank you William, Bill and Tim. Finally s/[\x00-\x1f]//g did the
> trick, almost perfect.
>> The original file is the palm database of memo pads. The text is
> there, plain. Several mixed control characters > were present.
>> The system I working on is a Fedora linux box. I have no hex utility
> installed to make de dump, so I don't know > if the ^E is really a ^E.
>
> I find that a little hard to believe. Try 'hexdump', or if that isn't
> present you should at least have 'od'. If neither of them are
> installed,
> you Linux installation sounds a bit broken. Unless you can identify
> which characters are to be kept or discarded, you will find it
> difficult
> to 'clean' your data effectively.
>
> HTH
>
> -- 
> Brian Raven
>
> -----------------------------------------------------------------------------------------------------------
> This e-mail may contain confidential and/or privileged information.
> If you are not the intended recipient or have received this e-mail
> in error, please advise the sender immediately by reply e-mail and
> delete this message and any attachments without retaining a copy.
> Any unauthorised copying, disclosure or distribution of the material
> in this e-mail is strictly forbidden.
>
>
> _______________________________________________
> ActivePerl mailing list
> [email protected]
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs 

_______________________________________________
ActivePerl mailing list
[email protected]
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Reply via email to