Re: dealing with funky characters

John W. Krahn Fri, 03 Jan 2003 16:05:35 -0800

Willy wrote:
> 
> what i would like to do is the following:::
> 
> open a file of undetermined format,
> take all non alphanumeric characters (other than spaces, tabs, \n etc)
> and parse the output around them...


What about punctuation characters?


> open (IN,"file.unknown");

You should _always_ verify that the file was opened.


> while (<IN>)
>         {
>            s/insert regular expression here/\n\n;
>            push(@array,$_); # or just shunt it out to another file :P
>         }
> 
> i'd love suggestions on this :)  also, if memory serves, each time
> the regular expression is matched, then $1..$n gets the match value (or
> am i thinking of something else?

Only if the match value is enclosed in parentheses.


> i would like to be able to manipulate them at some other point
> in the program- maybe find the use of various unknown tags, or
> substitute them with html tags that would make the document more
> legible..... (not very worried about that right now though)

There are various modules available on CPAN that allow you to manipulate
HTML documents.

http://search.cpan.org/



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: dealing with funky characters

Reply via email to