Willy wrote:
>
> what i would like to do is the following:::
>
> open a file of undetermined format,
> take all non alphanumeric characters (other than spaces, tabs, \n etc)
> and parse the output around them...
What about punctuation characters?
> open (IN,"file.unknown");
You should _always_ verify that the file was opened.
> while (<IN>)
> {
> s/insert regular expression here/\n\n;
> push(@array,$_); # or just shunt it out to another file :P
> }
>
> i'd love suggestions on this :) also, if memory serves, each time
> the regular expression is matched, then $1..$n gets the match value (or
> am i thinking of something else?
Only if the match value is enclosed in parentheses.
> i would like to be able to manipulate them at some other point
> in the program- maybe find the use of various unknown tags, or
> substitute them with html tags that would make the document more
> legible..... (not very worried about that right now though)
There are various modules available on CPAN that allow you to manipulate
HTML documents.
http://search.cpan.org/
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]