Re: Regex help - trim doc contents

2002-02-04 Thread Jean-Baptiste Nivoit

* Stephen Patterson ([EMAIL PROTECTED]) wrote:
> I have a scalar, $doc, which contains a plain ascii file, and which
> I'll be storing on a database (for searching).
> 
(...)
> 
> Can anyone help?

There's a module called Whitespace on CPAN, would that work for you?
Additionnally, there's also a String::Strip.

  jb.
  
___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users



Re: Regex help - trim doc contents

2002-01-30 Thread $Bill Luebkert

Joseph P. Discenza wrote:

> Stephen Patterson wrote, on
> : To minimise the space needed to store this file, I'd like to remove
> : all non-word characters and punctuation (except whitespace), and
> : replace multiple whitespaces with single whitespaces.
> : 
> : For efficiency, I'd like to do it on 1 line if possible.
> 
> How about two lines? Use tr///.
> 
> $_ ~= tr/\s/ /s;  # replace all whitespace with a space, then squash duplicate spaces
> $_ ~= tr/\w //cd; # delete everything that's NOT a word or a space
> 
> That should do ya, but I haven't tested it.


Perlfunc man page under tr:

 Note that "tr" does not do regular expression character classes such
 as "\d" or "[:lower:]". The  operator is not equivalent to the
 tr(1) utility. If you want to map strings between lower/upper cases,
 see the lc entry in the perlfunc manpage and the uc entry in the
 perlfunc manpage, and in general consider using the "s" operator if
 you need regular expressions.


-- 
   ,-/-  __  _  _ $Bill Luebkert   ICQ=14439852
  (_/   /  )// //   DBE Collectibles   Mailto:[EMAIL PROTECTED]
   / ) /--<  o // //  http://dbecoll.tripod.com/ (Free site for Perl)
-/-' /___/_<_http://www.todbe.com/

___
Perl-Win32-Users mailing list
[EMAIL PROTECTED]
To unsubscribe: http://listserv.ActiveState.com/mailman/listinfo/perl-win32-users