Pattern matching with Unicode (5.6.1)

David Gray Wed, 14 Aug 2002 12:32:03 -0700

Hello most excellent Unicode list,

I'm having a bit of a problem getting Unicode pattern matching to do
what I would like it to. My code somewhat resembles:


 sub parse_doc {
   my $file = shift;
   my $fh = do { no warnings; local *FH };
   open $fh,'<',$file or die "couldn't read [$file]: $!\n";
 
   my $contents = '';
   { local $/ = undef;
     $contents = <$fh>; }
   close $fh;

   # this is where I'm getting stuck
   my @contents = split "\n\n",$contents;
   print '['.int(@contents)."]\n";
 }

I've (sort of) made it work by doing:

 # strip BOM and trailing nulls and carriage returns
 s/^..// if $. == 1 and s/\0//g;
 s/[\0\r]//g;

But I'm sure there must be a more elegant way to do this. Honestly, I'm
not even sure where to start. Any ideas?

Thanks a bunch,

 -dave

Pattern matching with Unicode (5.6.1)

Reply via email to