Hello most excellent Unicode list, I'm having a bit of a problem getting Unicode pattern matching to do what I would like it to. My code somewhat resembles:
sub parse_doc { my $file = shift; my $fh = do { no warnings; local *FH }; open $fh,'<',$file or die "couldn't read [$file]: $!\n"; my $contents = ''; { local $/ = undef; $contents = <$fh>; } close $fh; # this is where I'm getting stuck my @contents = split "\n\n",$contents; print '['.int(@contents)."]\n"; } I've (sort of) made it work by doing: # strip BOM and trailing nulls and carriage returns s/^..// if $. == 1 and s/\0//g; s/[\0\r]//g; But I'm sure there must be a more elegant way to do this. Honestly, I'm not even sure where to start. Any ideas? Thanks a bunch, -dave