We've been told previously, that Perl cannot safely perform byte operations like
substitution or splitting, on lines containing multiple-byte, Japanese characters. Yet
in reading a bit about Ken Lunde's papers, and about Perl 5.8 I/O Layers or
Encode::JP, it looks like it might be possible. I'm struggling with deciphering the
"documentation" into a usable solution, however, and wonder if someone could advise on
this for a couple of specific tasks.
I have two problems I'd like to address.
1) Split a line containing multiple-byte chars.
I believe they are Shift-JIS - I'd ascertain the specific encoding before plowing into
it. For example, given an input file like this:
12|3|50 <some double-byte chars here>
12|4|50 <some double-byte chars here>
12|6|50 <some double-byte chars here>
12|9|50 <some double-byte chars here>
...
Is there a way to safely split these input lines, looking for a tab char for example?
If a differenet delimiter character would be better suited for splitting on (other
than tab), that is probably an option. The first field - numbers+pipes - does contain
pipes.
-----------
2) Substitution
Some of the other data I need to read, from a DB using DBI, contains hard line returns
(\n) input while the user created the data in certain LOB fields. I also would like to
be able to remove those, which I'd ordinarily do using
$x =~ s/\n//g;
-----------
For example, would using Encode::JP & specifying:
open my $in, "<:encoding(shiftjis)", $infile or die; ## from Enocode perldoc
... then allow me to use the split() function?
Or, would I be able to use something like the CPAN module: ShiftJIS::String ?
If anyone can offer up example solutions to this - I'd appreciate it. BTW, I am using
Perl 5.6 on this box, but might be able to move the process to a 5.8 installation.
Thanks
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>