split, substitue with mutlibyte kanji file ?

John Mooney Thu, 15 Jul 2004 07:21:10 -0700

We've been told previously, that Perl cannot safely perform byte operations like 
substitution or splitting, on lines containing multiple-byte, Japanese characters. Yet 
in reading a bit about Ken Lunde's papers, and about Perl 5.8 I/O Layers or 
Encode::JP, it looks like it might be possible. I'm struggling with deciphering the 
"documentation" into a usable solution, however, and wonder if someone could advise on 
this for a couple of specific tasks.
 
I have two problems I'd like to address.
1) Split a line containing multiple-byte chars. 
I believe they are Shift-JIS - I'd ascertain the specific encoding before plowing into 
it. For example, given an input file like this:
 
12|3|50    <some double-byte chars here>
12|4|50    <some double-byte chars here>
12|6|50    <some double-byte chars here>
12|9|50    <some double-byte chars here>
 ...
 
Is there a way to safely split these input lines, looking for a tab char for example? 
If a differenet delimiter character would be better suited for splitting on (other 
than tab), that is probably an option. The first field - numbers+pipes - does contain 
pipes.
 
 -----------


2) Substitution
Some of the other data I need to read, from a DB using DBI, contains hard line returns 
(\n) input while the user created the data in certain LOB fields. I also would like to 
be able to remove those, which I'd ordinarily do using 
  $x =~ s/\n//g;
 
 -----------
 
For example, would using Encode::JP & specifying:
    open my $in,  "<:encoding(shiftjis)", $infile  or die;   ## from Enocode perldoc
... then allow me to use the split() function?
 
Or, would I be able to use something like the CPAN module:  ShiftJIS::String ?
 
 
If anyone can offer up example solutions to this -  I'd appreciate it. BTW, I am using 
Perl 5.6 on this box, but might be able to move the process to a 5.8 installation.

Thanks






--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
<http://learn.perl.org/> <http://learn.perl.org/first-response>

split, substitue with mutlibyte kanji file ?

Reply via email to