On 7 Sep 2006 01:27:55 -0700, "GM" <[EMAIL PROTECTED]> wrote:

>Could you all give me some guide on how to convert my big5 string to
>unicode using python? I already knew that I might use cjkcodecs or
>python 2.4 but I still don't have idea on what exactly I should do.
>Please give me some sample code if you could. Thanks a lot

Gary, I used this Java program quite a few years ago to convert
various Big5 files to UTF-16. (Sorry it's Java not Python, but I'm a
very recent convert to the latter.) My newsgroup reader has messed the
formatting up somewhat. If this causes a problem, email me and I'll
send you the source directly.

-Richard Schulman

/*      This program converts an input file of one encoding format to
an output file of 
 *      another format. It will be mainly used to convert Big5 text
files to Unicode text files.
 */               

import java.io.*;
public class ConvertEncoding
{       public static void      main(String[] args)
        {       String outfile =        null;
                try
                {        convert(args[0], args[1],  "BIG5",
"UTF-16LE");
                }
                //      Or, at command line:
                //              convert(args[0], args[1], "GB2312",
"UTF8");
                //      or numerous variations thereon. Among possible
choices for input or output:
                //              "GB2312", "BIG5", "UTF8", "UTF-16LE".
The last named is MS UCS-2 format.
                //              I.e., "input file","output file",
"input encoding", "output encoding"
                catch (Exception        e)
                {       System.out.print(e.getMessage());
                        System.exit(1);
                }
         }

        public static void convert(String infile, String outfile,
String from, String to) 
                 throws IOException,    UnsupportedEncodingException
        {       // set up byte streams
                InputStream in;
                if (infile      !=      null)
                        in = new FileInputStream(infile);
                else
                        in = System.in;

                OutputStream out;
                if (outfile != null)
                        out = new FileOutputStream(outfile);
                else
                        out = System.out;

                 // Set up character stream
                Reader r =      new BufferedReader(new
InputStreamReader(in, from));
                Writer w =      new BufferedWriter(new
OutputStreamWriter(out, to));

                 w.write("\ufeff");     // This character signals
Unicode in the NT environment
                char[] buffer   = new char[4096];
                int len;
                while((len = r.read(buffer)) != -1) 
                w.write(buffer, 0, len);
                r.close();
                w.flush();
                w.close();
        }
}
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to