> hello,
>   recently i downloaded postgresql-7.3b5,i found the conversion between
> gb18030 and utf was mistake  because the map file bwteen them wasn't
> obviously right.the first byte of utf8 encoding  with two bytes shoule
> between 0xc0 with 0xfd,the map file didn't accord with this condition
> .please check it ,i wished that postgresql-7.3 can support the GB18030 and
> can run in China.
>                                       best regards
>                                                         jenny wang

Thanks for testing GB18030 support. Yes, the map file is completely
broken. I have attached fixed map generation perl script. Please test
it (I do not understand Chinese).

(1) save the perl script in the postgresql-7.3b5 source tree as:
    src/backend/utils/mb/Unicode/UCS_to_GB18030.pl.

(2) run it.

    cd src/backend/utils/mb/Unicode
    ./UCS_to_GB18030.pl

(3) the script will generate
    src/backend/utils/mb/Unicode/utf8_to_gb18030.map
    and
    src/backend/utils/mb/Unicode/gb18030_to_utf8.map

(4) If they look good, rebuild PostgreSQL and test it.
--
Tatsuo Ishii
#! /usr/bin/perl
#
# Copyright 2002 by Bill Huang
#
# $Id: UCS_to_GB18030.pl,v 1.1 2002/06/13 08:28:55 ishii Exp $
#
# Generate UTF-8 <--> GB18030 code conversion tables from
# map files provided by Unicode organization.
# Unfortunately it is prohibited by the organization
# to distribute the map files. So if you try to use this script,
# you have to obtain ISO10646-GB18030.TXT from 
# the organization's ftp site.
#
# ISO10646-GB18030.TXT format:
#                GB18030 code in hex
#                UCS-2 code in hex
#                # and Unicode name (not used in this script)

require "ucs2utf.pl";

# first generate UTF-8 --> GB18030 table

$in_file = "ISO10646-GB18030.TXT";

open( FILE, $in_file ) || die( "cannot open $in_file" );

while( <FILE> ){
        chop;
        if( /^#/ ){
                next;
        }
        ( $u, $c, $rest ) = split;
        $ucs = hex($u);
        $code = hex($c);
        if( $code >= 0x80 && $ucs >= 0x0080 ){
                $utf = &ucs2utf($ucs);
                if( $array{ $utf } ne "" ){
                        printf STDERR "Warning: duplicate unicode: %04x\n",$ucs;
                        next;
                }
                $count++;

                $array{ $utf } = $code;
        }
}
close( FILE );

#
# first, generate UTF8 --> GB18030 table
#

$file = "utf8_to_gb18030.map";
open( FILE, "> $file" ) || die( "cannot open $file" );
print FILE "static pg_utf_to_local ULmapGB18030[ $count ] = {\n";

for $index ( sort {$a <=> $b} keys( %array ) ){
        $code = $array{ $index };
        $count--;
        if( $count == 0 ){
                printf FILE "  {0x%04x, 0x%04x}\n", $index, $code;
        } else {
                printf FILE "  {0x%04x, 0x%04x},\n", $index, $code;
        }
}

print FILE "};\n";
close(FILE);

#
# then generate GB18030 --> UTF8 table
#
reset 'array';

open( FILE, $in_file ) || die( "cannot open $in_file" );

while( <FILE> ){
        chop;
        if( /^#/ ){
                next;
        }
        ( $c, $u, $rest ) = split;
        $ucs = hex($u);
        $code = hex($c);
        if( $code >= 0x80 && $ucs >= 0x0080 ){
                $utf = &ucs2utf($ucs);
                if( $array{ $code } ne "" ){
                        printf STDERR "Warning: duplicate code: %04x\n",$ucs;
                        next;
                }
                $count++;

                $array{ $code } = $utf;
        }
}
close( FILE );

$file = "gb18030_to_utf8.map";
open( FILE, "> $file" ) || die( "cannot open $file" );
print FILE "static pg_local_to_utf LUmapGB18030[ $count ] = {\n";
for $index ( sort {$a <=> $b} keys( %array ) ){
        $utf = $array{ $index };
        $count--;
        if( $count == 0 ){
                printf FILE "  {0x%04x, 0x%04x}\n", $index, $utf;
        } else {
                printf FILE "  {0x%04x, 0x%04x},\n", $index, $utf;
        }
}

print FILE "};\n";
close(FILE);
---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Reply via email to