On 10/07/2016 11:36 AM, Kyotaro HORIGUCHI wrote:
The radix conversion function and map conversion script became
more generic than the previous state. So I could easily added
radix conversion of EUC_JP in addition to SjiftJIS.

nm -S said that the size of radix tree data for sjis->utf8
conversion is 34kB and that for utf8->sjis is 46kB.  (eucjp->utf8
57kB, utf8->eucjp 93kB) LUmapSJIS and ULmapSJIS was 62kB and
59kB, and LUmapEUC_JP and ULmapEUC_JP was 106kB and 105kB. If I'm
not missing something, radix tree is faster and require less
memory.

Cool!

Currently the tree structure is devided into several elements,
One for 2-byte, other ones for 3-byte and 4-byte codes and output
table. The other than the last one is logically and technically
merged into single table but it makes the generator script far
complex than the current complexity. I no longer want to play
hide'n seek with complex perl object..

I think that's OK. There isn't really anything to gain by merging them.

It might be better that combining this as a native feature of the
core. Currently the helper function is in core but that function
is given as conv_func on calling LocalToUtf.

Yeah, I think we want to completely replace the current binary-search based code with this. I would rather maintain just one mechanism.

Current implement uses *.map files of pg_utf_to_local as
input. It seems not good but the radix tree files is completely
uneditable. Provide custom made loading functions for every
source instead of load_chartable() would be the way to go.

# However, for example utf8_to_sjis.map, it doesn't seem to have
# generated from the source mentioned in UCS_to_SJIS.pl

Ouch. We should find and document an authoritative source for all the mappings we have...

I think the next steps here are:

1. Find an authoritative source for all the existing mappings.
2. Generate the radix tree files directly from the authoritative sources, instead of the existing *.map files.
3. Completely replace the existing binary-search code with this.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to