Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-19 Thread Arjen Nienhuis
 That's fine when not every code point is used, but it's different for
 GB18030 where almost all code points are used. Using a plain array
 saves space and saves a binary search.

 Well, it doesn't save any space: if we get rid of the additional linear
 ranges in the lookup table, what remains is 30733 entries requiring about
 256K, same as (or a bit less than) what you suggest.

We could do both. What about something like this:

static unsigned int utf32_to_gb18030_from_0x0001[1105] = {
/* 0x0 */ 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7, 0x8,
...
static unsigned int utf32_to_gb18030_from_0x2010[1587] = {
/* 0x0 */ 0xa95c, 0x8136a532, 0x8136a533, 0xa843, 0xa1aa, 0xa844,
0xa1ac, 0x8136a534,
...
static unsigned int utf32_to_gb18030_from_0x2E81[28965] = {
/* 0x0 */ 0xfe50, 0x8138fd39, 0x8138fe30, 0xfe54, 0x8138fe31,
0x8138fe32, 0x8138fe33, 0xfe57,
...
static unsigned int utf32_to_gb18030_from_0xE000[2149] = {
/* 0x0 */ 0xaaa1, 0xaaa2, 0xaaa3, 0xaaa4, 0xaaa5, 0xaaa6, 0xaaa7, 0xaaa8,
...
static unsigned int utf32_to_gb18030_from_0xF92C[254] = {
/* 0x0 */ 0xfd9c, 0x84308535, 0x84308536, 0x84308537, 0x84308538,
0x84308539, 0x84308630, 0x84308631,
...
static unsigned int utf32_to_gb18030_from_0xFE30[464] = {
/* 0x0 */ 0xa955, 0xa6f2, 0x84318538, 0xa6f4, 0xa6f5, 0xa6e0, 0xa6e1, 0xa6f0,
...

static uint32
conv_utf8_to_18030(uint32 code)
{
uint32  ucs = utf8word_to_unicode(code);

#define conv_lin(minunicode, maxunicode, mincode) \
if (ucs = minunicode  ucs = maxunicode) \
return gb_unlinear(ucs - minunicode + gb_linear(mincode))

#define conv_array(minunicode, maxunicode) \
if (ucs = minunicode  ucs = maxunicode) \
return utf32_to_gb18030_from_##minunicode[ucs - minunicode];

conv_array(0x0001, 0x0452);
conv_lin(0x0452, 0x200F, 0x8130D330);
conv_array(0x2010, 0x2643);
conv_lin(0x2643, 0x2E80, 0x8137A839);
conv_array(0x2E81, 0x9FA6);
conv_lin(0x9FA6, 0xD7FF, 0x82358F33);
conv_array(0xE000, 0xE865);
conv_lin(0xE865, 0xF92B, 0x8336D030);
conv_array(0xF92C, 0xFA2A);
conv_lin(0xFA2A, 0xFE2F, 0x84309C38);
conv_array(0xFE30, 0x1);
conv_lin(0x1, 0x10, 0x90308130);
/* No mapping exists */
return 0;
}


 The point about possibly being able to do this with a simple lookup table
 instead of binary search is valid, but I still say it's a mistake to
 suppose that we should consider that only for GB18030.  With the reduced
 table size, the GB18030 conversion tables are not all that far out of line
 with the other Far Eastern conversions:

 $ size utf8*.so | sort -n
textdata bss dec hex filename
1880 512  162408 968 utf8_and_ascii.so
2394 528  162938 b7a utf8_and_iso8859_1.so
6674 512  1672021c22 utf8_and_cyrillic.so
   24318 904  16   252386296 utf8_and_win.so
   28750 968  16   297347426 utf8_and_iso8859.so
  121110 512  16  121638   1db26 utf8_and_euc_cn.so
  123458 512  16  123986   1e452 utf8_and_sjis.so
  133606 512  16  134134   20bf6 utf8_and_euc_kr.so
  185014 512  16  185542   2d4c6 utf8_and_sjis2004.so
  185522 512  16  186050   2d6c2 utf8_and_euc2004.so
  212950 512  16  213478   341e6 utf8_and_euc_jp.so
  221394 512  16  221922   362e2 utf8_and_big5.so
  274772 512  16  275300   43364 utf8_and_johab.so
  26 512  16  278304   43f20 utf8_and_uhc.so
  332262 512  16  332790   513f6 utf8_and_euc_tw.so
  350640 512  16  351168   55bc0 utf8_and_gbk.so
  496680 512  16  497208   79638 utf8_and_gb18030.so

 If we were to get excited about reducing the conversion time for GB18030,
 it would clearly make sense to use similar infrastructure for GBK, and
 perhaps the EUC encodings too.

I'll check them as well. If they have linear ranges it should work.


 However, I'm not that excited about changing it.  We have not heard field
 complaints about these converters being too slow.  What's more, there
 doesn't seem to be any practical way to apply the same idea to the other
 conversion direction, which means if you do feel there's a speed problem
 this would only halfway fix it.

It does work if you linearlize it first. That's why we need to convert
to utf32 first as well. That's a form of linearization.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-15 Thread Arjen Nienhuis
On Thu, May 14, 2015 at 11:04 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, May 6, 2015 at 11:13 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Maybe not, but at the very least we should consider getting it fixed in
 9.5 rather than waiting a full development cycle.  Same as in
 https://www.postgresql.org/message-id/20150428131549.ga25...@momjian.us
 I'm not saying we MUST include it in 9.5, but we should at least
 consider it.  If we simply stash it in the open CF we guarantee that it
 will linger there for a year.

 Sure, if somebody has the time to put into it now, I'm fine with that.
 I'm afraid it won't be me, though: even if I had the time, I don't
 know enough about encodings.

 I concur that we should at least consider this patch for 9.5.  I've
 added it to
 https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

 I looked at this patch a bit, and read up on GB18030 (thank you
 wikipedia).  I concur we have a problem to fix.  I do not like the way
 this patch went about it though, ie copying-and-pasting LocalToUtf and
 UtfToLocal and their supporting routines into utf8_and_gb18030.c.
 Aside from being duplicative, this means the improved mapping capability
 isn't available to use with anything except GB18030.  (I do not know
 whether there are any linear mapping ranges in other encodings, but
 seeing that the Unicode crowd went to the trouble of defining a notation
 for it in http://www.unicode.org/reports/tr22/, I'm betting there are.)

 What I think would be a better solution, if slightly more invasive,
 is to extend LocalToUtf and UtfToLocal to add a callback function
 argument for a function of signature uint32 translate(uint32).
 This function, if provided, would be called after failing to find a
 mapping in the mapping table(s), and it could implement any translation
 that would be better handled by code than as a boatload of mapping-table
 entries.  If it returns zero then it doesn't know a translation either,
 so throw error as before.

 An alternative definition that could be proposed would be to call the
 function before consulting the mapping tables, not after, on the grounds
 that the function can probably exit cheaply if the input's not in a range
 that it cares about.  However, consulting the mapping table first wins
 if you have ranges that mostly work but contain a few exceptions: put
 the exceptions in the mapping table and then the function need not worry
 about handling them.

 Another alternative approach would be to try to define linear mapping
 ranges in a tabular fashion, for more consistency with what's there now.
 But that probably wouldn't work terribly well because the bytewise
 character representations used in this logic have to be converted into
 code points before you can do any sort of linear mapping.  We could
 hard-wire that conversion for UTF8, but the conversion in the other code
 space would be encoding-specific.  So we might as well just treat the
 whole linear mapping behavior as a black box function for each encoding.

 I'm also discounting the possibility that someone would want an
 algorithmic mapping for cases involving combined codes (ie pairs of
 UTF8 characters).  Of the encodings we support, only EUC_JIS_2004 and
 SHIFT_JIS_2004 need such cases at all, and those have only a handful of
 cases; so it doesn't seem popular enough to justify the extra complexity.

 I also notice that pg_gb18030_verifier isn't even close to strict enough;
 it basically relies on pg_gb18030_mblen which contains no checks
 whatsoever on the third and fourth bytes.  So that needs to be fixed.

 The verification tightening would definitely not be something to
 back-patch, and I'm inclined to think that the additional mapping
 capability shouldn't be either, in view of the facts that (a) we've
 had few if any field complaints yet, and (b) changing the signatures
 of LocalToUtf/UtfToLocal might possibly break third-party code.
 So I'm seeing this as a HEAD-only patch, but I do want to try to
 squeeze it into 9.5 rather than wait another year.

 Barring objections, I'll go make this happen.

GB18030 is a special case, because it's a full mapping of all unicode
characters, and most of it is algorithmically defined. This makes
UtfToLocal a bad choice to implement it. UtfToLocal assumes a sparse
array with only the defined characters. It uses binary search to find
a character. The 2 tables it uses now are huge (the .so file is 1MB).
Adding the rest of the valid characters to this scheme is possible,
but would make the problem worse.

I think fixing UtfToLocal only for the new characters is not optimal.

I think the best solution is to get rid of UtfToLocal for GB18030. Use
a specialized algorithm:
- For characters  U+ use the algorithm from my patch
- For charcaters = U+ use special mapping tables to map from/to
UTF32. Those tables would be smaller, and the code would be faster (I
assume).

For example (256 KB):

Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-15 Thread Arjen Nienhuis
On Fri, May 15, 2015 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Arjen Nienhuis a.g.nienh...@gmail.com writes:
 GB18030 is a special case, because it's a full mapping of all unicode
 characters, and most of it is algorithmically defined.

 True.

 This makes UtfToLocal a bad choice to implement it.

 I disagree with that conclusion.  There are still 3+ characters
 that need to be translated via lookup table, so we still need either
 UtfToLocal or a clone of it; and as I said previously, I'm not on board
 with cloning it.

 I think the best solution is to get rid of UtfToLocal for GB18030. Use
 a specialized algorithm:
 - For characters  U+ use the algorithm from my patch
 - For charcaters = U+ use special mapping tables to map from/to
 UTF32. Those tables would be smaller, and the code would be faster (I
 assume).

 I looked at what wikipeda claims is the authoritative conversion table:

 http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml

 According to that, about half of the characters below U+ can be
 processed via linear conversions, so I think we ought to save table
 space by doing that.  However, the remaining stuff that has to be
 processed by lookup still contains a pretty substantial number of
 characters that map to 4-byte GB18030 characters, so I don't think
 we can get any table size savings by adopting a bespoke table format.
 We might as well use UtfToLocal.  (Worth noting in this connection
 is that we haven't seen fit to sweat about UtfToLocal's use of 4-byte
 table entries for other encodings, even though most of the others
 are not concerned with characters outside the BMP.)


It's not about 4 vs 2 bytes, it's about using 8 bytes vs 4. UtfToLocal
uses a sparse array:

map = {{0, x}, {1, y}, {2, z}, ...}

v.s.

map = {x, y, z, ...}

That's fine when not every code point is used, but it's different for
GB18030 where almost all code points are used. Using a plain array
saves space and saves a binary search.

Gr. Arjen


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-15 Thread Tom Lane
Arjen Nienhuis a.g.nienh...@gmail.com writes:
 GB18030 is a special case, because it's a full mapping of all unicode
 characters, and most of it is algorithmically defined.

True.

 This makes UtfToLocal a bad choice to implement it.

I disagree with that conclusion.  There are still 3+ characters
that need to be translated via lookup table, so we still need either
UtfToLocal or a clone of it; and as I said previously, I'm not on board
with cloning it.

 I think the best solution is to get rid of UtfToLocal for GB18030. Use
 a specialized algorithm:
 - For characters  U+ use the algorithm from my patch
 - For charcaters = U+ use special mapping tables to map from/to
 UTF32. Those tables would be smaller, and the code would be faster (I
 assume).

I looked at what wikipeda claims is the authoritative conversion table:

http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/gb-18030-2000.xml

According to that, about half of the characters below U+ can be
processed via linear conversions, so I think we ought to save table
space by doing that.  However, the remaining stuff that has to be
processed by lookup still contains a pretty substantial number of
characters that map to 4-byte GB18030 characters, so I don't think
we can get any table size savings by adopting a bespoke table format.
We might as well use UtfToLocal.  (Worth noting in this connection
is that we haven't seen fit to sweat about UtfToLocal's use of 4-byte
table entries for other encodings, even though most of the others
are not concerned with characters outside the BMP.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-15 Thread Robert Haas
On Fri, May 15, 2015 at 3:18 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, I'm not that excited about changing it.  We have not heard field
 complaints about these converters being too slow.  What's more, there
 doesn't seem to be any practical way to apply the same idea to the other
 conversion direction, which means if you do feel there's a speed problem
 this would only halfway fix it.

Half a loaf is better than none.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-15 Thread Tom Lane
Arjen Nienhuis a.g.nienh...@gmail.com writes:
 On Fri, May 15, 2015 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 According to that, about half of the characters below U+ can be
 processed via linear conversions, so I think we ought to save table
 space by doing that.  However, the remaining stuff that has to be
 processed by lookup still contains a pretty substantial number of
 characters that map to 4-byte GB18030 characters, so I don't think
 we can get any table size savings by adopting a bespoke table format.
 We might as well use UtfToLocal.  (Worth noting in this connection
 is that we haven't seen fit to sweat about UtfToLocal's use of 4-byte
 table entries for other encodings, even though most of the others
 are not concerned with characters outside the BMP.)

 It's not about 4 vs 2 bytes, it's about using 8 bytes vs 4. UtfToLocal
 uses a sparse array:

 map = {{0, x}, {1, y}, {2, z}, ...}

 v.s.

 map = {x, y, z, ...}

 That's fine when not every code point is used, but it's different for
 GB18030 where almost all code points are used. Using a plain array
 saves space and saves a binary search.

Well, it doesn't save any space: if we get rid of the additional linear
ranges in the lookup table, what remains is 30733 entries requiring about
256K, same as (or a bit less than) what you suggest.

The point about possibly being able to do this with a simple lookup table
instead of binary search is valid, but I still say it's a mistake to
suppose that we should consider that only for GB18030.  With the reduced
table size, the GB18030 conversion tables are not all that far out of line
with the other Far Eastern conversions:

$ size utf8*.so | sort -n
   textdata bss dec hex filename
   1880 512  162408 968 utf8_and_ascii.so
   2394 528  162938 b7a utf8_and_iso8859_1.so
   6674 512  1672021c22 utf8_and_cyrillic.so
  24318 904  16   252386296 utf8_and_win.so
  28750 968  16   297347426 utf8_and_iso8859.so
 121110 512  16  121638   1db26 utf8_and_euc_cn.so
 123458 512  16  123986   1e452 utf8_and_sjis.so
 133606 512  16  134134   20bf6 utf8_and_euc_kr.so
 185014 512  16  185542   2d4c6 utf8_and_sjis2004.so
 185522 512  16  186050   2d6c2 utf8_and_euc2004.so
 212950 512  16  213478   341e6 utf8_and_euc_jp.so
 221394 512  16  221922   362e2 utf8_and_big5.so
 274772 512  16  275300   43364 utf8_and_johab.so
 26 512  16  278304   43f20 utf8_and_uhc.so
 332262 512  16  332790   513f6 utf8_and_euc_tw.so
 350640 512  16  351168   55bc0 utf8_and_gbk.so
 496680 512  16  497208   79638 utf8_and_gb18030.so

If we were to get excited about reducing the conversion time for GB18030,
it would clearly make sense to use similar infrastructure for GBK, and
perhaps the EUC encodings too.

However, I'm not that excited about changing it.  We have not heard field
complaints about these converters being too slow.  What's more, there
doesn't seem to be any practical way to apply the same idea to the other
conversion direction, which means if you do feel there's a speed problem
this would only halfway fix it.

So my feeling is that the most practical and maintainable answer is to
keep GB18030 using code that is mostly shared with the other encodings.
I've committed a fix that does it that way for 9.5.  If you want to
pursue the idea of a faster conversion using direct lookup tables,
I think that would be 9.6 material at this point.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-14 Thread Tom Lane
I wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, May 6, 2015 at 11:13 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Maybe not, but at the very least we should consider getting it fixed in
 9.5 rather than waiting a full development cycle.  Same as in
 https://www.postgresql.org/message-id/20150428131549.ga25...@momjian.us
 I'm not saying we MUST include it in 9.5, but we should at least
 consider it.  If we simply stash it in the open CF we guarantee that it
 will linger there for a year.

 Sure, if somebody has the time to put into it now, I'm fine with that.
 I'm afraid it won't be me, though: even if I had the time, I don't
 know enough about encodings.

 I concur that we should at least consider this patch for 9.5.  I've
 added it to
 https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

I looked at this patch a bit, and read up on GB18030 (thank you
wikipedia).  I concur we have a problem to fix.  I do not like the way
this patch went about it though, ie copying-and-pasting LocalToUtf and
UtfToLocal and their supporting routines into utf8_and_gb18030.c.
Aside from being duplicative, this means the improved mapping capability
isn't available to use with anything except GB18030.  (I do not know
whether there are any linear mapping ranges in other encodings, but
seeing that the Unicode crowd went to the trouble of defining a notation
for it in http://www.unicode.org/reports/tr22/, I'm betting there are.)

What I think would be a better solution, if slightly more invasive,
is to extend LocalToUtf and UtfToLocal to add a callback function
argument for a function of signature uint32 translate(uint32).
This function, if provided, would be called after failing to find a
mapping in the mapping table(s), and it could implement any translation
that would be better handled by code than as a boatload of mapping-table
entries.  If it returns zero then it doesn't know a translation either,
so throw error as before.

An alternative definition that could be proposed would be to call the
function before consulting the mapping tables, not after, on the grounds
that the function can probably exit cheaply if the input's not in a range
that it cares about.  However, consulting the mapping table first wins
if you have ranges that mostly work but contain a few exceptions: put
the exceptions in the mapping table and then the function need not worry
about handling them.

Another alternative approach would be to try to define linear mapping
ranges in a tabular fashion, for more consistency with what's there now.
But that probably wouldn't work terribly well because the bytewise
character representations used in this logic have to be converted into
code points before you can do any sort of linear mapping.  We could
hard-wire that conversion for UTF8, but the conversion in the other code
space would be encoding-specific.  So we might as well just treat the
whole linear mapping behavior as a black box function for each encoding.

I'm also discounting the possibility that someone would want an
algorithmic mapping for cases involving combined codes (ie pairs of
UTF8 characters).  Of the encodings we support, only EUC_JIS_2004 and
SHIFT_JIS_2004 need such cases at all, and those have only a handful of
cases; so it doesn't seem popular enough to justify the extra complexity.

I also notice that pg_gb18030_verifier isn't even close to strict enough;
it basically relies on pg_gb18030_mblen which contains no checks
whatsoever on the third and fourth bytes.  So that needs to be fixed.

The verification tightening would definitely not be something to
back-patch, and I'm inclined to think that the additional mapping
capability shouldn't be either, in view of the facts that (a) we've
had few if any field complaints yet, and (b) changing the signatures
of LocalToUtf/UtfToLocal might possibly break third-party code.
So I'm seeing this as a HEAD-only patch, but I do want to try to
squeeze it into 9.5 rather than wait another year.

Barring objections, I'll go make this happen.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-06 Thread Robert Haas
On Tue, May 5, 2015 at 9:04 AM, Arjen Nienhuis a.g.nienh...@gmail.com wrote:
 Can someone look at this patch. It should fix bug #12845.

 The current tests for conversions are very minimal. I expanded them a
 bit for this bug.

 I think the binary search in the .map files should be removed but I
 leave that for another patch.

Please add this patch to
https://commitfest.postgresql.org/action/commitfest_view/open so we
don't forget about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-06 Thread Alvaro Herrera
Robert Haas wrote:
 On Tue, May 5, 2015 at 9:04 AM, Arjen Nienhuis a.g.nienh...@gmail.com wrote:
  Can someone look at this patch. It should fix bug #12845.
 
  The current tests for conversions are very minimal. I expanded them a
  bit for this bug.
 
  I think the binary search in the .map files should be removed but I
  leave that for another patch.
 
 Please add this patch to
 https://commitfest.postgresql.org/action/commitfest_view/open so we
 don't forget about it.

If we think this is a bug fix, we should add it to the open items list,
https://wiki.postgresql.org/wiki/Open_Items

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-06 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, May 6, 2015 at 11:13 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
 Maybe not, but at the very least we should consider getting it fixed in
 9.5 rather than waiting a full development cycle.  Same as in
 https://www.postgresql.org/message-id/20150428131549.ga25...@momjian.us
 I'm not saying we MUST include it in 9.5, but we should at least
 consider it.  If we simply stash it in the open CF we guarantee that it
 will linger there for a year.

 Sure, if somebody has the time to put into it now, I'm fine with that.
 I'm afraid it won't be me, though: even if I had the time, I don't
 know enough about encodings.

I concur that we should at least consider this patch for 9.5.  I've
added it to
https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

I'm willing to look at it myself, whenever my non-copious spare time
permits; but that won't be in the immediate future.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-06 Thread Robert Haas
On Wed, May 6, 2015 at 11:13 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 It's a behavior change, so I don't think we would consider a back-patch.

 Maybe not, but at the very least we should consider getting it fixed in
 9.5 rather than waiting a full development cycle.  Same as in
 https://www.postgresql.org/message-id/20150428131549.ga25...@momjian.us
 I'm not saying we MUST include it in 9.5, but we should at least
 consider it.  If we simply stash it in the open CF we guarantee that it
 will linger there for a year.

Sure, if somebody has the time to put into it now, I'm fine with that.
I'm afraid it won't be me, though: even if I had the time, I don't
know enough about encodings.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-06 Thread Alvaro Herrera
Robert Haas wrote:
 On Wed, May 6, 2015 at 10:55 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  Robert Haas wrote:
  On Tue, May 5, 2015 at 9:04 AM, Arjen Nienhuis a.g.nienh...@gmail.com 
  wrote:
   Can someone look at this patch. It should fix bug #12845.
  
   The current tests for conversions are very minimal. I expanded them a
   bit for this bug.
  
   I think the binary search in the .map files should be removed but I
   leave that for another patch.
 
  Please add this patch to
  https://commitfest.postgresql.org/action/commitfest_view/open so we
  don't forget about it.
 
  If we think this is a bug fix, we should add it to the open items list,
  https://wiki.postgresql.org/wiki/Open_Items
 
 It's a behavior change, so I don't think we would consider a back-patch.

Maybe not, but at the very least we should consider getting it fixed in
9.5 rather than waiting a full development cycle.  Same as in
https://www.postgresql.org/message-id/20150428131549.ga25...@momjian.us
I'm not saying we MUST include it in 9.5, but we should at least
consider it.  If we simply stash it in the open CF we guarantee that it
will linger there for a year.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch for bug #12845 (GB18030 encoding)

2015-05-06 Thread Robert Haas
On Wed, May 6, 2015 at 10:55 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Robert Haas wrote:
 On Tue, May 5, 2015 at 9:04 AM, Arjen Nienhuis a.g.nienh...@gmail.com 
 wrote:
  Can someone look at this patch. It should fix bug #12845.
 
  The current tests for conversions are very minimal. I expanded them a
  bit for this bug.
 
  I think the binary search in the .map files should be removed but I
  leave that for another patch.

 Please add this patch to
 https://commitfest.postgresql.org/action/commitfest_view/open so we
 don't forget about it.

 If we think this is a bug fix, we should add it to the open items list,
 https://wiki.postgresql.org/wiki/Open_Items

It's a behavior change, so I don't think we would consider a back-patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers