On Thu, Feb 13, 2014 at 09:47:01PM -0500, Bruce Momjian wrote: > On Wed, Oct 16, 2013 at 02:17:11PM -0400, Bruce Momjian wrote: > > > > You can see the UTF8 case is fine because \n is considered greater > > > > than space, but in the C locale, where \n is less than space, the > > > > false return value shows the problem with > > > > internal_bpchar_pattern_compare() trimming the string and first > > > > comparing on lengths. This is exactly the problem you outline, where > > > > space trimming assumes everything is less than a space. > > > > > > For collations other than C some of those issues that have to do with > > > string comparisons might simply be hidden, depending on how strcoll() > > > handles inputs off different lengths: If strcoll() applies implicit > > > space padding to the shorter value, there won't be any visible > > > difference in ordering between bpchar and varchar values. If strcoll() > > > does not apply such space padding, the right-trimming of bpchar values > > > causes very similar issues even in a en_US collation. > > I have added the attached C comment to explain the problem, and added a > TODO item to fix it if we ever break binary upgrading. > > Does anyone think this warrants a doc mention?
I have done some more thinking on this and I found a way to document this, which reduces our need to actually fix it some day. I am afraid the behavioral change needed to fix this might break so many applications that the fix will never be done, though I will keep the TODO item until I get more feedback on that. Patch attached. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml new file mode 100644 index 30fd9bb..9635004 *** a/doc/src/sgml/datatype.sgml --- b/doc/src/sgml/datatype.sgml *************** SELECT '52093.89'::money::numeric::float *** 1072,1081 **** <para> Values of type <type>character</type> are physically padded with spaces to the specified width <replaceable>n</>, and are ! stored and displayed that way. However, the padding spaces are ! treated as semantically insignificant. Trailing spaces are ! disregarded when comparing two values of type <type>character</type>, ! and they will be removed when converting a <type>character</type> value to one of the other string types. Note that trailing spaces <emphasis>are</> semantically significant in <type>character varying</type> and <type>text</type> values, and --- 1072,1084 ---- <para> Values of type <type>character</type> are physically padded with spaces to the specified width <replaceable>n</>, and are ! stored and displayed that way. However, trailing spaces are treated as ! semantically insignificant and disregarded when comparing two values ! of type <type>character</type>. In collations where whitespace ! is significant, this behavior can produce unexpected results, ! e.g. <command>SELECT 'a '::CHAR(2) collate "C" < 'a\n'::CHAR(2) ! returns true. ! Trailing spaces are removed when converting a <type>character</type> value to one of the other string types. Note that trailing spaces <emphasis>are</> semantically significant in <type>character varying</type> and <type>text</type> values, and diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c new file mode 100644 index 284b5d1..502ca44 *** a/src/backend/utils/adt/varchar.c --- b/src/backend/utils/adt/varchar.c *************** bpcharcmp(PG_FUNCTION_ARGS) *** 846,863 **** len2; int cmp; - /* - * Trimming trailing spaces off of both strings can cause a string - * with a character less than a space to compare greater than a - * space-extended string, e.g. this returns false: - * SELECT E'ab\n'::CHAR(10) < E'ab '::CHAR(10); - * even though '\n' is less than the space if CHAR(10) was - * space-extended. The correct solution would be to trim only - * the longer string to be the same length of the shorter, if - * possible, then do the comparison. However, changing this - * might break existing indexes, breaking binary upgrades. - * For details, see http://www.postgresql.org/message-id/CAK+WP1xdmyswEehMuetNztM4H199Z1w9KWRHVMKzyyFM+hV=z...@mail.gmail.com - */ len1 = bcTruelen(arg1); len2 = bcTruelen(arg2); --- 846,851 ----
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers