Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: No, especially if it results in queries that used to work breaking, which it well could. But I'm not sure where to go with it from there, beyond throwing up my hands. Well, that's why there's been no movement on this since 2004 :-(. The amount of work needed for a better solution seems far out of proportion to the benefits. We could extend the existing logic to handle multi-bytes characters though, couldn't we? It's not going to fix all the problems but at least it'll do something sane. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Greg Stark gsst...@mit.edu writes: On Mon, Nov 22, 2010 at 12:38 AM, Tom Lane t...@sss.pgh.pa.us wrote: Well, that's why there's been no movement on this since 2004 :-(. The amount of work needed for a better solution seems far out of proportion to the benefits. We could extend the existing logic to handle multi-bytes characters though, couldn't we? It's not going to fix all the problems but at least it'll do something sane. Not easily, cheaply, or portably. The closest you could get in that line would be to use towlower(), which doesn't exist everywhere (though I grant probably most platforms have it by now). The much much bigger problem though is that we don't know what character representation towlower() deals in. We recently kluged the regex code to assume that the wchar_t representation for UTF8 locales is the standardized Unicode code point. I haven't heard of that breaking, but 9.0 hasn't been out that long. In other multibyte encodings we have no idea how to use that function, short of invoking mbstowcs/wcstombs or local equivalent, which is expensive and doesn't readily allow a short-circuit for ASCII. And, after you've hacked your way through all that, you still end up with case-folding behavior that depends on the prevailing locale. Which is dangerous for the previously cited reasons, and arguably not spec-compliant. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On 11/23/2010 11:14 AM, Greg Stark wrote: On Mon, Nov 22, 2010 at 12:38 AM, Tom Lanet...@sss.pgh.pa.us wrote: No, especially if it results in queries that used to work breaking, which it well could. But I'm not sure where to go with it from there, beyond throwing up my hands. Well, that's why there's been no movement on this since 2004 :-(. The amount of work needed for a better solution seems far out of proportion to the benefits. We could extend the existing logic to handle multi-bytes characters though, couldn't we? It's not going to fix all the problems but at least it'll do something sane. What casing rules will you apply? How will you know what is an upper case character and what its lower case character is? The sad, short answer is that there are no simple rules beyond ASCII. See the URL I posted upthread. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Tue, Nov 23, 2010 at 5:12 PM, Tom Lane t...@sss.pgh.pa.us wrote: And, after you've hacked your way through all that, you still end up with case-folding behavior that depends on the prevailing locale. Which is dangerous for the previously cited reasons, and arguably not spec-compliant. So I thought the problem with the Turkish locale definition was that it redefined how a capital ascii character which was present in standard SQL identifiers was lowercased. Resulting in standard SQL syntax not working. I'm not sure I understand the danger if a user creates an object in a database with a particular encoding and locale using that locale for downcasing in the future. We don't currently support changing the locale of a database or using different locales in the future. Even with Peter's patch I think we can reasonably require the user to specify a single locale which controls how the SQL identifiers are interpreted regardless of the collations used in the operations. The points about the C API being limited and nonportable are a different issue.I guess I would need to do research to see if we're missing something which would help here. Otherwise I might be beginning to see the value in that /other/ library which I've argued against in the past. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Greg Stark gsst...@mit.edu writes: I'm not sure I understand the danger if a user creates an object in a database with a particular encoding and locale using that locale for downcasing in the future. The case I was worried about is dumping from one database and reloading into another one with a different locale. Although I suppose there are enough *other* reasons why that might fail that adding changes of downcasing behavior might not be a big deal. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Tue, Nov 23, 2010 at 5:51 PM, Tom Lane t...@sss.pgh.pa.us wrote: The case I was worried about is dumping from one database and reloading into another one with a different locale. Although I suppose there are enough *other* reasons why that might fail that adding changes of downcasing behavior might not be a big deal. If you dump the whole database then pg_dump would create the new database with the correct encoding and locale. If you change it then that can already cause it to fail if the data can't be converted to the new encoding. And as you point out there are all kinds of ways you can cause that to fail by making the context incompatible with the definitions you're loading. The lesson we learned in the past is that we have to ignore the locale for all the characters present in the standard identifiers. Beyond that I think this is just an implementation problem which may be a show stopper in itself but if we can do anything with mulitbyte characters it's probably an improvement over what we do now. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On sön, 2010-11-21 at 18:48 -0500, Tom Lane wrote: Yeah. I'm actually not sure that the SQL committee has thought very hard about this, because the spec is worded as though they think that Unicode case normalization is all they have to say to uniquely define what to do. The Unicode guys recognize that case mapping is locale-specific, which puts us right back at square one. But leaving spec compliance aside, we know from bitter experience that we cannot use a definition that lets the Turkish locale fool with the mapping of i/I. I suspect that locale-dependent mappings of any other characters are just as bad, we simply haven't had enough users burnt by such cases to have an institutional memory of it. The number of locale-dependent case mappings in the entire universe of Unicode is actually limited to 7 cases for Lithuanian and 8 cases for Turkish. (ftp://ftp.unicode.org/Public/UNIDATA/SpecialCasing.txt) So it would be fair to say that there is a default case mapping, and that is what the SQL standard presumably refers to. One thing that we could do is let the user declare that he thinks his current locale is consistent with the Unicode case normalization, and apply the full Unicode conversion if so. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Wed, Jul 7, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: Rajanikant Chirmade rajanikant.chirm...@enterprisedb.com writes: Every identifier is downcase truncated by function downcase_truncate_identifier() before using it. But since the function downcase_truncate_identifier() is not multibyte-charecter aware, it is not able to downcase some of special charecters in identifier like my_SchemÄ. IIRC this is intentional. Please consult the archives for previous discussions. Why would this be intentional? One concern I have about this approach is that I am guessing that the current implementation of str_tolower() is a lot slower than the current implementation of downcase_truncate_identifier(). It would be nice to have an implementation that is capable of handling wide characters but doesn't actually incur the speed penalty unless a wide character is actually present. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Robert Haas robertmh...@gmail.com writes: On Wed, Jul 7, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: IIRC this is intentional. Please consult the archives for previous discussions. Why would this be intentional? Well, it's intentional for lack of any infrastructure that would allow a more spec-compliant approach. As you say, calling str_tolower here is probably a non-starter for performance reasons. Another big problem is that str_tolower produces a locale-specific downcasing conversion. This (a) is going to create portability headaches of the first magnitude, and (b) is not really an advance in terms of spec compliance. The SQL spec says that identifier case folding should be done according to the Unicode standard, but it's not safe to assume that any random platform-specific locale is going to act that way. A specific example of a locale that is known to NOT behave acceptably is Turkish: they have weird ideas about i versus I, which in fact broke things back when we used to use tolower for this purpose. See the archives from early 2004, and in particular commit 59f9a0b9df0d224bb62ff8ec5b65e0b187655742, which removed the exact same logic (though not wide-character-aware) that this patch proposes to put back. I think the given patch can be rejected out of hand. If the OP has any ideas about doing non-locale-dependent case folding at an acceptable speed, I'm happy to listen. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Sun, Nov 21, 2010 at 4:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Wed, Jul 7, 2010 at 10:07 AM, Tom Lane t...@sss.pgh.pa.us wrote: IIRC this is intentional. Please consult the archives for previous discussions. Why would this be intentional? Well, it's intentional for lack of any infrastructure that would allow a more spec-compliant approach. As you say, calling str_tolower here is probably a non-starter for performance reasons. Another big problem is that str_tolower produces a locale-specific downcasing conversion. This (a) is going to create portability headaches of the first magnitude, and (b) is not really an advance in terms of spec compliance. The SQL spec says that identifier case folding should be done according to the Unicode standard, but it's not safe to assume that any random platform-specific locale is going to act that way. A specific example of a locale that is known to NOT behave acceptably is Turkish: they have weird ideas about i versus I, which in fact broke things back when we used to use tolower for this purpose. See the archives from early 2004, and in particular commit 59f9a0b9df0d224bb62ff8ec5b65e0b187655742, which removed the exact same logic (though not wide-character-aware) that this patch proposes to put back. I think the given patch can be rejected out of hand. If the OP has any ideas about doing non-locale-dependent case folding at an acceptable speed, I'm happy to listen. I think that's fair. It actually doesn't seem like it should be that hard if we knew that the server encoding were UTF8 - it's just a big translation table somewhere, no? We use heuristics to copy as many characters as possible without detailed examination and consult the lookup table for the rest. However, that's not very practical in the face of more than one encoding that must be handled. What sort of infrastructure would actually be useful for dealing with this problem? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On 11/21/2010 06:09 PM, Robert Haas wrote: I think that's fair. It actually doesn't seem like it should be that hard if we knew that the server encoding were UTF8 - it's just a big translation table somewhere, no? No, it's far more complex. See for example http://unicode.org/reports/tr21/tr21-3.html, which says: There are a number of complications to case mappings that occur once the repertoire of characters is expanded beyond ASCII. * Because of the inclusion of certain composite characters for compatibility, such as 01F1 DZ /capital dz/, there is a third case, called /titlecase/, which is used where the first letter of a word is to be capitalized (e.g. Titlecase, vs. UPPERCASE, or lowercase). o For example, the title case of the example character is 01F2 Dz /capital d with small z/. * Case mappings may produce strings of different length than the original. o For example, the German character 00DF ß /small letter sharp s/ expands when uppercased to the sequence of two characters SS. This also occurs where there is no precomposed character corresponding to a case mapping, such as with 0149 'n /latin small letter n preceded by apostrophe./ * Characters may also have different case mappings, depending on the context. o For example, 03A3 ? /capital sigma/ lowercases to 03C3 ? /small sigma/ if it is followed by another letter, but lowercases to 03C2 ? /small final sigma/ if it is not. * Characters may have case mappings that depend on the locale. o For example, in Turkish the letter 0049 I /capital letter i/ lowercases to 0131 ? /small dotless i/. * Case mappings are not, in general, reversible. o For example, once the string McGowan has been uppercased, lowercased or titlecased, the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation. cheers andrew
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Sun, Nov 21, 2010 at 6:22 PM, Andrew Dunstan and...@dunslane.net wrote: On 11/21/2010 06:09 PM, Robert Haas wrote: I think that's fair. It actually doesn't seem like it should be that hard if we knew that the server encoding were UTF8 - it's just a big translation table somewhere, no? No, it's far more complex. See for example http://unicode.org/reports/tr21/tr21-3.html, which says: There are a number of complications to case mappings that occur once the repertoire of characters is expanded beyond ASCII. Because of the inclusion of certain composite characters for compatibility, such as 01F1 DZ capital dz, there is a third case, called titlecase, which is used where the first letter of a word is to be capitalized (e.g. Titlecase, vs. UPPERCASE, or lowercase). For example, the title case of the example character is 01F2 Dz capital d with small z. Case mappings may produce strings of different length than the original. For example, the German character 00DF ß small letter sharp s expands when uppercased to the sequence of two characters SS. This also occurs where there is no precomposed character corresponding to a case mapping, such as with 0149 ʼn latin small letter n preceded by apostrophe. Characters may also have different case mappings, depending on the context. For example, 03A3 Σ capital sigma lowercases to 03C3 σ small sigma if it is followed by another letter, but lowercases to 03C2 ς small final sigma if it is not. Characters may have case mappings that depend on the locale. For example, in Turkish the letter 0049 I capital letter i lowercases to 0131 ı small dotless i. Case mappings are not, in general, reversible. For example, once the string McGowan has been uppercased, lowercased or titlecased, the original cannot be recovered by applying another uppercase, lowercase, or titlecase operation. Yikes. So what do people do about this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Andrew Dunstan and...@dunslane.net writes: On 11/21/2010 06:09 PM, Robert Haas wrote: I think that's fair. It actually doesn't seem like it should be that hard if we knew that the server encoding were UTF8 - it's just a big translation table somewhere, no? No, it's far more complex. See for example http://unicode.org/reports/tr21/tr21-3.html, which says: Yeah. I'm actually not sure that the SQL committee has thought very hard about this, because the spec is worded as though they think that Unicode case normalization is all they have to say to uniquely define what to do. The Unicode guys recognize that case mapping is locale-specific, which puts us right back at square one. But leaving spec compliance aside, we know from bitter experience that we cannot use a definition that lets the Turkish locale fool with the mapping of i/I. I suspect that locale-dependent mappings of any other characters are just as bad, we simply haven't had enough users burnt by such cases to have an institutional memory of it. But for example do you really think it's a good idea if pg_dump and reload into a DB with a different locale results in changing the normalized form of SQL identifiers? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Sun, Nov 21, 2010 at 6:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: Andrew Dunstan and...@dunslane.net writes: On 11/21/2010 06:09 PM, Robert Haas wrote: I think that's fair. It actually doesn't seem like it should be that hard if we knew that the server encoding were UTF8 - it's just a big translation table somewhere, no? No, it's far more complex. See for example http://unicode.org/reports/tr21/tr21-3.html, which says: Yeah. I'm actually not sure that the SQL committee has thought very hard about this, because the spec is worded as though they think that Unicode case normalization is all they have to say to uniquely define what to do. The Unicode guys recognize that case mapping is locale-specific, which puts us right back at square one. But leaving spec compliance aside, we know from bitter experience that we cannot use a definition that lets the Turkish locale fool with the mapping of i/I. I suspect that locale-dependent mappings of any other characters are just as bad, we simply haven't had enough users burnt by such cases to have an institutional memory of it. But for example do you really think it's a good idea if pg_dump and reload into a DB with a different locale results in changing the normalized form of SQL identifiers? No, especially if it results in queries that used to work breaking, which it well could. But I'm not sure where to go with it from there, beyond throwing up my hands. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Robert Haas robertmh...@gmail.com writes: On Sun, Nov 21, 2010 at 6:48 PM, Tom Lane t...@sss.pgh.pa.us wrote: spec compliance aside, we know from bitter experience that we cannot use a definition that lets the Turkish locale fool with the mapping of i/I. I suspect that locale-dependent mappings of any other characters are just as bad, we simply haven't had enough users burnt by such cases to have an institutional memory of it. But for example do you really think it's a good idea if pg_dump and reload into a DB with a different locale results in changing the normalized form of SQL identifiers? No, especially if it results in queries that used to work breaking, which it well could. But I'm not sure where to go with it from there, beyond throwing up my hands. Well, that's why there's been no movement on this since 2004 :-(. The amount of work needed for a better solution seems far out of proportion to the benefits. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Mon, Jul 26, 2010 at 12:40 AM, Rajanikant Chirmade rajanikant.chirm...@enterprisedb.com wrote: Since discussion stopped in discussion thread http://archives.postgresql.org/pgsql-bugs/2006-09/msg00128.php Are there any implications of this change in handling identifiers ? Thanks Regards, Rajanikant Chirmade An even more relevant message appears to be this one: http://archives.postgresql.org/pgsql-bugs/2006-09/msg00133.php Both this and the comment in downcase_truncate_identifier() suggests that the current method is attributable to lack of support for Unicode-aware case normalization and is known not to work correctly in all locales. Locale and encoding stuff isn't really my area of expertise, but if now have support for Unicode-aware case normalization, shouldn't we be using it here, too? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Since discussion stopped in discussion thread http://archives.postgresql.org/pgsql-bugs/2006-09/msg00128.php Are there any implications of this change in handling identifiers ? Thanks Regards, Rajanikant Chirmade On Tue, Jul 13, 2010 at 12:10 AM, Rajanikant Chirmade rajanikant.chirm...@enterprisedb.com wrote: On Wed, Jul 7, 2010 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Rajanikant Chirmade rajanikant.chirm...@enterprisedb.com writes: Every identifier is downcase truncated by function downcase_truncate_identifier() before using it. But since the function downcase_truncate_identifier() is not multibyte-charecter aware, it is not able to downcase some of special charecters in identifier like my_SchemÄ. IIRC this is intentional. Please consult the archives for previous discussions. regards, tom lane I got one discussion thread on same issue. But it stopped without any conclusion. http://archives.postgresql.org/pgsql-bugs/2006-09/msg00128.php Thanks Regards, Rajanikant Chirmade.
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
On Wed, Jul 7, 2010 at 7:37 PM, Tom Lane t...@sss.pgh.pa.us wrote: Rajanikant Chirmade rajanikant.chirm...@enterprisedb.com writes: Every identifier is downcase truncated by function downcase_truncate_identifier() before using it. But since the function downcase_truncate_identifier() is not multibyte-charecter aware, it is not able to downcase some of special charecters in identifier like my_SchemÄ. IIRC this is intentional. Please consult the archives for previous discussions. regards, tom lane I got one discussion thread on same issue. But it stopped without any conclusion. http://archives.postgresql.org/pgsql-bugs/2006-09/msg00128.php Thanks Regards, Rajanikant Chirmade.
Re: [HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Rajanikant Chirmade rajanikant.chirm...@enterprisedb.com writes: Every identifier is downcase truncated by function downcase_truncate_identifier() before using it. But since the function downcase_truncate_identifier() is not multibyte-charecter aware, it is not able to downcase some of special charecters in identifier like my_SchemÄ. IIRC this is intentional. Please consult the archives for previous discussions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] multibyte-character aware support for function downcase_truncate_identifier()
Hi All, Every identifier is downcase truncated by function downcase_truncate_identifier() before using it. But since the function downcase_truncate_identifier() is not multibyte-charecter aware, it is not able to downcase some of special charecters in identifier like my_SchemÄ. If schema is created of name my_SchemÄ, pg_namespace shows entries as my_schemÄ . Example is as below : postgres=# create schema my_SchemÄ; CREATE SCHEMA postgres=# select nspname from pg_namespace; nspname pg_toast pg_temp_1 pg_toast_temp_1 pg_catalog public information_schema my_schemÄ (7 rows) postgres=# Achually it should downcase as my_schemä as per multibyte-character aware as lower() works : postgres=# select lower('my_SchemÄ'); lower --- my_schemä (1 row) There is function str_tolower() which work as multibyte-character aware. Need to use same function where ever downcase required. So, it will create uniform down-casing at all places. two places identified where need to add wide-character aware downcase : 1. downcase_truncate_identifier(); - Attaching patch for changes and small test case. Following functions should also synchronise with downcase_truncate_identifier() : 2. pg_strcasecmp(); 3. pg_strncasecmp(); - to add fix at these functions (2,3) need to move str_tolower() from formatting.c from backend to some common location (may be in src/port) from where these can be used with client as well as server. Thanks Regards, Rajanikant Chirmade. diff --git a/orig/postgresql-9.0beta2/src/backend/parser/scansup.c b/postgresql-9.0beta2/src/backend/parser/scansup.c index 94082f7..179b37e 100644 --- a/orig/postgresql-9.0beta2/src/backend/parser/scansup.c +++ b/postgresql-9.0beta2/src/backend/parser/scansup.c @@ -129,33 +129,11 @@ char * downcase_truncate_identifier(const char *ident, int len, bool warn) { char *result; - int i; - - result = palloc(len + 1); - - /* - * SQL99 specifies Unicode-aware case normalization, which we don't yet - * have the infrastructure for. Instead we use tolower() to provide a - * locale-aware translation. However, there are some locales where this - * is not right either (eg, Turkish may do strange things with 'i' and - * 'I'). Our current compromise is to use tolower() for characters with - * the high bit set, and use an ASCII-only downcasing for 7-bit - * characters. - */ - for (i = 0; i len; i++) - { - unsigned char ch = (unsigned char) ident[i]; - if (ch = 'A' ch = 'Z') - ch += 'a' - 'A'; - else if (IS_HIGHBIT_SET(ch) isupper(ch)) - ch = tolower(ch); - result[i] = (char) ch; - } - result[i] = '\0'; + result = str_tolower(ident, len); - if (i = NAMEDATALEN) - truncate_identifier(result, i, warn); + if (len = NAMEDATALEN) + truncate_identifier(result, len, warn); return result; } --- This tests if identifier with special charecters using wide-charecter aware downcase. create schema my_SchemÄ; --- Since we smash identifiers to lower we try to find schema name --- by downcasing nspname. select count(nspname) from pg_namespace where nspname=LOWER('my_SchemÄ'); drop schema my_SchemÄ; wide-charecter_aware_downcase.out Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers