Re: [BUGS] check_locale() and the empty string
On Sat, 2012-03-24 at 19:07 -0400, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: Surely we don't want it to be set from the environment, right? Why not? I agree that we shouldn't change the documented behavior of those GUCs. But a SQL command like CREATE DATABASE being environment sensitive does seem like surprising behavior to me, and it did indirectly lead to a bug. I do agree that it's probably unwise to store an empty string as the value of pg_database.datcollate or datctype, because that would mean that if the server is restarted with different LC_XXX environment values then it will think the database has different locale settings, leading to havoc. Yes, that's the worst of the problem. I should have mentioned that more explicitly in the original report. However, ISTM the right fix is to replace an empty-string value with the implied locale name at createdb time. Proposed patch attached. +1. Note 2: there is code in initdb that is supposed to be kept parallel to this, but it's not currently making any attempt to canonicalize non-empty locale names. Should we make it do that too? I assume you are talking about the code that results in writing the settings to postgresql.conf? It doesn't look quite as dangerous in that area because (a) it ignores zero-length strings; and (b) setting the GUC to the wrong value will either be prevented or will not cause any major problem. However, it does seem like a good idea to canonicalize the settings unless there is some reason not to. Regards, Jeff Davis -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] check_locale() and the empty string
Jeff Davis pg...@j-davis.com writes: On Sat, 2012-03-24 at 19:07 -0400, Tom Lane wrote: Jeff Davis pg...@j-davis.com writes: Surely we don't want it to be set from the environment, right? Why not? I agree that we shouldn't change the documented behavior of those GUCs. But a SQL command like CREATE DATABASE being environment sensitive does seem like surprising behavior to me, and it did indirectly lead to a bug. Well, our locale documentation makes it pretty clear that a lot of this behavior is inherited from the server's environment by default: http://www.postgresql.org/docs/devel/static/locale.html So I don't see anything wrong with that in principle. But we'd better make sure that a database's lc_collate/lc_ctype are locked down after creation. Note 2: there is code in initdb that is supposed to be kept parallel to this, but it's not currently making any attempt to canonicalize non-empty locale names. Should we make it do that too? I assume you are talking about the code that results in writing the settings to postgresql.conf? It doesn't look quite as dangerous in that area because (a) it ignores zero-length strings; and (b) setting the GUC to the wrong value will either be prevented or will not cause any major problem. However, it does seem like a good idea to canonicalize the settings unless there is some reason not to. When I wrote the proposed patch, I was imagining that setlocale() would in fact do some canonicalization of the supplied string. Experimentation shows that's not so, though, at least not on current Linux and OS X: you seem to get back the supplied string in all cases except where it's . The reason I was hoping for canonicalization is that right now we consider locale names like en_US.utf8 and en_US.UTF-8 to be different, even though libc very probably treats them the same; this results in CREATE DATABASE rejecting lc_collate/ctype settings that are only cosmetically different from the template database's values. Canonicalization by setlocale() would fix that without requiring us to make any unsupported assumptions about which names mean the same. Oh well. However, it might still be that somewhere there is a libc that does take the opportunity to canonicalize the locale name, and if that did happen then it would be important for initdb to do it the same way as CREATE DATABASE does. Otherwise, we might end up rejecting a CREATE DATABASE lc_collate/ctype setting that's identical to what the user told initdb to use, because one got canonicalized and the other not. So this roundabout series of assumptions leads me to think that initdb needs to be tweaked too. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] possible bug in COPY ... FROM ... NULL '\0'
The documentation says: The specified null string is sent by COPY TO without adding any backslashes; conversely, COPY FROM matches the input against the null string before removing backslashes. -- http://www.postgresql.org/docs/devel/static/sql-copy.html Those seem inconsistent with the following behavior: postgres=# copy foo to '/tmp/a.copy' null '\0'; COPY 2 postgres=# copy foo from '/tmp/a.copy' null '\0'; ERROR: invalid byte sequence for encoding UTF8: 0x00 CONTEXT: COPY foo, line 2: \0 $ cat /tmp/a.copy 1 \0 COPY TO seems to follow the documentation, inserting the null string without modification into the output file. COPY FROM seems to de-escape the input before trying to match it against the null string, leading to the invalid byte sequence. standard_conforming_strings is on. Regards, Jeff Davis -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
Re: [BUGS] possible bug in COPY ... FROM ... NULL '\0'
Jeff Davis pg...@j-davis.com writes: The documentation says: The specified null string is sent by COPY TO without adding any backslashes; conversely, COPY FROM matches the input against the null string before removing backslashes. -- http://www.postgresql.org/docs/devel/static/sql-copy.html Those seem inconsistent with the following behavior: postgres=# copy foo to '/tmp/a.copy' null '\0'; COPY 2 postgres=# copy foo from '/tmp/a.copy' null '\0'; ERROR: invalid byte sequence for encoding UTF8: 0x00 CONTEXT: COPY foo, line 2: \0 $ cat /tmp/a.copy 1 \0 COPY TO seems to follow the documentation, inserting the null string without modification into the output file. COPY FROM seems to de-escape the input before trying to match it against the null string, leading to the invalid byte sequence. This seems to have worked as per the documentation before 8.4. I guess it got broken in one of the efficiency-driven rewrites of the COPY logic. Specifically, the issue seems to be that CopyReadAttributesText tries to combine de-escaping with locating the end-of-field, so it does all the de-escaping --- and the ensuing encoding validity check --- before checking to see if the input matches the defined null string. I'm pretty sure that the original code first broke the line into fields, then checked the null marker, then did de-escaping. Fortunately, it seems pretty easy to fix: we just have to reverse the order of the null-marker check and the pg_verifymbstr call. However, this is dependent on the parsing loop never having a reason to throw error, which at the very least is something that needs a comment. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
[BUGS] BUG #6556: earthdistance module has lacking documentation
The following bug has been logged on the website: Bug reference: 6556 Logged by: Kasper Sandberg Email address: kont...@sandberg-consult.dk PostgreSQL version: 9.1.3 Operating system: - Description: looking at http://www.postgresql.org/docs/9.1/static/earthdistance.html it looks very scarce and missing, and conflicting, some things operate based on miles, others meters. it also doesnt specify the units used for earth_distance() which seems to be meters. -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs