Re: [BUGS] check_locale() and the empty string

2012-03-25 Thread Jeff Davis
On Sat, 2012-03-24 at 19:07 -0400, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
  Surely we don't want it to be set from the environment, right?
 
 Why not?

I agree that we shouldn't change the documented behavior of those GUCs.

But a SQL command like CREATE DATABASE being environment sensitive does
seem like surprising behavior to me, and it did indirectly
lead to a bug.

 I do agree that it's probably unwise to store an empty string as the
 value of pg_database.datcollate or datctype, because that would mean
 that if the server is restarted with different LC_XXX environment values
 then it will think the database has different locale settings, leading
 to havoc.

Yes, that's the worst of the problem. I should have mentioned that more
explicitly in the original report.

 However, ISTM the right fix is to replace an empty-string
 value with the implied locale name at createdb time.  Proposed patch
 attached.

+1.

 Note 2: there is code in initdb that is supposed to be kept parallel
 to this, but it's not currently making any attempt to canonicalize
 non-empty locale names.  Should we make it do that too?

I assume you are talking about the code that results in writing the
settings to postgresql.conf?

It doesn't look quite as dangerous in that area because (a) it ignores
zero-length strings; and (b) setting the GUC to the wrong value will
either be prevented or will not cause any major problem. However, it
does seem like a good idea to canonicalize the settings unless there is
some reason not to.

Regards,
Jeff Davis


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] check_locale() and the empty string

2012-03-25 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 On Sat, 2012-03-24 at 19:07 -0400, Tom Lane wrote:
 Jeff Davis pg...@j-davis.com writes:
 Surely we don't want it to be set from the environment, right?

 Why not?

 I agree that we shouldn't change the documented behavior of those GUCs.
 But a SQL command like CREATE DATABASE being environment sensitive does
 seem like surprising behavior to me, and it did indirectly
 lead to a bug.

Well, our locale documentation makes it pretty clear that a lot of this
behavior is inherited from the server's environment by default:
http://www.postgresql.org/docs/devel/static/locale.html
So I don't see anything wrong with that in principle.  But we'd better
make sure that a database's lc_collate/lc_ctype are locked down after
creation.

 Note 2: there is code in initdb that is supposed to be kept parallel
 to this, but it's not currently making any attempt to canonicalize
 non-empty locale names.  Should we make it do that too?

 I assume you are talking about the code that results in writing the
 settings to postgresql.conf?

 It doesn't look quite as dangerous in that area because (a) it ignores
 zero-length strings; and (b) setting the GUC to the wrong value will
 either be prevented or will not cause any major problem. However, it
 does seem like a good idea to canonicalize the settings unless there is
 some reason not to.

When I wrote the proposed patch, I was imagining that setlocale()
would in fact do some canonicalization of the supplied string.
Experimentation shows that's not so, though, at least not on current
Linux and OS X: you seem to get back the supplied string in all cases
except where it's .

The reason I was hoping for canonicalization is that right now we
consider locale names like en_US.utf8 and en_US.UTF-8 to be
different, even though libc very probably treats them the same;
this results in CREATE DATABASE rejecting lc_collate/ctype settings
that are only cosmetically different from the template database's
values.  Canonicalization by setlocale() would fix that without
requiring us to make any unsupported assumptions about which names
mean the same.  Oh well.

However, it might still be that somewhere there is a libc that does
take the opportunity to canonicalize the locale name, and if that
did happen then it would be important for initdb to do it the same
way as CREATE DATABASE does.  Otherwise, we might end up rejecting
a CREATE DATABASE lc_collate/ctype setting that's identical to what
the user told initdb to use, because one got canonicalized and the
other not.  So this roundabout series of assumptions leads me to
think that initdb needs to be tweaked too.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] possible bug in COPY ... FROM ... NULL '\0'

2012-03-25 Thread Jeff Davis
The documentation says:

The specified null string is sent by COPY TO without adding any
backslashes; conversely, COPY FROM matches the input against the null
string before removing backslashes.
  -- http://www.postgresql.org/docs/devel/static/sql-copy.html

Those seem inconsistent with the following behavior:

  postgres=# copy foo to '/tmp/a.copy' null '\0';
  COPY 2
  postgres=# copy foo from '/tmp/a.copy' null '\0';
  ERROR:  invalid byte sequence for encoding UTF8: 0x00
  CONTEXT:  COPY foo, line 2: \0

  $ cat /tmp/a.copy
  1
  \0

COPY TO seems to follow the documentation, inserting the null string
without modification into the output file. COPY FROM seems to de-escape
the input before trying to match it against the null string, leading to
the invalid byte sequence.

standard_conforming_strings is on.

Regards,
Jeff Davis


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


Re: [BUGS] possible bug in COPY ... FROM ... NULL '\0'

2012-03-25 Thread Tom Lane
Jeff Davis pg...@j-davis.com writes:
 The documentation says:
 The specified null string is sent by COPY TO without adding any
 backslashes; conversely, COPY FROM matches the input against the null
 string before removing backslashes.
   -- http://www.postgresql.org/docs/devel/static/sql-copy.html

 Those seem inconsistent with the following behavior:

   postgres=# copy foo to '/tmp/a.copy' null '\0';
   COPY 2
   postgres=# copy foo from '/tmp/a.copy' null '\0';
   ERROR:  invalid byte sequence for encoding UTF8: 0x00
   CONTEXT:  COPY foo, line 2: \0

   $ cat /tmp/a.copy
   1
   \0

 COPY TO seems to follow the documentation, inserting the null string
 without modification into the output file. COPY FROM seems to de-escape
 the input before trying to match it against the null string, leading to
 the invalid byte sequence.

This seems to have worked as per the documentation before 8.4.  I guess
it got broken in one of the efficiency-driven rewrites of the COPY
logic.  Specifically, the issue seems to be that CopyReadAttributesText
tries to combine de-escaping with locating the end-of-field, so it does
all the de-escaping --- and the ensuing encoding validity check ---
before checking to see if the input matches the defined null string.
I'm pretty sure that the original code first broke the line into fields,
then checked the null marker, then did de-escaping.

Fortunately, it seems pretty easy to fix: we just have to reverse the
order of the null-marker check and the pg_verifymbstr call.  However,
this is dependent on the parsing loop never having a reason to throw
error, which at the very least is something that needs a comment.

regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs


[BUGS] BUG #6556: earthdistance module has lacking documentation

2012-03-25 Thread kontakt
The following bug has been logged on the website:

Bug reference:  6556
Logged by:  Kasper Sandberg
Email address:  kont...@sandberg-consult.dk
PostgreSQL version: 9.1.3
Operating system:   -
Description:

looking at http://www.postgresql.org/docs/9.1/static/earthdistance.html

it looks very scarce and missing, and conflicting, some things operate based
on miles, others meters. it also doesnt specify the units used for
earth_distance() which seems to be meters.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs