Re: [HACKERS] Enhancements to passwordcheck

2017-09-29 Thread Albe Laurenz
Michael Paquier wrote:
> On Thu, Sep 28, 2017 at 12:06 AM, Alvaro Herrera  
> wrote:
>> I think the passwordcheck module as a whole is a dead end, security-
>> wise.  Myself, I've never seen the point in it.  It runs at the wrong
>> time, and there's no way to fix that.
> 
> Client commands may be run on a trusted network as well, let's not
> forget that. But I definitely agree that this is bad practice in
> general to not hash passwords beforehand. Another thing that
> passwordcheck is good at is being an example of hook use. I would
> think that many people refer to it when implementing their own module
> for whatever they want.

Right.

I originally only wanted the hook, but was lobbied into writing the
contrib module as well, to
a) have a nice checkbox item for ill-concieved security check lists
b) have an example of how the hook could be used.

I still think that there is nothing wrong with adding some GUCs
to the module, as long as there is nothing in it that can compromise
overall security.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enhancements to passwordcheck

2017-09-27 Thread Albe Laurenz
Nathan Bossart wrote:
>> As was pointed out in the original discussion
>> d960cb61b694cf459dcfb4b0128514c203937...@exadv11.host.magwien.gv.at
>> the weak point of "passwordcheck" is that it does not work very well
>> for encrypted passwords.
>> The only saving grace is that you can at least check against
>> username equals password.
> 
> Thanks for linking the original thread.  There are a lot of
> interesting points.  I wonder if enhanced password checking in core
> or contrib might be received differently with the introduction of
> SCRAM authentication, since the weaknesses of MD5 were often cited.

I had the impression that the reasons why database passwords are
not the best option for high security were:
1) The password hash is stored in the database and can be stolen and
   cracked (don't know if dictionary attacks are harder with SCRAM).
2) The password or the password hash are transmitted to the server
   when you change the password and may be captured.

>> So I think it is fine to extend "passwordcheck", but we shouldn't
>> take it serious enough to reduce security elsewhere in order to
>> improve the module.
> 
> I understand the points made here, but not allowing configurability
> here really hinders the module's ability to enforce much of
> anything.

I agree that it is a good thing to make "passwordcheck" configurable.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Enhancements to passwordcheck

2017-09-26 Thread Albe Laurenz
Nathan Bossart wrote:
>>> passwordcheck.force_new_password
>>>
>> Does it mean a password different from the old one? +1. It could be
>> different from the last 3 passwords but we don't store a password
>> history.
> 
> Yes.  As Michael pointed out, this might be better to do as a separate
> effort since we'll almost certainly need to introduce a way to store
> password history.

That increases the number of passwords stored on the server and
consequently the damage when that list is stolen.
Of course the old passwords are invalid, but if someone cracks them
they could still try them on other systems the person uses.

I think we should accept such a risk only if the benefits are clear, and
my opinion has always been that if you forbid password reuse, people
tend to come up with password generation schemes that are no better
than the original passwords.

> One interesting design challenge will be how to handle pre-hashed
> passwords, since the number of checks we can do on those is pretty
> limited.  I'm currently thinking of a parameter that can be used to
> block, allow, or force pre-hashed passwords.  If we take that route,
> perhaps we will also need to ensure that PostgreSQL fails to start when
> invalid combinations are specified (e.g. pre-hashed passwords are forced
> and min_password_length is nonzero).  Thoughts?

As was pointed out in the original discussion
d960cb61b694cf459dcfb4b0128514c203937...@exadv11.host.magwien.gv.at
the weak point of "passwordcheck" is that it does not work very well
for encrypted passwords.
The only saving grace is that you can at least check against
username equals password.

Disabling pre-hashed passwords in order to allow better password
checks is a problem rather than a solution, because it exposes you
to password theft of the clear-text password.  I think we shouldn't
go there.

The overall opinion in the above thread was that if you *really* care
about security, you don't use database passwords, but external
authentication with a centralized identity management system.

So I think it is fine to extend "passwordcheck", but we shouldn't
take it serious enough to reduce security elsewhere in order to
improve the module.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] On Complex Source Code Reading Strategy

2017-07-26 Thread Albe Laurenz
Zeray Kalayu wrote:
> I want to be PG hacker but it seems a complex beast to find my way out in it.
>  So, can anyone suggest me from his experience/style the general
> approaches/techniques/strategies on how to read complex source code in
> general and PG in particular effectively.
> 
> Can you remember your being novice to PostgreSQL hacking and write
> your experience/style to me and probably to other novices as well.
> 
> There are some guidelines in PG wiki but they are far from the truth
> of down the rabbit hole.
> 
> I believe that there is plethora of knowledge and skill sets in the PG
> hackers head not to be found in any kind of written material and thus,
> that It would be great if any PG hacker could write his
> experience/style of PG hacking so that is of useful asset for PG
> hacking beginners.

I'm not the most competent hacker around, but maybe that qualifies
me to answer your question :^)

I assume that you have read the documentation and know your
way around PostgreSQL.

Find something that you need and want to work on - maybe some
shortcoming or missing feature of PostgreSQL.  The TODO list
(https://wiki.postgresql.org/wiki/Todo) may give some inspiration,
but be cautioned that it contains mostly things that nobody
could reach consensus on or implement easily.

Best is to work on something that serves your own need.
You don't have to work on core PostgreSQL, it could also be a
server extension.

If you have some project, you have an angle from which to tackle
the large body of code that PostgreSQL is.
As always, start with a design. Ask the list before you start coding.

Another good and commendable path into the source is to review
patches (https://wiki.postgresql.org/wiki/Reviewing_a_Patch).
This is higly appreciated, because there is always a shortage of
reviewers, and you get to see how other people go about changing the code.
There is a lot to learn this way!

You will find that the PostgreSQL source is mostly well written,
well documented and well structured.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Typo in comment in postgres_fdw.c

2017-06-28 Thread Albe Laurenz
Attached is a fix for a small typo I found.

Yours,
Laurenz Albe


comment.patch
Description: comment.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-23 Thread Albe Laurenz
Peter Eisentraut wrote:
> On 6/21/17 09:02, Albe Laurenz wrote:
>> 2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
>> Broken pipe
>> 2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
>> 2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply worker 
>> for subscription "reprec" has started
>> DEBUG:  received replication command: IDENTIFY_SYSTEM
>> DEBUG:  received replication command: START_REPLICATION SLOT "reprec" 
>> LOGICAL 0/0 (proto_version '1', publication_names '"repsend"')
>> 2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
>> ReplicationSlotRelease, slot.c:394
>> 2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
>> terminated by signal 6: Aborted
>> 2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active 
>> server processes
>> 2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
>> Broken pipe
>> 2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes terminated; 
>> reinitializing
> 
> I can't reproduce that.  I let it loop around for about 10 minutes and
> it was fine.
> 
> I notice that you have some debug settings on.  Could you share your
> exact setup steps from initdb, as well as configure options, just in
> case one of these settings is causing a problem?

System is Linux, RHEL 6.9, Kernel 2.6.32-696.1.1.el6.x86_64.

Configure options:

CFLAGS='-Wall -O0 -g' ./configure --prefix=/home/laurenz/pg10 \
--sysconfdir=/magwien/etc \
--docdir=/home/laurenz/pg10/doc \
--mandir=/home/laurenz/pg10/man \
--with-perl \
--with-tcl \
--with-tclconfig=/usr/lib64 \
--with-python \
--with-openssl \
--with-pam \
--with-gssapi \
--enable-nls=de \
--enable-thread-safety \
--with-libxml \
--with-libxslt \
--enable-debug \
--enable-cassert \
--with-ldap

$ initdb -A trust -E UTF8 --locale=de_DE.UTF-8 --lc-messages=C -T german -U 
postgres -W -k dbhome

$ grep -v '^[[:space:]]*#' postgresql.conf | grep -v '^$' | sed -e 's/#.*$//'

port = 5432
max_connections = 100
password_encryption = scram-sha-256
shared_buffers = 128MB
dynamic_shared_memory_type = posix
wal_level = logical
log_destination = 'stderr'
logging_collector = on
log_filename = 'postgresql-%Y-%m-%d.log'
log_rotation_size = 0
client_min_messages = debug2
log_min_messages = error
log_connections = off
log_disconnections = off
log_timezone = 'Europe/Vienna'
datestyle = 'iso, dmy'
timezone = 'Europe/Vienna'
lc_messages = 'C'
lc_monetary = 'de_DE.UTF-8'
lc_numeric = 'de_DE.UTF-8'
lc_time = 'de_DE.UTF-8'
default_text_search_config = 'pg_catalog.german'

The crash happens reliably here.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Logical replication in the same cluster

2017-06-21 Thread Albe Laurenz
Tom Lane wrote:
> Petr Jelinek  writes:
>> On 26/04/17 18:59, Bruce Momjian wrote:
>>> ... it just hangs.  My server logs say:
> 
>> Yes that's result of how logical replication slots work, the transaction
>> that needs to finish is your transaction. It can be worked around by
>> creating the slot manually via the SQL interface for example and create
>> the subscription using WITH (NOCREATE SLOT, SLOT NAME = 'your slot') .
> 
> If that's a predictable deadlock, I think a minimum expectation is that
> the system should notice it and throw an error, not just hang.  (Then
> the error could give a hint about how to work around it.)  But the case
> Bruce has in mind doesn't seem like a crazy use-case to me.  Can't we
> make it "just work"?

+1

I think that many people who start experimenting with logical replication
will run into this (like I did).

If nothing else, it's bad PR.
People will get the first impression that logical replication doesn't
work well.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Logical replication: stuck spinlock at ReplicationSlotRelease

2017-06-21 Thread Albe Laurenz
While playing with HEAD as of d14c85ed,
I ran into the following:

CREATE DATABASE source;
CREATE DATABASE recipient;

\c source
CREATE TABLE repli(id integer PRIMARY KEY, val text NOT NULL);
INSERT INTO repli VALUES (1, 'one');
CREATE PUBLICATION repsend FOR TABLE repli;
SELECT pg_create_logical_replication_slot('reprec', 'pgoutput');

\c recipient
CREATE TABLE repli(id integer PRIMARY KEY, val text NOT NULL);
CREATE SUBSCRIPTION reprec CONNECTION 'port=5432 dbname=source user=postgres'
   PUBLICATION repsend WITH (create_slot='false', slot_name='reprec');
SELECT * FROM repli;
┌┬─┐
│ id │ val │
├┼─┤
│  1 │ one │
└┴─┘
(1 row)

Looks good.
Now I try to produce a replication conflict:

INSERT INTO repli VALUES (2, 'block');

\c source
INSERT INTO repli VALUES (2, 'two');

After a while the server crashes, and the log looks like this:

2017-06-21 14:55:07.002 CEST [23096] ERROR:  duplicate key value violates 
unique constraint "repli_pkey"
2017-06-21 14:55:07.002 CEST [23096] DETAIL:  Key (id)=(2) already exists.
2017-06-21 14:55:07.003 CEST [23070] LOG:  worker process: logical replication 
worker for subscription 28681 (PID 23096) exited with exit code 1
2017-06-21 14:55:07.006 CEST [23117] LOG:  logical replication apply worker for 
subscription "reprec" has started
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:55:07.008 CEST [23118] LOG:  starting logical decoding for slot 
"reprec"
2017-06-21 14:55:07.008 CEST [23118] DETAIL:  streaming transactions committing 
after 0/51A67B8, reading WAL from 0/51A66C8
LOG:  starting logical decoding for slot "reprec"
DETAIL:  streaming transactions committing after 0/51A67B8, reading WAL from 
0/51A66C8
2017-06-21 14:55:07.008 CEST [23118] LOG:  logical decoding found consistent 
point at 0/51A66C8
2017-06-21 14:55:07.008 CEST [23118] DETAIL:  There are no running transactions.
DEBUG:  sending replication keepalive
LOG:  logical decoding found consistent point at 0/51A66C8
DETAIL:  There are no running transactions.
2017-06-21 14:55:07.009 CEST [23117] ERROR:  duplicate key value violates 
unique constraint "repli_pkey"
2017-06-21 14:55:07.009 CEST [23117] DETAIL:  Key (id)=(2) already exists.
2017-06-21 14:55:07.010 CEST [23070] LOG:  worker process: logical replication 
worker for subscription 28681 (PID 23117) exited with exit code 1
2017-06-21 14:55:12.019 CEST [23122] LOG:  logical replication apply worker for 
subscription "reprec" has started
DEBUG:  received replication command: IDENTIFY_SYSTEM
2017-06-21 14:55:12.021 CEST [23124] LOG:  starting logical decoding for slot 
"reprec"
2017-06-21 14:55:12.021 CEST [23124] DETAIL:  streaming transactions committing 
after 0/51A67B8, reading WAL from 0/51A66C8
DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:55:12.022 CEST [23124] LOG:  logical decoding found consistent 
point at 0/51A66C8
2017-06-21 14:55:12.022 CEST [23124] DETAIL:  There are no running transactions.
LOG:  starting logical decoding for slot "reprec"
DETAIL:  streaming transactions committing after 0/51A67B8, reading WAL from 
0/51A66C8
DEBUG:  sending replication keepalive
LOG:  logical decoding found consistent point at 0/51A66C8
DETAIL:  There are no running transactions.
2017-06-21 14:55:12.023 CEST [23122] ERROR:  duplicate key value violates 
unique constraint "repli_pkey"
2017-06-21 14:55:12.023 CEST [23122] DETAIL:  Key (id)=(2) already exists.
2017-06-21 14:55:12.024 CEST [23070] LOG:  worker process: logical replication 
worker for subscription 28681 (PID 23122) exited with exit code 1
2017-06-21 14:55:12.033 CEST [23124] LOG:  could not send data to client: 
Broken pipe
2017-06-21 14:55:12.033 CEST [23124] FATAL:  connection to client lost
2017-06-21 14:55:17.032 CEST [23133] LOG:  logical replication apply worker for 
subscription "reprec" has started
DEBUG:  received replication command: IDENTIFY_SYSTEM
DEBUG:  received replication command: START_REPLICATION SLOT "reprec" LOGICAL 
0/0 (proto_version '1', publication_names '"repsend"')
2017-06-21 14:57:24.552 CEST [23124] PANIC:  stuck spinlock detected at 
ReplicationSlotRelease, slot.c:394
2017-06-21 14:57:24.885 CEST [23070] LOG:  server process (PID 23124) was 
terminated by signal 6: Aborted
2017-06-21 14:57:24.885 CEST [23070] LOG:  terminating any other active server 
processes
2017-06-21 14:57:24.887 CEST [23134] LOG:  could not send data to client: 
Broken pipe
2017-06-21 14:57:24.890 CEST [23070] LOG:  all server processes terminated; 
reinitializing

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Index created in BEFORE trigger not updated during INSERT

2017-06-06 Thread Albe Laurenz
Tom Lane wrote:
>> Andres Freund  writes:
>>> Hm, strategically sprinkled CheckTableNotInUse() might do the trick?
> 
> Attached is a proposed patch that closes off this problem.  I've tested
> it to the extent that it blocks Albe's example and passes check-world.

I tested it, and it works fine.  Thanks!

> I'm unsure whether to back-patch or not; the main argument for not doing
> so is that if any extensions are calling DefineIndex() directly, this
> would be an API break for them.  Given what a weird case this is, I'm not
> sure it's worth that.
> 
> A possible end-run around the API objection would be to not add an extra
> argument to DefineIndex() in the back branches, but to use !is_alter_table
> as the control condition.  That would work for the core callers, although
> we might need a special case for bootstrap mode.  On the other hand,
> thinking again about hypothetical third-party callers, it's possible that
> that's not the right answer for them, in which case they'd be really in
> trouble.  So I'm not that much in love with that answer.

It causes a slight bellyache to leave an unfixed data corruption bug
in the back branches (if only index corruption), but I agree that it is
such a weird case to create an index in a BEFORE trigger that we probably
can live without a back-patch.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Index created in BEFORE trigger not updated during INSERT

2017-05-22 Thread Albe Laurenz
Not that it is a useful use case, but I believe that this is
a bug that causes index corruption:

CREATE TABLE mytable(
   id integer PRIMARY KEY,
   id2 integer NOT NULL
);

CREATE FUNCTION makeindex() RETURNS trigger
   LANGUAGE plpgsql AS
$$BEGIN
   CREATE INDEX ON mytable(id2);
   RETURN NEW;
END;$$;

CREATE TRIGGER makeindex BEFORE INSERT ON mytable FOR EACH ROW
   EXECUTE PROCEDURE makeindex();

INSERT INTO mytable VALUES (1, 42);

SELECT * FROM mytable;
┌┬─┐
│ id │ id2 │
├┼─┤
│  1 │  42 │
└┴─┘
(1 row)

But 42 is not present in the index:

SET enable_seqscan = off;

SELECT * FROM mytable WHERE id2 = 42;
┌┬─┐
│ id │ id2 │
├┼─┤
└┴─┘
(0 rows)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] password_encryption, default and 'plain' support

2017-05-05 Thread Albe Laurenz
Tom Lane wrote:
> Robert Haas  writes:
>> On Wed, May 3, 2017 at 7:31 AM, Heikki Linnakangas  wrote:
>>> So, I propose that we remove support for password_encryption='plain' in
>>> PostgreSQL 10. If you try to do that, you'll get an error.

>> I have no idea how widely used that option is.

> Is it possible that there are still client libraries that don't support
> password encryption at all?  If so, are we willing to break them?
> I'd say "yes" but it's worth thinking about.

We have one application that has been reduced to "password" authentication
ever since "crypt" authentication was removed, because they implemented the
line protocol rather than using libpq and never bothered to move to "md5".

But then, it might be a good idea to break this application, because that
would force the vendor to implement something that is not a
blatant security problem.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CTE inlining

2017-05-05 Thread Albe Laurenz
Thomas Kellerer wrote:
>> 1) we switch unmarked CTEs as inlineable by default in pg11.
>
> +1 from me for option 1

+1 from me as well, FWIW.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

2017-02-01 Thread Albe Laurenz
Andrew Dunstan wrote:
> On 01/29/2017 04:07 PM, David Rowley wrote:
>> Looks like there's a few other usages of CountDBBackends() which
>> require background workers to be counted too, so I ended up creating
>> CountDBConnections() as I didn't really think adding a bool flag to
>> CountDBBackends was so nice.
>>
>> I thought about renaming CountUserBackends() to become
>> CountUserConnections(), but I've not. Although, perhaps its better to
>> break any third party stuff that uses that so that authors can review
>> which behaviour they need rather than have their extension silently
>> break?
> 
> I'm inclined to keep this as is - I don't think we should change the
> names at least in the stable releases. I'm not sure how far back it
> should be patched. The real effect is going to be felt from 9.6, I
> think, but arguably for consistency we should change it back to 9.3 or
> 9.4. Thoughts?
> 
> Other things being equal I intend to commit this later today.

+1

Maybe it is better not to backpatch farther than 9.6 - I think it is
good to be conservative about backpatching, and, as you say, the effect
won't be noticable much before 9.6.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ISO/IEC 9075-2:2016 for postgres community

2017-01-19 Thread Albe Laurenz
Wolfgang Wilhelm wrote:
> are you looking for something like this:
> jtc1sc32.org/doc/N2301-2350/32N2311T-text_for_ballot-CD_9075-2.pdf ?

That looks like it is from 2013...

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] CONNECTION LIMIT and Parallel Query don't play well together

2017-01-11 Thread Albe Laurenz
Amit Kapila wrote:
> On Wed, Jan 11, 2017 at 2:44 AM, David Rowley
>  wrote:
>> It has come to my attention that when a user has a CONNECTION LIMIT
>> set, and they make use of parallel query, that their queries can fail
>> due to the connection limit being exceeded.
>>
>> Simple test case:
>>
>> postgres=# CREATE USER user1 LOGIN CONNECTION LIMIT 2;
>> [...]
>> postgres=> SELECT COUNT(*) FROM t1;
>> ERROR:  too many connections FOR ROLE "user1"
>> CONTEXT:  parallel worker
>>
>> Now, as I understand it, during the design of parallel query, it was
>> designed in such a way that nodeGather could perform all of the work
>> in the main process in the event that no workers were available, and
>> that the only user visible evidence of this would be the query would
>> be slower than it would otherwise be.
>>
>> After a little bit of looking around I see that CountUserBackends()
>> does not ignore the parallel workers, and counts these as
>> "CONNECTIONS". It's probably debatable to weather these are
>> connections or not,
> 
> I think this is not only for parallel workers, rather any background
> worker that uses database connection
> (BGWORKER_BACKEND_DATABASE_CONNECTION) will be counted in a similar
> way.  I am not sure if it is worth inventing something to consider
> such background worker connections different from backend connections.
> However, I think we should document it either in parallel query or in
> background worker or in Create User .. Connection section.

I think that this should be fixed rather than documented.
Users will not take it well if their queries error out
in this fashion.

Background processes should not be counted as active connections.
Their limit should be determined by max_worker_processes,
and neither max_connections nor the connection limit per user
or database should take them into account.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-12-05 Thread Albe Laurenz
Tobias Bussmann wrote:
>> I think if we don't see any impact then we should backpatch this
>> patch. However, if we decide to go some other way, then I can provide
>> a separate patch for back branches.  BTW, what is your opinion?
> 
> I could not find anything on backporting guidelines in the wiki but my 
> opinion would be to backpatch
> the patch in total. With a different behaviour between the simple and 
> extended query protocol it would
> be hard to debug query performance issue in user applications that uses 
> PQprepare. If the user tries
> to replicate a query with a PREPARE in psql and tries to EXPLAIN EXECUTE it, 
> the results would be
> different then what happens within the application. That behaviour could be 
> confusing, like
> differences between EXPLAIN SELECT and EXPLAIN EXECUTE can be to less 
> experienced users.

+1

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-29 Thread Albe Laurenz
Amit Kapila wrote:
>> But with Tobias' complete patch "make installcheck-world" succeeds.
> 
> Okay.  I have looked into patch proposed and found that it will
> resolve the regression test problem (Create Table .. AS Execute).  I
> think it is slightly prone to breakage if tomorrow we implement usage
> of EXECUTE with statements other Create Table (like Copy).  Have you
> registered this patch in CF to avoid loosing track?
> 
> We have two patches (one proposed in this thread and one proposed by
> me earlier [1]) and both solves the regression test failure.  Unless
> there is some better approach, I think we can go with one of these.

Tobias did that here: https://commitfest.postgresql.org/12/890/

He added your patch as well.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNDO and in-place update

2016-11-23 Thread Albe Laurenz
Robert Haas wrote:
> To implement this in PostgreSQL, I believe we would need support for
> UNDO.

> - Reading a page that has been recently modified gets significantly
> more expensive; it is necessary to read the associated UNDO entries
> and do a bunch of calculation that is significantly more complex than
> what is required today.

As others have said, that is the main drawback.

Also, and related to this, ROLLBACK would become an expensive operation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-21 Thread Albe Laurenz
Robert Haas wrote:
>>/* creates a parallel-enabled plan */
>>PQprepare(conn, "pstmt", "SELECT id FROM l WHERE val = $1", 1, NULL);
>>/* blows up with "cannot insert tuples during a parallel operation" */
>>PQexec(conn, "CREATE TABLE bad(id) AS EXECUTE pstmt('abcd')");
> 
> Ouch.  I missed that.
> 
>> With Tobias' patch, this does not fail.
>>
>> I think we should either apply something like that patch or disable parallel
>> execution for prepared statements altogether and document that.
> 
> I agree.  I think probably the first option is better, but I might
> have to commit with one hand so I can hold my nose with the other.

Is there a better, more consistent approach for that?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-18 Thread Albe Laurenz
Amit Kapila wrote:
> Can you once test by just passing CURSOR_OPT_PARALLEL_OK in
> PrepareQuery() and run the tests by having forc_parallel_mode=regress?
> It seems to me that the code in the planner is changed for setting a
> parallelModeNeeded flag, so it might work as it is.

No, that doesn't work.

But with Tobias' complete patch "make installcheck-world" succeeds.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel execution and prepared statements

2016-11-18 Thread Albe Laurenz
Robert Haas wrote:
> On Tue, Nov 15, 2016 at 10:41 AM, Albe Laurenz <laurenz.a...@wien.gv.at> 
> wrote:
>> Tobias Bussmann has discovered an oddity with prepared statements.
>>
>> Parallel scan is used with prepared statements, but only if they have
>> been created with protocol V3 "Parse".
>> If a prepared statement has been prepared with the SQL statement PREPARE,
>> it will never use a parallel scan.
>>
>> I guess that is an oversight in commit 57a6a72b, right?
>> PrepareQuery in commands/prepare.c should call CompleteCachedPlan
>> with cursor options CURSOR_OPT_PARALLEL_OK, just like
>> exec_prepare_message in tcop/postgres.c does.
>>
>> The attached patch fixes the problem for me.
> 
> Actually, commit 57a6a72b made this change, and then 7bea19d0 backed
> it out again because it turned out to break things.

I didn't notice the CREATE TABLE ... AS EXECUTE problem.

But then the change to use CURSOR_OPT_PARALLEL_OK in tcop/postgres.c should
be reverted as well, because it breaks things just as bad:

   /* creates a parallel-enabled plan */
   PQprepare(conn, "pstmt", "SELECT id FROM l WHERE val = $1", 1, NULL);
   /* blows up with "cannot insert tuples during a parallel operation" */
   PQexec(conn, "CREATE TABLE bad(id) AS EXECUTE pstmt('abcd')");

With Tobias' patch, this does not fail.

I think we should either apply something like that patch or disable parallel
execution for prepared statements altogether and document that.

Yours,
Laurenz Albe


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Parallel execution and prepared statements

2016-11-15 Thread Albe Laurenz
Tobias Bussmann has discovered an oddity with prepared statements.

Parallel scan is used with prepared statements, but only if they have
been created with protocol V3 "Parse".
If a prepared statement has been prepared with the SQL statement PREPARE,
it will never use a parallel scan.

I guess that is an oversight in commit 57a6a72b, right?
PrepareQuery in commands/prepare.c should call CompleteCachedPlan
with cursor options CURSOR_OPT_PARALLEL_OK, just like
exec_prepare_message in tcop/postgres.c does.

The attached patch fixes the problem for me.

Yours,
Laurenz Albe


0001-Consider-parallel-plans-for-statements-prepared-with.patch
Description: 0001-Consider-parallel-plans-for-statements-prepared-with.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-14 Thread Albe Laurenz
Michael Paquier wrote:
> Meh. I forgot docs and --help output updates.

Looks good, except that you left the "N" option in the getopt_long
call for pg_dumpall.  I fixed that in the attached patch.

I'll mark the patch "ready for committer".

Yours,
Laurenz Albe


pgdump-sync-v5.patch
Description: pgdump-sync-v5.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-11 Thread Albe Laurenz
Michael Paquier wrote:
> A typo s/pre_fsync_fname/pre_sync_fname, and a mistake from me because
> I did not compile with -DPG_FLUSH_DATA_WORKS to check this code.
> 
> v2 is attached, fixing those issues.

The patch applies and compiles fine.

I have tested it on Linux and MinGW and could see the fsync(2) and
FlushFileBuffers calls I expected.
This adds crash safety for a reasonable price, and I think we should have that.

The documentation additions are sufficient.

Looking through the patch, I had two questions that are more about
style and consistency than anything else:

- In pg_dumpall.c, the result of fsync_fname() is cast to "void" to show that
  the return code is ignored, but not anywhere else.  Is that by design?

- For pg_dumpall, a short option "-N" is added for "--no-sync", but not for
  pg_dump (because -N is already taken there).
  I'd opt for either using the same short option for both or (IMO better)
  only offering a long option for both.
  This would avoid confusion, and we expect that few people will want to use
  this option anyway, right?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump, pg_dumpall and data durability

2016-11-07 Thread Albe Laurenz
Michael Paquier wrote:
>> In my quest of making the backup tools more compliant to data
>> durability, here is a thread for pg_dump and pg_dumpall.
> 
> Okay, here is a patch doing the above. I have added a new --nosync
> option to pg_dump and pg_dumpall to switch to the pre-10 behavior. I
> have arrived at the conclusion that it is better not to touch at
> _EndData and _EndBlob, and just issue the fsync in CloseArchive when
> all the write operations are done. In the case of the directory
> format, the fsync is done on all the entries recursively. This makes
> as well the patch more simple. The regression tests calling pg_dump
> don't use --nosync yet in this patch, that's a move that could be done
> afterwards.

The patch does not apply, I had to change the hunk for
src/include/common/file_utils.h.

Also, compilation fails because function "pre_fsync_fname" cannot be
resolved during linking. Is that a typo for "pre_fsync_fname" or is
the patch incomplete?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-11-07 Thread Albe Laurenz
Tom Lane wrote:
>> Albe Laurenz <laurenz.a...@wien.gv.at> writes:
>>> Anyway, I have prepared a patch along the lines you suggest.
>> 
>> Pushed, we'll see if the buildfarm likes this iteration any better.
> 
> And the answer is "not very much".  The Windows builds aren't actually
> failing, but they are producing lots of warnings:
> 
> lquery_op.obj : warning LNK4197: export '_ltq_regex' specified multiple 
> times; using first specification
[...]
> 
> This is evidently from the places where there are two "extern"
> declarations for a function, one in a header and one in
> PG_FUNCTION_INFO_V1.  The externs are identical now, but nonetheless
> MSVC insists on whining about it.

Funny -- it seems to mind a duplicate declaration only if there
is PGDLLEXPORT in at least one of them.

> I'm inclined to give this up as a bad job and go back to the
> previous state.  We have a solution that works and doesn't
> produce warnings; third-party authors who don't want to use it
> are on their own.

I think you are right.

Thanks for the work and thought you expended on this!
Particularly since Windows is not your primary interest.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequential scan result order vs performance

2016-10-31 Thread Albe Laurenz
Jim Nasby wrote:
> On 10/30/16 9:12 AM, Tom Lane wrote:
>> I think there will be a lot of howls.  People expect that creating
>> a table by inserting a bunch of rows, and then reading back those
>> rows, will not change the order.  We already futzed with that guarantee
>> a bit with syncscans, but that only affects quite large tables --- and
>> even there, we were forced to provide a way to turn it off.
> 
> Leaving a 30% performance improvement on the floor because some people
> don't grok how sets work seems insane to me.

+1

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-25 Thread Albe Laurenz
I wrote:
> Anyway, I have prepared a patch along the lines you suggest.

It occurred to me that the documentation still suggests that you should
add a declaration to a C function; I have fixed that too.

I'll add the patch to the next commitfest.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch
Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-24 Thread Albe Laurenz
Tom Lane wrote:
> I poked at this a little bit.  AFAICT, the only actual cross-file
> references are in contrib/ltree/, which has quite a few.  Maybe we
> could hold our noses and attach PGDLLEXPORT to the declarations in
> ltree.h.
> 
> hstore's HSTORE_POLLUTE macro would also need PGDLLEXPORT-ification,
> but that's just within the macro so it wouldn't be too ugly.
> 
> The other problem is xml2's xml_is_well_formed(), which duplicates
> the name of a core backend function.  If we PGDLLEXPORT-ify that,
> we get conflicts against the core's declaration in utils/xml.h.
> I do not see any nice way to dodge that.  Maybe we could decide that
> six releases' worth of backwards compatibility is enough and
> just drop it from xml2 --- AFAICS, that would only break applications
> that had never updated to the extension-ified version of xml2 at all.

I guess it would also be possible, but undesirable, to decorate the
declaration in utils/xml.h.

Anyway, I have prepared a patch along the lines you suggest.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch
Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1-function-decl.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-18 Thread Albe Laurenz
Tom Lane wrote:
> No, it's cross *file* references within a single contrib module that
> would be likely to need extern declarations in a header file.  That's
> not especially weird IMO.  I'm not sure how many cases there actually
> are though.

I went through the contrib moules and tried to look that up,
and now I am even more confused than before.

The buildfarm log at
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=thrips=2016-10-12%2018%3A37%3A26
shows the build failing (among others) for contrib/tablefunc
(close to the bottom end of the log).

Now when I look at the source, there is only one C file, and the
failing functions are *not* declared anywhere, neither in the C file
nor in the (almost empty) header file.

So it must be something other than double declaration that makes
the build fail.
Could it be that the .DEF files have something to do with that?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-17 Thread Albe Laurenz
I wrote:
> But I'd understand if you think that this is too much code churn for too 
> little
> benefit, even if it could be considered a clean-up.
> 
> In that case, I'd argue that in the sample in doc/src/sgml/xfunc.sgml
> the function definitions should be changed to read
> 
>   PGDLLEXPORT Datum foo(PG_FUNCTION_ARGS)
> 
> instead of
> 
>   Datum foo(PG_FUNCTION_ARGS)
> 
> because without that the sample fails if you try to build it with MSVC
> like the stackoverflow question did.

Since I didn't hear from you, I assume that you are not in favour of
removing the SQL function declarations from contrib.

So I went ahead and wrote a patch to add PGDLLEXPORT to the C function sample.

While at it, I noticed that there are no instructions for building and
linking C functions with MSVC, so I have added a patch for that as well.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-sample-C-function.patch
Description: 0001-Add-PGDLLEXPORT-to-sample-C-function.patch


0002-Add-instructions-for-building-C-functions-with-MSVC.patch
Description: 0002-Add-instructions-for-building-C-functions-with-MSVC.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-14 Thread Albe Laurenz
Tom Lane wrote:
>> Well, the buildfarm doesn't like that even a little bit.  It seems that
>> the MSVC compiler does not like seeing both "extern Datum foo(...)" and
>> "extern PGDLLEXPORT Datum foo(...)", so anything that had an extern in
>> a .h file is failing.  There is also quite a bit of phase-of-the-moon
>> behavior in here, because in some cases some functions are raising errors
>> and others that look entirely the same are not.
> 
> I take back the latter comment --- I was comparing HEAD source code
> against errors reported on back branches, and at least some of the
> discrepancies are explained by additions/removals of separate "extern"
> declarations.  But still, if we want to do this we're going to need to
> put PGDLLEXPORT in every V1 function extern declaration.  We might be
> able to remove some of the externs as unnecessary instead, but any way
> you slice it it's going to be messy.
> 
> For the moment I've reverted the change.

Sorry for having caused the inconvenience.

Actually I would say that the correct solution is to remove the function
declarations from all the header files in contrib, since from commit e7128e8d
on the functions are declared by PG_FUNCTION_INFO_V1 anyway, right?
Of course one would have to check first that the SQL functions don't try to
call each other, which would require extra forward declarations...

If you are willing to consider that, I'd be happy to prepare a patch.

But I'd understand if you think that this is too much code churn for too little
benefit, even if it could be considered a clean-up.

In that case, I'd argue that in the sample in doc/src/sgml/xfunc.sgml
the function definitions should be changed to read

  PGDLLEXPORT Datum foo(PG_FUNCTION_ARGS)

instead of

  Datum foo(PG_FUNCTION_ARGS)

because without that the sample fails if you try to build it with MSVC
like the stackoverflow question did.

I'd be happy to prepare a patch for that as well.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Albe Laurenz
Tom Lane wrote:
> I'm okay with adding PGDLLEXPORT to the extern, but we should update
> that comment to note that it's not necessary with any of our standard
> Windows build processes.  (For that matter, the comment fails to explain
> why this macro is providing an extern for the base function at all...)

Here is a patch for that, including an attempt to improve the comment.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1.patch
Description: 0001-Add-PGDLLEXPORT-to-PG_FUNCTION_INFO_V1.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Albe Laurenz
Tom Lane wrote:
>> PostgreSQL itself seems to use export files that explicitly declare the
>> exported symbols, so it gets away without these decorations.
> 
> Except that we don't.  There aren't PGDLLEXPORT markings for any
> functions exported from contrib modules, and we don't use dlltool
> on them either.  By your argument, none of contrib would work on
> Windows builds at all, but we have a ton of buildfarm evidence and
> successful field use to the contrary.  How is that all working?

I thought it was the job of src/tools/msvc/gendef.pl to generate
.DEF files?  In the buildfarm logs I can find lines like:

  Generating CITEXT.DEF from directory Release\citext, platform x64
  Generating POSTGRES_FDW.DEF from directory Release\postgres_fdw, platform x64

which are emitted by gendef.pl.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Albe Laurenz
Tom Lane wrote:
> Albe Laurenz <laurenz.a...@wien.gv.at> writes:
>> Currently, PG_FUNCTION_INFO_V1 is defined as
[...]
> 
>> Is there a good reason why the "funcname" declaration is not decorated
>> with PGDLLEXPORT?
> 
> The lack of complaints about this suggest that it's not actually necessary
> to do so.  So what this makes me wonder is whether we can't drop the
> DLLEXPORT on the finfo function too.  I'd rather reduce the number of
> Microsoft-isms in the code, not increase it.

I understand the sentiment.

But it seems to actually cause a problem on Windows, at least there was a
complaint here: http://stackoverflow.com/q/39964233

Adding PGDLLEXPORT solved the problem there.

I guess that there are not more complaints about that because
few people build C functions on Windows themselves (lack of PGXS)
and those who do are probably knowledgeable enough that they can
fix it themselves by sticking an extra declaration with PGDLLEXPORT
into their source file.

PostgreSQL itself seems to use export files that explicitly declare the
exported symbols, so it gets away without these decorations.

>> BTW, I admit I don't understand the comment.
> 
> It seems like an obvious typo.  Or, possibly, sloppy thinking that
> contributed to failure to recognize that the keyword isn't needed.

Looking through the history, it seems to be as follows:
- In 5fde8613, the comment was added (mistakenly) together with a DLLIMPORT 
decoration.
- In 77e50a61, the decoration was changed to PGDLLEXPORT, but the comment 
forgotten.
- In e7128e8d, the function prototype was added to the macro, but
  the PGDLLEXPORT decoration was forgotten.

The dlltool mentioned is MinGW, so it doesn't apply to people building
with MSVC.

Maybe the comment should be like this:

/*
 *  Macro to build an info function associated with the given function name.
 *  Win32 loadable functions usually link with 'dlltool --export-all' or use
 *  a .DEF file, but it doesn't hurt to add PGDLLEXPORT in case they don't.
 */

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add PGDLLEXPORT to PG_FUNCTION_INFO_V1

2016-10-12 Thread Albe Laurenz
Currently, PG_FUNCTION_INFO_V1 is defined as

  /*
   *  Macro to build an info function associated with the given function name.
   *  Win32 loadable functions usually link with 'dlltool --export-all', but it
   *  doesn't hurt to add PGDLLIMPORT in case they don't.
   */
  #define PG_FUNCTION_INFO_V1(funcname) \
  Datum funcname(PG_FUNCTION_ARGS); \
  extern PGDLLEXPORT const Pg_finfo_record * 
CppConcat(pg_finfo_,funcname)(void); \
  const Pg_finfo_record * \
  CppConcat(pg_finfo_,funcname) (void) \
  { \
  static const Pg_finfo_record my_finfo = { 1 }; \
  return _finfo; \
  } \
  extern int no_such_variable

Is there a good reason why the "funcname" declaration is not decorated
with PGDLLEXPORT?

It would do the right thing on Windows and save people the trouble of
either using an export definition file or re-declaring the function with
the PGDLLEXPORT decoration.
An SQL function that is not exported does not make much sense, right?

BTW, I admit I don't understand the comment.  What does PGDLLIMPORT do
for a dynamically loaded function?

I propose the attached patch.

Yours,
Laurenz Albe


0001-Add-PGDLLEXPORT-to-function-declaration-in-PG_FUNCTI.patch
Description: 0001-Add-PGDLLEXPORT-to-function-declaration-in-PG_FUNCTI.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Nested loop join condition does not get pushed down to foreign scan

2016-09-13 Thread Albe Laurenz
I just noticed something surprising:

-- create a larger local table
CREATE TABLE llarge (id integer NOT NULL, val integer NOT NULL);
INSERT INTO llarge SELECT i, i%100 FROM generate_series(1, 1) i;
ALTER TABLE llarge ADD PRIMARY KEY (id);

-- create a small local table
CREATE TABLE small (id integer PRIMARY KEY, val text NOT NULL);
INSERT INTO small VALUES (1, 'one');

-- create a foreign table based on llarge
CREATE EXTENSION postgres_fdw;
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (host 
'localhost', port '5432', dbname 'test');
CREATE USER MAPPING FOR myself SERVER loopback OPTIONS (user 'myself', password 
'mypassword');
CREATE FOREIGN TABLE rlarge (id integer NOT NULL, val integer NOT NULL) SERVER 
loopback OPTIONS (table_name 'llarge');

SET enable_hashjoin = off;
-- plan for a nested loop join with a local table
EXPLAIN (COSTS off) SELECT * FROM small JOIN llarge USING (id);
  QUERY PLAN
--
 Nested Loop
   ->  Seq Scan on small
   ->  Index Scan using llarge_pkey on llarge
 Index Cond: (id = small.id)
(4 rows)

-- plan for a nested loop join with a foreign table
EXPLAIN (COSTS off) SELECT * FROM small JOIN rlarge USING (id);
  QUERY PLAN
---
 Nested Loop
   Join Filter: (small.id = rlarge.id)
   ->  Seq Scan on small
   ->  Foreign Scan on rlarge
(4 rows)


Is there a fundamental reason why the join condition does not get pushed down 
into
the foreign scan or is that an omission that can easily be fixed?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Documentation fix for CREATE FUNCTION

2016-07-15 Thread Albe Laurenz
Tom Lane wrote:
> Albe Laurenz <laurenz.a...@wien.gv.at> writes:
>> I just noticed that the documentation for CREATE FUNCTION still mentions
>> that the temporary namespace is searched for functions even though that
>> has been removed with commit aa27977.
> 
> The example you propose to correct was introduced by that same commit,
> which should make you think twice about whether it really was invalidated
> by that commit.

Yes, I wondered about that.

> I believe the reason for forcing pg_temp to the back of the path is to
> prevent unqualified table names from being captured by pg_temp entries.
> This risk exists despite the rule against searching pg_temp for functions
> or operators.  A maliciously named temp table could at least prevent
> a security definer function from doing what it was supposed to, and
> could probably hijack control entirely via triggers or rules.
> 
> Possibly the documentation should be more explicit about why this is
> being done, but the example code is good as-is.

Maybe something like the attached would keep people like me from
misunderstanding this.

Yours,
Laurenz Albe


0001-Improve-example-in-CREATE-FUNCTION-documentation.patch
Description: 0001-Improve-example-in-CREATE-FUNCTION-documentation.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Documentation fix for CREATE FUNCTION

2016-07-13 Thread Albe Laurenz
I just noticed that the documentation for CREATE FUNCTION still mentions
that the temporary namespace is searched for functions even though that
has been removed with commit aa27977.

Attached is a patch to fix that.

Yours,
Laurenz Albe


0001-Fix-mention-of-pg_temp-in-CREATE-FUNCTION-documentat.patch
Description: 0001-Fix-mention-of-pg_temp-in-CREATE-FUNCTION-documentat.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] to_date_valid()

2016-07-05 Thread Albe Laurenz
Andreas Karlsson wrote:
> On 07/04/2016 10:55 PM, Pavel Stehule wrote:
>> 2016-07-04 22:15 GMT+02:00 Andreas Karlsson wrote:
>>> I do not see a clear conclusion in the linked threads. For example
>>> Bruce calls it a bug in one of the emails
>>> (https://www.postgresql.org/message-id/201107200103.p6K13ix10517%40momjian.us).
>>>
>>> I think we should fix to_date() to throw an error. Personally I
>>> would be happy if my code broke due to this kind of change since the
>>> exception would reveal an old bug which has been there a long time
>>> eating my data. I cannot see a case where I would have wanted the
>>> current behavior.
>>
>> If I remember, this implementation is based on Oracle's behave.
> 
> In the thread I linked above they claim that Oracle (at least 10g) does
> not work like this.
[...]
> I do not have access to an Oracle installation so I cannot confirm this
> myself.

Oracle 12.1:

SQL> SELECT to_date('2016-12-40','-MM-DD') FROM dual;
SELECT to_date('2016-12-40','-MM-DD') FROM dual
   *
ERROR at line 1:
ORA-01847: day of month must be between 1 and last day of month

SQL> SELECT to_date('2017-02-29','-MM-DD') FROM dual;
SELECT to_date('2017-02-29','-MM-DD') FROM dual
   *
ERROR at line 1:
ORA-01839: date not valid for month specified

So no, compatibility with Oracle is certainly not the reason
to leave it as it is.

But notwithstanding your feeling that you would like your application
to break if it makes use of this behaviour, it is a change that might
make some people pretty unhappy - nobody can tell how many.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in to_timestamp().

2016-06-20 Thread Albe Laurenz
Tom Lane wrote:
> I don't necessarily have an opinion yet.  I would like to see more than
> just an unsupported assertion about what Oracle's behavior is.  Also,
> how should FM mode affect this?

I can supply what Oracle 12.1 does:

SQL> SELECT to_timestamp('2016-06-13 15:43:36', ' /MM/DD HH24:MI:SS') AS ts 
FROM dual;

TS

2016-06-13 15:43:36.0 AD

SQL> SELECT to_timestamp('2016-06-13 15:43:36', '/MM/DD  HH24:MI:SS') AS ts 
FROM dual;

TS

2016-06-13 15:43:36.0 AD

SQL> SELECT to_timestamp('2016-06-1315:43:36', '/MM/DD  HH24:MI:SS') AS 
ts FROM dual;

TS

2016-06-13 15:43:36.0 AD

(to_timestamp_tz behaves the same way.)

So Oracle seems to make no difference between one or more spaces.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using FDW AddForeignUpdateTargets for a hidden pseudo-column

2016-06-14 Thread Albe Laurenz
Aleksey Demakov wrote:
> I have a data store where tuples have unique identities that normally are not 
> visible.
> I also have a FDW to work with this data store. As per the docs to implement 
> updates
> for this data store I have AddForeignUpdateTargets() function that adds an 
> artificial
> column to the target list.
> 
> It seems to me that I cannot re-use a system attribute number for this 
> artificial resjunk
> column (as, for instance, postgres_fdw uses SelfItemPointerAttributeNumber). 
> These
> attributes have specific meaning not compatible with my tuple identity.
> 
> On other hand using a regular AttrNumber might confuse the query planner. In 
> contrast
> e..g with Oracle FDW that can use a unique key to identify the row, in my 
> data store
> the tuple identity is normally not visible. So the data planner might break 
> if it sees a
> Var node with an unexpected varattno number.
>
> What is the best approach to handle such a case?
> 
> 1. Give up on this entirely and require a unique key for any table used thru 
> FDW.
> 
> 2. Force the FDW to expose the identity column either explicitly by the user 
> who
> creates a foreign table or automatically adding it in the corresponding 
> trigger
> (preferably still making it hidden for normal scans).
> 
> 3. Modify the postgresql core to nicely handle the case of an unknown target
> column added by a FDW.
> 
> 4. Something else?

When implementing this for oracle_fdw, I went with your solution #1.
The downside is that anything that does not have a unique key cannot be
modified.

I first thought of using the internal ROWID column that's probably similar to
your case, but that wouldn't fit into a tid's 6 bytes, and I found that I could
only add resjunk columns for existing columns of the table.
Making the internal ROWID an explicit column in the foreign table seemed
just too ugly.

I don't know if #3 would be difficult, but it sure would make things easier
for FDW developers.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prepared statements and generic plans

2016-06-14 Thread Albe Laurenz
Bruce Momjian wrote:
> However, for the wire protocol prepare/execute, how do you do EXPLAIN?
> The only way I can see doing it is to put the EXPLAIN in the prepare
> query, but I wasn't sure that works.  So, I just wrote and tested the
> attached C program and it properly output the explain information, e.g.
> 
>   res = PQprepare(conn, "prep1", "EXPLAIN SELECT * FROM pg_language", 0, 
> NULL);
>   ---
> generated:
> 
>   QUERY PLAN
> 
>   Seq Scan on pg_language  (cost=0.00..1.04 rows=4 width=114)
> 
> so that works --- good.

Hm, yes.

Were you just curious or is it relevant for the documentation update?

>>> Looking at how the code behaves, it seems custom plans that are _more_
>>> expensive (plus planning cost) than the generic plan switch to the
>>> generic plan after five executions, as now documented.  Custom plans
>>> that are significantly _cheaper_ than the generic plan _never_ use the
>>> generic plan.
>>
>> Yes, that's what the suggested documentation improvement says as well,
>> right?
> 
> Yes.  What is odd is that it isn't the plan of the actual supplied
> parameters that is cheaper, just the generic plan that assumes each
> distinct value in the query is equally likely to be used.  So, when we
> say the generic plan is cheaper, it is just comparing the custom plan
> with the supplied parameters vs. the generic plan --- it is not saying
> that running the supplied constants with the generic plan will execute
> faster, because in fact we might be using a sub-optimial generic plan.

Right, that's why it is important to document that it is estimates that are
compared, not actual costs.

This has caused confussion in the past, see
https://www.postgresql.org/message-id/flat/561E749D.4090301%40socialserve.com#561e749d.4090...@socialserve.com

> Right.  Updated patch attached.

I am happy with the patch as it is.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prepared statements and generic plans

2016-06-13 Thread Albe Laurenz
Bruce Momjian wrote:
> Also, is it possible to do an EXPLAIN prepare() with the libpq/wire
> protocol?  I can't do PREPARE EXPLAIN, but I can do EXPLAIN EXECUTE.
> However, I don't see any way to inject EXPLAIN into the libpq/wire
> prepare case.  Can you specify prepare(EXPLAIN SELECT)?  (PREPARE
> EXPLAIN SELECT throws a syntax error.)

I am not sure what you mean:
EXPLAIN PREPARE to get EXPLAIN for PREPARE, or PREPARE ... FOR EXPLAIN
to get an EXPLAIN statement with parameters.
What should EXPLAIN PREPARE show that EXPLAIN SELECT wouldn't?
Why the need for EXPLAIN statements with parameters?

> Looking at how the code behaves, it seems custom plans that are _more_
> expensive (plus planning cost) than the generic plan switch to the
> generic plan after five executions, as now documented.  Custom plans
> that are significantly _cheaper_ than the generic plan _never_ use the
> generic plan.

Yes, that's what the suggested documentation improvement says as well,
right?

> Updated patch attached.

Upon re-read, one tiny question:

!Prepared statements can optionally use generic plans rather than
!re-planning with each set of supplied EXECUTE values.

Maybe the "optionally" should be omitted, since the user has no choice.

It is true that there is a cursor option CURSOR_OPT_CUSTOM_PLAN, but that
cannot be used on the SQL level.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prepared statements and generic plans

2016-06-07 Thread Albe Laurenz
Bruce Momjian wrote:
>> !distinct column values, a generic plan assumes a column equality
>> !comparison will match 33% of processed rows.  Column statistics
>>
>> ... assumes *that* a column equality comparison will match 33% of *the* 
>> processed rows.
> 
> Uh, that seems overly wordy.  I think the rule is that if the sentence
> makes sense without the words, you should not use them, but it is
> clearly a judgement call in this case.  Do you agree?

My gut feeling is that at least the "the" should be retained, but mine
are the guts of a German speaker.
It is clearly a judgement call, so follow your instincts.

> Updated patch attached.
> 
> One more thing --- there was talk of moving some of this into chapter
> 66, but as someone already mentioned, there are no subsections there
> because it is a dedicated topic:
> 
>   66. How the Planner Uses Statistics.
> 
> I am not inclined to add a prepare-only section to that chapter.  On the
> other hand, the issues described apply to PREPARE and to protocol-level
> prepare, so having it in PREPARE also seems illogical.  However, I am
> inclined to leave it in PREPARE until we are ready to move all of this
> to chapter 66.

I think it would be ok to leave it where it is in your patch; while the
paragraph goes into technical detail, it is still alright in the general
documentation (but only just).

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prepared statements and generic plans

2016-06-06 Thread Albe Laurenz
Bruce Momjian wrote:
> OK, updated version attached.  I added "potential" to the first
> paragraph, and added "estimated cost" to the later part, fixed the
> "cheaper than", and clarified that we add the plan time cost to the
> non-generic plan, which is how it can be cheaper than the generic plan.
> I also moved the "Once a generic plan is chosen" line.
> 
> Yeah, that's a lot of changes, but they all improved the text.  Thanks.

Thanks for working on this.

!Prepared statements can optionally use generic plans rather than
!re-planning with each set of supplied EXECUTE values.
!This occurs immediately for prepared statements with no parameters;
!otherwise it occurs only after five or more executions produce estimated
!plan costs, with planning overhead added, that are, on average, more
!expensive than the generic plan cost.

The following might be easier to understand:
... after five or more executions produce plans whose estimated cost average
(including planning overhead) is more expensive than the generic plan cost 
estimate.

!A generic plan assumes each value supplied to EXECUTE

... assumes *that* each value ...

!is one of the column's distinct values and that column values are
!uniformly distributed.  For example, if statistics records three

Shouldn't it be "record"?
The documentation treats "statistics" as a plural word throughout.

!distinct column values, a generic plan assumes a column equality
!comparison will match 33% of processed rows.  Column statistics

... assumes *that* a column equality comparison will match 33% of *the* 
processed rows.

!also allows generic plans to accurately compute the selectivity of

Column statistics also *allow* ...

!unique columns.  Comparisons on non-uniformly-distributed columns and
!specification of non-existent values affects the average plan cost,
!and hence if and when a generic plan is chosen.

(Disclaimer: I am not a native speaker.)
Other than that, a hearty +1.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prepared statements and generic plans

2016-06-03 Thread Albe Laurenz
David G. Johnston wrote:
> On Thu, Jun 2, 2016 at 9:56 PM, Bruce Momjian  wrote:
>> In Postgres 9.2 we improved the logic of when generic plans are used by
>> EXECUTE.  We weren't sure how well it would work, and the docs included
>> a vague description on when generic plans are chosen.
>> 
>> I have gotten a few questions lately about how prepared statements are
>> handled with multiple executions so I have updated the PREPARE manual
>> page with the attached patch to more clearly explain generic plans and
>> when they are chosen.
>> 
>> I would like to apply this to the 9.6 docs.

I think that this is a very good idea.

I had to dig through the sources before to answer such questions.

> ​While putting the proposed patch in context I came across this.
> 
> ​"""
> ​
> Prepared statements have the largest performance advantage when a single 
> session is being used to
> execute a large number of similar statements. The performance difference will 
> be particularly
> significant if the statements are complex to plan or rewrite, for example, if 
> the query involves a
> join of many tables or requires the application of several rules. If the 
> statement is relatively
> simple to plan and rewrite but relatively expensive to execute, the 
> performance advantage of prepared
> statements will be less noticeable.
> 
> ​"""
> 
> Until and unless the generic plan is used the  "...if statements are complex 
> to plan..." doesn't make
> sense; and no where in the description itself do we introduce the generic 
> plan concept.  This is
> inconsistent but I'm not addressing it below though its worth considering 
> before a patch to this area
> is committed.

I think that the paragraph is still true; it talks about "a large number of 
similar statements".
As long as "large" is sufficiently greater than five, that is.

> As to the patch...
> 
> Isn't this backwards?  [change]
> 
> """
> otherwise it  occurs only after five or more 
> executions produce
> [execution /strike plans] plus planning costs that are, on average, [roughly 
> equal to /strike cheaper]
> than the generic plan.
> """

I agree.
I also fouund this sentence hard to read.

> I'd maybe go with something like this:
> 
> All executions of a prepared statement having zero parameters will use the 
> same plan so the planning
> time taken during the first execution will be spread across all subsequent 
> executions.  For statements
> having parameters the first five executions will result in value-specific 
> plans as previously
> described.  However, on the sixth execution a generic plan will also be 
> computed and if the average
> planning + execution cost of all previous value-specific plans is about equal 
> to the execution cost of
> the generic plan the generic plan will be chosen for that and all subsequent 
> executions.

I think that is much better, but I suggest this wording:

"All executions of a prepared statement having zero parameters use the same 
plan, so they
 will use the generic plan immediately.  For statements having parameters the 
first five executions
 will result in value-specific plans as previously described.
 However, on the sixth execution a generic plan will also be computed, and if 
the average cost estimate
 of all previous value-specific plans is about equal to the cost estimate of 
the generic plan,
 the generic plan will be chosen for that and all subsequent executions."

This emphasizes that it is only estimates we are dealing with, otherwise it 
would be hard
to understand why estimation errors can lead to generic plans being chosen that 
are much worse.

> 
> 
> If we are getting generic plans significantly cheaper than the value-specific 
> plans I suspect there is
> a problem...so any comparison that indicates "less-than" is prone to cause 
> confusion.  The original is
> worded well on this point: "...generic plan appears to be not much more 
> expensive..." but lacks detail
> elsewhere.

I don't quite get that.  Do you mean the same thing that I wrote above?

> This part:
> 
> !A generic plan assumes each value supplied to EXECUTE
> !is one of the column's distinct values and that column values are
> !uniformly distributed.  For example, if statistics records three
> !distinct column values, a generic plan assumes a column equality
> !comparison will match 33% of processed rows.  Column statistics
> !also allows generic plans to accurately compute the selectivity of
> !unique columns.  Comparisons on non-uniformly-distributed columns and
> !specification of non-existent values affects the average plan cost,
> !and hence if and when a generic plan is chosen.  [elided the last 
> sentence, placed in the first
> paragraph]
> 
> I'm not sure of the specific goal here but this level detail seems a bit 
> out-of-place in the SQL
> Command documentation.  So, do we want this user-facing and if so do we want 
> it here?

[...]

> This leaves Bruce's second alteration: 

Re: [HACKERS] Rename max_parallel_degree?

2016-06-03 Thread Albe Laurenz
Robert Haas wrote:
> On Wed, Jun 1, 2016 at 5:29 PM, Tom Lane  wrote:
> > Robert Haas  writes:
> >> I've largely given up hope of coming up with an alternative that can
> >> attract more than one vote and that is also at least mildly accurate,
> >> but one idea is max_parallel_workers_per_gather_node.  That will be
> >> totally clear.
> >
> > Given the reference to Gather nodes, couldn't you drop the word
> > "parallel"?  "node" might not be necessary either.
> 
> Well, I think we could drop node, if you like.  I think parallel
> wouldn't be good to drop, though, because it sounds like we want a
> global limit on parallel workers also, and that can't be just
> max_workers.  So I think we should keep parallel in there for all of
> them, and have max_parallel_workers and
> max_parallel_workers_per_gather(_node).  The reloption and the Path
> struct field can be parallel_workers rather than parallel_degree.

I believe that it will be impossible to find a name that makes
the meaning clear to everybody.  Those who do not read the documentation
will always find a way to misunderstand it.

These suggestions have the added disadvantage that it is hard
to remember them.  I see myself going, "I have to change the maximum
for parallel workers, what was the name again?", and having to resort
to the manual (or SHOW ALL) each time.

I suggest to follow the precedent of "work_mem", stick with
something simple like "max_parallel_workers" and accept the risk
of not being totally self-explanatory.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-13 Thread Albe Laurenz
Pavel Stehule wrote:
> I like a strategy based on risks. Probably there are situation, when the 
> generic plan is great every
> time - INSERTs, UPDATEs via PK, simple SELECTs via PK. generic plan can be 
> well if almost all data has
> similar probability. Elsewhere on bigger data, the probability of pretty slow 
> plan is higher, and then
> we should to prefer custom plan.
> 
> so the strategy - if cost of generic plan is less than some MAGIC CONSTANT 
> (can be specified by GUC),
> then use generic plan. Elsewhere use a custom plan everytime.
> 
> It allow to controll the plan reusing. When MAGIC CONSTANT = 0 .. use custom 
> plan everytime, When
> MAGIC CONSTANT = M, then use generic plan always.

I have a different idea:

What about a GUC "custom_plan_threshold" that controls after how many
executions a generic plan will be considered, with a default value of 5.

A value of -1 could disable generic plans.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: [JDBC] Re: 9.4-1207 behaves differently with server side prepared statements compared to 9.2-1102

2016-01-12 Thread Albe Laurenz
Vladimir Sitnikov wrote:
> Here's the simplified testcase:
> https://gist.github.com/vlsi/df08cbef370b2e86a5c1
> 
> It reproduces the problem in both 9.4.4 and 9.5rc1.
> It is reproducible via both psql and pgjdbc.
> 
> I use a single table, however my production case includes a join of
> two tables and the query is like
> select ... from foo, bar where foo.skewed=? and bar.non_skewed=? and
> foo.bar_id=bar.id
> 
> Note: my application _always_ sends *the same* *bad* value for skewed
> column (it effectively is used as a filtering column in the particular
> query).
> Unfortunately, on 6th execution backend switches to the plan that uses
> skewed index access.
> 
> Is it something that can be fixed/improved?
> 
> Good plan (the first 5 executions):
> Index Scan using non_skewed__flipper on plan_flipper
> (cost=0.43..42.77 rows=10 width=113) (actual time=0.030..0.072 rows=10
> loops=1)
>   Index Cond: (non_skewed = 42)
>   Filter: (skewed = 0)
>   Rows Removed by Filter: 10
>   Buffers: shared hit=20 read=3
> Execution time: 0.094 ms
> 
> Bad plan (all the subsequent executions):
> Index Scan using skewed__flipper on plan_flipper  (cost=0.43..6.77
> rows=1 width=113) (actual time=0.067..355.867 rows=10 loops=1)
>   Index Cond: (skewed = $1)
>   Filter: (non_skewed = $2)
>   Rows Removed by Filter: 90
>   Buffers: shared hit=18182 read=2735
> Execution time: 355.901 ms

The problem is that the index "skewed__flipper" is more selective than
"non_skewed__flipper" except when "skewed = 0", so the generic plan prefers it.

I don't know if there is a good solution except disabling server prepared 
statements.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Experimental evaluation of PostgreSQL's query optimizer

2015-12-21 Thread Albe Laurenz
Viktor Leis wrote:
> We have recently performed an experimental evaluation of PostgreSQL's
> query optimizer. For example, we measured the contributions of
> cardinality estimation and the cost model on the overall query
> performance. You can download the resulting paper here:
> http://www.vldb.org/pvldb/vol9/p204-leis.pdf
> 
> Some findings:
> 1. Perhaps unsurprisingly, we found that cardinality
> estimation is the biggest problem in query optimization.
> 2. The quality of Postgres' cardinality estimates is not generally worse
> than that of the major commerical systems.
> 3. It seems to me that one obvious way to avoid many bad situations
> would be to disable nested loop joins when the inner relation is NOT
> an index scan.
> 
> I hope this will be of interest to some of you.

I have read the paper with great interest, and I have some comments.

- The paper mentions that the "Join Order Benchmark" has high cross-table
  correlation, and this correlation is responsible for bad cardinality
  estimates that cause bad plans with all RDBMS.
  Wouldn't it be interesting to do the same experiment with a different
  real-word data sets to see if that is indeed typical and not an
  idiosyncrasy of that specific benchmark?

- The paper suggests that sampling the base tables is preferable to
  using statistics because it gives better estimates, but I think that that
  is only a win with long running, complicated, data warehouse style queries.
  For the typical OLTP query it would incur intolerable planning times.
  Any ideas on that?

- From my experience in tuning SQL queries I can confirm your one finding,
  namely that bad cardinality estimates are the prime source for bad
  plan choices.
  Perhaps it would be valuable to start thinking about statistics for
  inter-table correlation. What about something as "simple" as a factor
  per (joinable) attribute pair that approximates the total row count
  of a join on these attributes, divided by the planner's estimate?

- I also can corroborate your finding that nested loop joins are often
  harmful, particularly when the inner loop is a sequential scan.
  One of the first things I do when investigating bad performance of a query
  whose plan has a nestend loop join is to set enable_nestloop to "off"
  and see if that makes a difference, and it often does.
  Maybe it would be a win to bias the planner against nested loop joins.
  This is dreaming, but it might be nice to have some number as to how
  reliable a certain estimate is, which is high if the estimate is, say,
  derived from a single filter on a base table and sinks as more conditions
  are involved or numbers pulled out of thin air.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Costing foreign joins in postgres_fdw

2015-12-19 Thread Albe Laurenz
Robert Haas wrote:
>> Maybe, to come up with something remotely realistic, a formula like
>>
>> sum of locally estimated costs of sequential scan for the base table
>> plus count of estimated result rows (times a factor)
>
> Was this meant to say "the base tables", plural?

Yes.

> I think whatever we do here should try to extend the logic in
> postgres_fdw's estimate_path_cost_size() to foreign tables in some
> reasonably natural way, but I'm not sure exactly what that should look
> like.  Maybe do what that function currently does for single-table
> scans, and then add all the values up, or something like that.  I'm a
> little worried, though, that the planner might then view a query that
> will be executed remotely as a nested loop with inner index-scan as
> not worth pushing down, because in that case the join actually will
> not touch every row from both tables, as a hash or merge join would.

That's exactly what I meant, minus a contribution for the estimated
result set size.

There are cases where a nested loop is faster than a hash join,
but I think it is rare that this is by orders of magnitude.

So I'd say it is a decent rough estimate, and that's the best we can
hope for here, if we cannot ask the remote server.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Costing foreign joins in postgres_fdw

2015-12-18 Thread Albe Laurenz
Ashutosh Bapat wrote:
> Costs for foreign queries are either obtained from the foreign server using 
> EXPLAIN (if
> use_remote_estimate is ON) otherwise they are cooked up locally based on the 
> statistics available. For
> joins as well, we have to do the same. If use_remote_estimates [1] is ON, we 
> can get the costs from
> the foreign server. Rest of the mail discusses approaches for estimating the 
> costs when
> use_remote_estimates is OFF.
> 
> 
> 1. Unlike base relations where the table data "has to be" fetched from the 
> foreign server, a join
> doesn't "have to be" fetched from the foreign server. So, even if 
> use_remote_estimate is OFF for a
> base relation, we do try to estimate something locally. But for a join that's 
> not compulsory, so we
> can choose not to estimate anything and not push down the join if 
> use_remote_estimate is OFF. Whether
> we do that depends upon how well we can estimate the join cost when 
> use_remote_estimate is OFF.
> 
> 2. Locally estimating the cost of join that will be performed on the foreign 
> server is difficult
> because we do not know which join strategy the foreign server is going to 
> use, which in turn depends
> upon the availability of indexes, work memory, statistics about joining 
> expressions etc. One way to do
> this is to use the cost of cheapest local join path built upon foreign outer 
> and inner paths, to
> estimate the cost of executing the join at the foreign server The startup and 
> run time costs for
> sending, parsing and planning query at the foreign server as well as the cost 
> to fetch the tuples need
> to be adjusted, so that it doesn't get counted twice. We may assume that the 
> cost for the foreign join
> will be some factor of the adjusted cost, like we have done for estimating 
> cost of sort pushdown. The
> reason we choose cheapest path with foreign inner and outer paths is because 
> that's likely to be a
> closer to the real estimate than the path which does not have foreign inner 
> and outer paths. In the
> absence of such path, we should probably not push the join down since no 
> local path has found pushing
> inner and outer to be cheaper and it's likely (certainly not a rule) that 
> pushing the join in question
> down is not going to be cheaper than the local paths.
> 
> 
> 1st option is easy but it sounds too restrictive. 2nd option liberal but is 
> complex.

My gut feeling is that for a join where all join predicates can be pushed down, 
it
will usually be a win to push the join to the foreign server.

So in your first scenario, I'd opt for always pushing down the join
if possible if use_remote_estimate is OFF.

Your second scenario is essentially to estimate that a pushed down join will
always be executed as a nested loop join, which will in most cases produce
an unfairly negative estimate.

What about using local statistics to come up with an estimated row count for
the join and use that as the basis for an estimate?  My idea here is that it
is always be a win to push down a join unless the result set is so large that
transferring it becomes the bottleneck.
Maybe, to come up with something remotely realistic, a formula like

sum of locally estimated costs of sequential scan for the base table
plus count of estimated result rows (times a factor)

Yours,
Laurenz Albe



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fdw cleanup

2015-12-14 Thread Albe Laurenz
Feng Tian wrote:
> I need some help to understand foreign table error handling.
> 
> For a query on foreign table, ExecInitForeignScan is called, which in turn 
> calls the BeginForeignScan
> hook.   Inside this hook, I allocated some resource,
> 
> 
> node->fdw_state = allocate_resource(...);
> 
> If everything goes well, ExecEndForeignScan will call call my EndForeignScan 
> hook, inside the hook, I
> free the resource.
> 
> free_resource(node->fdw_state);
> 
> However, if during the execution an error happened, seems to me that 
> EndForeignScan will not be called
> (traced using gdb).   So my question is, is Begin/End the right place for 
> allocate/free resources?
> If it is not, what is the right way to do this?

If the resource is memory, use "palloc" to allocate it, and PostgreSQL
will take care of it automatically.

Other than that, you could call RegisterXactCallback() and register a callback
that will be called upon transaction end.  In the callback function you can
free the resource if it was not already freed by EndForeignScan.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Disabling an index temporarily

2015-12-12 Thread Albe Laurenz
Tatsuo Ishii wrote:
>> Wouldn't something like:
>>
>> ALTER INDEX foo SET DISABLED;
>>
>> See more in line with our grammar?
>
> But this will affect other sessions, no?

Not if it is used in a transaction that ends with a ROLLBACK,
but then you might as well use DROP INDEX, except
that DROP INDEX takes an access exclusive lock.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Errors in our encoding conversion tables

2015-11-27 Thread Albe Laurenz
Tom Lane wrote:
> There's a discussion over at
> http://www.postgresql.org/message-id/flat/2sa.dhu5.1hk1yrptnfy.1ml...@seznam.cz
> of an apparent error in our WIN1250 -> LATIN2 conversion.  I looked into this
> and found that indeed, the code will happily translate certain characters
> for which there seems to be no justification.  I made up a quick script
> that would recompute the conversion tables in latin2_and_win1250.c from
> the Unicode mapping files in src/backend/utils/mb/Unicode, and what it
> computes is shown in the attached diff.  (Zeroes in the tables indicate
> codes with no translation, for which an error should be thrown.)
> 
> Having done that, I thought it would be a good idea to see if we had any
> other conversion tables that weren't directly based on the Unicode data.
> The only ones I could find were in cyrillic_and_mic.c, and those seem to
> be absolutely filled with errors, to the point where I wonder if they were
> made from the claimed encodings or some other ones.  The attached patch
> recomputes those from the Unicode data, too.
> 
> None of this data seems to have been touched since Tatsuo-san's original
> commit 969e0246, so it looks like we simply didn't vet that submission
> closely enough.
> 
> I have not attempted to reverify the files in utils/mb/Unicode against the
> original Unicode Consortium data, but maybe we ought to do that before
> taking any further steps here.
> 
> Anyway, what are we going to do about this?  I'm concerned that simply
> shoving in corrections may cause problems for users.  Almost certainly,
> we should not back-patch this kind of change.

Thanks for picking this up.

I agree with your proposed fix, the only thing that makes me feel uncomfortable
is that you get error messages like:
  ERROR:  character with byte sequence 0x96 in encoding "WIN1250" has no 
equivalent in encoding "MULE_INTERNAL"
which is a bit misleading.
But the main thing is that no corrupt data can be entered.

I can understand the reluctance to back-patch; nobody likes his
application to suddenly fail after a minor database upgrade.

However, the people who would fail if this were back-patched are
people who will certainly run into trouble if they
a) upgrade to a release where this is fixed or
b) try to convert their database to, say, UTF8.

The least thing we should do is stick a fat warning into the release notes
of the first version where this is fixed, along with some guidelines what
to do (though I am afraid that there is not much more helpful to say than
"If your database encoding is X and data have been entered with client_encoding 
Y,
fix your data in the old system").

But I think that this fix should be applied to 9.6.
PostgreSQL has a strong reputation for being strict about correct encoding
(not saying that everybody appreciates that), and I think we shouldn't mar
that reputation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Git cartoon

2015-11-09 Thread Albe Laurenz
Fabrízio de Royes Mello wrote:
> Em domingo, 8 de novembro de 2015, Bruce Momjian  escreveu:
>> This git cartoon was too funny not to share:
>>
>> http://xkcd.com/1597/
>> 
>> Maybe we need it on our git wiki page.  ;-)
> 
>  I think we need our own cartoon with a funny DB story.

What, like this?

http://xkcd.com/327/

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] RFC: Add length parameterised dmetaphone functions

2015-11-06 Thread Albe Laurenz
Christian Marie wrote:
> A developer I work with was trying to use dmetaphone to group people names 
> into
> equivalence classes. He found that many long names would be grouped together
> when they shouldn't be, this turned out to be because dmetaphone has an
> undocumented upper bound on its output length, of four. This is obviously
> impractical for many use cases.
> 
> This patch addresses this by adding and documenting an optional argument to
> dmetaphone and dmetaphone_alt that specifies the maximum output length. This
> makes it possible to use dmetaphone on much longer inputs.
> 
> Backwards compatibility is catered for by making the new argument optional,
> defaulting to the old, hard-coded value of four. We now have:
> 
>   dmetaphone(text source) returns text
>   dmetaphone(text source, int max_output_length) returns text
>   dmetaphone_alt(text source) returns text
>   dmetaphone_alt(text source, int max_output_length) returns text

I like the idea.

How about:
dmetaphone(text source, int max_output_length DEFAULT 4) returns text
dmetaphone_alt(text source, int max_output_length DEFAULT 4) returns text

Saves two functions and is self-documenting.

> +postgres=# select dmetaphone('unicorns');
> + dmetaphone
> +
> + ANKR
> +(1 row)
> +
> +postgres=# select dmetaphone('unicorns', 8);
> + dmetaphone
>  
> - KMP
> + ANKRNS
>  (1 row)
>  
>   

Yeah, "ponies" would have been too short...

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Documentation fix for psql

2015-11-03 Thread Albe Laurenz
The psql documentation calls the \pset options unicode_*_style
when in reality they are called unicode_*_linestyle.

This should be backpatched to 9.5.

Yours,
Laurenz Albe


0001-Fix-documentation-for-pset-unicode_-_linestyle.patch
Description: 0001-Fix-documentation-for-pset-unicode_-_linestyle.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Documentation for min_wal_size and max_wal_size

2015-10-20 Thread Albe Laurenz
Wouldn't it be better to have these two parameters documented next to each 
other,
as in the attached patch?

Yours,
Laurenz Albe


0001-Move-documentation-for-min_wal_size-before-max_wal_s.patch
Description: 0001-Move-documentation-for-min_wal_size-before-max_wal_s.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow ssl_renegotiation_limit in PG 9.5

2015-10-16 Thread Albe Laurenz
Shay Rojansky wrote:
> Just to give some added reasoning...
> 
> As Andres suggested, Npgsql sends ssl_renegotiation_limit=0 because we've 
> seen renegotiation bugs with
> the standard .NET SSL implementation (which Npgsql uses). Seems like everyone 
> has a difficult time
> with renegotiation.

As far as I remember, that was introduced because of renegotiation bugs with 
Mono:
http://lists.pgfoundry.org/pipermail/npgsql-devel/2010-February/001074.html
http://fxjr.blogspot.co.at/2010/03/ssl-renegotiation-patch.html

Of course, with renegotiation disabled, nobody knows whether there would also 
have been renegotiation
problems since Npgsql switched from Mono SSL to .NET SSL ...

> As Tom suggested, it gets sent in the startup packet so that DISCARD/RESET 
> ALL resets back to 0 and
> not to the default 512MB (in older versions). Npgsql itself issues DISCARD 
> ALL in its connection pool
> implementation to reset the connection to its original opened state, and of 
> course users may want to
> do it themselves. Having SSL renegotiation accidentally turned on because a 
> user sent RESET ALL, when
> the SSL implementation is known to have issues, is something to be avoided...

Maybe Npgsql could switch to sending the statement after the connection has been
established and resending it after each RESET ALL?
You could add documentation that people should not use RESET ALL with Npgsql 
unless they
are ready to disable renegotiation afterwards.

If not, the only solution I can see is for PostgreSQL to not protest if it sees 
the
parameter in the startup packet.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [COMMITTERS] pgsql: Use gender-neutral language in documentation

2015-09-22 Thread Albe Laurenz
Peter Geoghegan wrote:
> On Mon, Sep 21, 2015 at 9:32 PM, Erik Rijkers  wrote:
>> I think this compulsive 'he'-avoiding is making the text worse.
>>
>>
>> -  environment variable); any user can make such a change for his 
>> session.
>> +  environment variable); any user can make such a change for their 
>> session.
> 
> -1. It seems fine to me.

(Disclaimer: I am not a native speaker.)

Using the pronoun of the third person plural as a replacement for "his or her"
has become widely used, at least in the U.S., and the OED condones that use:
http://www.oxforddictionaries.com/definition/english/they

Do we want to have that everywhere?

Yours,
Laurenz Albe 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-02 Thread Albe Laurenz
Etsuro Fujita wrote:
> On 2015/09/02 16:40, Amit Langote wrote:
>> On 2015-09-02 PM 04:07, Albe Laurenz wrote:
>>> Amit Langote wrote:
>>>> On 2015-09-02 PM 03:25, Amit Kapila wrote:
>>>>> Will it handle deadlocks across different table partitions. Consider
>>>>> a case as below:
>>>>>
>>>>> T1
>>>>> 1. Updates row R1 of T1 on shard S1
>>>>> 2. Updates row R2 of T2 on shard S2
>>>>>
>>>>> T2
>>>>> 1. Updates row R2 of T2 on shard S2
>>>>> 2. Updates row R1 of T1 on shard S1
>>>
>>>> As long as shards are processed in the same order in different
>>>> transactions, ISTM, this issue should not arise? I can imagine it becoming
>>>> a concern if parallel shard processing enters the scene. Am I missing
>>>> something?
>>>
>>> That would only hold for a single query, right?
>>>
>>> If 1. and 2. in the above example come from different queries within one
>>> transaction, you cannot guarantee that shards are processed in the same 
>>> order.
>>>
>>> So T1 and T2 could deadlock.
> 
>> Sorry, I failed to see why that would be the case. Could you elaborate?
> 
> I think Laurenz would assume that the updates 1. and 2. in the above
> transactions are performed *in a non-inherited manner*.  If that's
> right, T1 and T2 could deadlock, but I think we assume here to run
> transactions over shards *in an inherited manner*.

Yes, but does every update affect all shards?

If I say "UPDATE t1 SET col = 1 WHERE id = 42" and the row with id 42
happens to be on shard S1, the update would only affect that shard, right?

Now if "UPDATE t2 SET col = 1 WHERE id = 42" would only take place on
shard S2, and two transactions issue both updates in different order,
one transaction would be waiting for a lock on shard S1, while the other
would be waiting for a lock on shard S2, right?

But maybe I'm missing something fundamental.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Horizontal scalability/sharding

2015-09-02 Thread Albe Laurenz
Amit Langote wrote:
> On 2015-09-02 PM 03:25, Amit Kapila wrote:
>> Will it handle deadlocks across different table partitions. Consider
>> a case as below:
>>
>> T1
>> 1. Updates row R1 of T1 on shard S1
>> 2. Updates row R2 of T2 on shard S2
>>
>> T2
>> 1. Updates row R2 of T2 on shard S2
>> 2. Updates row R1 of T1 on shard S1

> As long as shards are processed in the same order in different
> transactions, ISTM, this issue should not arise? I can imagine it becoming
> a concern if parallel shard processing enters the scene. Am I missing
> something?

That would only hold for a single query, right?

If 1. and 2. in the above example come from different queries within one
transaction, you cannot guarantee that shards are processed in the same order.

So T1 and T2 could deadlock.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-19 Thread Albe Laurenz
Victor Wagner wrote:
 I wonder how useful this is at the present time.

 If the primary goes down and the client gets connected to the standby,
 it would have read-only access there.  Most applications wouldn't cope
 well with that.

 It is supposed that somebody (either system administrator or some
 cluster management software) have noticed failure of master and promoted
 one of the standbys to master.
 
 So, clients have only to find out which cluster node serves as master
 just now.
 
 Idea is that we don't need any extra administration actions such as IP
 migration to do it. Clients have list of alternate servers and discover
 which one to work with by trial and error.

Yes, but that will only work reliably if the (read-only) standby does not
allow connections before it is promoted.

 I consider in my proposal following situations:
 
 1. Warm standby - doesn't accept client connection at all unless
 promoted to master.
 
 2. Hot standby - we have two types of clients - one for which readonly
 access is sufficient, and other that need to connect only to master.
 In this case intention to write is explicitely stated in the connect
 string (readonly=false) and connect procedure would check if node it
 tries to connect allowed write.

I think that these are both valid use cases.

And as Robert said, there are people out using BDR or other proprietary
multi-master solutions, so there might well be an audience for this feature.

 host=main-server host=standby1 host=standby2 port=5432 dbname=database

 It seems a bit arbitrary to require that all servers use the same port.
 Maybe parameters like host2, port2, host3, port3 etc. might be better.

 I've thought about this approach. But PostgreSQL administration guide
 insists that all servers in the cluster should have as identical
 configuration as possible to simplify administration.

 Moreover I've seldom have seen configurations where postgresql is
 accepting connection on non-default port.

We do it all the time.

 Using host1, host2 etc would have unintended connotations, such is this
 is first, main server. I think that client should treat all given
 servers as equal and let cluster administration to choose which one
 would accept connection.

I don't think that my idea of host, host3 is very appealing myself,
but I still don't like your original proposal of having multiple host
parameters.

One problem with that is that this syntax is already allowed, but
your proposal would silently change the semantics.
Today, if you have multiple host parameters, the last one wins.
So with your modification in place, some connect strings that work today
would start behaving in unexpected ways.

Maybe a better idea would be:
  host=db1.myorg.com,db2.myorg.com port=5432,2345

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-19 Thread Albe Laurenz
Victor Wagner wrote:
   Idea is that we don't need any extra administration actions such as IP
   migration to do it. Clients have list of alternate servers and discover
   which one to work with by trial and error.
 
  Yes, but that will only work reliably if the (read-only) standby does not
  allow connections before it is promoted.
 
 It would just take a bit more time for client and a bit more load for
 server - to make sure that this connection is read-write by
 issuing
 
 show transaction_read_only
 
 statement before considering connection useful.

That's not very comfortable, and a lot of middleware software won't easily
learn the trick.

But even without that use case I think that the feature is probably
worth the effort.

[about having multiple host parameters in the connection string]

  One problem with that is that this syntax is already allowed, but
  your proposal would silently change the semantics.
  Today, if you have multiple host parameters, the last one wins.
  So with your modification in place, some connect strings that work today
  would start behaving in unexpected ways.
 
 This is serious argument. But what the use case of these connect strings
 now?

 It seems to me that in most cases last host in the connect string would
 be only host which accepts connections, so it wins anyway

I'm not saying that it is particularly wide-spread and useful; it could
happen through careless editing of connection strings or by using a
connection service file entry
(http://www.postgresql.org/docs/current/static/libpq-pgservice.html)
and overriding the host parameter on the command line.

  Maybe a better idea would be:
host=db1.myorg.com,db2.myorg.com port=5432,2345
 
 I've tried not to introduce new delimiters. But this syntax definitely
 have some advantages. At least it allows to specify host-port pairs as
 two parallel lists.
 
 Other idea - allow to specify host-port pair as argument of host
 parameter.
 
   host=db1.myorg.com:5432
 
 It is consistent with URL syntax and system administrators are used to
 it. And with long list of hosts there is less chances to made an error
 as host and corresponding port come together.

I don't think that is very attactive as it confuses the distinction between
host and port.  What would you do with

   host=db1.myorg.com:2345 port=1234

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-18 Thread Albe Laurenz
Hans-Jürgen Schönig wrote:
 in addition to that you have the “problem” of transactions. if you failover 
 in the middle
 of a transaction, strange things might happen from the application point of 
 view.
 
 the good thing, however, is that stupid middleware is sometimes not able to 
 handle
 failed connections. however, overall i think it is more of a danger than a 
 benefit.

Maybe I misunderstood the original proposal, but my impression was that the 
alternative
servers would be tried only at the time the connection is established, and 
there would be no
such problems as you describe.

Those could only happen if libpq automatically tried to reconnect upon failure 
without
the client noticing.

So the stupid middleware would get an error message, but the reconnect would 
actually work.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Implement failover on libpq connect level.

2015-08-18 Thread Albe Laurenz
Victor Wagner wrote:
 Rationale
 =
 
 Since introduction of the WAL-based replication into the PostgreSQL, it is
 possible to create high-availability and load-balancing clusters.
 
 However, there is no support for failover in the client libraries. So, only
 way to provide transparent for client application failover is IP address
 migration. This approach has some limitation, i.e. it requires that
 master and backup servers reside in the same subnet or may not be
 feasible for other reasons.
 
 Commercial RDBMS, such as Oracle, employ more flexible approach. They
 allow to specify multiple servers in the connect string, so if primary
 server is not available, client library tries to connect to other ones.
 
 This approach allows to use geographically distributed failover clusters
 and also is a cheap way to implement load-balancing (which is not
 possible with IP address migration).

I wonder how useful this is at the present time.

If the primary goes down and the client gets connected to the standby,
it would have read-only access there.  Most applications wouldn't cope
well with that.

Once we have multi-master replication that can be used for fail-over,
the picture will change.  Then a feature like that would be very useful indeed.

 host=main-server host=standby1 host=standby2 port=5432 dbname=database

It seems a bit arbitrary to require that all servers use the same port.

Maybe parameters like host2, port2, host3, port3 etc. might be better.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autonomous Transaction is back

2015-08-17 Thread Albe Laurenz
Noah Misch wrote:
   If the autonomous transaction can interact with uncommitted
   work in a way that other backends could not, crazy things happen when the
   autonomous transaction commits and the suspended transaction aborts:
  
   CREATE TABLE t (c) AS SELECT 1;
   BEGIN;
   UPDATE t SET c = 2 WHERE c = 1;
   BEGIN_AUTONOMOUS;
   UPDATE t SET c = 3 WHERE c = 1;
   UPDATE t SET c = 4 WHERE c = 2;
   COMMIT_AUTONOMOUS;
   ROLLBACK;
  
   If you replace the autonomous transaction with a savepoint, the c=3 update
   finds no rows, and the c=4 update changes one row.  When the outer 
   transaction
   aborts, only the original c=1 row remains live.  If you replace the 
   autonomous
   transaction with a dblink/pg_background call, the c=3 update waits
   indefinitely for c=2 to commit or abort, an undetected deadlock.

 My starting expectation is that the semantics of an autonomous transaction
 will be exactly those of dblink/pg_background.  (I said that during the
 unconference session.)  The application would need to read data from tables
 before switching to the autonomous section.  Autonomous transactions are then
 a performance and syntactic help, not a source of new semantics.  Does any
 database have autonomous transactions that do otherwise?

Oracle behaves like that, i.e. it deadlocks with your example:

SQL SELECT * FROM t;

 C
--
 1

SQL CREATE PROCEDURE proc2 IS
  2  PRAGMA AUTONOMOUS_TRANSACTION;
  3  BEGIN
  4  UPDATE t SET c = 3 WHERE c = 1;
  5  UPDATE t SET c = 4 WHERE c = 2;
  6  COMMIT;
  7  END;
  8  /

Procedure created.

SQL CREATE PROCEDURE proc1 IS
  2  BEGIN
  3  UPDATE t SET c = 2 WHERE c = 1;
  4  proc2;
  5  ROLLBACK;
  6  END;
  7  /

Procedure created.

SQL CALL proc1();
CALL proc1()
 *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at LAURENZ.PROC2, line 4
ORA-06512: at LAURENZ.PROC1, line 4

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] psql readline Tab insert tab

2015-06-02 Thread Albe Laurenz
Peter Eisentraut wrote:
 On 6/1/15 7:00 AM, Albe Laurenz wrote:
 Hans Ginzel wrote:
 how to make psql (readline) to insert tab when Tab is pressed? E.g. for
 pasting. I know, there is -n option. But then the history is not
 accessible.

 It could be done by adding the following lines to your ~/.inputrc file:

 $if Psql
 TAB: tab-insert
 $endif

 Great! Thank you very much. Could this be added as note to the -n option
 of the page http://www.postgresql.org/docs/current/static/app-psql.html,
 please?

 Would that be worth an addition to the documentation?

 There is already a section about this on the psql man page.  (Look for
 disable-completion.)

Right, I didn't notice that.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [NOVICE] psql readline Tab insert tab

2015-06-01 Thread Albe Laurenz
Hans Ginzel wrote:
 how to make psql (readline) to insert tab when Tab is pressed? E.g. for
 pasting. I know, there is -n option. But then the history is not
 accessible.

 It could be done by adding the following lines to your ~/.inputrc file:

 $if Psql
 TAB: tab-insert
 $endif

 Great! Thank you very much. Could this be added as note to the -n option
 of the page http://www.postgresql.org/docs/current/static/app-psql.html,
 please?

Would that be worth an addition to the documentation?

Yours,
Laurenz Albe


0001-Document-how-to-disable-tab-completion-with-readline.patch
Description: 0001-Document-how-to-disable-tab-completion-with-readline.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.5 release notes may need ON CONFLICT DO NOTHING compatibility notice for FDW authors

2015-05-26 Thread Albe Laurenz
Peter Geoghegan wrote:
 In any case, third party foreign data wrappers that target other
 database system will totally ignore ON CONFLICT DO NOTHING when built
 against the master branch (unless they consider these questions). They
 should perhaps make a point of rejecting DO NOTHING outright where it
 makes sense that support could exist, but it just doesn't. Or they
 could just add support (I imagine that this would be very easy for
 mysql_fdw, for example -- MySQL has INSERT IGNORE). I feel a
 compatibility item in the release notes is in order so the question is
 considered, but there seems to be no place to do that on the Wiki, and
 the original commit message does not have a note like this.

+1

I wouldn't have become aware of that if I hadn't read your message.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Rounding to even for numeric data type

2015-03-30 Thread Albe Laurenz
Michael Paquier wrote:
 Well, I am not sure about that... But reading this thread changing the
 default rounding sounds unwelcome. So it may be better to just put in
 words the rounding method used now in the docs, with perhaps a mention
 that this is not completely in-line with the SQL spec if that's not
 the case.

The SQL standard does not care, it says that numbers and other data types
should, whenever necessary, be rounded or truncated in an implementation-
defined fashion.

I cannot find any mention of a round() function.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] MD5 authentication needs help

2015-03-06 Thread Albe Laurenz
Stephen Frost wrote:
 * Tom Lane (t...@sss.pgh.pa.us) wrote:
 Bruce Momjian br...@momjian.us writes:
 Let me update my list of possible improvements:

 1)  MD5 makes users feel uneasy (though our usage is mostly safe)

 2)  The per-session salt sent to the client is only 32-bits, meaning
 that it is possible to reply an observed MD5 hash in ~16k connection
 attempts.

 3)  Using the user name for the MD5 storage salt allows the MD5 stored
 hash to be used on a different cluster if the user used the same
 password.

 4)  Using the user name for the MD5 storage salt allows the MD5 stored
 hash to be used on the _same_ cluster.

 5)  Using the user name for the MD5 storage salt causes the renaming of
 a user to break the stored password.

 What happened to possession of the contents of pg_authid is sufficient
 to log in?  I thought fixing that was one of the objectives here.
 
 Yes, it certainly was.  I think Bruce was thinking that we could simply
 hash what goes on to disk with an additional salt that's stored, but
 that wouldn't actually work without requiring a change to the wireline
 protocol, which is the basis of this entire line of discussion, in my
 view.

This article
https://hashcat.net/misc/postgres-pth/postgres-pth.pdf
has some ideas about how to improve the situation.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2015-03-03 Thread Albe Laurenz
Etsuro Fujita wrote:
 While updating the patch, I noticed that in the previous patch, there is
 a bug in pushing down parameterized UPDATE/DELETE queries; generic plans
 for such queries fail with a can't-happen error.  I fixed the bug and
 tried to add the regression tests that execute the generic plans, but I
 couldn't because I can't figure out how to force generic plans.  Is
 there any way to do that?

I don't know about a way to force it, but if you run the statement six
times, it will probably switch to a generic plan.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL renegotiation

2015-02-23 Thread Albe Laurenz
Florian Weimer wrote:
 On 02/22/2015 02:05 PM, Andres Freund wrote:
 On 2015-02-22 01:27:54 +0100, Emil Lenngren wrote:
 I honestly wonder why postgres uses renegotiation at all. The motivation
 that cryptoanalysis is easier as more data is sent seems quite
 far-fetched.

 I don't think so. There's a fair number of algorithms that can/could be
 much easier be attached with lots of data available. Especially if you
 can guess/know/control some of the data.  Additionally renegotiating
 regularly helps to constrain a possible key leagage to a certain amount
 of time. With backend connections often being alive for weeks at a time
 that's not a bad thing.
 
 Renegotiation will be removed from future TLS versions because it is
 considered unnecessary with modern ciphers:
 
   https://github.com/tlswg/tls13-spec/issues/38
 
 If ciphers require rekeying, that mechanism will be provided at the TLS
 layer in the future.
 
 I think you could remove renegotiation from PostgreSQL as long as you
 offer something better than RC4 in the TLS handshake.

I'd say it is best to wait if and how OpenSSL change their API when they
implement TLS 1.3.

I'd vote against removing renegotiation.  At the very least, if the feature
should provide unnecessary and cumbersome with future versions of OpenSSL,
we should retain ssl_renegotiation_limit and change the default to 0.
It might still be of value with older versions.

If changing the encryption is so useless, whe did the TLS workgroup
decide to introduce rekeying as a substitute for renegotiation?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POLA violation with \c service=

2014-12-17 Thread Albe Laurenz
David Fetter wrote:
 I've noticed that psql's  \c function handles service= requests in a
 way that I can only characterize as broken.
 
 This came up in the context of connecting to a cloud hosting service
 named after warriors or a river or something, whose default hostnames
 are long, confusing, and easy to typo, so I suspect that service= may
 come up more often going forward than it has until now.
 
 For example, when I try to use
 
 \c service=foo
 
 It will correctly figure out which database I'm trying to connect to,
 but fail to notice that it's on a different host, port, etc., and
 hence fail to connect with a somewhat unhelpful error message.
 
 I can think of a few approaches for fixing this:
 
 0.  Leave it broken.
 1.  Disable service= requests entirely in \c context, and error out if 
 attempted.
 2.  Ensure that \c actually uses all of the available information.
 
 Is there another one I missed?
 
 If not, which of the approaches seems reasonable?

#2 is the correct solution, #1 a band aid.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using RTLD_DEEPBIND to handle symbol conflicts in loaded libraries

2014-11-26 Thread Albe Laurenz
Ants Aasma wrote:
 I had to make oracle_fdw work with PostgreSQL compiled using
 --with-ldap. The issue there is that Oracle's client library has the
 delightful property of linking against a ldap library they bundle that
 has symbol conflicts with OpenLDAP. At PostgreSQL startup libldap is
 loaded, so when libclntsh.so (the Oracle client) is loaded it gets
 bound to OpenLDAP symbols, and unsurprisingly crashes with a segfault
 when those functions get used.
 
 glibc-2.3.4+ has a flag called RTLD_DEEPBIND for dlopen that prefers
 symbols loaded by the library to those provided by the caller. Using
 this flag fixes my issue, PostgreSQL gets the ldap functions from
 libldap, Oracle client gets them from whatever it links to. Both work
 fine.

I am aware of the problem, but this solution is new to me.
My workaround so far has been to load OpenLDAP with the LD_PRELOAD
environment variable on PostgreSQL start.
But then you get a crash when Oracle uses LDAP functionality (directory naming).

 Attached is a patch that enables this flag on Linux when available.
 This specific case could also be fixed by rewriting oracle_fdw to use
 dlopen for libclntsh.so and pass this flag, but I think it would be
 better to enable it for all PostgreSQL loaded extension modules.

I'll consider changing oracle_fdw in that fashion, even if that will
only remedy the problem on Linux.
I think that this patch is a good idea though.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Functions used in index definitions shouldn't be changed

2014-11-21 Thread Albe Laurenz
Robert Haas wrote:
 On Thu, Nov 20, 2014 at 1:56 PM, Albe Laurenz laurenz.a...@wien.gv.at wrote:
  I don't think that there is a universally compelling right or wrong to
  questions like this, it is more a matter of taste.  Is it more important to 
  protect
  the casual DBA from hurting himself or herself, or is it more important to
  provide a well honed scalpel for the experienced surgeon?
 
 +1.
 
 I think if we had an already-existing prohibition here and you
 proposed relaxing it, the howls would be equally loud.  We're not
 entirely consistent about how picky we are.

There is also the possibility to add syntax like this:

CREATE OR REPLACE [FORCE] FUNCTION ...

What do you think about that?  It would protect the casual user but allow
the expert to do it anyway.

Another thing I thought about is changing function volatility:
If you change the volatility of a function used in an index to anything
other than IMMUTABLE, your database will continue to work as expected, but
a dump will fail to restore with
ERROR:  functions in index expression must be marked IMMUTABLE

Is that something worth checking for?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Functions used in index definitions shouldn't be changed

2014-11-20 Thread Albe Laurenz
Tom Lane wrote:
 Antonin Houska a...@cybertec.at writes:
 Albe Laurenz laurenz.a...@wien.gv.at wrote:
 Currently it is possible to change the behaviour of a function with
 CREATE OR REPLACE FUNCTION even if the function is part of an index 
 definition.

 I think that should be forbidden, because it is very likely to corrupt
 the index.  I expect the objection that this would break valid use cases
 where people know exactly what they are doing, but I believe that this
 is a footgun for inexperienced users that should be disarmed.

 I'd also opt for forbidding behaviour changing modifications with ALTER 
 FUNCTION
 for functions used in index definitions, specifically altering strictness.

 Attached is a patch implementing a fix.

 A comparable issue is that alteration of text search configurations may
 partially invalidate precomputed tsvector columns and indexes based on
 tsvector data.  We discussed whether we should prohibit that somehow
 and explicitly decided that it would be a bad idea.  In the text search
 case, changes like adding stop words are common and don't meaningfully
 invalidate indexes.  In some cases you may decide that you need to
 recompute the tsvector data, but that decision should be left to the
 user.

 I think that pretty much the same thing applies to functions used in
 indexes.  There are too many use-cases where alterations don't
 meaningfully impact the stored data for us to get away with a blanket
 prohibition.  (I'm pretty sure that this has been discussed in the
 past, too.  If Albe really wants to push the point he should look
 up the previous discussions and explain why the decision was wrong.)

I don't think that there is a universally compelling right or wrong to
questions like this, it is more a matter of taste.  Is it more important to 
protect
the casual DBA from hurting himself or herself, or is it more important to
provide a well honed scalpel for the experienced surgeon?

It seems that more people lean in the latter direction, so I'll cease and 
desist.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unlogged tables can vanish after a crash

2014-11-19 Thread Albe Laurenz
I observed an interesting (and I think buggy) behaviour today after one of
our clusters crashed due to an out of space condition in the data directory.

Five databases in that cluster have each one unlogged table.

The log reads as follows:

PANIC  could not write to file pg_xlog/xlogtemp.1820: No space left on device
...
LOGterminating any other active server processes
...
LOGall server processes terminated; reinitializing
LOGdatabase system was interrupted; last known up at 2014-11-18 18:04:28 CET
LOGdatabase system was not properly shut down; automatic recovery in 
progress
LOGredo starts at C9/50403B20
LOGredo done at C9/5A98
LOGcheckpoint starting: end-of-recovery immediate
LOGcheckpoint complete: ...
LOGautovacuum launcher started
LOGdatabase system is ready to accept connections
...
PANIC  could not write to file pg_xlog/xlogtemp.4417: No space left on device
...
LOGterminating any other active server processes
...
LOGall server processes terminated; reinitializing
LOGdatabase system was interrupted; last known up at 2014-11-18 18:04:38 CET
LOGdatabase system was not properly shut down; automatic recovery in 
progress
LOGredo starts at C9/5B70
LOGredo done at C9/5FFFE4E0
LOGcheckpoint starting: end-of-recovery immediate
LOGcheckpoint complete: ...
FATAL  could not write to file pg_xlog/xlogtemp.4442: No space left on device
LOGstartup process (PID 4442) exited with exit code 1
LOGaborting startup due to startup process failure

After the problem was removed, the cluster was restarted.
The log reads as follows:

LOGending log output to stderr  Future log output will go to log 
destination csvlog.
LOGdatabase system was shut down at 2014-11-18 18:05:03 CET
LOGautovacuum launcher started
LOGdatabase system is ready to accept connections


So no crash recovery was performed, probably because the startup process
failed *after* it completed the end-of-recovery checkpoint.

Now the main fork files for all five unlogged tables are gone; the init fork 
files
are still there.

Obviously the main fork got nuked during recovery, but the startup process died
before it could recreate them:

/*
 * Preallocate additional log files, if wanted.
 */
PreallocXlogFiles(EndOfLog);

/*
 * Reset initial contents of unlogged relations.  This has to be done
 * AFTER recovery is complete so that any unlogged relations created
 * during recovery also get picked up.
 */
if (InRecovery)
ResetUnloggedRelations(UNLOGGED_RELATION_INIT);

It seems to me that the right fix would be to recreate the unlogged
relations *before* the checkpoint.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unlogged tables can vanish after a crash

2014-11-19 Thread Albe Laurenz
Andres Freund wrote:
 On 2014-11-19 11:26:56 +, Albe Laurenz wrote:
 I observed an interesting (and I think buggy) behaviour today after one of
 our clusters crashed due to an out of space condition in the data 
 directory.
 
 Hah, just a couple days I pushed a fix for that ;)
 
 http://archives.postgresql.org/message-id/20140912112246.GA4984%40alap3.anarazel.de
 and
 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d3586fc8aa5d9365a5c50cb5e555971eb633a4ec

Thanks, I didn't see that.
PostgreSQL, the database system where your bugs get fixed before you report 
them!

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Functions used in index definitions shouldn't be changed

2014-11-19 Thread Albe Laurenz
Currently it is possible to change the behaviour of a function with
CREATE OR REPLACE FUNCTION even if the function is part of an index definition.

I think that should be forbidden, because it is very likely to corrupt
the index.  I expect the objection that this would break valid use cases
where people know exactly what they are doing, but I believe that this
is a footgun for inexperienced users that should be disarmed.

I'd also opt for forbidding behaviour changing modifications with ALTER FUNCTION
for functions used in index definitions, specifically altering strictness.

Attached is a patch implementing a fix.

Yours,
Laurenz Albe


alter_index_function.patch
Description: alter_index_function.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql plan changes causing failure after repeated invocation

2014-11-11 Thread Albe Laurenz
Merlin Moncure wrote:
 I chased down a problem today where users were reporting sporadic
 failures in the application.   Turns out, the function would work
 exactly 5 times and then fail; this is on 9.2.  I think I understand
 why this is happening and I'm skeptical it's a bug in postgres, but I
 thought I'd socialize it.
 
 What's happening here is a query structured like this, somewhat deep
 into a pl/pgsql function:

[...]
   (_plpgsql_var = 'yyy' and q.data::int = foo.foo_id)
[...]

 What is happening, along with some triggers I don't completely
 understand (this problem started hitting when I made an unrelated
 change in the function) is that the cast (q.data::int) started to
 fail.  In cases where _plpgsql_var is not 'yyy', the cast was getting
 applied where previously it did not.
 
 The workaround was simple, insert a case statement so that q.data::int
 becomes CASE WHEN _plpgsql_var = 'yyy' THEN q.data::int ELSE NULL END.
 That being said, it does bring up some interesting points.
 
 *) relying on A being checked first in 'A OR B' is obviously not
 trustworthy, and it shouldn't be.  Generally I assume the planner will
 do the cheaper of the two first (along with some extra encouragement
 to put it on the left side), but this can't be relied upon.
 
 *) It's possible to write queries so that they will fail depending on
 plan choice. This is not good, and should be avoided when possible
 (the query isn't great I'll admit), but the interaction with execution
 count is a little unpleasant.

This must be the custom plan feature added in 9.2 switching over to
a generic plan after 5 executions.

But you are right, it is not nice.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Let's drop two obsolete features which are bear-traps for novices

2014-11-04 Thread Albe Laurenz
David Fetter wrote:
 On Tue, Nov 04, 2014 at 07:51:06AM +0900, Tatsuo Ishii wrote:
 Just out of curiosity, why is Oracle's NUMBER (I assume you are
 talking about this) so fast?
 
 I suspect that what happens is that NUMBER is stored as a native type
 (int2, int4, int8, int16) that depends on its size and then cast to
 the next upward thing as needed, taking any performance hits at that
 point.  The documentation hints (38 decimal places) at a 128-bit
 internal representation as the maximum.  I don't know what happens
 when you get past what 128 bits can represent.

No, Oracle stores NUMBERs as variable length field (up to 22 bytes),
where the first byte encodes the sign and the comma position and the
remaining bytes encode the digits, each byte representing two digits
in base-100 notation (see Oracle Metalink note 1007641.6).

So it's not so different from PostgreSQL.
No idea why their arithmetic should be faster.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgreSQL Service Name Enhancement - Wildcard support for LDAP/DNS lookup

2014-10-29 Thread Albe Laurenz
I have suggested a similar feature before and met with little enthusiasm:
http://www.postgresql.org/message-id/d960cb61b694cf459dcfb4b0128514c2f34...@exadv11.host.magwien.gv.at

I still think it would be a nice feature and would make pg_service.conf
more useful than it is now.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: multivariate statistics / proof of concept

2014-10-13 Thread Albe Laurenz
Tomas Vondra wrote:
 attached is a WIP patch implementing multivariate statistics.

I think that is pretty useful.
Oracle has an identical feature called extended statistics.

That's probably an entirely different thing, but it would be very
nice to have statistics to estimate the correlation between columns
of different tables, to improve the estimate for the number of rows
in a join.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-09-12 Thread Albe Laurenz
Tom Lane wrote:
 Stephen Frost sfr...@snowman.net writes:
 I have to admit that, while I applaud the effort made to have this
 change only be to postgres_fdw, I'm not sure that having the
 update/delete happening during the Scan phase and then essentially
 no-op'ing the ExecForeignUpdate/ExecForeignDelete is entirely in-line
 with the defined API.
 
 Yeah, I've started looking at this patch and that seemed like not
 necessarily such a wise choice.  I think it'd be better if the generated
 plan tree had a different structure in this case.  The existing approach
 is an impressive hack but it's hard to call it anything but a hack.

I guess that the idea is inspired by this comment on postgres_fdw.c:

 * Note: currently, the plan tree generated for UPDATE/DELETE will always
 * include a ForeignScan that retrieves ctids (using SELECT FOR UPDATE)
 * and then the ModifyTable node will have to execute individual remote
 * UPDATE/DELETE commands.  If there are no local conditions or joins
 * needed, it'd be better to let the scan node do UPDATE/DELETE RETURNING
 * and then do nothing at ModifyTable.  Room for future optimization ...

 I'm not sure offhand what the new plan tree ought to look like.  We could
 just generate a ForeignScan node, but that seems like rather a misnomer.
 Is it worth inventing a new ForeignUpdate node type?  Or maybe it'd still
 be ForeignScan but with a flag field saying hey this is really an update
 (or a delete).  The main benefit I can think of right now is that the
 EXPLAIN output would be less strange-looking --- but EXPLAIN is hardly
 the only thing that ever looks at plan trees, so having an outright
 misleading plan structure is likely to burn us down the line.

I can understand these qualms.
I wonder if ForeignUpdate is such a good name though, since it would
surprise the uninitiate that in the regular (no push-down) case the
actual modification is *not* performed by ForeignUpdate.
So it should rather be a ForeignModifyingScan, but I personally would
prefer a has_side_effects flag on ForeignScan.

 One advantage of getting the core code involved in the decision about
 whether an update/delete can be pushed down is that FDW-independent checks
 like whether there are relevant triggers could be implemented in the core
 code, rather than having to duplicate them (and later maintain them) in
 every FDW that wants to do this.  OTOH, maybe the trigger issue is really
 the only thing that could be shared, not sure.  Stuff like is there a
 LIMIT probably has to be in the FDW since some FDWs could support pushing
 down LIMIT and others not.

You are right, the gain would probably be limited.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-09-09 Thread Albe Laurenz
Etsuro Fujita wrote:
 I agree with you on that point.  So, I've updated the patch to have the
 explicit flag, as you proposed.  Attached is the updated version of the
 patch.  In this version, I've also revised code and its comments a bit.

Thank you, I have set the patch to Ready for Committer.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-09-08 Thread Albe Laurenz
I wrote:
 I gave it a spin and could not find any undesirable behaviour, and the
 output of EXPLAIN ANALYZE looks like I'd expect.
 
 I noticed that you use the list length of fdw_private to check if
 the UPDATE or DELETE is pushed down to the remote server or not.
 
 While this works fine, I wonder if it wouldn't be better to have some
 explicit flag in fdw_private for that purpose.  Future modifications that
 change the list length might easily overlook that it is used for this
 purpose, thereby breaking the code.
 
 Other than that it looks alright to me.

Maybe I should have mentioned that I have set the patch to Waiting for Author
because I'd like to hear your opinion on that, but I'm prepared to set it
to Ready for Committer soon.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-09-02 Thread Albe Laurenz
Etsuro Fujita wrote:
 Please find attached the updated version of the patch.

I gave it a spin and could not find any undesirable behaviour, and the
output of EXPLAIN ANALYZE looks like I'd expect.

I noticed that you use the list length of fdw_private to check if
the UPDATE or DELETE is pushed down to the remote server or not.

While this works fine, I wonder if it wouldn't be better to have some
explicit flag in fdw_private for that purpose.  Future modifications that
change the list length might easily overlook that it is used for this
purpose, thereby breaking the code.

Other than that it looks alright to me.

Yours,
Laurenz Albe 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-25 Thread Albe Laurenz
Etsuro Fujita wrote:
 Done.  (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
 though.)
 
 Other changes:
 
 * Address the comments from Eitoku-san.
 * Add regression tests.
 * Fix a bug, which fails to show the actual row counts in EXPLAIN
 ANALYZE for UPDATE/DELETE without a RETURNING clause.
 * Rebase to HEAD.
 
 Please find attached an updated version of the patch.

Here is my review:

The patch Applies fine, Builds without warning and passes make Check,
so the ABC of patch reviewing is fine.

I played with it, and apart from Hanada's comments I have found the following:

test= EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id  3;
QUERY PLAN
--
 Update on laurenz.rtest  (cost=100.00..14134.40 rows=299970 width=10) (actual 
time=0.005..0.005 rows=0 loops=1)
   -  Foreign Scan on laurenz.rtest  (cost=100.00..14134.40 rows=299970 
width=10) (actual time=0.002..0.002 rows=27 loops=1)
 Output: id, val, ctid
 Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id  3))
 Planning time: 0.179 ms
 Execution time: 3706.919 ms
(6 rows)

Time: 3708.272 ms

The actual time readings are surprising.
Shouldn't these similar to the actual execution time, since most of the time is 
spent
in the foreign scan node?

Reading the code, I noticed that the pushed down UPDATE or DELETE statement is 
executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the 
normal scan?

It is not expected that postgresReScanForeignScan is called when the 
UPDATE/DELETE
is pushed down, right?  Maybe it would make sense to add an assertion for that.

I ran a simple performance test and found that performance is improved as 
expected;
updating 10 rows took 1000 rather than 8000 ms, and DELETING the same amount
took 200 instead of 6500 ms.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to manage shared library lifetime through C functions

2014-08-04 Thread Albe Laurenz
Seref Arikan wrote:
 I hope this is the right group to ask this question; apologies if this should 
 go the general or some
 other list.
 
 
 I have multiple shared libraries that can be called from C that I'd like to 
 use from a C based
 postgresql function.
 
 These libraries perform some expensive initialization and they require the C 
 code to properly release
 resources when the library is no longer needed.
 
 
 This means that I need a mechanism to keep a session level pointer to a 
 library, initialize it when it
 is called first from a C based function and dispose the library properly when 
 the session ends (and
 terminated due to a problem) I would like to keep the libraries available as 
 long as the session is
 alive, so multiple calls are supposed to avoid initialization/disposal costs 
 every time.
 
 
 I could probably use a temp table as a container for the initalization and 
 even pointer values (sounds
 dirty) but I have no idea how to hook to session end to clean up when session 
 ends.
 
 
 What would be a good strategy here?

You could register a callback at process exit with on_proc_exit() from 
storage/ipc.h.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] How to manage shared library lifetime through C functions

2014-08-04 Thread Albe Laurenz
Craig Ringer wrote:
 On 08/04/2014 06:31 PM, Seref Arikan wrote:
 Thanks a lot Heikki and Albe. Exactly what I was asking for.
 Heikki: the libraries are written in languages that have their own
 runtime and their documentation insists that both init and dispose calls
 are performed when used from C. PG_init() and proc_exit sounds spot on.
 
 That's usually with reference to code that might then load them again,
 such as another library. Without specifics it's hard to say.
 
 If these tools depend on the C code terminating cleanly to operate
 correctly then they'll also break if the OS crashes, the system loses
 power, etc.
 
 Exiting a program without calling cleanup is in many ways just a
 particularly friendly kind of crash, and it's often actually much more
 efficient than doing cleanup. Why neatly tidy the floor when you're
 about to demolish the house? Total waste of time.

There are valid use cases (else the function probably wouldn't exist).
I use it in oracle_fdw to gracefully close any open Oracle connections when
the process exits.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: [GENERAL] pg_dump behaves differently for different archive formats

2014-07-28 Thread Albe Laurenz
Tom Lane wrote on Dec 16, 2013:
 Albe Laurenz laurenz.a...@wien.gv.at writes:
 Restoring a plain format dump and a custom format dump of
 the same database can lead to different results:

 pg_dump organizes the SQL statements it creates in TOC entries.
 If a custom format dump is restored with pg_restore, all
 SQL statements in a TOC entry will be executed as a single command
 and thus in a single transaction.

 Yeah, this is a bug I think.  pg_dump was designed around the idea
 that the output would be executed as a simple script, and in a
 number of places there's an expectation that one SQL statement
 can fail without affecting following ones.  So if pg_restore can't
 provide that behavior it's not good.
 
 On the other hand, I'm not sure how much enthusiasm there'd be for
 complex or fragile changes to fix this.  A lot of people invariably
 run restores in single-transaction mode and don't really care about
 fault-tolerant restores.  Also, it's easy enough to dodge the problem
 if you must: just pipe the output into psql rather than
 direct-to-database.
 
 So to me the question is can we fix this without doing something like
 duplicating psql's lexer?  If we have to parse out the statements
 contained in each text blob, it's probably going to be too painful.
 Some cautionary history about this sort of thing can be read at
 http://www.postgresql.org/message-id/flat/18006.1325700...@sss.pgh.pa.us

I thought that changing the dump format for this would be too
much trouble, so I came up with the attached.

It assumes that custom- or tar-format archives are written by pg_dump
and cannot contain arbitrary SQL statements, which allows me to get away
with very simple parsing.

If this is not shot down immediately on account of fragility, I'd
add it to the next commitfest page.

The problem has been a pain point for my co-workers in the past;
using single-transaction mode doesn't work for us, since we have custom objects
in our template database that cause expected errors when a dump is restored.

Yours,
Laurenz Albe


pg_restore.patch
Description: pg_restore.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-07-25 Thread Albe Laurenz
Shigeru Hanada wrote:
 * Naming of new behavior
 You named this optimization Direct Update, but I'm not sure that
 this is intuitive enough to express this behavior.  I would like to
 hear opinions of native speakers.

How about batch foreign update or batch foreign modification?
(Disclaimer: I'm not a native speaker either.)

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-07-01 Thread Albe Laurenz
Michael Paquier wrote:
 After sleeping on it, I have put my hands on the postgres_fdw portion and 
 came up with a largely
 simplified flow, resulting in the patch attached.

[...]

 Ronan, what do you think of those patches? I have nothing more to add, and I 
 think that they should be
 looked by a committer. Particularly the FDW API that is perhaps not the best 
 fit, but let's see some
 extra opinions about that.

I looked the the API and ist documentation, and while I saw no problem with the 
API,
I think that the documentation still needs some attention:

It mentions a local_schema, which doesn't exist (any more?).
It should be mentioned that the CreateForeignTableStmt's
base.relation-schemaname should be set to NULL.
Also, it would be nice to find a few words for options,
maybe explaining a potential application.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] IMPORT FOREIGN SCHEMA statement

2014-05-26 Thread Albe Laurenz
Ronan Dunklau wrote:
 Since my last proposal didn't get any strong rebuttal, please find attached a
 more complete version of the IMPORT FOREIGN SCHEMA statement.
 
 I tried to follow the SQL-MED specification as closely as possible.
 
 This adds discoverability to foreign servers. The structure of the statement
 as I understand it is simple enough:
 
 IMPORT FOREIGN SCHEMA remote_schema FROM SERVER some_server [ (LIMIT TO |
 EXCEPT) table_list ] INTO local_schema.
 
 The import_foreign_schema patch adds the infrastructure, and a new FDW
 routine:
 
 typedef List *(*ImportForeignSchema_function) (ForeignServer *server,
 ImportForeignSchemaStmt * parsetree);
 
 This routine must return a list of CreateForeignTableStmt mirroring whatever
 tables were found on the remote side, which will then be executed.

In addition to data type mapping questions (which David already raised)
I have one problem when I think of the Oracle FDW:

Oracle follows the SQL standard in folding table names to upper case.
So this would normally result in a lot of PostgreSQL foreign tables
with upper case names, which would be odd and unpleasant.

I cannot see a way out of that, but I thought I should mention it.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)

2014-05-02 Thread Albe Laurenz
Amit Langote wrote:
 Is the following behavior perceived fix-worthy?


 -- note the '1's in the outputs

 postgres=# CREATE TABLE test AS SELECT;
 SELECT 1

 postgres=# insert into test select;
 INSERT 0 1

 Or maybe, it just means 1 'null' row/record and not no row at all?

Right, I'd say you end up with a table with two 0-tuples.

Maybe odd, but it shouldn't be a problem.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4 Proposal: Initdb creates a single table

2014-04-23 Thread Albe Laurenz
Simon Riggs wrote:
 I propose we add a single table called Postgres when we Initdb
 
   CREATE TABLE Postgres (Id Integer, Data Jsonb);
   COMMENT ON TABLE Postgres IS 'Single table for quick start usage -
 design your database';
 
 The purpose of this is to make the database immediately usable. By
 including this table in the default  initdb it will mean that programs
 can rely on the existence of this table and begin working quicker.
 
 By now, some of you will be doubled over laughing as if this is an
 April fool joke.  I don't mean it to be at all.
 
 The idea is to have a stupidly obvious and easy table that will
 potentially be usable by just about everyone, in any language.

I am a PostgreSQL newbie.
How is this table useful for me?
I want to develop a database application.
I want to store personal data like name and birth date!

Actually, I feel confused.  What should I do with this table?
Is it part of the database system?  Will the database be broken if
I drop it?  Do I have to ship it with my application?

 If you don't like it, don't use it. If you really dislike it, drop it.

No, I'm not the kind of person who reads a manual to figure out what to
do with this table.  I want to start coding *right now*.

 But for new people coming to Postgres, they will have a data object to
 access and begin using the database immediately. Their code will work,
 their examples will work. OK, so they need to go back and think about
 the design, but at least they got it to work and will be encouraged to
 do more.

I have found a sample application for personal data on the internet.
How can I make it work with this table?

 Remember when we didn't have a database called Postgres? Remember how
 much simpler life is now? Remember that now.

Good that you mention that!  I have wondered what to do with it.
When I first connected to PostgreSQL, I created a sample table, but the
senior developer from the other office told me that this is the postgres
database and that I shouldn't create any objects there.

What is it good for?  Can I delete it?

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   >