date:20140403

On 2014-04-03 00:48:12 -0400, Greg Stark wrote:
 On Wed, Apr 2, 2014 at 4:16 PM, Andres Freund and...@2ndquadrant.comwrote:
  PS: Could you please start to properly quote again? You seem to have
  stopped doing that entirely in the last few months.

 I've been responding a lot from the phone. Unfortunately the Gmail client
 on the phone makes it nearly impossible to format messages well. I'm
 beginning to think it would be better to just not quote at all any more.
 I'm normally not doing a point-by-point response anyways.

I really don't care where you're answering from TBH. It's unreadable,
misses context and that's it. If $device doesn't work for you, don't use
it.
I don't mind an occasional quick answer that's badly formatted, but for
other things it's really annoying.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged


On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?


Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.


Ideally, you would *only* write the data to WAL, when you do ALTER TABLE 
... SET LOGGED. There's no fundamental reason you need to rewrite the 
heap, too. I understand that it might be difficult to do, because of the 
way the system catalogs work, but it's worthy goal.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the heap, too.
 I understand that it might be difficult to do, because of the way the system
 catalogs work, but it's worthy goal.

I don't think that's realistic to achieve due to the issues described in
http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com

I don't think it's worthwile to make the feature much more complex, just
to address this. perfect is the enemy of good and all that.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] quiet inline configure check misses a step for clang

Hi,

The current quiet inline test doesn't work for clang. As e.g. evidenced in
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=gulldt=2014-04-03%2007%3A49%3A26stg=configure
configure thinks it's not quiet.

Which means that postgres compiled with a recent clang will be noticably
slower than it needs to be.

The reason for that is that clang is smart and warns about static inline
if they are declared locally in the .c file, but not if they are
declared in a #included file.  That seems to be a reasonable
behaviour...

I think that needs to be fixed. We either can make the configure test
considerably more complex or simply drop the requirement for quiet
inline.

Comments?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 04/03/2014 01:44 PM, Andres Freund wrote:

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:

On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.

Maybe I'm missing something, but doesn't this actually involve writing the data
twice? Once into WAL and again into the relation itself?

Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.

Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
SET LOGGED. There's no fundamental reason you need to rewrite the heap, too.
I understand that it might be difficult to do, because of the way the system
catalogs work, but it's worthy goal.

I don't think that's realistic to achieve due to the issues described in
http://archives.postgresql.org/message-id/CA%2BTgmob44LNwwU73N1aJsGQyzQ61SdhKJRC_89wCm0%2BaLg%3Dx2Q%40mail.gmail.com

To which I replied here:
http://www.postgresql.org/message-id/533af9d7.7010...@vmware.com. Please
reply to that sub-thread with any problems you see. I might be missing
something, but I really don't see any insurmountable problem here.

I don't think it's worthwile to make the feature much more complex, just
to address this. perfect is the enemy of good and all that.

We should do the trivial implementation first, sure. But that ought to
be trivial. Now is the time to discuss how to do the more optimal thing.
If we can come up with a feasible design on that, Fabrizio will have
time to do that as part of the GSoC.

- Heikki

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-01 20:39:35 +0300, Heikki Linnakangas wrote:
 On 03/07/2014 05:36 AM, Tom Lane wrote:
 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.
 
 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.
 
 You just have to include that information with the commit WAL record, no?

Sure, it's possible to do that. But that seems like complicating generic
paths more than I'd like for a minor feature. Especially as the
unlinking of the files would need to happen somewhere in
RecordTransactionCommit(). After the XLogFlush(), but before unsetting
MyPgXact-delayChkpt. That's a crit section, right?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged


On 04/01/2014 08:39 PM, Heikki Linnakangas wrote:

On 03/07/2014 05:36 AM, Tom Lane wrote:

=?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:

Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
Thinking in a scope of one GSoC, of course.


I think it's basically the same thing.  You might hope to optimize it;
but you have to create (rather than remove) an init fork, and there's
no way to do that in exact sync with the commit.


You just have to include that information with the commit WAL record, no?


No-one's replied yet, but perhaps the worry is that after you've written 
the commit record, you have to go ahead with removing/creating the init 
fork, and that is seen as too risky. If a creat() or unlink() call 
fails, that will have to be a PANIC, and crash recovery will likewise 
have to PANIC if the forks still cannot be removed/created.


My first thought is that that seems ok. It's unlikely that an unlink() 
of a small file in the data directory would fail. Creation could be done 
with a temporary name first and renamed into place, to avoid running out 
of disk space in the critical section.


If that's not acceptable, one idea off the top of my head is to somehow 
stamp the init forks when making an unlogged table logged, with the XID 
of the transcation. Crash recovery could then check the clog to see if 
the transaction committed, and ignore any init fork files belonging to 
committed transactions. (Same in reverse when making a logged table 
unlogged).


Currently, we reset unlogged relations before replaying the WAL. That 
would have to be delayed until end of WAL replay, because otherwise we 
don't know if the transaction committed or not. Although if we go with 
the stamping approach, we could still reset unstamped files at the 
beginning of recovery.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 14:26:50 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:39 PM, Heikki Linnakangas wrote:
 On 03/07/2014 05:36 AM, Tom Lane wrote:
 =?ISO-8859-1?Q?Fabr=EDzio_de_Royes_Mello?= fabriziome...@gmail.com writes:
 Do you think is difficult to implement ALTER TABLE ... SET UNLOGGED too?
 Thinking in a scope of one GSoC, of course.
 
 I think it's basically the same thing.  You might hope to optimize it;
 but you have to create (rather than remove) an init fork, and there's
 no way to do that in exact sync with the commit.
 
 You just have to include that information with the commit WAL record, no?
 
 No-one's replied yet

That might be because it was a month after the initial discussion, and
at least I'd temporarily lost track of the thread ;)

 , but perhaps the worry is that after you've written the
 commit record, you have to go ahead with removing/creating the init fork,
 and that is seen as too risky. If a creat() or unlink() call fails, that
 will have to be a PANIC, and crash recovery will likewise have to PANIC if
 the forks still cannot be removed/created.

That's part of the worry, yes. It's also creeping code dealing with
unlogged relations into a fairly critical place
(RecordTransactionCommit()) where it really doesn't seem to belong.

 My first thought is that that seems ok. It's unlikely that an unlink() of a
 small file in the data directory would fail. Creation could be done with a
 temporary name first and renamed into place, to avoid running out of disk
 space in the critical section.

I continue to feel that that's far too much impact for a minor
feature. Even if it could be made work reliably, it'll be a fair amount
of seldomly used infrastructure.

 If that's not acceptable, one idea off the top of my head is to somehow
 stamp the init forks when making an unlogged table logged, with the XID of
 the transcation. Crash recovery could then check the clog to see if the
 transaction committed, and ignore any init fork files belonging to committed
 transactions. (Same in reverse when making a logged table unlogged).

I've thought about that - after all, the logical decoding stuff uses
that trick in some places - but it has the grave disadvantage that it
requires a full directory scan to fully remove a relation. That seems to
be a heavy price.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing the 
 data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the
 heap, too.

As another point: What's the advantage of that? The amount of writes
will be the same, no? It doesn't seem to be all that interesting that
a second filenode exists temporarily?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json(b) equality rules

2014-04-03 Thread Andrew Dunstan



On 04/03/2014 05:02 AM, Oleg Bartunov wrote:

Well, we don't supported Infinity and NaN in json(b), as well as Json
standard :)
Now we need a script, which generated nice html table.


(Oleg, please try not to top-post.)

I don't think we should follow these rules at all. Json is not 
Javascript, and these are Javascript rules, not Json rules. I'm entirely 
opposed to treating 0, 0, false, and [0] as equal. The equality rule 
we actually have for jsonb is the correct one, I believe.


cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged


On 04/03/2014 02:41 PM, Andres Freund wrote:

On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:

On 04/01/2014 08:58 PM, Andres Freund wrote:

On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:

On 3/4/14, 8:50 AM, Andres Freund wrote:

Can't that be solved by just creating the permanent relation in a new
relfilenode? That's equivalent to a rewrite, yes, but we need to do that
for anything but wal_level=minimal anyway.


Maybe I'm missing something, but doesn't this actually involve writing the data 
twice? Once into WAL and again into the relation itself?


Yes. But as I said, that's unavoidable for anything but
wal_level=minimal.


Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
SET LOGGED. There's no fundamental reason you need to rewrite the
heap, too.


As another point: What's the advantage of that? The amount of writes
will be the same, no? It doesn't seem to be all that interesting that
a second filenode exists temporarily?


Surely it's cheaper to read the whole relation and copy it to just WAL, 
than to read the whole relation and write it both the WAL and another file.


(Maybe it's not worth the trouble to avoid it - but that depends on 
whether we come up with a good design..)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json(b) equality rules

2014-04-03 Thread Hannu Krosing

On 04/03/2014 04:32 AM, Oleg Bartunov wrote:
 Hi there,

 I'm wondering if we should follow all js equility rules as
 nicely visualized in
 http://strilanc.com/visualization/2014/03/27/Better-JS-Equality-Table.html
Probably not as JSON is general interchange format.

If somebody wants JavaScript rules, they can use pl/v8

Any equality operations specific for JSON should be related
to array and object/dictionary equality and not data
store inside JSON


Cheers
Hannu


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

On 2014-04-03 15:02:27 +0300, Heikki Linnakangas wrote:
 On 04/03/2014 02:41 PM, Andres Freund wrote:
 On 2014-04-03 13:38:29 +0300, Heikki Linnakangas wrote:
 On 04/01/2014 08:58 PM, Andres Freund wrote:
 On 2014-04-01 12:56:04 -0500, Jim Nasby wrote:
 On 3/4/14, 8:50 AM, Andres Freund wrote:
 Can't that be solved by just creating the permanent relation in a new
 relfilenode? That's equivalent to a rewrite, yes, but we need to do that
 for anything but wal_level=minimal anyway.
 
 Maybe I'm missing something, but doesn't this actually involve writing 
 the data twice? Once into WAL and again into the relation itself?
 
 Yes. But as I said, that's unavoidable for anything but
 wal_level=minimal.
 
 Ideally, you would *only* write the data to WAL, when you do ALTER TABLE ...
 SET LOGGED. There's no fundamental reason you need to rewrite the
 heap, too.
 
 As another point: What's the advantage of that? The amount of writes
 will be the same, no? It doesn't seem to be all that interesting that
 a second filenode exists temporarily?
 
 Surely it's cheaper to read the whole relation and copy it to just WAL, than
 to read the whole relation and write it both the WAL and another file.

I have to admit I was thinking of the WAL replay case ;). But we'll
actually have to write all dirty s_b, change the persistency tags and
such anyway because there's no LSN interlock with checkpoints. That
seems pretty ugly as well, and once again, avoidable by a rewrite.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2014 proposal

2014-04-03 Thread Alexander Korotkov

On Wed, Apr 2, 2014 at 2:22 PM, Alexander Korotkov aekorot...@gmail.comwrote:

 On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

 The BIRCH algorithm as described in the paper describes building a tree
 in memory. If I understood correctly, you're suggesting to use a pre-built
 GiST index instead. Interesting idea!

 There are a couple of signifcant differences between the CF tree
 described in the paper and GiST:

 1. In GiST, a leaf item always represents one heap tuple. In the CF tree,
 a leaf item represents a cluster, which consists of one or more tuples. So
 the CF tree doesn't store an entry for every input tuple, which makes it
 possible to keep it in memory.

 2. In the CF tree, all entries in a leaf node must satisfy a threshold
 requirement, with respect to a threshold value T: the diameter (or radius)
 has to be less than T. GiST imposes no such restrictions. An item can
 legally be placed anywhere in the tree; placing it badly will just lead to
 degraded search performance, but it's still a legal GiST tree.

 3. A GiST index, like any other index in PostgreSQL, holds entries also
 for deleted tuples, until the index is vacuumed. So you cannot just use
 information from a non-leaf node and use it in the result, as the
 information summarized at a non-leaf level includes noise from the dead
 tuples.

 Can you elaborate how you are planning to use a GiST index to implement
 BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
 strict in where in the tree an item can be stored, and lets the operator
 class to specify exactly when a node is split etc.


 Hmmm, it's likely I've imagined something quite outside of this paper, and
 even already suggested it to Ivan... :)
 I need a little time to rethink it.


Using GiST we can implement BIRCH-like clustering like so:
1) Build a CF tree as GiST index without restriction of T threshold value.
2) Scan CF tree with threshold T with some auxiliary operator. If
consistent method see CF entry which diameter is greater than T then it
returns true. Otherwise it returns false and put this CF entry into output
area (could be either in-memory or temporary table).
3) Process other steps of algorithm as usual.

This modification would have following advantages:
1) User can build GiST index once and then try clustering with different
parameters. Initial GiST index build would be slowest operation while other
steps is expected to be fast.
2) Use GiST infrastructure and automatically get buffering build.

The drawback is that building GiST index is more expensive than building
in-memory CF tree with given threshold T (assuming T is well chosen).

Does it make any sense?

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] quiet inline configure check misses a step for clang

Andres Freund and...@2ndquadrant.com writes:
 The current quiet inline test doesn't work for clang. As e.g. evidenced in
 http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=gulldt=2014-04-03%2007%3A49%3A26stg=configure
 configure thinks it's not quiet.

 Which means that postgres compiled with a recent clang will be noticably
 slower than it needs to be.

 The reason for that is that clang is smart and warns about static inline
 if they are declared locally in the .c file, but not if they are
 declared in a #included file.  That seems to be a reasonable
 behaviour...

 I think that needs to be fixed. We either can make the configure test
 considerably more complex or simply drop the requirement for quiet
 inline.

I object to the latter; you're proposing to greatly increase the warning
noise seen with any compiler that issues a warning for this without caring
about .h vs .c.  For somebody who finds gcc -pedantic unusable, I would
think you'd have a bit more sympathy for people using other compilers.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] quiet inline configure check misses a step for clang

On 2014-04-03 09:43:20 -0400, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
The current quiet inline test doesn't work for clang. As e.g. evidenced in
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=gulldt=2014-04-03%2007%3A49%3A26stg=configure
configure thinks it's not quiet.

Which means that postgres compiled with a recent clang will be noticably
slower than it needs to be.

The reason for that is that clang is smart and warns about static inline
if they are declared locally in the .c file, but not if they are
declared in a #included file. That seems to be a reasonable
behaviour...

I think that needs to be fixed. We either can make the configure test
considerably more complex or simply drop the requirement for quiet
inline.

I object to the latter; you're proposing to greatly increase the warning
noise seen with any compiler that issues a warning for this without caring
about .h vs .c. For somebody who finds gcc -pedantic unusable, I would
think you'd have a bit more sympathy for people using other compilers.

Yea, but which compilers are that? The only one in the buildfarm I could
find a couple weeks back was acc, and there's a flag we could add to the
relevant template that silences it. I also don't think that very old
platforms won't usually be used for active development, so a louder
build there doesn't really have the same impact as noisy builds for
actively developed on platforms.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC proposal - make an unlogged table logged

Heikki Linnakangas hlinnakan...@vmware.com writes:
 No-one's replied yet, but perhaps the worry is that after you've written 
 the commit record, you have to go ahead with removing/creating the init 
 fork, and that is seen as too risky. If a creat() or unlink() call 
 fails, that will have to be a PANIC, and crash recovery will likewise 
 have to PANIC if the forks still cannot be removed/created.

 My first thought is that that seems ok.

No, it isn't.  No filesystem operation should *ever* be thought to be
guaranteed to succeed.

I also concur with Andres' complaint that this feature is not worth
adding complication to the core transaction commit path for.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] WAL format and API changes (9.5)

I'd like to do some changes to the WAL format in 9.5. I want to annotate 
each WAL record with the blocks that they modify. Every WAL record 
already includes that information, but it's done in an ad hoc way, 
differently in every rmgr. The RelFileNode and block number are 
currently part of the WAL payload, and it's the REDO routine's 
responsibility to extract it. I want to include that information in a 
common format for every WAL record type.


That makes life a lot easier for tools that are interested in knowing 
which blocks a WAL record modifies. One such tool is pg_rewind; it 
currently has to understand every WAL record the backend writes. There's 
also a tool out there called pg_readahead, which does prefetching of 
blocks accessed by WAL records, to speed up PITR. I don't think that 
tool has been actively maintained, but at least part of the reason for 
that is probably that it's a pain to maintain when it has to understand 
the details of every WAL record type.


It'd also be nice for contrib/pg_xlogdump and backend code itself. The 
boilerplate code in all WAL redo routines, and writing WAL records, 
could be simplified.


So, here's my proposal:

Insertion
-

The big change in creating WAL records is that the buffers involved in 
the WAL-logged operation are explicitly registered, by calling a new 
XLogRegisterBuffer function. Currently, buffers that need full-page 
images are registered by including them in the XLogRecData chain, but 
with the new system, you call the XLogRegisterBuffer() function instead. 
And you call that function for every buffer involved, even if no 
full-page image needs to be taken, e.g because the page is going to be 
recreated from scratch at replay.


It is no longer necessary to include the RelFileNode and BlockNumber of 
the modified pages in the WAL payload. That information is automatically 
included in the WAL record, when XLogRegisterBuffer is called.


Currently, the backup blocks are implicitly numbered, in the order the 
buffers appear in XLogRecData entries. With the new API, the blocks are 
numbered explicitly. This is more convenient when a WAL record sometimes 
modifies a buffer and sometimes not. For example, a B-tree split needs 
to modify four pages: the original page, the new page, the right sibling 
(unless it's the rightmost page) and if it's an internal page, the page 
at the lower level whose split the insertion completes. So there are two 
pages that are sometimes missing from the record. With the new API, you 
can nevertheless always register e.g. original page as buffer 0, new 
page as 1, right sibling as 2, even if some of them are actually 
missing. SP-GiST contains even more complicated examples of that.


The new XLogRegisterBuffer would look like this:

void XLogRegisterBuffer(int blockref_id, Buffer buffer, bool buffer_std)

blockref_id: An arbitrary ID given to this block reference. It is used 
in the redo routine to open/restore the same block.

buffer: the buffer involved
buffer_std: is the page in standard page layout?

That's for the normal cases. We'll need a couple of variants for also 
registering buffers that don't need full-page images, and perhaps also a 
function for registering a page that *always* needs a full-page image, 
regardless of the LSN. A few existing WAL record types just WAL-log the 
whole page, so those ad-hoc full-page images could be replaced with this.


With these changes, a typical WAL insertion would look like this:

/* register the buffer with the WAL record, with ID 0 */
XLogRegisterBuffer(0, buf, true);

rdata[0].data = (char *) xlrec;
rdata[0].len = sizeof(BlahRecord);
rdata[0].buffer_id = -1; /* -1 means the data is always included */
rdata[0].next = (rdata[1]);

rdata[1].data = (char *) mydata;
rdata[1].len = mydatalen;
rdata[1].buffer_id = 0; /* 0 here refers to the buffer registered above 
*/
rdata[1].next = NULL

...
recptr = XLogInsert(RM_BLAH_ID, xlinfo, rdata);

PageSetLSN(buf, recptr);


(While we're at it, perhaps we should let XLogInsert set the LSN of all 
the registered buffers, to reduce the amount of boilerplate code).


(Instead of using a new XLogRegisterBuffer() function to register the 
buffers, perhaps they should be passed to XLogInsert as a separate list 
or array. I'm not wedded on the details...)


Redo


There are four different states a block referenced by a typical WAL 
record can be in:


1. The old page does not exist at all (because the relation was 
truncated later)
2. The old page exists, but has an LSN higher than current WAL record, 
so it doesn't need replaying.

3. The LSN is  current WAL record, so it needs to be replayed.
4. The WAL record contains a full-page image, which needs to be restored.

With the current API, that leads to a long boilerplate:

/* If we have a full-page image, restore it and we're done */
if

Re: [HACKERS] quiet inline configure check misses a step for clang

Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-03 09:43:20 -0400, Tom Lane wrote:
 I object to the latter; you're proposing to greatly increase the warning
 noise seen with any compiler that issues a warning for this without caring
 about .h vs .c.  For somebody who finds gcc -pedantic unusable, I would
 think you'd have a bit more sympathy for people using other compilers.

 Yea, but which compilers are that? The only one in the buildfarm I could
 find a couple weeks back was acc, and there's a flag we could add to the
 relevant template that silences it. I also don't think that very old
 platforms won't usually be used for active development, so a louder
 build there doesn't really have the same impact as noisy builds for
 actively developed on platforms.

Didn't we already have this discussion last year?  The main points
are all mentioned in

http://www.postgresql.org/message-id/ca+tgmoyjnc4b+8y01grnal52gtpbzc3zsc4sdnw4lgxhqt3...@mail.gmail.com

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] quiet inline configure check misses a step for clang

On 2014-04-03 09:43:20 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  The current quiet inline test doesn't work for clang. As e.g. evidenced in
  http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=gulldt=2014-04-03%2007%3A49%3A26stg=configure
  configure thinks it's not quiet.
 
  Which means that postgres compiled with a recent clang will be noticably
  slower than it needs to be.
 
  The reason for that is that clang is smart and warns about static inline
  if they are declared locally in the .c file, but not if they are
  declared in a #included file.  That seems to be a reasonable
  behaviour...
 
  I think that needs to be fixed. We either can make the configure test
  considerably more complex or simply drop the requirement for quiet
  inline.
 
 I object to the latter;

Do you have an idea how to make the former work? The whole autoconf
getup doesn't really seem to support generating two files for a
test. I've looked a bit around and it seems like it'd be a fair amount
of effort to do it solely in autoconf.
The easiest seems to be to just have a header for the test in the
sourcetree, but that seems fairly ugly...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] quiet inline configure check misses a step for clang

On 2014-04-03 10:15:33 -0400, Tom Lane wrote:
Andres Freund and...@2ndquadrant.com writes:
On 2014-04-03 09:43:20 -0400, Tom Lane wrote:
I object to the latter; you're proposing to greatly increase the warning
noise seen with any compiler that issues a warning for this without caring
about .h vs .c. For somebody who finds gcc -pedantic unusable, I would
think you'd have a bit more sympathy for people using other compilers.

Didn't we already have this discussion last year? The main points
are all mentioned in

http://www.postgresql.org/message-id/ca+tgmoyjnc4b+8y01grnal52gtpbzc3zsc4sdnw4lgxhqt3...@mail.gmail.com

To which I replied:
http://www.postgresql.org/message-id/20131224141631.gf26...@alap2.anarazel.de

It really seems like an odd behaviour if a compiler behaved that
way. But even if some decade+ old compiler gets this wrong: I am not
going to shed many tears. We're talking about HP-UX's ac++ here. If
binaries get a bit more bloated there...

I really am not trying to win the inline fight here through the
backdoor, I only want to make clang use inlines again. As just written
in the other message, I just don't see any easy and nice way to write
the autoconf test more robustly.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training Services

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] json(b) equality rules

On Thu, Apr 3, 2014 at 7:43 AM, Andrew Dunstan and...@dunslane.net wrote:
 I don't think we should follow these rules at all. Json is not Javascript,
 and these are Javascript rules, not Json rules. I'm entirely opposed to
 treating 0, 0, false, and [0] as equal. The equality rule we actually have
 for jsonb is the correct one, I believe.

+1


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] jsonb is also breaking the rule against nameless unions

On 2014-04-02 23:50:19 +0200, Andres Freund wrote:
   I just tried it on clang. It builds clean with -Wc11-extensions except
   warning about _Static_assert(). That's possibly fixable with some
   autoconf trickery.
  
  Ah.  That sounds promising.  What clang version is that?
 
 It's debian's clang-3.5, which is built from trunk IIUC: 1:3.5~svn201651-1
 
 I have some patches that fix the configure tests to properly use
 -Werror, but I am too tired to check their validity now.

So, three patches attached:
1) fix a couple of warnings clang outputs in -pedantic. All of them
   somewhat valid and not too ugly to fix.
2) Use -Werror in a couple more configure checks. That allows to compile
   pg using both -Wc11-extensions and -Wc99-extensions in a halfway sane
   fashion. I think this is sane, but I'd welcome comments.
3) Fix C89 compliance issue in pgbench, plus minor cleanups. I don't
   like the fix here, but I don't really have a better idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 3219b14205f33e4e6c3682cf9ca795159d59e611 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Thu, 3 Apr 2014 12:51:40 +0200
Subject: [PATCH 1/3] Fix a bunch of somewhat pedantic compiler warnings.

In C89 it's not allowed to have a trailing comma after the last
enumerator value, compound literals have to be compile time constant
and strictly speaking a void * pointer isn't a function pointer...
---
 src/backend/access/transam/xlog.c  |  2 +-
 src/bin/pg_dump/parallel.c |  5 -
 src/bin/psql/tab-complete.c| 10 +-
 src/include/catalog/objectaccess.h |  2 +-
 src/include/pgstat.h   |  2 +-
 src/include/utils/jsonapi.h|  2 +-
 6 files changed, 13 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f9d6609..5096999 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -651,7 +651,7 @@ typedef enum
 	XLOG_FROM_ANY = 0,			/* request to read WAL from any source */
 	XLOG_FROM_ARCHIVE,			/* restored using restore_command */
 	XLOG_FROM_PG_XLOG,			/* existing file in pg_xlog */
-	XLOG_FROM_STREAM,			/* streamed from master */
+	XLOG_FROM_STREAM			/* streamed from master */
 } XLogSource;
 
 /* human-readable names for XLogSources, for debugging output */
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index 6f2634b..7d6e146 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -558,7 +558,10 @@ ParallelBackupStart(ArchiveHandle *AH, RestoreOptions *ropt)
 		{
 			/* we are the worker */
 			int			j;
-			int			pipefd[2] = {pipeMW[PIPE_READ], pipeWM[PIPE_WRITE]};
+			int			pipefd[2];
+
+			pipefd[0] = pipeMW[PIPE_READ];
+			pipefd[1] = pipeWM[PIPE_WRITE];
 
 			/*
 			 * Store the fds for the reverse communication in pstate. Actually
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 1d69b95..202ffce 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -781,7 +781,7 @@ static const pgsql_thing_t words_after_create[] = {
 
 
 /* Forward declaration of functions */
-static char **psql_completion(char *text, int start, int end);
+static char **psql_completion(const char *text, int start, int end);
 static char *create_command_generator(const char *text, int state);
 static char *drop_command_generator(const char *text, int state);
 static char *complete_from_query(const char *text, int state);
@@ -790,7 +790,7 @@ static char *_complete_from_query(int is_schema_query,
 	 const char *text, int state);
 static char *complete_from_list(const char *text, int state);
 static char *complete_from_const(const char *text, int state);
-static char **complete_from_variables(char *text,
+static char **complete_from_variables(const char *text,
 		const char *prefix, const char *suffix);
 static char *complete_from_files(const char *text, int state);
 
@@ -812,7 +812,7 @@ void
 initialize_readline(void)
 {
 	rl_readline_name = (char *) pset.progname;
-	rl_attempted_completion_function = (void *) psql_completion;
+	rl_attempted_completion_function = psql_completion;
 
 	rl_basic_word_break_characters = WORD_BREAKS;
 
@@ -834,7 +834,7 @@ initialize_readline(void)
  * completion_matches() function, so we don't have to worry about it.
  */
 static char **
-psql_completion(char *text, int start, int end)
+psql_completion(const char *text, int start, int end)
 {
 	/* This is the variable we'll return. */
 	char	  **matches = NULL;
@@ -3847,7 +3847,7 @@ complete_from_const(const char *text, int state)
  * to support quoting usages.
  */
 static char **
-complete_from_variables(char *text, const char *prefix, const char *suffix)
+complete_from_variables(const char *text, const char *prefix, const char *suffix)
 {
 	char	  **matches;
 	char	  **varnames;
diff --git

Re: [HACKERS] WAL format and API changes (9.5)

Heikki Linnakangas hlinnakan...@vmware.com writes:
 The big change in creating WAL records is that the buffers involved in 
 the WAL-logged operation are explicitly registered, by calling a new 
 XLogRegisterBuffer function.

Seems reasonable, especially if we can make the buffer numbering business
less error-prone.

 void XLogRegisterBuffer(int blockref_id, Buffer buffer, bool buffer_std)

 blockref_id: An arbitrary ID given to this block reference. It is used 
 in the redo routine to open/restore the same block.
 buffer: the buffer involved
 buffer_std: is the page in standard page layout?

 That's for the normal cases. We'll need a couple of variants for also 
 registering buffers that don't need full-page images, and perhaps also a 
 function for registering a page that *always* needs a full-page image, 
 regardless of the LSN. A few existing WAL record types just WAL-log the 
 whole page, so those ad-hoc full-page images could be replaced with this.

Why not just one function with an additional flags argument?

Also, IIRC there are places that WAL-log full pages that aren't in a
shared buffer at all (btree build does this I think).  How will that fit
into this model?

 (While we're at it, perhaps we should let XLogInsert set the LSN of all 
 the registered buffers, to reduce the amount of boilerplate code).

Yeah, maybe so.  I'm not sure why that was separate to begin with.

 Let's simplify that, and have one new function, XLogOpenBuffer, which 
 returns a return code that indicates which of the four cases we're 
 dealing with. A typical redo function looks like this:

   if (XLogOpenBuffer(0, buffer) == BLK_REPLAY)
   {
   /* Modify the page */
   ...

   PageSetLSN(page, lsn);
   MarkBufferDirty(buffer);
   }
   if (BufferIsValid(buffer))
   UnlockReleaseBuffer(buffer);

 The '0' in the XLogOpenBuffer call is the ID of the block reference 
 specified in the XLogRegisterBuffer call, when the WAL record was created.

+1, but one important step here is finding the data to be replayed.
That is, a large part of the complexity of replay routines has to do
with figuring out which parts of the WAL record were elided due to
full-page-images, and locating the remaining parts.  What can we do
to make that simpler?

Ideally, if XLogOpenBuffer (bad name BTW) returns BLK_REPLAY, it would
also calculate and hand back the address/size of the logged data that
had been pointed to by the associated XLogRecData chain item.  The
trouble here is that there might've been multiple XLogRecData items
pointing to the same buffer.  Perhaps the magic ID number you give to
XLogOpenBuffer should be thought of as identifying an XLogRecData chain
item, not so much a buffer?  It's fairly easy to see what to do when
there's just one chain item per buffer, but I'm not sure what to do
if there's more than one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] It seems no Windows buildfarm members are running find_typedefs

2014-04-03 Thread Andrew Dunstan



On 04/03/2014 12:01 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

For OSX we'd construct the list via File::Find to recurse through the
directories.
So, something like this:

Thanks for the tips.  The attached patch against buildfarm 4.11 seems
to work, cf
http://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=dromedarydt=2014-04-03%2003%3A33%3A16stg=typedefs
I tried it on current OSX (10.9.2) as well as dromedary's 10.6.8.
It does not work on prairiedog (10.4.11) --- the debug info format
seems to be different that far back.  Can't say that's worth worrying
about.





Thanks, applied.


BTW, after looking a bit more closely at what this added to the typedefs
list, I realize that this mechanism also collects typedefs that appear in
files that are compiled but never installed, for example src/timezone/zic
and the ecpg regression tests.  This seems like a Good Thing, since
certainly pgindent is going to hit the source files for those programs
too.  I wonder if we ought to switch over to scanning the .o files on
all platforms?


Sure, if someone wants to work out what's involved. Do the MSVC tools 
have some gadget for extracting symbols from .o files in a similar way? 
It would be nice to get them into the mix too.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)


On 04/03/2014 06:37 PM, Tom Lane wrote:

Also, IIRC there are places that WAL-log full pages that aren't in a
shared buffer at all (btree build does this I think).  How will that fit
into this model?


Hmm. We could provide a function for registering a block with given 
content, without a Buffer. Something like:


XLogRegisterPage(int id, RelFileNode, BlockNumber, Page)


Let's simplify that, and have one new function, XLogOpenBuffer, which
returns a return code that indicates which of the four cases we're
dealing with. A typical redo function looks like this:



if (XLogOpenBuffer(0, buffer) == BLK_REPLAY)
{
/* Modify the page */
...



PageSetLSN(page, lsn);
MarkBufferDirty(buffer);
}
if (BufferIsValid(buffer))
UnlockReleaseBuffer(buffer);



The '0' in the XLogOpenBuffer call is the ID of the block reference
specified in the XLogRegisterBuffer call, when the WAL record was created.


+1, but one important step here is finding the data to be replayed.
That is, a large part of the complexity of replay routines has to do
with figuring out which parts of the WAL record were elided due to
full-page-images, and locating the remaining parts.  What can we do
to make that simpler?


We can certainly add more structure to the WAL records, but any extra 
information you add will make the records larger. It might be worth it, 
and would be lost in the noise for more complex records like page 
splits, but we should keep frequently-used records like heap insertions 
as lean as possible.



Ideally, if XLogOpenBuffer (bad name BTW) returns BLK_REPLAY, it would
also calculate and hand back the address/size of the logged data that
had been pointed to by the associated XLogRecData chain item.  The
trouble here is that there might've been multiple XLogRecData items
pointing to the same buffer.  Perhaps the magic ID number you give to
XLogOpenBuffer should be thought of as identifying an XLogRecData chain
item, not so much a buffer?  It's fairly easy to see what to do when
there's just one chain item per buffer, but I'm not sure what to do
if there's more than one.


Hmm. You could register a separate XLogRecData chain for each buffer. 
Along the lines of:


rdata[0].data = data for buffer
rdata[0].len = ...
rdata[0].next = rdata[1];
rdata[1].data = more data for same buffer
rdata[1].len = ...
rdata[2].next = NULL;

XLogRegisterBuffer(0, buffer, data[0]);

At replay:

if (XLogOpenBuffer(0, buffer, xldata, len) == BLK_REPLAY)
{
/* xldata points to the data registered for this buffer */
}

Plus one more chain for the data not associated with a buffer.

- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 04/03/2014 06:37 PM, Tom Lane wrote:
 +1, but one important step here is finding the data to be replayed.
 That is, a large part of the complexity of replay routines has to do
 with figuring out which parts of the WAL record were elided due to
 full-page-images, and locating the remaining parts.  What can we do
 to make that simpler?

 We can certainly add more structure to the WAL records, but any extra 
 information you add will make the records larger. It might be worth it, 
 and would be lost in the noise for more complex records like page 
 splits, but we should keep frequently-used records like heap insertions 
 as lean as possible.

Sure, but in simple WAL records there would just be a single data chunk
that would be present in the normal case and not present in the FPI case.
Seems like we ought to be able to handle that degenerate case with no
significant wastage (probably just a flag bit someplace).

More generally, I'm pretty sure that your proposal is already going to
involve some small growth of WAL records compared to today, but I think
that's probably all right; the benefits seem significant.

 Hmm. You could register a separate XLogRecData chain for each buffer. 
 Plus one more chain for the data not associated with a buffer.

That would probably work.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)


On 04/03/2014 07:11 PM, Tom Lane wrote:


More generally, I'm pretty sure that your proposal is already going to
involve some small growth of WAL records compared to today,


Quite possible.


but I think
that's probably all right; the benefits seem significant.


Yep.

OTOH, once we store the relfilenode+block in a common format, we can 
then try to optimize that format more heavily. Just as an example, omit 
the tablespace oid in the RelFileNode, when it's the default tablespace 
(with a flag bit indicating we did that). Or use a variable-length 
endoding for the block number, on the assumption that smaller numbers 
are more common. Probably not be worth the extra complexity, but we can 
easily experiment with that kind of stuff once we have the 
infrastructure in place.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PostgreSQL Columnar Store for Analytic Workloads

2014-04-03 Thread Hadi Moshayedi

Dear Hackers,

We at Citus Data have been developing a columnar store extension for
PostgreSQL. Today we are excited to open source it under the Apache v2.0
license.

This columnar store extension uses the Optimized Row Columnar (ORC) format
for its data layout, which improves upon the RCFile format developed at
Facebook, and brings the following benefits:

* Compression: Reduces in-memory and on-disk data size by 2-4x. Can be
extended to support different codecs. We used the functions in
pg_lzcompress.h for compression and decompression.
* Column projections: Only reads column data relevant to the query.
Improves performance for I/O bound queries.
* Skip indexes: Stores min/max statistics for row groups, and uses them to
skip over unrelated rows.

We used the PostgreSQL FDW APIs to make this work. The extension doesn't
implement the writable FDW API, but it uses the process utility hook to
enable COPY command for the columnar tables.

This extension uses PostgreSQL's internal data type representation to store
data in the table, so this columnar store should support all data types
that PostgreSQL supports.

We tried the extension on TPC-H benchmark with 4GB scale factor on a
m1.xlarge Amazon EC2 instance, and the query performance improved by 2x-3x
compared to regular PostgreSQL table. Note that we flushed the page cache
before each test to see the impact on disk I/O.

When data is cached in memory, the performance of cstore_fdw tables were
close to the performance of regular PostgreSQL tables.

For more information, please visit:
 * our blog post:
http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics
 * our github page: https://github.com/citusdata/cstore_fdw

Feedback from you is really appreciated.

Thanks,
  -- Hadi

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Mon, Mar 31, 2014 at 7:35 PM, Peter Geoghegan p...@heroku.com wrote:
 Okay. Attached revision only trusts strxfrm() blobs (as far as that
 goes) when the buffer passed to strxfrm() was sufficiently large that
 the blob could fully fit.

Attached revision has been further polished. I've added two additional
optimizations:

* Squeeze the last byte out of each Datum, so that on a 64-bit system,
the full 8 bytes are available to store strxfrm() blobs.

* Figure out when the strcoll() bttextfastcmp_locale() comparator is
called, if it was called because a poor man's comparison required it
(and *not* because it's the non-leading key in the traditional sense,
which implies there are no poorman's normalized keys in respect of
this attribute at all). This allows us to try and get away with a
straight memcmp if and when the lengths of the original text strings
match, on the assumption that when the initial poorman's comparison
didn't work out, and when the string lengths match, there is a very
good chance that both are equal, and on average it's a win to avoid
doing a strcoll() (along with the attendant copying around of buffers
for NULL-termination) entirely. Given that memcmp() is so much cheaper
than strcoll() anyway, this seems like a good trade-off.

-- 
Peter Geoghegan
*** a/src/backend/executor/nodeMergeAppend.c
--- b/src/backend/executor/nodeMergeAppend.c
*** ExecInitMergeAppend(MergeAppend *node, E
*** 136,141 
--- 136,142 
  		sortKey-ssup_collation = node-collations[i];
  		sortKey-ssup_nulls_first = node-nullsFirst[i];
  		sortKey-ssup_attno = node-sortColIdx[i];
+ 		sortKey-leading = (i == 0);
  
  		PrepareSortSupportFromOrderingOp(node-sortOperators[i], sortKey);
  	}
*** a/src/backend/utils/adt/varlena.c
--- b/src/backend/utils/adt/varlena.c
***
*** 17,22 
--- 17,23 
  #include ctype.h
  #include limits.h
  
+ #include access/nbtree.h
  #include access/tuptoaster.h
  #include catalog/pg_collation.h
  #include catalog/pg_type.h
***
*** 29,34 
--- 30,36 
  #include utils/bytea.h
  #include utils/lsyscache.h
  #include utils/pg_locale.h
+ #include utils/sortsupport.h
  
  
  /* GUC variable */
*** typedef struct
*** 50,61 
--- 52,85 
  	int			skiptable[256]; /* skip distance for given mismatched char */
  } TextPositionState;
  
+ typedef struct
+ {
+ 	char	   *buf1;			/* 1st string, or poorman original string buf */
+ 	char	   *buf2;			/* 2nd string, or leading key/poor man blob */
+ 	int			buflen1;
+ 	int			buflen2;
+ #ifdef HAVE_LOCALE_T
+ 	pg_locale_t locale;
+ #endif
+ } TextSortSupport;
+ 
+ /*
+  * This should be large enough that most strings will be fit, but small enough
+  * that we feel comfortable putting it on the stack
+  */
+ #define TEXTBUFLEN		1024
+ 
  #define DatumGetUnknownP(X)			((unknown *) PG_DETOAST_DATUM(X))
  #define DatumGetUnknownPCopy(X)		((unknown *) PG_DETOAST_DATUM_COPY(X))
  #define PG_GETARG_UNKNOWN_P(n)		DatumGetUnknownP(PG_GETARG_DATUM(n))
  #define PG_GETARG_UNKNOWN_P_COPY(n) DatumGetUnknownPCopy(PG_GETARG_DATUM(n))
  #define PG_RETURN_UNKNOWN_P(x)		PG_RETURN_POINTER(x)
  
+ static void btpoorman_worker(SortSupport ssup, Oid collid);
+ static int bttextfastcmp_c(Datum x, Datum y, SortSupport ssup);
+ static int bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup);
+ static int bttextfastcmp_locale_poorman(Datum x, Datum y, SortSupport ssup);
+ static Datum bttext_convert(Datum d, SortSupport ssup);
  static int32 text_length(Datum str);
  static text *text_catenate(text *t1, text *t2);
  static text *text_substring(Datum str,
*** varstr_cmp(char *arg1, int len1, char *a
*** 1356,1365 
  	}
  	else
  	{
! #define STACKBUFLEN		1024
! 
! 		char		a1buf[STACKBUFLEN];
! 		char		a2buf[STACKBUFLEN];
  		char	   *a1p,
     *a2p;
  
--- 1380,1387 
  	}
  	else
  	{
! 		char		a1buf[TEXTBUFLEN];
! 		char		a2buf[TEXTBUFLEN];
  		char	   *a1p,
     *a2p;
  
*** varstr_cmp(char *arg1, int len1, char *a
*** 1393,1416 
  			int			a2len;
  			int			r;
  
! 			if (len1 = STACKBUFLEN / 2)
  			{
  a1len = len1 * 2 + 2;
  a1p = palloc(a1len);
  			}
  			else
  			{
! a1len = STACKBUFLEN;
  a1p = a1buf;
  			}
! 			if (len2 = STACKBUFLEN / 2)
  			{
  a2len = len2 * 2 + 2;
  a2p = palloc(a2len);
  			}
  			else
  			{
! a2len = STACKBUFLEN;
  a2p = a2buf;
  			}
  
--- 1415,1438 
  			int			a2len;
  			int			r;
  
! 			if (len1 = TEXTBUFLEN / 2)
  			{
  a1len = len1 * 2 + 2;
  a1p = palloc(a1len);
  			}
  			else
  			{
! a1len = TEXTBUFLEN;
  a1p = a1buf;
  			}
! 			if (len2 = TEXTBUFLEN / 2)
  			{
  a2len = len2 * 2 + 2;
  a2p = palloc(a2len);
  			}
  			else
  			{
! a2len = TEXTBUFLEN;
  a2p = a2buf;
  			}
  
*** varstr_cmp(char *arg1, int len1, char *a
*** 1475,1485 
  		}
  #endif   /* WIN32 */
  
! 		if (len1 = STACKBUFLEN)
  			a1p = (char *) palloc(len1 + 1);

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-03 Thread Thom Brown

On 3 April 2014 17:52, Peter Geoghegan p...@heroku.com wrote:
 On Mon, Mar 31, 2014 at 7:35 PM, Peter Geoghegan p...@heroku.com wrote:
 Okay. Attached revision only trusts strxfrm() blobs (as far as that
 goes) when the buffer passed to strxfrm() was sufficiently large that
 the blob could fully fit.

 Attached revision has been further polished. I've added two additional
 optimizations:

 * Squeeze the last byte out of each Datum, so that on a 64-bit system,
 the full 8 bytes are available to store strxfrm() blobs.

 * Figure out when the strcoll() bttextfastcmp_locale() comparator is
 called, if it was called because a poor man's comparison required it
 (and *not* because it's the non-leading key in the traditional sense,
 which implies there are no poorman's normalized keys in respect of
 this attribute at all). This allows us to try and get away with a
 straight memcmp if and when the lengths of the original text strings
 match, on the assumption that when the initial poorman's comparison
 didn't work out, and when the string lengths match, there is a very
 good chance that both are equal, and on average it's a win to avoid
 doing a strcoll() (along with the attendant copying around of buffers
 for NULL-termination) entirely. Given that memcmp() is so much cheaper
 than strcoll() anyway, this seems like a good trade-off.

I'm getting an error when building this:

In file included from printtup.c:23:0:
../../../../src/include/utils/memdebug.h:21:31: fatal error:
valgrind/memcheck.h: No such file or directory
compilation terminated.
gcc -O2 -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Wendif-labels
-Wmissing-format-attribute -Wformat-security -fno-strict-aliasing
-fwrapv -fexcess-precision=standard -I../../../src/include
-D_GNU_SOURCE   -c -o analyze.o analyze.c -MMD -MP -MF
.deps/analyze.Po
make[4]: *** [printtup.o] Error 1
make[4]: Leaving directory
`/home/thom/Development/postgresql/src/backend/access/common'
make[3]: *** [common-recursive] Error 2
make[3]: Leaving directory
`/home/thom/Development/postgresql/src/backend/access'
make[2]: *** [access-recursive] Error 2
make[2]: *** Waiting for unfinished jobs

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Thu, Apr 3, 2014 at 1:23 PM, Thom Brown t...@linux.com wrote:
 I'm getting an error when building this:

Sorry. I ran this through Valgrind, and forgot to reset where the
relevant macro is define'd before submission. Attached revision should
build without issue.


-- 
Peter Geoghegan
*** a/src/backend/executor/nodeMergeAppend.c
--- b/src/backend/executor/nodeMergeAppend.c
*** ExecInitMergeAppend(MergeAppend *node, E
*** 136,141 
--- 136,142 
  		sortKey-ssup_collation = node-collations[i];
  		sortKey-ssup_nulls_first = node-nullsFirst[i];
  		sortKey-ssup_attno = node-sortColIdx[i];
+ 		sortKey-leading = (i == 0);
  
  		PrepareSortSupportFromOrderingOp(node-sortOperators[i], sortKey);
  	}
*** a/src/backend/utils/adt/varlena.c
--- b/src/backend/utils/adt/varlena.c
***
*** 17,22 
--- 17,23 
  #include ctype.h
  #include limits.h
  
+ #include access/nbtree.h
  #include access/tuptoaster.h
  #include catalog/pg_collation.h
  #include catalog/pg_type.h
***
*** 29,34 
--- 30,36 
  #include utils/bytea.h
  #include utils/lsyscache.h
  #include utils/pg_locale.h
+ #include utils/sortsupport.h
  
  
  /* GUC variable */
*** typedef struct
*** 50,61 
--- 52,85 
  	int			skiptable[256]; /* skip distance for given mismatched char */
  } TextPositionState;
  
+ typedef struct
+ {
+ 	char	   *buf1;			/* 1st string, or poorman original string buf */
+ 	char	   *buf2;			/* 2nd string, or leading key/poor man blob */
+ 	int			buflen1;
+ 	int			buflen2;
+ #ifdef HAVE_LOCALE_T
+ 	pg_locale_t locale;
+ #endif
+ } TextSortSupport;
+ 
+ /*
+  * This should be large enough that most strings will be fit, but small enough
+  * that we feel comfortable putting it on the stack
+  */
+ #define TEXTBUFLEN		1024
+ 
  #define DatumGetUnknownP(X)			((unknown *) PG_DETOAST_DATUM(X))
  #define DatumGetUnknownPCopy(X)		((unknown *) PG_DETOAST_DATUM_COPY(X))
  #define PG_GETARG_UNKNOWN_P(n)		DatumGetUnknownP(PG_GETARG_DATUM(n))
  #define PG_GETARG_UNKNOWN_P_COPY(n) DatumGetUnknownPCopy(PG_GETARG_DATUM(n))
  #define PG_RETURN_UNKNOWN_P(x)		PG_RETURN_POINTER(x)
  
+ static void btpoorman_worker(SortSupport ssup, Oid collid);
+ static int bttextfastcmp_c(Datum x, Datum y, SortSupport ssup);
+ static int bttextfastcmp_locale(Datum x, Datum y, SortSupport ssup);
+ static int bttextfastcmp_locale_poorman(Datum x, Datum y, SortSupport ssup);
+ static Datum bttext_convert(Datum d, SortSupport ssup);
  static int32 text_length(Datum str);
  static text *text_catenate(text *t1, text *t2);
  static text *text_substring(Datum str,
*** varstr_cmp(char *arg1, int len1, char *a
*** 1356,1365 
  	}
  	else
  	{
! #define STACKBUFLEN		1024
! 
! 		char		a1buf[STACKBUFLEN];
! 		char		a2buf[STACKBUFLEN];
  		char	   *a1p,
     *a2p;
  
--- 1380,1387 
  	}
  	else
  	{
! 		char		a1buf[TEXTBUFLEN];
! 		char		a2buf[TEXTBUFLEN];
  		char	   *a1p,
     *a2p;
  
*** varstr_cmp(char *arg1, int len1, char *a
*** 1393,1416 
  			int			a2len;
  			int			r;
  
! 			if (len1 = STACKBUFLEN / 2)
  			{
  a1len = len1 * 2 + 2;
  a1p = palloc(a1len);
  			}
  			else
  			{
! a1len = STACKBUFLEN;
  a1p = a1buf;
  			}
! 			if (len2 = STACKBUFLEN / 2)
  			{
  a2len = len2 * 2 + 2;
  a2p = palloc(a2len);
  			}
  			else
  			{
! a2len = STACKBUFLEN;
  a2p = a2buf;
  			}
  
--- 1415,1438 
  			int			a2len;
  			int			r;
  
! 			if (len1 = TEXTBUFLEN / 2)
  			{
  a1len = len1 * 2 + 2;
  a1p = palloc(a1len);
  			}
  			else
  			{
! a1len = TEXTBUFLEN;
  a1p = a1buf;
  			}
! 			if (len2 = TEXTBUFLEN / 2)
  			{
  a2len = len2 * 2 + 2;
  a2p = palloc(a2len);
  			}
  			else
  			{
! a2len = TEXTBUFLEN;
  a2p = a2buf;
  			}
  
*** varstr_cmp(char *arg1, int len1, char *a
*** 1475,1485 
  		}
  #endif   /* WIN32 */
  
! 		if (len1 = STACKBUFLEN)
  			a1p = (char *) palloc(len1 + 1);
  		else
  			a1p = a1buf;
! 		if (len2 = STACKBUFLEN)
  			a2p = (char *) palloc(len2 + 1);
  		else
  			a2p = a2buf;
--- 1497,1507 
  		}
  #endif   /* WIN32 */
  
! 		if (len1 = TEXTBUFLEN)
  			a1p = (char *) palloc(len1 + 1);
  		else
  			a1p = a1buf;
! 		if (len2 = TEXTBUFLEN)
  			a2p = (char *) palloc(len2 + 1);
  		else
  			a2p = a2buf;
*** bttextcmp(PG_FUNCTION_ARGS)
*** 1683,1688 
--- 1705,2039 
  	PG_RETURN_INT32(result);
  }
  
+ Datum
+ bttextsortsupport(PG_FUNCTION_ARGS)
+ {
+ 	SortSupport		ssup = (SortSupport) PG_GETARG_POINTER(0);
+ 	Oidcollid = ssup-ssup_collation;
+ 
+ 	btpoorman_worker(ssup, collid);
+ 
+ 	PG_RETURN_VOID();
+ }
+ 
+ static void
+ btpoorman_worker(SortSupport ssup, Oid collid)
+ {
+ 	TextSortSupport	   *tss;
+ 
+ 	/*
+ 	 * If LC_COLLATE = C, we can make things quite a bit faster by using
+ 	 * memcmp() rather than strcoll().  To minimize the per-comparison
+ 	 * overhead, we make this decision just once for the whole

Re: [HACKERS] Useless Replica Identity: NOTHING noise from psql \d

2014-04-03 Thread Andrew Dunstan



On 03/27/2014 01:09 PM, Tom Lane wrote:

Alvaro Herrera alvhe...@2ndquadrant.com writes:

Note that make check in contrib is substantially slower than make
installcheck --- it creates a temporary installation for each contrib
module.  If we make buildfarm run installcheck for all modules, that
might be a problem for some animals.  Would it be possible to have a
target that runs make installcheck for most modules and make check
only for those that need the separate installation?  Maybe we can have a
file listing modules that don't support installcheck, for instance, and
then use $(filter) and $(filter-out) to produce appropriate $(MAKE) -C
loops.

This seems like a good idea to me; the slower animals will be putting lots
of cycles into pretty-much-useless make check builds if we don't.

Rather than a separate file, though, I think a distinct target in
contrib/Makefile would be the best mechanism for keeping the list of
modules that lack installcheck support.  Perhaps make special-check
or some name along that line?


Also, there's the vcregress.pl business.  The way it essentially
duplicates pg_upgrade/test.sh is rather messy; and now that
test_decoding also needs similar treatment, it's not looking so good.
Should we consider redoing that stuff in a way that allows both MSVC and
make-based systems to run those tests?

Agreed, but I'm not volunteering to fix that one ;-)






I've been kind of hoping that someone would step up on both these items, 
but the trail seems to have gone cold.


I'm going to put out the new buildfarm release with the new module to 
run test_decoding check. But of course It won't have MSVC support.


These can go on my long TOO list unless someone else gets there first.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add ALTER TABLESPACE ... MOVE command

2014-04-03 Thread Alvaro Herrera

Stephen Frost wrote:
 Add ALTER TABLESPACE ... MOVE command
 
 This adds a 'MOVE' sub-command to ALTER TABLESPACE which allows moving sets of
 objects from one tablespace to another.  This can be extremely handy and 
 avoids
 a lot of error-prone scripting.  ALTER TABLESPACE ... MOVE will only move
 objects the user owns, will notify the user if no objects were found, and can
 be used to move ALL objects or specific types of objects (TABLES, INDEXES, or
 MATERIALIZED VIEWS).

I just noticed that this commit added the new commands under the
ALTER THING name RENAME TO production (which were originally for
RenameStmt); since commit d86d51a95 had already added some
AlterTableSpaceOptionsStmt nodes to the possible results, maybe this
wasn't so bad in itself; but still it seems quite unlike the way we
organize our parse productions.

If we don't want to add new productions for these new node types, I
think at least this comment update is warranted:


diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 2867fa2..359bb8c 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -7009,6 +7009,8 @@ opt_force:FORCE   
{  $$ = TRUE; }
 /*
  *
  * ALTER THING name RENAME TO newname
+ * ALTER TABLESPACE name MOVE blah
+ * ALTER TABLESPACE name SET/RESET blah
  *
  */


Other thoughts?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

2014-04-03 Thread Thom Brown

On 3 April 2014 19:05, Peter Geoghegan p...@heroku.com wrote:
 On Thu, Apr 3, 2014 at 1:23 PM, Thom Brown t...@linux.com wrote:
 I'm getting an error when building this:

 Sorry. I ran this through Valgrind, and forgot to reset where the
 relevant macro is define'd before submission. Attached revision should
 build without issue.

Looking good:

-T 100 -n -f sort.sql

Master: 21.670467 / 21.718653 (avg: 21.69456)
Patch: 66.888756 / 66.888756 (avg: 66.888756)

308% increase in speed


-c 80 -j 80 -T 100 -n -f sort.sql

Master: 38.450082 / 37.440701 (avg: 37.9453915)
Patch: 153.321946 / 145.004726 (avg: 149.163336)

393% increase in speed

-- 
Thom


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2014 proposal


On 04/03/2014 04:15 PM, Alexander Korotkov wrote:

On Wed, Apr 2, 2014 at 2:22 PM, Alexander Korotkov aekorot...@gmail.comwrote:


On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:


The BIRCH algorithm as described in the paper describes building a tree
in memory. If I understood correctly, you're suggesting to use a pre-built
GiST index instead. Interesting idea!

There are a couple of signifcant differences between the CF tree
described in the paper and GiST:

1. In GiST, a leaf item always represents one heap tuple. In the CF tree,
a leaf item represents a cluster, which consists of one or more tuples. So
the CF tree doesn't store an entry for every input tuple, which makes it
possible to keep it in memory.

2. In the CF tree, all entries in a leaf node must satisfy a threshold
requirement, with respect to a threshold value T: the diameter (or radius)
has to be less than T. GiST imposes no such restrictions. An item can
legally be placed anywhere in the tree; placing it badly will just lead to
degraded search performance, but it's still a legal GiST tree.

3. A GiST index, like any other index in PostgreSQL, holds entries also
for deleted tuples, until the index is vacuumed. So you cannot just use
information from a non-leaf node and use it in the result, as the
information summarized at a non-leaf level includes noise from the dead
tuples.

Can you elaborate how you are planning to use a GiST index to implement
BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
strict in where in the tree an item can be stored, and lets the operator
class to specify exactly when a node is split etc.



Hmmm, it's likely I've imagined something quite outside of this paper, and
even already suggested it to Ivan... :)
I need a little time to rethink it.



Using GiST we can implement BIRCH-like clustering like so:
1) Build a CF tree as GiST index without restriction of T threshold value.
2) Scan CF tree with threshold T with some auxiliary operator. If
consistent method see CF entry which diameter is greater than T then it
returns true. Otherwise it returns false and put this CF entry into output
area (could be either in-memory or temporary table).
3) Process other steps of algorithm as usual.


I still don't understand how that would work. You can't trust the 
non-leaf entries, because their CF value can contain deleted entries. So 
you have to scan every tuple anyway. Once you do that, what's the point 
of the index?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2014 proposal

2014-04-03 Thread Alexander Korotkov

On Thu, Apr 3, 2014 at 11:21 PM, Heikki Linnakangas hlinnakan...@vmware.com
 wrote:

 On 04/03/2014 04:15 PM, Alexander Korotkov wrote:

 On Wed, Apr 2, 2014 at 2:22 PM, Alexander Korotkov aekorot...@gmail.com
 wrote:

  On Tue, Apr 1, 2014 at 2:23 PM, Heikki Linnakangas 
 hlinnakan...@vmware.com wrote:

  The BIRCH algorithm as described in the paper describes building a tree
 in memory. If I understood correctly, you're suggesting to use a
 pre-built
 GiST index instead. Interesting idea!

 There are a couple of signifcant differences between the CF tree
 described in the paper and GiST:

 1. In GiST, a leaf item always represents one heap tuple. In the CF
 tree,
 a leaf item represents a cluster, which consists of one or more tuples.
 So
 the CF tree doesn't store an entry for every input tuple, which makes it
 possible to keep it in memory.

 2. In the CF tree, all entries in a leaf node must satisfy a threshold
 requirement, with respect to a threshold value T: the diameter (or
 radius)
 has to be less than T. GiST imposes no such restrictions. An item can
 legally be placed anywhere in the tree; placing it badly will just lead
 to
 degraded search performance, but it's still a legal GiST tree.

 3. A GiST index, like any other index in PostgreSQL, holds entries also
 for deleted tuples, until the index is vacuumed. So you cannot just use
 information from a non-leaf node and use it in the result, as the
 information summarized at a non-leaf level includes noise from the dead
 tuples.

 Can you elaborate how you are planning to use a GiST index to implement
 BIRCH? You might also want to take a look at SP-GiST; SP-GiST is more
 strict in where in the tree an item can be stored, and lets the operator
 class to specify exactly when a node is split etc.


 Hmmm, it's likely I've imagined something quite outside of this paper,
 and
 even already suggested it to Ivan... :)
 I need a little time to rethink it.


 Using GiST we can implement BIRCH-like clustering like so:
 1) Build a CF tree as GiST index without restriction of T threshold value.
 2) Scan CF tree with threshold T with some auxiliary operator. If
 consistent method see CF entry which diameter is greater than T then it
 returns true. Otherwise it returns false and put this CF entry into output
 area (could be either in-memory or temporary table).
 3) Process other steps of algorithm as usual.


 I still don't understand how that would work. You can't trust the non-leaf
 entries, because their CF value can contain deleted entries. So you have to
 scan every tuple anyway. Once you do that, what's the point of the index?


Yeah, it is limitation of this idea. It's not going to be auto-updatable
CF. User can build index on freshly vacuumed table and play with clustering
some time. Updates on table during that time would be either allowed
clustering error or prohibited. Another potential solution is to let this
index to hold some snapshot of data. But it seems not possible to do in
extension now.

--
With best regards,
Alexander Korotkov.

Re: [HACKERS] [COMMITTERS] pgsql: Add ALTER TABLESPACE ... MOVE command

Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Stephen Frost wrote:
 Add ALTER TABLESPACE ... MOVE command

 I just noticed that this commit added the new commands under the
 ALTER THING name RENAME TO production (which were originally for
 RenameStmt); since commit d86d51a95 had already added some
 AlterTableSpaceOptionsStmt nodes to the possible results, maybe this
 wasn't so bad in itself; but still it seems quite unlike the way we
 organize our parse productions.

Ick.  That's just plain sloppy.  Please create a separate production,
*and* a separate comment header.

Commit d86d51a95 was pretty damn awful in this regard as well, but
let's clean them both up, not make it worse.

Existing precedent would suggest inventing two new productions named the
same as the parse node types they produce, viz AlterTableSpaceMoveStmt
and AlterTableSpaceOptionsStmt.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Firing trigger if only

2014-04-03 Thread Gabriel

Good afternoon all.I have some problem with triggers on PostgreSQL 8.4.I have
a trigger on specific table(articles) that fires on update statement:

CREATE OR REPLACE FUNCTION trigger_articles_update()
  RETURNS trigger AS
$BODY$BEGIN 
  INSERT INTO
article_change(article_id,change_type)VALUES(OLD.article_id,2);
   RETURN NULL; 
END$BODY$
  LANGUAGE 'plpgsql' VOLATILE
  COST 100;
ALTER FUNCTION trigger_articles_update() OWNER TO postgres;

I have 2 different applications that performs update on table
articles(written in Delphi and using ZeosDB). My problem is that I want
trigger to fire only when performing update with first application, but not
with second.I know that triggers supposed to fire on every change on table,
but this is a specific problem that I have.Any hint appreciated. 8)



--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Firing-trigger-if-only-tp5798484.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] PostgreSQL Columnar Store for Analytic Workloads

2014-04-03 Thread Hadi Moshayedi

Dear Hackers,

We at Citus Data have been developing a columnar store extension for
PostgreSQL. Today we are excited to open source it under the Apache v2.0
license.

This columnar store extension uses the Optimized Row Columnar (ORC) format
for its data layout, which improves upon the RCFile format developed at
Facebook, and brings the following benefits:

* Compression: Reduces in-memory and on-disk data size by 2-4x. Can be
extended to support different codecs. We used the functions in
pg_lzcompress.h for compression and decompression.
* Column projections: Only reads column data relevant to the query.
Improves performance for I/O bound queries.
* Skip indexes: Stores min/max statistics for row groups, and uses them to
skip over unrelated rows.

We used the PostgreSQL FDW APIs to make this work. The extension doesn't
implement the writable FDW API, but it uses the process utility hook to
enable COPY command for the columnar tables.

This extension uses PostgreSQL's internal data type representation to store
data in the table, so this columnar store should support all data types
that PostgreSQL supports.

We tried the extension on TPC-H benchmark with 4GB scale factor on a
m1.xlarge Amazon EC2 instance, and the query performance improved by 2x-3x
compared to regular PostgreSQL table. Note that we flushed the page cache
before each test to see the impact on disk I/O.

When data is cached in memory, the performance of cstore_fdw tables were
close to the performance of regular PostgreSQL tables.

For more information, please visit:
 * our blog post:
http://citusdata.com/blog/76-postgresql-columnar-store-for-analytics
 * our github page: https://github.com/citusdata/cstore_fdw

Feedback from you is really appreciated.

Thanks,
  -- Hadi

Re: [HACKERS] Fwd: SSL auth question

2014-04-03 Thread Воронин Дмитрий

Thank you for answer!
I know it. So, my second questions is:
How can I add support of this extension in PostgreSQL. So, I want to do thing, 
that PostgreSQL accept connection with cert auth method and certificate has my 
extension with critical flag?

03.04.2014, 04:33, Wim Lewis w...@omnigroup.com:
 On 1 Apr 2014, at 11:38 PM, carriingfat...@ya.ru wrote:

  I set certificate auth on postgresql 9.3. I generate SSL certificate with 
 my custom extension. So, OpenSSL read it, PostgreSQL accept it if this 
 extension is not critical, but if I set this extension critical, PostgreSQL 
 deny connection.

 I think that is the correct behavior. The critical bit tells PostgreSQL (or 
 other software) what to do if it does not understand the extension: if 
 there's an unknown extension with the critical bit set, then the certificate 
 can't be validated. If the critical bit is not set, then the unknown 
 extension is ignored, and the certificate is processed as if the extension 
 weren't there.

 See this section of RFC 5280:
   http://tools.ietf.org/html/rfc5280#section-4.2

 The idea is that you can set the critical bit for extensions that are 
 supposed *restrict* the usability of the certificate, so that the certificate 
 won't be used in undesired ways by software that doesn't understand the 
 extension.


Best regards, Dmitry Voronin



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

2014-04-03 Thread Robert Haas

On Thu, Apr 3, 2014 at 10:14 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 (Instead of using a new XLogRegisterBuffer() function to register the
 buffers, perhaps they should be passed to XLogInsert as a separate list or
 array. I'm not wedded on the details...)

That would have the advantage of avoiding the function-call overhead,
which seems appealing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fwd: SSL auth question

=?koi8-r?B?98/Sz87JziDkzcnU0snK?= carriingfat...@yandex.ru writes:
 I know it. So, my second questions is:
 How can I add support of this extension in PostgreSQL. So, I want to do 
 thing, that PostgreSQL accept connection with cert auth method and 
 certificate has my extension with critical flag?

Seems like this is a question you should direct to OpenSSL people, not us.
Postgres itself knows nothing to speak of about SSL certificates; it just
delegates all that processing to openssl.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

Robert Haas robertmh...@gmail.com writes:
 On Thu, Apr 3, 2014 at 10:14 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 (Instead of using a new XLogRegisterBuffer() function to register the
 buffers, perhaps they should be passed to XLogInsert as a separate list or
 array. I'm not wedded on the details...)

 That would have the advantage of avoiding the function-call overhead,
 which seems appealing.

You're kidding, right?  One function call per buffer touched by an update
is going to be lost in the noise.

A somewhat more relevant concern is where are we going to keep the state
data involved in all this.  Since this code is, by definition, going to be
called in critical sections, any solution involving internal pallocs is
right out.  The existing arrangement keeps all its data in local variables
of the function doing the update, which is probably a nice property to
preserve if we can manage it.  That might force us to do it as above.

I'd still like something that *looks* more like a list of function calls,
though, even if they're macros under the hood.  The existing approach to
setting up the XLogRecData chains is awfully prone to bugs of omission.
There are a few places that went so far as to set up custom macros to help
with that.  I'm not a fan of doing the same thing in randomly different
ways in different places; but if we did it that way uniformly, it might be
better/safer.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] get_fn_expr_variadic considered harmful

I wrote:
 Robert Haas robertmh...@gmail.com writes:
 Well, that argues for the choice of trying to make them equivalent
 again, I suppose, but it sounds like there are some nasty edge cases
 that won't easily be filed down.  I think your idea of redefining
 funcvariadic to be true only for VARIADIC ANY is probably a promising
 approach to that solution, but as you say it leaves some problems
 unsolved.

 I think what we'll have to do is tell complainants to recreate any
 affected indexes or rules after installing 9.3.5.  Given the relatively
 small number of complaints, I don't think it's worth working harder,
 nor taking risks like inserting catalog lookups into readfuncs.c.

After some thought, it seems to me that the best solution is to redefine
funcvariadic as meaning the last actual argument is a variadic array,
which means it will always be true for a VARIADIC non-ANY function.
In HEAD, that leads to a nice patch that actually simplifies the logic
in ruleutils.c, as attached.  In 9.3, we will not be able to assume that 
funcvariadic has that meaning, so we'll have to use the existing
decompilation logic (which basically ignores funcvariadic unless it's a
VARIADIC ANY function).  I've not made that version of the patch yet but
it should be pretty straightforward.

regards, tom lane

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 32c0135..e54d46d 100644
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
*** deparseFuncExpr(FuncExpr *node, deparse_
*** 1548,1562 
  	procform = (Form_pg_proc) GETSTRUCT(proctup);
  
  	/* Check if need to print VARIADIC (cf. ruleutils.c) */
! 	if (OidIsValid(procform-provariadic))
! 	{
! 		if (procform-provariadic != ANYOID)
! 			use_variadic = true;
! 		else
! 			use_variadic = node-funcvariadic;
! 	}
! 	else
! 		use_variadic = false;
  
  	/* Print schema name only if it's not pg_catalog */
  	if (procform-pronamespace != PG_CATALOG_NAMESPACE)
--- 1548,1554 
  	procform = (Form_pg_proc) GETSTRUCT(proctup);
  
  	/* Check if need to print VARIADIC (cf. ruleutils.c) */
! 	use_variadic = node-funcvariadic;
  
  	/* Print schema name only if it's not pg_catalog */
  	if (procform-pronamespace != PG_CATALOG_NAMESPACE)
diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml
index 2b4ade0..941b101 100644
*** a/doc/src/sgml/xfunc.sgml
--- b/doc/src/sgml/xfunc.sgml
*** CREATE OR REPLACE FUNCTION retcomposite(
*** 3152,3160 
   is zero based.  functionget_call_result_type/ can also be used
   as an alternative to functionget_fn_expr_rettype/.
   There is also functionget_fn_expr_variadic/, which can be used to
!  find out whether the call contained an explicit literalVARIADIC/
!  keyword.  This is primarily useful for literalVARIADIC any/
!  functions, as described below.
  /para
  
  para
--- 3152,3161 
   is zero based.  functionget_call_result_type/ can also be used
   as an alternative to functionget_fn_expr_rettype/.
   There is also functionget_fn_expr_variadic/, which can be used to
!  find out whether variadic arguments have been merged into an array.
!  This is primarily useful for literalVARIADIC any/ functions,
!  since such merging will always have occurred for variadic functions
!  taking ordinary array types.
  /para
  
  para
diff --git a/src/backend/parser/parse_func.c b/src/backend/parser/parse_func.c
index 5934ab0..cc46084 100644
*** a/src/backend/parser/parse_func.c
--- b/src/backend/parser/parse_func.c
*** ParseFuncOrColumn(ParseState *pstate, Li
*** 556,561 
--- 556,572 
  	make_fn_arguments(pstate, fargs, actual_arg_types, declared_arg_types);
  
  	/*
+ 	 * If the function isn't actually variadic, forget any VARIADIC decoration
+ 	 * on the call.  (Perhaps we should throw an error instead, but
+ 	 * historically we've allowed people to write that.)
+ 	 */
+ 	if (!OidIsValid(vatype))
+ 	{
+ 		Assert(nvargs == 0);
+ 		func_variadic = false;
+ 	}
+ 
+ 	/*
  	 * If it's a variadic function call, transform the last nvargs arguments
  	 * into an array --- unless it's an any variadic.
  	 */
*** ParseFuncOrColumn(ParseState *pstate, Li
*** 584,589 
--- 595,605 
  		newa-location = exprLocation((Node *) vargs);
  
  		fargs = lappend(fargs, newa);
+ 
+ 		/* We could not have had VARIADIC marking before ... */
+ 		Assert(!func_variadic);
+ 		/* ... but now, it's a VARIADIC call */
+ 		func_variadic = true;
  	}
  
  	/*
diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c
index 566b4c9..b1bac86 100644
*** a/src/backend/utils/adt/ruleutils.c
--- b/src/backend/utils/adt/ruleutils.c
*** static char *get_relation_name(Oid relid
*** 401,407 
  static char *generate_relation_name(Oid relid, List *namespaces);
  static char *generate_function_name(Oid funcid, int nargs,
  	   List

Re: [HACKERS] WAL format and API changes (9.5)

On 2014-04-03 19:08:27 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  On Thu, Apr 3, 2014 at 10:14 AM, Heikki Linnakangas
  hlinnakan...@vmware.com wrote:
  (Instead of using a new XLogRegisterBuffer() function to register the
  buffers, perhaps they should be passed to XLogInsert as a separate list or
  array. I'm not wedded on the details...)
 
  That would have the advantage of avoiding the function-call overhead,
  which seems appealing.
 
 You're kidding, right?  One function call per buffer touched by an update
 is going to be lost in the noise.
 
 A somewhat more relevant concern is where are we going to keep the state
 data involved in all this.  Since this code is, by definition, going to be
 called in critical sections, any solution involving internal pallocs is
 right out.

We actually already allocate memory in XLogInsert() :(, although only in
the first XLogInsert() a backend does... I didn't remember it, but I
complained about it before:
http://archives.postgresql.org/message-id/4FEB223A.3060707%40enterprisedb.com

I didn't realize the implications for it back then and thus didn't press
hard for a change, but I think it should be fixed.

  The existing arrangement keeps all its data in local variables
 of the function doing the update, which is probably a nice property to
 preserve if we can manage it.  That might force us to do it as above.

We could simply allocate enough scratch space for the state at process
startup. I don't think there'll ever be a need for XLogInsert() to be
reentrant, so that should be fine.

 I'd still like something that *looks* more like a list of function calls,
 though, even if they're macros under the hood.  The existing approach to
 setting up the XLogRecData chains is awfully prone to bugs of
 omission.

Agreed. There's some pretty ugly code around this.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

Andres Freund and...@2ndquadrant.com writes:
 On 2014-04-03 19:08:27 -0400, Tom Lane wrote:
 A somewhat more relevant concern is where are we going to keep the state
 data involved in all this.  Since this code is, by definition, going to be
 called in critical sections, any solution involving internal pallocs is
 right out.

 We actually already allocate memory in XLogInsert() :(, although only in
 the first XLogInsert() a backend does...

Ouch.  I wonder if we should put an Assert(not-in-critical-section)
into MemoryContextAlloc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format and API changes (9.5)

On 2014-04-03 19:33:12 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-04-03 19:08:27 -0400, Tom Lane wrote:
  A somewhat more relevant concern is where are we going to keep the state
  data involved in all this.  Since this code is, by definition, going to be
  called in critical sections, any solution involving internal pallocs is
  right out.
 
  We actually already allocate memory in XLogInsert() :(, although only in
  the first XLogInsert() a backend does...
 
 Ouch.  I wonder if we should put an Assert(not-in-critical-section)
 into MemoryContextAlloc.

XLogInsert() is using malloc() directly, so that wouldn't detect this
case...

It's not a bad idea tho. I wonder how far the regression tests
get...

Not even through initdb. GetVirtualXIDsDelayingChkpt() is to blame.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-04-03 Thread YAMAMOTO Takashi

 Thanks.  To avoid socket path length limitations, I lean toward placing the
 socket temporary directory under /tmp rather than placing under the CWD:
 
 http://www.postgresql.org/message-id/flat/20121129223632.ga15...@tornado.leadboat.com

openvswitch has some tricks to overcome the socket path length
limitation using symlink.  (or procfs where available)
iirc these were introduced for debian builds which use deep CWD.

http://git.openvswitch.org/cgi-bin/gitweb.cgi?p=openvswitch;a=blob;f=lib/socket-
util.c;h=aa0c7196da9926de38b7388b8e28ead12e12913e;hb=HEAD

look at shorten_name_via_symlink and shorten_name_via_proc.

YAMAMOTO Takashi


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

On Thu, Apr 3, 2014 at 3:19 PM, Thom Brown t...@linux.com wrote:
 Master: 38.450082 / 37.440701 (avg: 37.9453915)
 Patch: 153.321946 / 145.004726 (avg: 149.163336)

I think that those are objectively very large reductions in a cost
that figures prominently in most workloads. Based solely on those
facts, but also on the fairly low complexity of the patch, it may be
worth considering committing this before 9.4 goes into feature freeze,
purely as something that just adds a SortSupport function for the
default text opclass, with more or less the same cases accelerated as
with the existing SortSupport-supplying-opclasses. There has been only
very modest expansions to the SortSupport and tuplesort code. I
haven't generalized this to work with other areas where a normalized
key could be put to good use, but I see no reason to block on that.

Obviously I've missed the final commitfest deadline by a wide berth. I
don't suggest this lightly, and it in no small part has a lot to do
with the patch being simple and easy to reason about. The patch could
almost (but not quite) be written as part of a third party text
operator class's sort support routine. I think that if an individual
committer were willing to commit this at their own discretion before
feature freeze, outside of the formal commitfest process, that would
not be an unreasonable thing in these particular, somewhat unusual
circumstances.

I defer entirely to the judgement of others here - this is not an
issue that I feel justified in expressing a strong opinion on, and in
fact I don't have such an opinion anyway. However, by actually looking
at the risks and the benefits here, I think everyone will at least
understand why I'd feel justified in broaching the topic. This is a
very simple idea, and a rather old one at that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)

Peter Geoghegan p...@heroku.com writes:
 I think that those are objectively very large reductions in a cost
 that figures prominently in most workloads. Based solely on those
 facts, but also on the fairly low complexity of the patch, it may be
 worth considering committing this before 9.4 goes into feature freeze,

Personally, I have paid no attention to this thread and have no intention
of doing so before feature freeze.  There are three dozen patches at
https://commitfest.postgresql.org/action/commitfest_view?id=21
that have moral priority for consideration for 9.4.  Not all of them are
going to get in, certainly, and I'm already feeling a lot of guilt about
the small amount of time I've been able to devote to reviewing/committing
patches this cycle.  Spending time now on patches that didn't even exist
at the submission deadline feels quite unfair to me.

Perhaps I shouldn't lay my own guilt trip on other committers --- but
I think it would be a bad precedent to not deal with the existing patch
queue first.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Securing make check (CVE-2014-0067)

2014-04-03 Thread Noah Misch

On Fri, Apr 04, 2014 at 02:36:05AM +, YAMAMOTO Takashi wrote:
  Thanks.  To avoid socket path length limitations, I lean toward placing the
  socket temporary directory under /tmp rather than placing under the CWD:
  
  http://www.postgresql.org/message-id/flat/20121129223632.ga15...@tornado.leadboat.com
 
 openvswitch has some tricks to overcome the socket path length
 limitation using symlink.  (or procfs where available)
 iirc these were introduced for debian builds which use deep CWD.

That's another reasonable approach.  Does it have a notable advantage over
placing the socket in a subdirectory of /tmp?  Offhand, the security and
compatibility consequences look similar.

-- 
Noah Misch
EnterpriseDB http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A question about code in DefineRelation()

2014-04-03 Thread Etsuro Fujita

If I understand correctly, foreign tables cannot have an OID column, but
the following code in DefineRelation() assumes that foreign tables *can*
have that coulum:

560 /*
561  * Create a tuple descriptor from the relation schema.  Note
that this
562  * deals with column names, types, and NOT NULL constraints, but not
563  * default values or CHECK constraints; we handle those below.
564  */
565 descriptor = BuildDescForRelation(schema);
566
567 localHasOids = interpretOidsOption(stmt-options,
568(relkind == RELKIND_RELATION ||
569 relkind ==
RELKIND_FOREIGN_TABLE));

Is this intended to support an OID column on foreign tables in future?

Thanks,

Best regards,
Etsuro Fujita


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] four minor proposals for 9.5

2014-04-03 Thread Amit Kapila

On Tue, Apr 1, 2014 at 11:42 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2014-03-27 17:56 GMT+01:00 Pavel Stehule pavel.steh...@gmail.com:
 So I'll prepare a some prototypes in next month for

 1. log a execution time for cancelled queries,
 2. track a query lock time


 When I though about this proposal, I found so our implementation is
 plus/minus hack, that can works well in GoodData, but can be inconsistent
 with native postgresql. So in this proposal I'll plan some different than we
 use, but I hope so it is more consistent with upstream.

 So I miss a execution time for cancelled queries. Same time can be
 interesting for some queries that was from any reasons - temp file limits
 can stop queries after 5 minutes, some out of memory etc ..

 It is not hard to implement printing duration for cancelled queries, but is
 impossible do it for any kind of exception. But there is way. We can use a
 log line prefix space. Now, there are not a possibility to print duration
 - so we can introduce a new symbol %D as duration.

This means for any exception/error it will print duration if %D is used, not
only for cancelled queries or you planing just for some specific logs like
when query is cancelled.
One more thing I think currently also we can find that by crude way
(pg_stat_activity has query_start time and log_line_prefix has end time),
but I think the having in one place 'log' will be more useful.


 Same technique I would to
 use for printing lock time - it can be printing instead symbol %L.

Above you have mentioned that you are planing to have three different
lock times (Table, Tuple and others), so will this one symbol (%L) enable
all three lock times?
Are you planing to add new logs for logging this or planing to use existing
infrastructure?

One general point is that won't it be bit difficult to parse the log line having
so many times, should we consider to log with some special marker for
each time (for example: tup_lock_wait_time:1000ms).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] B-Tree support function number 3 (strxfrm() optimization)