[HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Sawada Masahiko
Hi all,

Attached WIP patch adds -C (--concurrently) option for reindexdb
command for concurrently reindexing.
If we specify -C option with any table then reindexdb do reindexing
concurrently with minimum lock necessary.
Note that we cannot use '-s' option (for system catalog) and '-C'
option at the same time.
This patch use simple method as follows.

1. Do CREATE INDEX CONCURRENTLY new index which has same definition
as target index
2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts)
3. Swap old and new index
4. Drop old index
5. COMMIT

These process are based on pg_repack(or pg_reorg) does, done via SQL.

ToDo
- Multi language support for log message.

Regards,

---
Sawada Masahiko


reindexdb-concurrently_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Michael Paquier
On Mon, Aug 25, 2014 at 3:36 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 Attached WIP patch adds -C (--concurrently) option for reindexdb
 command for concurrently reindexing.
 If we specify -C option with any table then reindexdb do reindexing
 concurrently with minimum lock necessary.
 Note that we cannot use '-s' option (for system catalog) and '-C'
 option at the same time.
 This patch use simple method as follows.

 1. Do CREATE INDEX CONCURRENTLY new index which has same definition
 as target index
 2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts)
 3. Swap old and new index
 4. Drop old index
 5. COMMIT

 These process are based on pg_repack(or pg_reorg) does, done via SQL.

This would be a useful for users, but I am not sure that you can call
that --concurrently as the rename/swap phase requires an exclusive
lock, and you would actually block a real implementation of REINDEX
CONCURRENTLY (hm...).

 ToDo
 - Multi language support for log message.
Why? I am not sure that's something you should deal with.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIMIT for UPDATE and DELETE

2014-08-25 Thread Etsuro Fujita

Hi Rukh,

(2014/08/15 6:18), Rukh Meski wrote:

Based on the feedback on my previous patch, I've separated only the
LIMIT part into its own feature.  This version plays nicely with
inheritance.  The intended use is splitting up big UPDATEs and DELETEs
into batches more easily and efficiently.


Before looking into the patch, I'd like to know the use cases in more 
details.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] minor config doc update

2014-08-25 Thread Fabien COELHO


Find a small documentation patch attached:

 - show the valid range for segment_timeout
 - remove one spurious empty line (compared to other descriptions)

--
Fabien.diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index f23e5dc..49547ee 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -2274,8 +2274,9 @@ include_dir 'conf.d'
   /term
   listitem
para
-Maximum time between automatic WAL checkpoints, in
-seconds. The default is five minutes (literal5min/).
+Maximum time between automatic WAL checkpoints, in seconds.
+The valid range is between 30 seconds and one hour.
+The default is five minutes (literal5min/).
 Increasing this parameter can increase the amount of time needed
 for crash recovery.
 This parameter can only be set in the filenamepostgresql.conf/
@@ -2294,7 +2295,6 @@ include_dir 'conf.d'
para
 Specifies the target of checkpoint completion, as a fraction of
 total time between checkpoints. The default is 0.5.
-
 This parameter can only be set in the filenamepostgresql.conf/
 file or on the server command line.
/para

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Sawada Masahiko
On Mon, Aug 25, 2014 at 3:48 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Aug 25, 2014 at 3:36 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 Attached WIP patch adds -C (--concurrently) option for reindexdb
 command for concurrently reindexing.
 If we specify -C option with any table then reindexdb do reindexing
 concurrently with minimum lock necessary.
 Note that we cannot use '-s' option (for system catalog) and '-C'
 option at the same time.
 This patch use simple method as follows.

 1. Do CREATE INDEX CONCURRENTLY new index which has same definition
 as target index
 2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts)
 3. Swap old and new index
 4. Drop old index
 5. COMMIT

 These process are based on pg_repack(or pg_reorg) does, done via SQL.

 This would be a useful for users, but I am not sure that you can call
 that --concurrently as the rename/swap phase requires an exclusive
 lock, and you would actually block a real implementation of REINDEX
 CONCURRENTLY (hm...).


this might be difficult to call this as --concurrently.
It might need to be change the name.

 ToDo
 - Multi language support for log message.
 Why? I am not sure that's something you should deal with.

The log message which has been existed already are supported multi
language support using by .po file,
But newly added message has not corresponded message in .po file, I thought.

Regards,

---
Sawada Masahiko


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardening pg_upgrade

2014-08-25 Thread Bernd Helmle



--On 21. August 2014 22:08:58 +0200 Magnus Hagander mag...@hagander.net 
wrote:



I vote for discarding 8.3 support in pg_upgrade.  There are already
enough limitations on pg_upgrade from pre-8.4 to make it of questionable
value; if it's going to create problems like this, it's time to cut the
rope.


+1. 8.3 has been unsupported for a fairly long time now, and you can
still do a two-step upgrade if you're on that old a version.


Also +1 from my side. I've seen some old 8.3 installations at customers, 
still, but they aren't large and can easily be upgraded with a two step 
upgrade.


--
Thanks

Bernd


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Extended Prefetching using Asynchronous IO - proposal and patch

2014-08-25 Thread Heikki Linnakangas

On 08/25/2014 12:49 AM, johnlumby wrote:

On 08/19/14 18:27, Heikki Linnakangas wrote:

Please write the patch without atomic CAS operation. Just use a spinlock.


Umm,   this is a new criticism I think.


Yeah. Be prepared that new issues will crop up as the patch gets slimmer 
and easier to review :-). Right now there's still so much chaff that 
it's difficult to see the wheat.



  I use CAS for things other
than locking,
such as add/remove from shared queue.   I suppose maybe a spinlock on
the entire queue
can be used equivalently,  but with more code (extra confusion) and
worse performance
(coarser serialization).  What is your objection to using gcc's
atomic ops?   Portability?


Yeah, portability.

Atomic ops might make sense, but at this stage it's important to slim 
down the patch to the bare minimum, so that it's easier to review. Once 
the initial patch is in, you can improve it with additional patches.



There's a patch in the commitfest to add support for that,


sorry,  support for what? There are already spinlocks in postgresql,
you mean some new kind?please point me at it with hacker msgid or
something.


Atomic ops: https://commitfest.postgresql.org/action/patch_view?id=1314

Once that's committed, you can use the new atomic ops in your patch. But 
until then, stick to spinlocks.



On 08/19/14 19:10, Claudio Freire wrote:

On Tue, Aug 19, 2014 at 7:27 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Also, please split prefetching of regular index scans into a separate patch. ...

That patch already happened on the list, and it wasn't a win in many
cases. I'm not sure it should be proposed independently of this one.
Maybe a separate patch, but it should be considered dependent on this.

I don't have an archive link at hand atm, but I could produce one later.

Several people have asked to split this patch into several smaller ones
and I
have thought about it. It would introduce some awkward dependencies.
E.g.  splitting the scanner code  (index,  relation-heap) into separate
patch
from aio code would mean that the scanner patch becomes dependent
on the aio patch. They are not quite orthogonal.


Right now, please focus on the main AIO patch. That should show a 
benefit for bitmap heap scans too, so to demonstrate the benefits of 
AIO, you don't need to prefetch regular index scans. The main AIO patch 
can be written, performance tested, and reviewed without caring about 
regular index scans at all.



The reason is that the scanners call a new function, DiscardBuffer(blockid)
when aio is in use. We can get around it by providing a stub for
that function
in the scanner patch,   but then there is some risk of someone getting the
wrong version of that function in their build. It just adds yet more
complexity
and breakage opportunities.


Regardless of the regular index scans, we'll need to discuss the new API 
of PrefetchBuffer and DiscardBuffer.


It would be simpler for the callers if you didn't require the 
DiscardBuffer calls. I think it would be totally feasible to write the 
patch that way. Just drop the buffer pin after the asynchronous read 
finishes. When the caller actually needs the page, it will call 
ReadBuffer which will pin it again. You'll get a little bit more bufmgr 
traffic that way, but I think it'll be fine in practice.



One further comment concerning these BufferAiocb and aiocb control blocks
being in shared memory :

I explained above that the BufferAiocb must be in shared memory for
wait/post.
The aiocb does not necessarily have to be in shared memory,
but since there is a one-to-one relationship between BufferAiocb and aiocb,
it makes the code much simpler ,  and the operation much more efficient,
if the aiocb is embedded in the BufferAiocb as I have it now.
E.g. if the aiocb is in private-process memory,   then an additional
allocation
scheme is needed (fixed number?   palloc()in'g extra ones as needed?  ...)
which adds complexity,


Yep, use palloc or a fixed pool. There's nothing wrong with that.


 and probably some inefficiency since a shared
pool is usually
more efficient (allows higher maximum per process etc),   and there
would have to be
some pointer de-referencing from BufferAiocb to aiocb adding some
(small) overhead.


I think you're falling into the domain of premature optimization. A few 
pointer dereferences are totally insignificant, and the amount of memory 
you're saving pales in comparison to other similar non-shared pools and 
caches we have (catalog caches, for example). And on the other side of 
the coin, with a shared pool you'll waste memory when async I/O is not 
used (e.g because everything fits in RAM), and when it is used, you'll 
have more contention on locks and cache lines when multiple processes 
use the same objects.


The general guideline in PostgreSQL is that everything is 
backend-private, except structures used to communicate between backends.


- Heikki



--
Sent via pgsql-hackers mailing list 

Re: [HACKERS] implement subject alternative names support for SSL connections

2014-08-25 Thread Heikki Linnakangas

On 08/24/2014 03:11 PM, Alexey Klyukin wrote:

On Wed, Aug 20, 2014 at 11:53 AM, Heikki Linnakangas 
hlinnakan...@vmware.com wrote:


On 07/25/2014 07:10 PM, Alexey Klyukin wrote:


Greetings,

I'd like to propose a patch for checking subject alternative names entry
in
the SSL certificate for DNS names during SSL authentication.



Thanks! I just ran into this missing feature last week, while working on
my SSL test suite. So +1 for having the feature.

This patch needs to be rebased over current master branch, thanks to my
refactoring that moved all OpenSSL-specific stuff to be-secure-openssl.c.



The patch is rebased against fe-secure-openssl.c (that's where
verify_peer_name_matches_certificate appeared in the master branch), I've
changed the condition in the for loop to be less confusing (thanks to
comments from Magnus and Tom), making an explicit break once a match is
detected.


The patch doesn't seem to support wildcards in alternative names. Is 
that on purpose?


It would be good to add a little helper function that does the 
NULL-check, straight comparison, and wildcard check, for a single name. 
And then use that for the Common Name and all the Alternatives. That'll 
ensure that all the same rules apply whether the name is the Common Name 
or an Alternative (assuming that the rules are supposed to be the same; 
I don't know if that's true).


But actually, I wonder if we should delegate the whole hostname matching 
to OpenSSL? There's a function called X509_check_host for that, although 
it's new in OpenSSL 1.1.0 so we'd need to add a configure test for that 
and keep the current code to handle older versions.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] implement subject alternative names support for SSL connections

2014-08-25 Thread Andres Freund
On 2014-08-25 13:02:50 +0300, Heikki Linnakangas wrote:
 But actually, I wonder if we should delegate the whole hostname matching to
 OpenSSL? There's a function called X509_check_host for that, although it's
 new in OpenSSL 1.1.0 so we'd need to add a configure test for that and keep
 the current code to handle older versions.

Given that we're about to add support for other SSL implementations I'm
not sure that that's a good idea. IIRC there exist quite a bit of
different interpretations about what denotes a valid cert between the
libraries. Doesn't sound fun to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] implement subject alternative names support for SSL connections

2014-08-25 Thread Heikki Linnakangas

On 08/25/2014 01:07 PM, Andres Freund wrote:

On 2014-08-25 13:02:50 +0300, Heikki Linnakangas wrote:

But actually, I wonder if we should delegate the whole hostname matching to
OpenSSL? There's a function called X509_check_host for that, although it's
new in OpenSSL 1.1.0 so we'd need to add a configure test for that and keep
the current code to handle older versions.


Given that we're about to add support for other SSL implementations I'm
not sure that that's a good idea. IIRC there exist quite a bit of
different interpretations about what denotes a valid cert between the
libraries.


Really? That sounds scary. I can imagine that some libraries support 
more complicated stuff like Internationalized Domain Names, while others 
don't, but as long as they all behave the same with the basic stuff, I 
think that's acceptable.



Doesn't sound fun to me.


As long as just this patch is concerned, I agree it's easier to just 
implement it ourselves, but if we want to start implementing more 
complicated rules, then I'd rather not get into that business at all, 
and let the SSL library vendor deal with the bugs and CVEs.


I guess we'll go ahead with this patch for now, but keep this in mind if 
someone wants to complicate the rules further in the future.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Switch pg_basebackup to use -X stream instead of -X fetch by default?

2014-08-25 Thread Andres Freund
Hi,

currently pg_basebackup uses fetch mode when only -x is specified -
which imo isn't a very good thing to use due to the increased risk of
not fetching everything.
How about switching to stream mode for 9.5+?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Escaping from blocked send() reprised.

2014-08-25 Thread Heikki Linnakangas

On 07/01/2014 06:26 AM, Kyotaro HORIGUCHI wrote:

At Mon, 30 Jun 2014 11:27:47 -0400, Robert Haas robertmh...@gmail.com wrote in 
CA+TgmoZfcGzAEmtbyoCe6VdHnq085x+ox752zuJ2AKN=wc8...@mail.gmail.com

1. I think it's the case that there are platforms around where a
signal won't cause send() to return EINTR and I'd be entirely
unsurprised if SSL_write() doesn't necessarily return EINTR in that
case.  I'm not sure what, if anything, we can do about that.


We use a custom write routine with SSL_write, where we call send() 
ourselves, so that's not a problem as long as we put the check in the 
right place (in secure_raw_write(), after my recent SSL refactoring - 
the patch needs to be rebased).



man 2 send on FreeBSD has not description about EINTR.. And even
on linux, send won't return EINTR for most cases, at least I
haven't seen that. So send()=-1,EINTR seems to me as only an
equivalent of send() = 0. I have no idea about what the
implementer thought the difference is.


As the patch stands, there's a race condition: if the SIGTERM arrives 
*before* the send() call, the send() won't return EINTR anyway. So 
there's a chance that you still block. Calling pq_terminate_backend() 
again will dislodge it (assuming send() returns with EINTR on signal), 
but I don't think we want to define the behavior as usually, 
pq_terminate_backend() will kill a backend that's blocked on sending to 
the client, but sometimes you have to call it twice (or more!) to really 
kill it.


A more robust way is to set ImmediateInterruptOK before calling send(). 
That wouldn't let you send data that can be sent without blocking 
though. For that, you could put the socket to non-blocking mode, and 
sleep with select(), also waiting for the process' latch at the same 
time (die() sets the latch, so that will wake up the select() if a 
termination request arrives).


Is it actually safe to process the die-interrupt where send() is called? 
ProcessInterrupts() does ereport(FATAL, ...), which will attempt to 
send a message to the client. If that happens in the middle of 
constructing some other message, that will violate the protocol.



2. I think it would be reasonable to try to kill off the connection
without notifying the client if we're unable to send the data to the
client in a reasonable period of time.  But I'm unsure what a
reasonable period of time means.  This patch would basically do it
after no delay at all, which seems like it might be too aggressive.
However, I'm not sure.


I think there's no such a reasonable time.


I agree it's pretty hard to define any reasonable timeout here. I think 
it would be fine to just cut the connection; even if you don't block 
while sending, you'll probably reach a CHECK_FOR_INTERRUPT() somewhere 
higher in the stack and kill the connection almost as abruptly anyway. 
(you can't violate the protocol, however)


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIMIT for UPDATE and DELETE

2014-08-25 Thread Amit Kapila
On Mon, Aug 25, 2014 at 12:18 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
wrote:
 (2014/08/15 6:18), Rukh Meski wrote:

 Based on the feedback on my previous patch, I've separated only the
 LIMIT part into its own feature.  This version plays nicely with
 inheritance.  The intended use is splitting up big UPDATEs and DELETEs
 into batches more easily and efficiently.


 Before looking into the patch, I'd like to know the use cases in more
details.

You can once check the previous commit fest thread [1] for
this feature, in that you can find some use cases and what are
the difficulties to implement this feature, it might aid you in
review of this feature.

[1]
https://commitfest.postgresql.org/action/patch_view?id=1412


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-25 Thread Albe Laurenz
Etsuro Fujita wrote:
 Done.  (I've left deparseDirectUpdateSql/deparseDirectDeleteSql as-is,
 though.)
 
 Other changes:
 
 * Address the comments from Eitoku-san.
 * Add regression tests.
 * Fix a bug, which fails to show the actual row counts in EXPLAIN
 ANALYZE for UPDATE/DELETE without a RETURNING clause.
 * Rebase to HEAD.
 
 Please find attached an updated version of the patch.

Here is my review:

The patch Applies fine, Builds without warning and passes make Check,
so the ABC of patch reviewing is fine.

I played with it, and apart from Hanada's comments I have found the following:

test= EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id  3;
QUERY PLAN
--
 Update on laurenz.rtest  (cost=100.00..14134.40 rows=299970 width=10) (actual 
time=0.005..0.005 rows=0 loops=1)
   -  Foreign Scan on laurenz.rtest  (cost=100.00..14134.40 rows=299970 
width=10) (actual time=0.002..0.002 rows=27 loops=1)
 Output: id, val, ctid
 Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id  3))
 Planning time: 0.179 ms
 Execution time: 3706.919 ms
(6 rows)

Time: 3708.272 ms

The actual time readings are surprising.
Shouldn't these similar to the actual execution time, since most of the time is 
spent
in the foreign scan node?

Reading the code, I noticed that the pushed down UPDATE or DELETE statement is 
executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the 
normal scan?

It is not expected that postgresReScanForeignScan is called when the 
UPDATE/DELETE
is pushed down, right?  Maybe it would make sense to add an assertion for that.

I ran a simple performance test and found that performance is improved as 
expected;
updating 10 rows took 1000 rather than 8000 ms, and DELETING the same amount
took 200 instead of 6500 ms.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-08-25 Thread Heikki Linnakangas

On 07/12/2014 05:16 AM, Jeff Davis wrote:

I was able to see about a 2% increase in runtime when using the
similar_escape function directly. I made a 10M tuple table and did:

 explain analyze
   select
similar_escape('','#')
 from t;

which was the worst reasonable case I could think of. (It appears that
selecting from a table is faster than from generate_series. I'm curious
what you use when testing the performance of an individual function at
the SQL level.)


A large table like that is what I usually do. A large generate_series() 
spends a lot of time building the tuplestore, especially if it doesn't 
fit in work_mem and spills to disk. Sometimes I use this to avoid it:


explain analyze
  select
similar_escape('','#') 
from generate_series(1, 1) a, generate_series(1,1000);


although in my experience it still has somewhat more overhead than a 
straight seqscan because.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIMIT for UPDATE and DELETE

2014-08-25 Thread Kevin Grittner
Etsuro Fujita fujita.ets...@lab.ntt.co.jp wrote:
 (2014/08/15 6:18), Rukh Meski wrote:
 Based on the feedback on my previous patch, I've separated only the
 LIMIT part into its own feature.  This version plays nicely with
 inheritance.  The intended use is splitting up big UPDATEs and DELETEs
 into batches more easily and efficiently.

 Before looking into the patch, I'd like to know the use cases in more
 details.

There have been a few times I wanted something like this, so that
it was easier to do an update that affects a very high percentage
of rows in a table, while making the old version of the row no
longer match the selection criteria for the UPDATE.  There are
workarounds using cursors or subselects returning ctid, but they
are kludgy and error prone.  Basically I wanted to alternate UPDATE 
of a subset of the rows with table VACUUM so that subsequent 
iterations can re-use space and avoid bloating the table.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardening pg_upgrade

2014-08-25 Thread Kevin Grittner
Bernd Helmle maili...@oopsware.de wrote:
 Magnus Hagander mag...@hagander.net wrote:

 I vote for discarding 8.3 support in pg_upgrade.  There are already
 enough limitations on pg_upgrade from pre-8.4 to make it of questionable
 value; if it's going to create problems like this, it's time to cut the
 rope.

 +1. 8.3 has been unsupported for a fairly long time now, and you can
 still do a two-step upgrade if you're on that old a version.

 Also +1 from my side. I've seen some old 8.3 installations at customers,
 still, but they aren't large and can easily be upgraded with a two step
 upgrade.

+1

If we could leave it without it being any extra work, fine; but
once a release is over a year out of support, if it's a matter of
putting extra work on the pg hackers or on the users who have
chosen to wait more than a year after support ends to do the
upgrade, I'm OK with asking those users to do a two-phase upgrade
or fall back to pg_dump.  It's not like we're leaving them without 
any options.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-08-25 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 07/12/2014 05:16 AM, Jeff Davis wrote:
 I was able to see about a 2% increase in runtime when using the
 similar_escape function directly. I made a 10M tuple table and did:
 
 explain analyze
 select
 similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#')
  from t;
 
 which was the worst reasonable case I could think of. (It appears that
 selecting from a table is faster than from generate_series. I'm curious
 what you use when testing the performance of an individual function at
 the SQL level.)

 A large table like that is what I usually do. A large generate_series() 
 spends a lot of time building the tuplestore, especially if it doesn't 
 fit in work_mem and spills to disk. Sometimes I use this to avoid it:

 explain analyze
select
 similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#')
  
 from generate_series(1, 1) a, generate_series(1,1000);

 although in my experience it still has somewhat more overhead than a 
 straight seqscan because.

[ scratches head... ]  Surely similar_escape is marked immutable, and
will therefore be executed exactly once in either of these formulations,
because the planner will fold the expression to a constant.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Built-in binning functions

2014-08-25 Thread Petr Jelinek

Hi,

I finally had some time to get back to this.

I attached version3 of the patch which fixes Tom's complaint about 
int8 version by removing the int8 version as it does not seem necessary 
(the float8 can handle integers just fine).


This patch now basically has just one optimized function for float8 and 
one for numeric datatypes, just like width_bucket.



On 08/07/14 02:14, Tom Lane wrote:
Also, I'm not convinced by this business of throwing an error for a
NULL array element.  Per spec, null arguments to width_bucket()
produce a null result, not an error --- shouldn't this flavor act
similarly?  In any case I think the test needs to use
array_contains_nulls() not just ARR_HASNULL.


I really think there should be difference between NULL array and NULL 
inside array and that NULL inside the array is wrong input. I changed 
the check to array_contains_nulls() though.



On 08/07/14 02:14, Tom Lane wrote:
Lastly, the spec defines behaviors for width_bucket that allow either
ascending or descending buckets.  We could possibly do something
similar


I am not sure it's worth it here as we require input to be sorted 
anyway. It might be worthwhile if we decided to do this as an aggregate 
(since there input would not be required to be presorted) instead of 
function but I am not sure if that's desirable either as that would 
limit usability to only the single main use-case (group by and count()).



On 20/07/14 11:01, Simon Riggs wrote:

On 16 July 2014 20:35, Pavel Stehule pavel.steh...@gmail.com wrote:



On 08/07/14 02:14, Tom Lane wrote:


I didn't see any discussion of the naming question in this thread.
I'd like to propose that it should be just width_bucket(); we can
easily determine which function is meant, considering that the
SQL-spec variants don't take arrays and don't even have the same
number of actual arguments.


I did mention in submission that the names are up for discussion, I am all
for naming it just width_bucket.


I had this idea too - but I am not sure if it is good idea. A distance
between ANSI SQL with_bucket and our enhancing is larger than in our
implementation of median for example.

I can live with both names, but current name I prefer.



So I suggest that we use the more generic function name bin(), with a
second choice of discretize()  (though that seems annoyingly easy to
spell incorrectly)



I really don't think bin() is good choice here, bucket has same meaning 
in this context and bin is often used as shorthand for binary which 
would be confusing here.


Anyway I currently left the name as it is, I would not be against using 
width_bucket() or even just bucket(), not sure about the discretize() 
option.



--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index c715ca2..e28ff68 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -920,6 +920,31 @@
entryliteralwidth_bucket(5.35, 0.024, 10.06, 5)/literal/entry
entryliteral3/literal/entry
   /row
+
+  row
+   entry
+indexterm
+ primaryvarwidth_bucket/primary
+/indexterm
+literalfunctionvarwidth_bucket(parameterop/parameter typenumeric/type, parameterthresholds/parameter typenumerc[]/type)/function/literal/entry
+   entrytypeint/type/entry
+   entryreturn the bucket to which parameteroperand/ would
+   be assigned based on right-bound bucket parameterthresholds/,
+   the parameterthresholds/ must be ordered (smallest first)/entry
+   entryliteralvarwidth_bucket(5.35::numeric, ARRAY[1, 3, 4, 6]::numeric[])/literal/entry
+   entryliteral3/literal/entry
+  /row
+
+  row
+   entryliteralfunctionvarwidth_bucket(parameterop/parameter typedp/type, parameterthresholds/parameter typedp[]/type)/function/literal/entry
+   entrytypeint/type/entry
+   entryreturn the bucket to which parameteroperand/ would
+   be assigned based on right-bound bucket parameterthresholds/,
+   the parameterthresholds/ must be ordered (smallest first)/entry
+   entryliteralvarwidth_bucket(5.35::float8, ARRAY[1, 3, 4, 6]::float8[])/literal/entry
+   entryliteral3/literal/entry
+  /row
+
  /tbody
 /tgroup
/table
diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 41b3eaa..fcbb623 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -2800,6 +2800,61 @@ width_bucket_float8(PG_FUNCTION_ARGS)
 	PG_RETURN_INT32(result);
 }
 
+/*
+ * Implements the float8 version of the varwidth_bucket() function.
+ * See also varwidth_bucket_general() and varwidth_bucket_int8().
+ *
+ * 'thresholds' is an array (must be sorted from smallest to biggest value)
+ * containing right-bounds for each bucket, varwidth_bucket() returns
+ * integer indicating the bucket number that 'operand' belongs to. An operand
+ * greater than the upper bound is 

Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-08-25 Thread Heikki Linnakangas

On 07/12/2014 05:16 AM, Jeff Davis wrote:

On Fri, 2014-07-11 at 11:51 -0400, Tom Lane wrote:

Jeff Davis pg...@j-davis.com writes:

Attached is a small patch to $SUBJECT.
In master, only single-byte characters are allowed as an escape. Of
course, with the patch it must still be a single character, but it may
be multi-byte.


I'm concerned about the performance cost of this patch.  Have you done
any measurements about what kind of overhead you are putting on the
inner loop of similar_escape?


I didn't consider this very performance critical, because this is
looping through the pattern, which I wouldn't expect to be a long
string. On my machine using en_US.UTF-8, the difference is imperceptible
for a SIMILAR TO ... ESCAPE query.

I was able to see about a 2% increase in runtime when using the
similar_escape function directly. I made a 10M tuple table and did:

 explain analyze
   select
similar_escape('','#')
 from t;

which was the worst reasonable case I could think of.


Actually, that gets optimized to a constant in the planner:

postgres=# explain  verbose select
similar_escape('','#') 
from t;
QUERY PLAN 




--
 Seq Scan on public.t  (cost=0.00..144247.85 rows=985 width=0)
   Output: 
'^(?:

)$'::text
 Planning time: 0.033 ms
(3 rows)

With a working test case:

create table t (pattern text);
insert into t select 
'' from 
generate_series(1, 100);

vacuum t;

explain (analyze) select similar_escape(pattern,'#') from t;

your patch seems to be about 2x-3x as slow as unpatched master. So this 
needs some optimization. A couple of ideas:


1. If the escape string is in fact a single-byte character, you can 
proceed with the loop just as it is today, without the pg_mblen calls.


2. Since pg_mblen() will always return an integer between 1-6, it would 
probably be faster to replace the memcpy() and memcmp() calls with 
simple for-loops iterating byte-by-byte.


In very brief testing, with the 1. change above, the performance with 
this patch is back to what it's without the patch. See attached.


- Heikki

diff --git a/src/backend/utils/adt/regexp.c b/src/backend/utils/adt/regexp.c
index caf45ef..dd46325 100644
--- a/src/backend/utils/adt/regexp.c
+++ b/src/backend/utils/adt/regexp.c
@@ -688,11 +688,16 @@ similar_escape(PG_FUNCTION_ARGS)
 		elen = VARSIZE_ANY_EXHDR(esc_text);
 		if (elen == 0)
 			e = NULL;			/* no escape character */
-		else if (elen != 1)
-			ereport(ERROR,
-	(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
-	 errmsg(invalid escape string),
-  errhint(Escape string must be empty or one character.)));
+		else
+		{
+			int			escape_mblen = pg_mbstrlen_with_len(e, elen);
+
+			if (escape_mblen  1)
+ereport(ERROR,
+		(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
+		 errmsg(invalid escape string),
+		 errhint(Escape string must be empty or one character.)));
+		}
 	}
 
 	/*--
@@ -723,59 +728,94 @@ similar_escape(PG_FUNCTION_ARGS)
 	while (plen  0)
 	{
 		char		pchar = *p;
+		int			mblen;
 
-		if (afterescape)
+		/*
+		 * If the escape string is single-byte character, we can process the
+		 * the pattern one byte at a time, ignoring multi-byte characters.
+		 * (This works because all server-encodings have the property that the
+		 * a non-first byte of a multi-byte characters always has the high-bit
+		 * set, and hence we cannot be fooled by a byte in the middle of a
+		 * multi-byte character.)
+		 */
+		if (elen == 1 || (mblen = pg_mblen(p)))
 		{
-			if (pchar == ''  !incharclass)	/* for SUBSTRING patterns */
-*r++ = ((nquotes++ % 2) == 0) ? '(' : ')';
-			else
+			if (afterescape)
+			{
+if (pchar == ''  !incharclass)	/* for SUBSTRING patterns */
+	*r++ = ((nquotes++ % 2) == 0) ? '(' : ')';
+else
+{
+	*r++ = '\\';
+	*r++ = pchar;
+}
+afterescape = false;
+			}
+			else if (e  pchar == *e)
+			{
+/* SQL99 escape character; do not send to output */
+afterescape = true;
+			}
+			else if (incharclass)
+			{
+if (pchar == '\\')
+	*r++ = '\\';
+*r++ = pchar;
+if (pchar == ']')
+	incharclass = false;
+			}
+			else if (pchar == '[')
+			{
+*r++ = pchar;
+incharclass = true;
+			}
+			else if (pchar == '%')
+			{
+*r++ = '.';
+*r++ = '*';
+			}
+			else if (pchar == '_')
+*r++ = '.';
+			else if (pchar == '(')
+			{
+/* convert to non-capturing parenthesis */
+*r++ = '(';
+*r++ = '?';
+*r++ = ':';
+			}
+			else if (pchar == '\\' || pchar == '.' ||
+	 pchar == '^' || pchar == '$')
 			{
 *r++ = '\\';
 *r++ = pchar;
 			}
-			afterescape = false;
-		}
-		else if 

Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-08-25 Thread Heikki Linnakangas

On 08/25/2014 04:48 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

On 07/12/2014 05:16 AM, Jeff Davis wrote:

I was able to see about a 2% increase in runtime when using the
similar_escape function directly. I made a 10M tuple table and did:

explain analyze
select
similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#')
 from t;

which was the worst reasonable case I could think of. (It appears that
selecting from a table is faster than from generate_series. I'm curious
what you use when testing the performance of an individual function at
the SQL level.)



A large table like that is what I usually do. A large generate_series()
spends a lot of time building the tuplestore, especially if it doesn't
fit in work_mem and spills to disk. Sometimes I use this to avoid it:



explain analyze
select
similar_escape('ΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣΣ','#')
from generate_series(1, 1) a, generate_series(1,1000);



although in my experience it still has somewhat more overhead than a
straight seqscan because.


[ scratches head... ]  Surely similar_escape is marked immutable, and
will therefore be executed exactly once in either of these formulations,
because the planner will fold the expression to a constant.


Yeah, just noticed that myself..

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Unwanted LOG during recovery of DROP TABLESPACE REDO

2014-08-25 Thread Heikki Linnakangas

On 08/15/2014 12:31 PM, Marko Tiikkaja wrote:

On 7/16/14 4:33 PM, Tom Lane wrote:

Rajeev rastogi rajeev.rast...@huawei.com writes:

I found and fixed a bug that causes recovery (crash recovery , PITR) to throw 
unwanted LOG message if the tablespace symlink is not found during the 
processing of DROP TABLESPACE redo.
  LOG:  could not remove symbolic link 
pg_tblspc/16384: No such file or directory


I don't think that's a bug: it's the designed behavior.  Why should we
complicate the code to not print a log message in a situation where
it's unclear if the case is expected or not?


I agree with Tom here; this doesn't seem like an improvement.


Well, for comparison, we also silently ignore non-existent files when 
replaying a DROP TABLE. I could go either way myself, but this is 
clearly a very minor thing, and we have two -1's, so I'm marking this as 
Rejected in the commitfest.


Thanks anyway!

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-08-25 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 On 08/25/2014 04:48 PM, Tom Lane wrote:
 [ scratches head... ]  Surely similar_escape is marked immutable, and
 will therefore be executed exactly once in either of these formulations,
 because the planner will fold the expression to a constant.

 Yeah, just noticed that myself..

... although, given that, it is also fair to wonder how much the speed of
similar_escape really matters.  Surely in most use-cases the pattern and
escape char will be constants.  And, when they are not, wouldn't the
subsequent parsing work for the regexes dominate the cost of
similar_escape anyway?

IOW, I'm not sure we should be advising Jeff to duplicate code in
order to have a fast path.  Keeping it short might be the best goal.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-08-25 Thread Alvaro Herrera
Thomas Munro wrote:
 On 22 August 2014 23:02, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Forgive me if I have misunderstood but it looks like your incremental
 patch included a couple of unrelated changes, namely
 s/0/InvalidCommandId/ and a reversion of ConditionalMultiXactIdWait.

Yeah, sorry about those, will push separately.

 Undoing those gave me skip-locked-v12-b.patch, attached for reference,
 and I've included those changes in a new full patch
 skip-locked-v13.patch (also rebased).

 +1 for the change from if-then-else to switch statements.

 I was less sure about the 'goto failed' change, but I couldn't measure
 any change in concurrent throughput in my simple benchmark, so I guess
 that extra buffer lock/unlock doesn't matter and I can see why you
 prefer that control flow.

I was also thinking in reducing the lock level acquired to shared rather
than exclusive in all the paths that goto failed.  Since the lock is
only used to read a couple of fields from the tuple, shared is enough
and should give slightly better concurrency.  Per buffer access rules in
src/backend/storage/buffer/README:

: 1. To scan a page for tuples, one must hold a pin and either shared or
: exclusive content lock.  To examine the commit status (XIDs and status bits)
: of a tuple in a shared buffer, one must likewise hold a pin and either shared
: or exclusive lock.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow multi-byte characters as escape in SIMILAR TO and SUBSTRING

2014-08-25 Thread Heikki Linnakangas

On 08/25/2014 06:14 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

On 08/25/2014 04:48 PM, Tom Lane wrote:

[ scratches head... ]  Surely similar_escape is marked immutable, and
will therefore be executed exactly once in either of these formulations,
because the planner will fold the expression to a constant.



Yeah, just noticed that myself..


... although, given that, it is also fair to wonder how much the speed of
similar_escape really matters.  Surely in most use-cases the pattern and
escape char will be constants.  And, when they are not, wouldn't the
subsequent parsing work for the regexes dominate the cost of
similar_escape anyway?

IOW, I'm not sure we should be advising Jeff to duplicate code in
order to have a fast path.  Keeping it short might be the best goal.


It's certainly not worth bending over backwards for a small performance 
gain here, but I think special-casing a single-byte escape sequence is 
still quite reasonable. It doesn't make the code any longer.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_statements cluttered with DEALLOCATE dbdpg_p*

2014-08-25 Thread Heikki Linnakangas

On 07/20/2014 11:51 PM, Peter Geoghegan wrote:

On Sun, Jul 20, 2014 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:

However, this is certainly a behavioral change.  Perhaps squeeze it
into 9.4, but not the back braches?


+1


Ok, done. (We're a month closer to releasing 9.4 than we were when this 
consensus was reached, but I think it's still OK...)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \watch versus \timing

2014-08-25 Thread Heikki Linnakangas

On 08/18/2014 10:51 AM, Michael Paquier wrote:

On Mon, Aug 18, 2014 at 4:12 PM, Fujii Masao masao.fu...@gmail.com wrote:

On Mon, Aug 18, 2014 at 3:19 PM, Michael Paquier
michael.paqu...@gmail.com wrote:

On Thu, Aug 14, 2014 at 11:10 PM, Fujii Masao masao.fu...@gmail.com wrote:

Attached patch changes \watch so that it displays how long the query takes
if \timing is enabled.

I didn't refactor PSQLexec and SendQuery into one routine because
the contents of those functions are not so same. I'm not sure how much
it's worth doing that refactoring. Anyway this feature is quite useful
even without that refactoring, I think.


The patch applies correctly and it does correctly what it is made for:
=# \timing
Timing is on.
=# select 1;
  ?column?
--
 1
(1 row)
Time: 0.407 ms
=# \watch 1
Watch every 1sMon Aug 18 15:17:41 2014
  ?column?
--
 1
(1 row)
Time: 0.397 ms
Watch every 1sMon Aug 18 15:17:42 2014
  ?column?
--
 1
(1 row)
Time: 0.615 ms

Refactoring it would be worth it thinking long-term... And printing
the timing in PSQLexec code path is already done in SendQuery, so
that's doing two times the same thing IMHO.

Now, looking at the patch, introducing the new function
PSQLexecInternal with an additional parameter to control the timing is
correct choosing the non-refactoring way of doing. But I don't think
that printing the time outside PSQLexecInternal is consistent with
SendQuery. Why not simply control the timing with a boolean flag and
print the timing directly in PSQLexecInternal?


Because the timing needs to be printed after the query result.

Thanks for pointing that. Yes this makes the refactoring a bit more difficult.


Michael reviewed this, so I'm marking this as Ready for Committer. Since 
you're a committer yourself, I expect you'll take it over from here.


I agree that refactoring this would be nice in the long-term, and I also 
agree that it's probably OK as it is in the short-term. I don't like the 
name PSQLexecInternal, though. PSQLexec is used for internal commands 
anyway. In fact it's backwards, because PSQLexecInternal is used for 
non-internal queries, given by \watch, while PSQLexec is used for 
internal commands.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [TODO] Track number of files ready to be archived in pg_stat_archiver

2014-08-25 Thread Gilles Darold
Le 21/08/2014 10:17, Julien Rouhaud a écrit :
 Hello,

 Attached patch implements the following TODO item :

 Track number of WAL files ready to be archived in pg_stat_archiver

 However, it will track the total number of any file ready to be
 archived, not only WAL files.

 Please let me know what you think about it.

 Regards.

Hi,

Maybe looking at archive ready count will be more efficient if it is
done in the view definition through a function. This will avoid any
issue with incrementing/decrement of archiverStats.ready_count and the
patch will be more simple. Also I don't think we need an other memory
allocation for that, the counter information is always in the number of
.ready files in the archive_status directory and the call to
pg_stat_archiver doesn't need high speed performances.

For example having a new function called
pg_stat_get_archive_ready_count() that does the same at what you add
into pgstat_read_statsfiles() and the pg_stat_archiver defined as follow:

CREATE VIEW pg_stat_archiver AS
s.failed_count,
s.last_failed_wal,
s.last_failed_time,
pg_stat_get_archive_ready() as ready_count,
s.stats_reset
FROM pg_stat_get_archiver() s;

The function pg_stat_get_archive_ready_count() will also be available
for any other querying.

-- 
Gilles Darold
http://dalibo.com - http://dalibo.org



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-08-25 Thread Heikki Linnakangas

On 07/31/2014 12:29 AM, Thomas Munro wrote:

On 29 July 2014 02:35, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

David Rowley wrote:


I've also been looking at the isolation tests and I see that you've added a
series of tests for NOWAIT. I was wondering why you did that as that's
really existing code, probably if you thought the tests were a bit thin
around NOWAIT then maybe that should be a separate patch?


The isolation tester is new so we don't have nearly enough tests for it.
Adding more meaningful tests is good even if they're unrelated to the
patch at hand.


Here are my isolation tests for NOWAIT as a separate patch,
independent of SKIP LOCKED.  They cover the tuple lock, regular row
lock and multixact row lock cases.


Thanks, committed.


I guess this might be called white
box testing, since it usese knowledge of how to construct schedules
that hit the three interesting code paths that trigger the error, even
though you can't see from the output why the error was raised in each
case without extra instrumentation (though it did cross my mind that
it could be interesting at the very least for testing if the error
message were different in each case).


Yeah, seems reasonable. This kind of tests might become obsolete in the 
future if the internals change a lot, so that e.g. we don't use 
multixids anymore. But even if that happens, there's little harm in 
keeping the tests, and meanwhile it's good to have coverage of these cases.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improving speed of make check-world

2014-08-25 Thread Heikki Linnakangas

On 08/15/2014 08:45 AM, Peter Eisentraut wrote:

make check-world creates a temporary installation in every subdirectory
it runs a test in, which is stupid: it's very slow and uses a lot of
disk space.  It's enough to do this once per run.  That is the essence
of what I have implemented.  It cuts the time for make check-world in
half or less, and it saves gigabytes of disk space.


Nice!


The idea is that we only maintain one temporary installation under the
top-level directory.  By looking at the variable MAKELEVEL, we can tell
whether we are the top-level make invocation.  If so, make a temp
installation.  If not, we know that the upper layers have already done
it and we can just point to the existing temp installation.

I do this by ripping out the temp installation logic from pg_regress and
implementing it directly in the makefiles.  This is much simpler and has
additional potential benefits:

The pg_regress temp install mode is actually a combination of two
functionalities: temp install plus temp instance.  Until now, you could
only get the two together, but the temp instance functionality is
actually quite useful by itself.  It opens up the possibility of
implementing make check for external pgxs modules, for example.

Also, you could now run the temp installation step using parallel make,
making it even faster.  This was previously disabled because the make
flags would have to pass through pg_regress.  It still won't quite work
to run make check-world -j8, say, because we can't actually run the
tests in parallel (yet), but something like make -C src/test/regress
check -j8 should work.

To that end, I have renamed the pg_regress --temp-install option to
--temp-instance.  Since --temp-install is only used inside the source
tree, this shouldn't cause any compatibility problems.


Yeah, that all makes a lot of sense.

The new EXTRA_INSTALL makefile variable ought to be documented in 
extend.sgml, where we list REGRESS_OPTS and others.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-08-25 Thread Alvaro Herrera
Fabrízio de Royes Mello wrote:
 On Sat, Aug 23, 2014 at 8:53 AM, Michael Paquier michael.paqu...@gmail.com
 wrote:
 
  On Sat, Aug 23, 2014 at 3:32 AM, Alvaro Herrera
  alvhe...@2ndquadrant.com wrote:
   Great.  Pushed.  Thanks for the patch.
  There is a typo in what has been pushed. Patch attached.
 
 
 Thanks... I fixed that in my last patch to change 'loggedness' to
 'persistence'. Attached.

Thanks, pushed.  I also added a comment explaining why it's okay to do
what we're doing.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Specifying the unit in storage parameter

2014-08-25 Thread Fujii Masao
On Thu, Aug 21, 2014 at 4:20 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Fri, Aug 8, 2014 at 12:32 PM, Fujii Masao masao.fu...@gmail.com wrote:
 This is not user-friendly. I'd like to propose the attached patch which
 introduces the infrastructure which allows us to specify the unit when
 setting INTEGER storage parameter like autovacuum_vacuum_cost_delay.
 Comment? Review?
 This patch makes autovacuum_vacuum_cost_delay more consistent with
 what is at server level. So +1.

Thanks for reviewing the patch!

 Looking at the patch, the parameter fillfactor in the category
 RELOPT_KIND_HEAP (the first element in intRelOpts of reloptions.c) is
 not updated with the new field. It is only a one-line change.
 @@ -97,7 +97,7 @@ static relopt_int intRelOpts[] =
 Packs table pages only to this percentage,
 RELOPT_KIND_HEAP
 },
 -   HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100
 +   HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100, 0
 },

Oh, good catch. I wonder why I did such a mistake...
Attached is the updated version of the patch.

 Except that, I tested as well the patch and it works as expected. The
 documentation, as well as the regression tests remain untouched, but I
 guess that this is fine (not seeing similar tests in regressions, and
 documentation does not specify the unit for a given parameter).

I think that it's worth adding the regression test for this feature.
Attached patch updates the regression test.

Regards,

-- 
Fujii Masao
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
***
*** 97,103  static relopt_int intRelOpts[] =
  			Packs table pages only to this percentage,
  			RELOPT_KIND_HEAP
  		},
! 		HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 97,103 
  			Packs table pages only to this percentage,
  			RELOPT_KIND_HEAP
  		},
! 		HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 105,111  static relopt_int intRelOpts[] =
  			Packs btree index pages only to this percentage,
  			RELOPT_KIND_BTREE
  		},
! 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 105,111 
  			Packs btree index pages only to this percentage,
  			RELOPT_KIND_BTREE
  		},
! 		BTREE_DEFAULT_FILLFACTOR, BTREE_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 113,119  static relopt_int intRelOpts[] =
  			Packs hash index pages only to this percentage,
  			RELOPT_KIND_HASH
  		},
! 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 113,119 
  			Packs hash index pages only to this percentage,
  			RELOPT_KIND_HASH
  		},
! 		HASH_DEFAULT_FILLFACTOR, HASH_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 121,127  static relopt_int intRelOpts[] =
  			Packs gist index pages only to this percentage,
  			RELOPT_KIND_GIST
  		},
! 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 121,127 
  			Packs gist index pages only to this percentage,
  			RELOPT_KIND_GIST
  		},
! 		GIST_DEFAULT_FILLFACTOR, GIST_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 129,135  static relopt_int intRelOpts[] =
  			Packs spgist index pages only to this percentage,
  			RELOPT_KIND_SPGIST
  		},
! 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100
  	},
  	{
  		{
--- 129,135 
  			Packs spgist index pages only to this percentage,
  			RELOPT_KIND_SPGIST
  		},
! 		SPGIST_DEFAULT_FILLFACTOR, SPGIST_MIN_FILLFACTOR, 100, 0
  	},
  	{
  		{
***
*** 137,143  static relopt_int intRelOpts[] =
  			Minimum number of tuple updates or deletes prior to vacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, INT_MAX
  	},
  	{
  		{
--- 137,143 
  			Minimum number of tuple updates or deletes prior to vacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, INT_MAX, 0
  	},
  	{
  		{
***
*** 145,151  static relopt_int intRelOpts[] =
  			Minimum number of tuple inserts, updates or deletes prior to analyze,
  			RELOPT_KIND_HEAP
  		},
! 		-1, 0, INT_MAX
  	},
  	{
  		{
--- 145,151 
  			Minimum number of tuple inserts, updates or deletes prior to analyze,
  			RELOPT_KIND_HEAP
  		},
! 		-1, 0, INT_MAX, 0
  	},
  	{
  		{
***
*** 153,159  static relopt_int intRelOpts[] =
  			Vacuum cost delay in milliseconds, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 100
  	},
  	{
  		{
--- 153,159 
  			Vacuum cost delay in milliseconds, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 0, 100, GUC_UNIT_MS
  	},
  	{
  		{
***
*** 161,167  static relopt_int intRelOpts[] =
  			Vacuum cost amount available before napping, for autovacuum,
  			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST
  		},
! 		-1, 1, 1
  	},
  	{
  		{
--- 161,167 
  			

Re: [HACKERS] Hardening pg_upgrade

2014-08-25 Thread Bruce Momjian
On Mon, Aug 25, 2014 at 06:34:12AM -0700, Kevin Grittner wrote:
 Bernd Helmle maili...@oopsware.de wrote:
  Magnus Hagander mag...@hagander.net wrote:
 
  I vote for discarding 8.3 support in pg_upgrade.  There are already
  enough limitations on pg_upgrade from pre-8.4 to make it of questionable
  value; if it's going to create problems like this, it's time to cut the
  rope.
 
  +1. 8.3 has been unsupported for a fairly long time now, and you can
  still do a two-step upgrade if you're on that old a version.
 
  Also +1 from my side. I've seen some old 8.3 installations at customers,
  still, but they aren't large and can easily be upgraded with a two step
  upgrade.
 
 +1
 
 If we could leave it without it being any extra work, fine; but
 once a release is over a year out of support, if it's a matter of
 putting extra work on the pg hackers or on the users who have
 chosen to wait more than a year after support ends to do the
 upgrade, I'm OK with asking those users to do a two-phase upgrade
 or fall back to pg_dump.  It's not like we're leaving them without 
 any options.

OK, I will move in the direction of removing 8.3 support and use a
single query to pull schema information.   I was hesistant to remove 8.3
support as I know we have kept pg_dump support all the way back to 7.0,
but it seems pg_upgrade need not have the same version requirements.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \watch versus \timing

2014-08-25 Thread Fujii Masao
On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 On 08/18/2014 10:51 AM, Michael Paquier wrote:

 On Mon, Aug 18, 2014 at 4:12 PM, Fujii Masao masao.fu...@gmail.com
 wrote:

 On Mon, Aug 18, 2014 at 3:19 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

 On Thu, Aug 14, 2014 at 11:10 PM, Fujii Masao masao.fu...@gmail.com
 wrote:

 Attached patch changes \watch so that it displays how long the query
 takes
 if \timing is enabled.

 I didn't refactor PSQLexec and SendQuery into one routine because
 the contents of those functions are not so same. I'm not sure how much
 it's worth doing that refactoring. Anyway this feature is quite useful
 even without that refactoring, I think.


 The patch applies correctly and it does correctly what it is made for:
 =# \timing
 Timing is on.
 =# select 1;
   ?column?
 --
  1
 (1 row)
 Time: 0.407 ms
 =# \watch 1
 Watch every 1sMon Aug 18 15:17:41 2014
   ?column?
 --
  1
 (1 row)
 Time: 0.397 ms
 Watch every 1sMon Aug 18 15:17:42 2014
   ?column?
 --
  1
 (1 row)
 Time: 0.615 ms

 Refactoring it would be worth it thinking long-term... And printing
 the timing in PSQLexec code path is already done in SendQuery, so
 that's doing two times the same thing IMHO.

 Now, looking at the patch, introducing the new function
 PSQLexecInternal with an additional parameter to control the timing is
 correct choosing the non-refactoring way of doing. But I don't think
 that printing the time outside PSQLexecInternal is consistent with
 SendQuery. Why not simply control the timing with a boolean flag and
 print the timing directly in PSQLexecInternal?


 Because the timing needs to be printed after the query result.

 Thanks for pointing that. Yes this makes the refactoring a bit more
 difficult.


 Michael reviewed this, so I'm marking this as Ready for Committer. Since
 you're a committer yourself, I expect you'll take it over from here.

Yep!

 I agree that refactoring this would be nice in the long-term, and I also
 agree that it's probably OK as it is in the short-term. I don't like the
 name PSQLexecInternal, though. PSQLexec is used for internal commands
 anyway. In fact it's backwards, because PSQLexecInternal is used for
 non-internal queries, given by \watch, while PSQLexec is used for internal
 commands.

Agreed. So what about PSQLexecCommon (inspired by
the relation between LWLockAcquireCommon and LWLockAcquire)?
Or any better name?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LIMIT for UPDATE and DELETE

2014-08-25 Thread Jeff Janes
On Sun, Aug 24, 2014 at 11:48 PM, Etsuro Fujita fujita.ets...@lab.ntt.co.jp
 wrote:

 Hi Rukh,


 (2014/08/15 6:18), Rukh Meski wrote:

 Based on the feedback on my previous patch, I've separated only the
 LIMIT part into its own feature.  This version plays nicely with
 inheritance.  The intended use is splitting up big UPDATEs and DELETEs
 into batches more easily and efficiently.


 Before looking into the patch, I'd like to know the use cases in more
 details.



There are two common use cases I can think of:

1)

I've just added a column to an existing table, and it is all NULL.  I've
changed the code to populate that column appropriately for new or updated
rows, but I need to back fill the existing rows.  I have a (slow) method to
compute the new value.  (I've not yet changed the code to depend on that
column being populated)

The obvious solution is:

update the_table set new_col=populate_new_col(whatever) where new_col is
null.

But this will bloat the table because vacuum cannot intervene, and will
take a very long time.  The first row to be update will remain locked until
the last row gets updated, which is not acceptable.  And if something goes
wrong before the commit, you've lost all the work.

With the limit clause, you can just do this:

update the_table set new_col=populate_new_col(whatever) where new_col is
null limit 5;

In a loop with appropriate vacuuming and throttling.

2)

I've introduced or re-designed partitioning, and need to migrate rows to
the appropriate partitions without long lived row locks.

create table pgbench_accounts2 () inherits (pgbench_accounts);

and then in a loop:

with t as (delete from only pgbench_accounts where aid  50 limit 5000
returning *)
  insert into pgbench_accounts2 select * from t;

Cheers,

Jeff


Re: [HACKERS] Specifying the unit in storage parameter

2014-08-25 Thread Alvaro Herrera
Fujii Masao wrote:
 On Thu, Aug 21, 2014 at 4:20 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:

  Looking at the patch, the parameter fillfactor in the category
  RELOPT_KIND_HEAP (the first element in intRelOpts of reloptions.c) is
  not updated with the new field. It is only a one-line change.
  @@ -97,7 +97,7 @@ static relopt_int intRelOpts[] =
  Packs table pages only to this percentage,
  RELOPT_KIND_HEAP
  },
  -   HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100
  +   HEAP_DEFAULT_FILLFACTOR, HEAP_MIN_FILLFACTOR, 100, 0
  },
 
 Oh, good catch. I wonder why I did such a mistake...

Uninitialized elements at end of struct are filled with zeroes.  We do
have other examples of this -- for instance, config_generic in the guc.c
tables are almost always only 5 members long even though the struct is
quite a bit longer than that.  Most entries do not even have flags set.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-08-25 Thread Fabrízio de Royes Mello
On Mon, Aug 25, 2014 at 2:55 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Fabrízio de Royes Mello wrote:
  On Sat, Aug 23, 2014 at 8:53 AM, Michael Paquier 
michael.paqu...@gmail.com
  wrote:
  
   On Sat, Aug 23, 2014 at 3:32 AM, Alvaro Herrera
   alvhe...@2ndquadrant.com wrote:
Great.  Pushed.  Thanks for the patch.
   There is a typo in what has been pushed. Patch attached.
  
 
  Thanks... I fixed that in my last patch to change 'loggedness' to
  'persistence'. Attached.

 Thanks, pushed.  I also added a comment explaining why it's okay to do
 what we're doing.


Thanks...

I'm working on a refactoring to pass down the relpersistence flag to
finish_heap_swap... is this valid for now or I should leave it to another
patch?

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Verbose output of pg_dump not show schema name

2014-08-25 Thread Heikki Linnakangas

On 08/20/2014 11:11 PM, Fabrízio de Royes Mello wrote:

On Wed, Aug 20, 2014 at 2:43 AM, Michael Paquier michael.paqu...@gmail.com
wrote:


I had a look at this patch, and here are a couple of comments:
1) Depending on how ArchiveEntry is called to register an object to
dump, namespace may be NULL, but it is not the case
namespace-dobj.name, so you could get the namespace name at the top
of the function that have their verbose output improved with something
like that:
const char *namespace = tbinfo-dobj.namespace ?
tbinfo-dobj.namespace-dobj.name : NULL;
And then simplify the message output as follows:
if (namespace)
write_msg(blah \%s\.\%s\ blah, namespace, classname);
else
write_msg(blah \%s\ blah, classname);
You can as well safely remove the checks on namespace-dobj.name.


Ok


AFAICS, the namespace can never be NULL in any of these. There is a 
selectSourceSchema(fout, tbinfo-dobj.namespace-dobj.name) call 
before or after printing the message, so if tbinfo-dobj.namespace is 
NULL, you'll crash anyway. Please double-check, and remove the dead code 
if you agree.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Fujii Masao
On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com wrote:
 On Mon, Aug 25, 2014 at 3:48 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 On Mon, Aug 25, 2014 at 3:36 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 Attached WIP patch adds -C (--concurrently) option for reindexdb
 command for concurrently reindexing.
 If we specify -C option with any table then reindexdb do reindexing
 concurrently with minimum lock necessary.
 Note that we cannot use '-s' option (for system catalog) and '-C'
 option at the same time.
 This patch use simple method as follows.

 1. Do CREATE INDEX CONCURRENTLY new index which has same definition
 as target index
 2. Aquire ACCESS EXCLUSIVE LOCK to target table( and transaction starts)
 3. Swap old and new index
 4. Drop old index
 5. COMMIT

 These process are based on pg_repack(or pg_reorg) does, done via SQL.

+1. I have some shell scripts which do that reindex technique,
and I'd be happy if I can replace them with this feature.

Can this technique reindex the primary key index and the index
which other objects depend on (e.g., foreign key)?

 This would be a useful for users, but I am not sure that you can call
 that --concurrently as the rename/swap phase requires an exclusive
 lock, and you would actually block a real implementation of REINDEX
 CONCURRENTLY (hm...).


 this might be difficult to call this as --concurrently.
 It might need to be change the name.

I'm OK to say that as --concurrently if the document clearly
explains that restriction. Or --almost-concurrently? ;P

 ToDo
 - Multi language support for log message.
 Why? I am not sure that's something you should deal with.

 The log message which has been existed already are supported multi
 language support using by .po file,
 But newly added message has not corresponded message in .po file, I thought.

I don't think that you need to add the update of .po file into
the feature patch.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardening pg_upgrade

2014-08-25 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 OK, I will move in the direction of removing 8.3 support and use a
 single query to pull schema information.   I was hesistant to remove 8.3
 support as I know we have kept pg_dump support all the way back to 7.0,
 but it seems pg_upgrade need not have the same version requirements.

Not really related, but ... I've been thinking that it's time to rip out
pg_dump's support for server versions before 7.3 or 7.4.  That would let
us get rid of a lot of klugy code associated with the lack of schemas
and dependency info in the older versions.  It's possible that we should
move the cutoff even further --- I've not looked closely at how much could
be removed by dropping versions later than 7.3.

Aside from the question of how much old code could be removed, there's the
salient point of how do we test pg_dump against such old branches?  The
further back you go the harder it is to even build PG on modern platforms,
and the less likely it will work (I note for example that pre-8.0
configure doesn't try to use -fwrapv, let alone some of the other switches
we've found necessary on recent gcc).  I've usually tested pg_dump patches
against old servers by running them against builds I have in captivity on
my old HPPA box ... but once that dies, I'm *not* looking forward to
trying to rebuild 7.x on my current machines.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-08-25 Thread Heikki Linnakangas

On 06/18/2014 09:17 PM, Josh Berkus wrote:

On 06/18/2014 11:03 AM, Andres Freund wrote:

Well, all those actually do write to the xlog (to write a new
checkpoint, containing the updated control file). Since pg_resetxlog has
done all this pretty much since forever renaming it now seems to be a
big hassle for users for pretty much no benefit? This isn't a tool the
average user should ever touch.


If we're using it to create a unique system ID which can be used to
orchestrate replication and clustering systems, a lot more people are
going to be touching it than ever did before -- and not just for BDR.


I think pg_resetxlog is still appropriate: changing the system ID will 
reset the WAL. In particular, any archived WAL will become useless.


But yeah, this does change the target audience of pg_resetxlog 
significantly. Now that we'll have admins running pg_resetxlog as part 
of normal operations, we have to document it carefully. We also have to 
design the user interface carefully, to make it more clear that while 
resetting the system ID won't eat your data, some of the other settings 
will.


The proposed pg_resetxlog --help output looks like this:


pg_resetxlog resets the PostgreSQL transaction log.

Usage:
  pg_resetxlog [OPTION]... DATADIR

Options:
  -e XIDEPOCH  set next transaction ID epoch
  -f   force update to be done
  -l XLOGFILE  force minimum WAL starting location for new transaction log
  -m MXID,MXID set next and oldest multitransaction ID
  -n   no update, just show what would be done (for testing)
  -o OID   set next OID
  -O OFFSETset next multitransaction offset
  -s [SYSID]   set system identifier (or generate one)
  -V, --versionoutput version information, then exit
  -x XID   set next transaction ID
  -?, --help   show this help, then exit

Report bugs to pgsql-b...@postgresql.org.


I don't think that's good enough. The -s option should be displayed more 
prominently, and the dangerous options like -l and -x should be more 
clearly labeled as such.


I would de-emphasize setting the system ID to a particular value. It 
might be useful for disaster recovery, like -x, but in general you 
should always reset it to a new unique value. If you make it too easy to 
set it to a particular value, someone will try initialize a streaming 
standby server using initdb+pg_dump, and changing the system ID to match 
the master's.


The user manual for pg_resetxlog says:


pg_resetxlog clears the write-ahead log (WAL) and optionally resets
some other control information stored in the pg_control file. This
function is sometimes needed if these files have become corrupted. It
should be used only as a last resort, when the server will not start
due to such corruption.


That's clearly not true for the -s option.

In summary, I think we want this feature in some form, but we'll somehow 
need to be make the distinction to the dangerous pg_resetxlog usage. It 
might be best, after all, to make this a separate utility, 
pg_resetsystemid. It would not need to have the capability to set the 
system ID to a particular value, only a randomly assigned one (setting 
it to a particular value could be added to pg_resetxlog, where other 
dangerous options are).


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardening pg_upgrade

2014-08-25 Thread Bruce Momjian
On Mon, Aug 25, 2014 at 03:04:52PM -0400, Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  OK, I will move in the direction of removing 8.3 support and use a
  single query to pull schema information.   I was hesistant to remove 8.3
  support as I know we have kept pg_dump support all the way back to 7.0,
  but it seems pg_upgrade need not have the same version requirements.
 
 Not really related, but ... I've been thinking that it's time to rip out
 pg_dump's support for server versions before 7.3 or 7.4.  That would let
 us get rid of a lot of klugy code associated with the lack of schemas
 and dependency info in the older versions.  It's possible that we should
 move the cutoff even further --- I've not looked closely at how much could
 be removed by dropping versions later than 7.3.

Yeah, it kind of is related, as that was the logic I followed originally
for pg_upgrade, i.e. never remove supported versions --- that has been
overridden.

 Aside from the question of how much old code could be removed, there's the
 salient point of how do we test pg_dump against such old branches?  The
 further back you go the harder it is to even build PG on modern platforms,
 and the less likely it will work (I note for example that pre-8.0
 configure doesn't try to use -fwrapv, let alone some of the other switches
 we've found necessary on recent gcc).  I've usually tested pg_dump patches
 against old servers by running them against builds I have in captivity on
 my old HPPA box ... but once that dies, I'm *not* looking forward to
 trying to rebuild 7.x on my current machines.

Yes.  You could argue that a double-upgrade from 7.0 to 8.0 to 9.4 would
be less buggy than one from 7.0 to 9.4.  I agree there is almost zero
testing of very old versions.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall reccomendation in release notes

2014-08-25 Thread Bruce Momjian
On Thu, Aug 21, 2014 at 12:18:46PM -0400, Bruce Momjian wrote:
 I have developed the attached patch to address the issues raised above:
 
 o  non-text output of pg_dump is mentioned
 o  mentions of using OID for keys is removed
 o  the necessity of pg_dumpall --globals-only is mentioned
 o  using pg_dump parallel mode rather than pg_dumpall for upgrades is 
 mentioned
 o  pg_upgrade is mentioned more prominently for upgrades
 o  replication upgrades are in their own section
 
 I don't think we want to mention pg_upgrade as the _primary_
 major-version upgrade method.  While the pg_dump upgrade section is
 long, it is mostly about starting/stoping the server, moving
 directories, etc, the same things you have to do for pg_upgrade, so I
 just mentioned that int the pg_upgrade section.  Other ideas?
 
 I plan to apply this to head and 9.4.

Updated patch attached and applied.

Any other suggestions?  Please let me know.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
new file mode 100644
index 06f064e..07ca0dc
*** a/doc/src/sgml/backup.sgml
--- b/doc/src/sgml/backup.sgml
***
*** 28,34 
titleacronymSQL/ Dump/title
  
para
!The idea behind this dump method is to generate a text file with SQL
 commands that, when fed back to the server, will recreate the
 database in the same state as it was at the time of the dump.
 productnamePostgreSQL/ provides the utility program
--- 28,34 
titleacronymSQL/ Dump/title
  
para
!The idea behind this dump method is to generate a file with SQL
 commands that, when fed back to the server, will recreate the
 database in the same state as it was at the time of the dump.
 productnamePostgreSQL/ provides the utility program
*** pg_dump replaceable class=parameterd
*** 39,44 
--- 39,47 
  /synopsis
 As you see, applicationpg_dump/ writes its result to the
 standard output. We will see below how this can be useful.
+While the above command creates a text file, applicationpg_dump/
+can create files in other formats that allow for parallism and more
+fine-grained control of object restoration.
/para
  
para
*** pg_dump replaceable class=parameterd
*** 98,117 
 exclusive lock, such as most forms of commandALTER TABLE/command.)
/para
  
-   important
-para
- If your database schema relies on OIDs (for instance, as foreign
- keys) you must instruct applicationpg_dump/ to dump the OIDs
- as well. To do this, use the option-o/option command-line
- option.
-/para
-   /important
- 
sect2 id=backup-dump-restore
 titleRestoring the Dump/title
  
 para
! The text files created by applicationpg_dump/ are intended to
  be read in by the applicationpsql/application program. The
  general command form to restore a dump is
  synopsis
--- 101,111 
 exclusive lock, such as most forms of commandALTER TABLE/command.)
/para
  
sect2 id=backup-dump-restore
 titleRestoring the Dump/title
  
 para
! Text files created by applicationpg_dump/ are intended to
  be read in by the applicationpsql/application program. The
  general command form to restore a dump is
  synopsis
*** psql replaceable class=parameterdbna
*** 127,132 
--- 121,128 
  supports options similar to applicationpg_dump/ for specifying
  the database server to connect to and the user name to use. See
  the xref linkend=app-psql reference page for more information.
+ Non-text file dumps are restored using the xref
+ linkend=app-pgrestore utility.
 /para
  
 para
*** psql -f replaceable class=parameteri
*** 225,231 
  roles, tablespaces, and empty databases, then invoking
  applicationpg_dump/ for each database.  This means that while
  each database will be internally consistent, the snapshots of
! different databases might not be exactly in-sync.
 /para
/sect2
  
--- 221,234 
  roles, tablespaces, and empty databases, then invoking
  applicationpg_dump/ for each database.  This means that while
  each database will be internally consistent, the snapshots of
! different databases are not sychronized.
!/para
! 
!para
! Cluster-wide data can be dumped alone using the
! applicationpg_dumpall/ option--globals-only/ option.
! This is necessary to fully backup the cluster if running the
! applicationpg_dump/ command on individual databases.
 /para
/sect2
  
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
new file mode 100644
index 1d91d92..f337485
*** a/doc/src/sgml/runtime.sgml
--- b/doc/src/sgml/runtime.sgml
*** $ userinputkill -INT `head -1 /usr/loc
*** 1517,1524 
 For 

Re: [HACKERS] psql \watch versus \timing

2014-08-25 Thread Heikki Linnakangas

On 08/25/2014 09:22 PM, Fujii Masao wrote:

On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I agree that refactoring this would be nice in the long-term, and I also
agree that it's probably OK as it is in the short-term. I don't like the
name PSQLexecInternal, though. PSQLexec is used for internal commands
anyway. In fact it's backwards, because PSQLexecInternal is used for
non-internal queries, given by \watch, while PSQLexec is used for internal
commands.


Agreed. So what about PSQLexecCommon (inspired by
the relation between LWLockAcquireCommon and LWLockAcquire)?
Or any better name?


Actually, perhaps it would be better to just copy-paste PSQLexec, and 
modify the copy to suite \watch's needs. (PSQLexecWatch? 
SendWatchQuery?). PSQLexec doesn't do much, and there isn't very much 
overlap between what \watch wants and what other PSQLexec callers want. 
\watch wants timing output, others don't. \watch doesn't want 
transaction handling. Do we want --echo-hidden to print the \watch'd 
query? Not sure..


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-08-25 Thread Tom Lane
Heikki Linnakangas hlinnakan...@vmware.com writes:
 In summary, I think we want this feature in some form, but we'll somehow 
 need to be make the distinction to the dangerous pg_resetxlog usage. It 
 might be best, after all, to make this a separate utility, 
 pg_resetsystemid.

That sounds fairly reasonable given your point about not wanting people to
confuse this with the can-eat-your-data aspects of pg_resetxlog.  (OTOH,
won't this result in a lot of code duplication?  We'd still need to erase
and refill the WAL area.)

 It would not need to have the capability to set the 
 system ID to a particular value, only a randomly assigned one (setting 
 it to a particular value could be added to pg_resetxlog, where other 
 dangerous options are).

I'm less convinced about that.  While you can shoot yourself in the foot
by assigning the same system ID to two installations that share WAL
archive or something like that, this feels a bit different than the ways
you can shoot yourself in the foot with pg_resetxlog.  If we do what you
say here then I think we'll be right back to the discussion of how to
separate the assign-a-sysID option from pg_resetxlog's other, more
dangerous options.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-08-25 Thread Andres Freund
On August 25, 2014 9:45:50 PM CEST, Tom Lane t...@sss.pgh.pa.us wrote:
Heikki Linnakangas hlinnakan...@vmware.com writes:
 In summary, I think we want this feature in some form, but we'll
somehow 
 need to be make the distinction to the dangerous pg_resetxlog usage.
It 
 might be best, after all, to make this a separate utility, 
 pg_resetsystemid.

That sounds fairly reasonable given your point about not wanting people
to
confuse this with the can-eat-your-data aspects of pg_resetxlog. 
(OTOH,
won't this result in a lot of code duplication?  We'd still need to
erase
and refill the WAL area.)

Let's move pg-controldata, resetxlog, resetsysid into a common src/bin 
directory. Then we can unify the relevant code between all three.

-- 
Please excuse brevity and formatting - I am writing this on my mobile phone.

Andres Freund  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql \watch versus \timing

2014-08-25 Thread Heikki Linnakangas

On 08/25/2014 10:48 PM, Heikki Linnakangas wrote:

On 08/25/2014 09:22 PM, Fujii Masao wrote:

On Tue, Aug 26, 2014 at 1:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I agree that refactoring this would be nice in the long-term, and I also
agree that it's probably OK as it is in the short-term. I don't like the
name PSQLexecInternal, though. PSQLexec is used for internal commands
anyway. In fact it's backwards, because PSQLexecInternal is used for
non-internal queries, given by \watch, while PSQLexec is used for internal
commands.


Agreed. So what about PSQLexecCommon (inspired by
the relation between LWLockAcquireCommon and LWLockAcquire)?
Or any better name?


Actually, perhaps it would be better to just copy-paste PSQLexec, and
modify the copy to suite \watch's needs. (PSQLexecWatch?
SendWatchQuery?). PSQLexec doesn't do much, and there isn't very much
overlap between what \watch wants and what other PSQLexec callers want.
\watch wants timing output, others don't. \watch doesn't want
transaction handling. Do we want --echo-hidden to print the \watch'd
query? Not sure..


BTW, I just noticed that none of the callers of PSQLexec pass 
start_xact=true. So that part of the function is dead code. We might 
want to remove it, and replace with a comment noting that PSQLexec never 
starts a new transaction block, even in autocommit-off mode. (I know 
you're hacking on this, so I didnn't want to joggle your elbow by doing 
it right now)


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Set new system identifier using pg_resetxlog

2014-08-25 Thread Heikki Linnakangas

On 08/25/2014 10:45 PM, Tom Lane wrote:

Heikki Linnakangas hlinnakan...@vmware.com writes:

It would not need to have the capability to set the
system ID to a particular value, only a randomly assigned one (setting
it to a particular value could be added to pg_resetxlog, where other
dangerous options are).


I'm less convinced about that.  While you can shoot yourself in the foot
by assigning the same system ID to two installations that share WAL
archive or something like that, this feels a bit different than the ways
you can shoot yourself in the foot with pg_resetxlog.  If we do what you
say here then I think we'll be right back to the discussion of how to
separate the assign-a-sysID option from pg_resetxlog's other, more
dangerous options.


I don't see the use case for setting system id to a particular value. 
Andres listed four use cases upthread:



a) Mark a database as not being the same. Currently if you promote two
   databases, e.g. to shard out, they'll continue to have same system
   identifier. Which really sucks, especially because timeline ids will
   often increase synchronously.


Yes, this is the legitimate use case a DBA would use this feature for. 
Resetting the system ID to a random value suffices.



b) For data recovery it's sometimes useful to create a new database
   (with the same catalog state) and replay all WAL. For that you need to
   reset the system identifier. I've done so hacking up resetxlog
   before.


This falls squarely in the dangerous category, and you'll have to 
reset other things than the system ID to make it work. Having the option 
in pg_resetxlog is fine for this.



c) We already allow to set pretty much all aspects of the control file
   via resetxlog - there seems little point of not having the ability to
   change the system identifier.


Ok, but it's not something a regular admin would ever use.


d) In a logical replication scenario one way to identify individual
   nodes is via the system identifier. If you want to convert a
   basebackup into logical standby one sensible way to do so is to
   create a logical replication slots *before* promoting a physical
   backup to guarantee that slot is able to stream out all changes. If
   the slot names contain the consumer's system identifier you need to
   know the new system identifier beforehand.


I didn't understand this one. But it seems like the obvious solution is 
to not use the consumer's system identifier as the slot name. Or rename 
it afterwards.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Michael Paquier
On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com 
 wrote:
 this might be difficult to call this as --concurrently.
 It might need to be change the name.

 I'm OK to say that as --concurrently if the document clearly
 explains that restriction. Or --almost-concurrently? ;P
By reading that I am thinking as well about a wording with lock,
like --minimum-locks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-08-25 Thread Michael Paquier
On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Here's a refreshed version of this patch.  I have split it up in a
 largish number of pieces, which hopefully makes it easier to understand
 what is going on.

Alvaro,

Could you confirm that the patches you just committed are 1, 3 and 6?

Regards,
-- 
Michael


[HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Fabien COELHO


Hello pgdevs,

I've been playing with pg for some time now to try to reduce the maximum 
latency of simple requests, to have a responsive server under small to 
medium load.


On an old computer with a software RAID5 HDD attached, pgbench 
simple update script run for a some time (scale 100, fillfactor 95)


pgbench -M prepared -N -c 2 -T 500 -P 1 ...

gives 300 tps. However this performance is really +1000 tps for a few 
seconds followed by 16 seconds at about 0 tps for the checkpoint induced 
IO storm. The server is totally unresponsive 75% of the time. That's 
bandwidth optimization for you. Hmmm... why not.


Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above 
max):


pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ...

The same thing more or less happens in a delayed fashion... You get 50 tps 
for some time, followed by sections of 15 seconds at 0 tps for the 
checkpoint when the segments are full... the server is unresponsive about 
10% of the time (one in ten transaction is late by more than 200 ms).


It is not satisfying, pg should be able to handle that load easily.

The culprit I found is bgwriter, which is basically doing nothing to 
prevent the coming checkpoint IO storm, even though there would be ample 
time to write the accumulating dirty pages so that checkpoint would find a 
clean field and pass in a blink. Indeed, at the end of the 500 seconds 
throttled test, pg_stat_bgwriter says:


  buffers_checkpoint = 19046
  buffers_clean = 2995

Which suggest that bgwriter took on him to write only 6 pages per second, 
where at least 50 would have been needed for the load, and could have been 
handled by the harware without any problem.


I have not found any mean to force bgwriter to send writes when it can. 
(Well, I have: create a process which sends CHECKPOINT every 0.2 
seconds... it works more or less, but this is not my point:-)


Bgwriter control parameters allow to control the maximum number of pages 
(bgwriter_lru_maxpages) written per round (bgwriter_delay), and a 
multiplier (bgwriter_lru_multiplier) which controls some heuristics to 
estimate how many pages should be needed so as to make them available. 
This may be nice in some settings, but is not adapted to the write 
oriented OTPL load tested with pgbench.


The problem is that with the update load on a fitting in memory database 
there is not that much need of new pages, even if pages are being 
dirtied (about 50 per seconds), so it seems that the heuristics decides 
not to write much. The net result of all this cleverness is that when the 
checkpoint arrives, several thousand pages have to be written and the 
server is offline for some time.


ISTM that bgwriter lacks at least some min page setting it could be 
induced to free this many pages if it can. That would be a start.


A better feature would be that it adapts itself to take advantage of the 
available IOPS, depending on the load induce by other tasks (vacuum, 
queries...), in a preventive manner, so as to avoid delaying what can be 
done right now under a small load, and thus avoid later IO storms. This 
would suggest that some setting would provide the expected IOPS capability 
of the underlying hardware, as some already suggest the expected available 
memory.


Note that this preventive approach could also improve the bandwith 
measure: currently when pgbench is running at maximum speed before the 
checkpoint storm, nothing is written to disk but WAL, although it could 
probably also write some dirty pages. When the checkpoints arrives, less 
pages would need to be written, so the storm would be shorter.


Any thoughts on this latency issue? Am I missing something?

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade: allow multiple -o/-O options

2014-08-25 Thread Bruce Momjian
On Fri, Aug 22, 2014 at 10:02:11AM -0400, Bruce Momjian wrote:
 On Fri, Aug 22, 2014 at 10:52:12AM +0200, Pavel Raiskup wrote:
  On Thursday 21 of August 2014 18:26:37 Bruce Momjian wrote:
   On Tue, Mar  4, 2014 at 04:52:56PM +0100, Pavel Raiskup wrote:
RFE:  Consider that you want to run pg_upgrade via some script with some
default '-o' option.  But then you also want to give the script's user a
chance to specify the old-server's options according user's needs.
Then something like the following is not possible:

  $ cat script
  ...
  pg_upgrade ... -o 'sth' $PG_UPGRADE_OPT ...
  ...

I know that this problem is still script-able, but the fix should be
innocent and it would simplify things.  Thanks for considering,
   
   Attached is a patch that makes multiple -o options append their
   arguments for pg_upgrade and pg_ctl, and documents this and the append
   behavior of postmaster/postgres.  This covers all the -o behaviors.
  
  Thanks!  Seems to be OK to me, one nit  - why you did not go the
  append_optiton way (there could be probably better name like arg_cat)?
  Because this is just about few lines, it is probably OK from PostgreSQL
  policy POV, so review?  ~ review+, thanks again!
 
 Well, I found append_optiton() to be an extra function that wasn't
 necessary --- the psprintf() use was short enough not to need a separate
 function.

Patch applied;  this will appear in PG 9.5.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-08-25 Thread Alvaro Herrera
Michael Paquier wrote:
 On Sat, Jun 14, 2014 at 5:31 AM, Alvaro Herrera alvhe...@2ndquadrant.com
 wrote:
 
  Here's a refreshed version of this patch.  I have split it up in a
  largish number of pieces, which hopefully makes it easier to understand
  what is going on.
 
 Alvaro,
 
 Could you confirm that the patches you just committed are 1, 3 and 6?

And 4.  Yes, they are.  I wanted to get trivial stuff out of the way
while I had some other trivial patch at hand.  I'm dealing with another
patch from the commitfest now, so I'm not posting a rebased version
right away, apologies.

How do people like this patch series?  It would be much easier for me to
submit a single patch, but I feel handing it in little pieces makes it
easier for reviewers.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Alvaro Herrera
Michael Paquier wrote:
 On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com wrote:
  On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko sawada.m...@gmail.com 
  wrote:
  this might be difficult to call this as --concurrently.
  It might need to be change the name.
 
  I'm OK to say that as --concurrently if the document clearly
  explains that restriction. Or --almost-concurrently? ;P
 By reading that I am thinking as well about a wording with lock,
 like --minimum-locks.

Why not just finish up the REINDEX CONCURRENTLY patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Andres Freund
On August 25, 2014 10:35:20 PM CEST, Alvaro Herrera alvhe...@2ndquadrant.com 
wrote:
Michael Paquier wrote:
 On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com
wrote:
  On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko
sawada.m...@gmail.com wrote:
  this might be difficult to call this as --concurrently.
  It might need to be change the name.
 
  I'm OK to say that as --concurrently if the document clearly
  explains that restriction. Or --almost-concurrently? ;P
 By reading that I am thinking as well about a wording with lock,
 like --minimum-locks.

Why not just finish up the REINDEX CONCURRENTLY patch.

+many. Although I'm not sure if we managed to find a safe relation swap.

If not: How about adding ALTER INDEX ... SWAP which requires an exclusive lock 
but is fast and O(1)? Than all indexes can be created concurrently, swapped in 
a very short xact, and then dropped concurrently? 95% of all users would be 
happy with that and the remaining 5 would still be in a better position than 
today where the catalog needs to be hacked for that (fkeys, pkeys et al).

--- 
Please excuse brevity and formatting - I am writing this on my mobile phone.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-08-25 Thread Pavel Stehule
Hi

I checked this patch, and it working very well

I found only two issue - I am not sure if it is issue

with data from https://wiki.postgresql.org/wiki/Grouping_Sets

postgres=# select name, place, sum(count), grouping(name), grouping(place)
from cars group by rollup(name, place);
 name  |   place|  sum  | grouping | grouping
---++---+--+--
 bmw   | czech rep. |   100 |0 |0
 bmw   | germany|  1000 |0 |0
 bmw   ||  1100 |0 |1
 opel  | czech rep. |  7000 |0 |0
 opel  | germany|  7000 |0 |0
 opel  || 14000 |0 |1
 skoda | czech rep. | 1 |0 |0
 skoda | germany|  5000 |0 |0
 skoda || 15000 |0 |1
   || 30100 |1 |1
(10 rows)

* redundant sets should be ignored

postgres=# select name, place, sum(count), grouping(name), grouping(place)
from cars group by rollup(name, place), name;
 name  |   place|  sum  | grouping | grouping
---++---+--+--
 bmw   | czech rep. |   100 |0 |0
 bmw   | germany|  1000 |0 |0
 bmw   ||  1100 |0 |1
 bmw   ||  1100 |0 |1
 opel  | czech rep. |  7000 |0 |0
 opel  | germany|  7000 |0 |0
 opel  || 14000 |0 |1
 opel  || 14000 |0 |1
 skoda | czech rep. | 1 |0 |0
 skoda | germany|  5000 |0 |0
 skoda || 15000 |0 |1
 skoda || 15000 |0 |1
(12 rows)

It duplicate rows

postgres=# explain select name, place, sum(count), grouping(name),
grouping(place) from cars group by rollup(name, place), name;
   QUERY PLAN

 GroupAggregate  (cost=101.14..101.38 rows=18 width=68)
   Grouping Sets: (name, place), (name), (name)
   -  Sort  (cost=101.14..101.15 rows=6 width=68)
 Sort Key: name, place
 -  Seq Scan on cars  (cost=0.00..1.06 rows=6 width=68)
 Planning time: 0.235 ms
(6 rows)

postgres=# select name, place, sum(count), grouping(name), grouping(place)
from cars group by grouping sets((name, place), (name), (name),(place), ());
 name  |   place|  sum  | grouping | grouping
---++---+--+--
 bmw   | czech rep. |   100 |0 |0
 bmw   | germany|  1000 |0 |0
 bmw   ||  1100 |0 |1
 bmw   ||  1100 |0 |1
 opel  | czech rep. |  7000 |0 |0
 opel  | germany|  7000 |0 |0
 opel  || 14000 |0 |1
 opel  || 14000 |0 |1
 skoda | czech rep. | 1 |0 |0
 skoda | germany|  5000 |0 |0
 skoda || 15000 |0 |1
 skoda || 15000 |0 |1
   || 30100 |1 |1
   | czech rep. | 17100 |1 |0
   | germany| 13000 |1 |0
(15 rows)

Fantastic work

Regards

Pavel




2014-08-25 7:21 GMT+02:00 Andrew Gierth and...@tao11.riddles.org.uk:

 Here is the new version of our grouping sets patch. This version
 supersedes the previous post.

 We believe the functionality of this version to be substantially
 complete, providing all the standard grouping set features except T434
 (GROUP BY DISTINCT).  (Additional tweaks, such as extra variants on
 GROUPING(), could be added for compatibility with other databases.)

 Since the debate regarding reserved keywords has not produced any
 useful answer, the main patch here makes CUBE and ROLLUP into
 col_name_reserved keywords, but a separate small patch is attached to
 make them unreserved_keywords instead.

 So there are now 5 files:

 gsp1.patch - phase 1 code patch (full syntax, limited
 functionality)
 gsp2.patch - phase 2 code patch (adds full functionality using the
  new chained aggregate mechanism)
 gsp-doc.patch  - docs
 gsp-contrib.patch  - quote cube in contrib/cube and
 contrib/earthdistance,
  intended primarily for testing pending a decision on
  renaming contrib/cube or unreserving keywords
 gsp-u.patch- proposed method to unreserve CUBE and ROLLUP

 --
 Andrew (irc:RhodiumToad)



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] Is this a bug?

2014-08-25 Thread Bruce Momjian
On Fri, Aug 22, 2014 at 10:04:50PM -0400, Bruce Momjian wrote:
 On Fri, Aug 22, 2014 at 03:12:47PM -0400, Robert Haas wrote:
  On Fri, Aug 22, 2014 at 2:33 PM, Bruce Momjian br...@momjian.us wrote:
   Yes, you remember well.  I will have to find a different way for
   pg_upgrade to call a no-op ALTER TABLE, which is fine.
  
   Looking at the ALTER TABLE options, I am going to put this check in a
   !IsBinaryUpgrade block so pg_upgrade can still use its trick.
  
  -1, that's really ugly.
  
  Maybe the right solution is to add a form of ALTER TABLE that is
  specifically defined to do only this check.  This is an ongoing need,
  so that might not be out of line.
 
 Ah, seems ALTER TABLE ... DROP CONSTRAINT IF EXISTS also works --- I
 will use that.

OK, attached patch applied, with pg_upgrade adjustments.  I didn't
think the original regression tests for this were necessary.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/contrib/pg_upgrade/dump.c b/contrib/pg_upgrade/dump.c
new file mode 100644
index e623a22..29a68c0
*** a/contrib/pg_upgrade/dump.c
--- b/contrib/pg_upgrade/dump.c
*** optionally_create_toast_tables(void)
*** 115,120 
--- 115,124 
  c.relkind IN ('r', 'm') AND 
  c.reltoastrelid = 0);
  
+ 		/* Suppress NOTICE output from non-existant constraints */
+ 		PQclear(executeQueryOrDie(conn, SET client_min_messages = warning;));
+ 		PQclear(executeQueryOrDie(conn, SET log_min_messages = warning;));
+ 
  		ntups = PQntuples(res);
  		i_nspname = PQfnumber(res, nspname);
  		i_relname = PQfnumber(res, relname);
*** optionally_create_toast_tables(void)
*** 125,137 
  	OPTIONALLY_CREATE_TOAST_OID));
  
  			/* dummy command that also triggers check for required TOAST table */
! 			PQclear(executeQueryOrDie(conn, ALTER TABLE %s.%s RESET (binary_upgrade_dummy_option);,
  	quote_identifier(PQgetvalue(res, rowno, i_nspname)),
  	quote_identifier(PQgetvalue(res, rowno, i_relname;
  		}
  
  		PQclear(res);
  
  		PQfinish(conn);
  	}
  
--- 129,144 
  	OPTIONALLY_CREATE_TOAST_OID));
  
  			/* dummy command that also triggers check for required TOAST table */
! 			PQclear(executeQueryOrDie(conn, ALTER TABLE %s.%s DROP CONSTRAINT IF EXISTS binary_upgrade_dummy_constraint;,
  	quote_identifier(PQgetvalue(res, rowno, i_nspname)),
  	quote_identifier(PQgetvalue(res, rowno, i_relname;
  		}
  
  		PQclear(res);
  
+ 		PQclear(executeQueryOrDie(conn, RESET client_min_messages;));
+ 		PQclear(executeQueryOrDie(conn, RESET log_min_messages;));
+ 
  		PQfinish(conn);
  	}
  
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
new file mode 100644
index e0b81b9..97a4e22
*** a/src/backend/access/common/reloptions.c
--- b/src/backend/access/common/reloptions.c
*** static void initialize_reloptions(void);
*** 307,312 
--- 307,314 
  static void parse_one_reloption(relopt_value *option, char *text_str,
  	int text_len, bool validate);
  
+ static bool is_valid_reloption(char *name);
+ 
  /*
   * initialize_reloptions
   *		initialization routine, must be called before parsing
*** initialize_reloptions(void)
*** 382,387 
--- 384,408 
  }
  
  /*
+  * is_valid_reloption
+  *		check if a reloption exists
+  *
+  */
+ static bool
+ is_valid_reloption(char *name)
+ {
+ 	int i;
+ 
+ 	for (i = 0; relOpts[i]; i++)
+ 	{
+ 		if (pg_strcasecmp(relOpts[i]-name, name) == 0)
+ 			return true;
+ 	}
+ 
+ 	return false;
+ }
+ 
+ /*
   * add_reloption_kind
   *		Create a new relopt_kind value, to be used in custom reloptions by
   *		user-defined AMs.
*** transformRelOptions(Datum oldOptions, Li
*** 672,677 
--- 693,703 
  
  		if (isReset)
  		{
+ 			if (!is_valid_reloption(def-defname))
+ ereport(ERROR,
+ 		(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ 		 errmsg(unrecognized parameter \%s\, def-defname)));
+ 
  			if (def-arg != NULL)
  ereport(ERROR,
  		(errcode(ERRCODE_SYNTAX_ERROR),

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SKIP LOCKED DATA (work in progress)

2014-08-25 Thread Thomas Munro
On 25 August 2014 02:57, Alvaro Herrera alvhe...@2ndquadrant.com wrote:
 Thomas Munro wrote:
 The difficulty of course will be testing all these racy cases reproducibly...

 Does this help?
 http://www.postgresql.org/message-id/51fb4305.3070...@2ndquadrant.com
 The useful trick there is forcing a query to get its snapshot and then
 go to sleep before actually doing anything, by way of an advisory lock.

Yes it does, thanks Alvaro and Craig.  I think the attached spec
reproduces the problem using that trick, ie shows NOWAIT blocking,
presumably in EvalPlanQualFetch (though I haven't stepped through it
with a debugger yet).  I'm afraid I'm out of Postgres hacking cycles
for a few days, but next weekend I should have a new patch that fixes
this by teaching EvalPlanQualFetch about wait policies, with isolation
tests for NOWAIT and SKIP LOCKED.

Best regards,
Thomas Munro
# Test NOWAIT with an updated tuple chain.

setup
{
  CREATE TABLE foo (
	id int PRIMARY KEY,
	data text NOT NULL
  );
  INSERT INTO foo VALUES (1, 'x');
}

teardown
{
  DROP TABLE foo;
}

session s1
setup		{ BEGIN; }
step s1a	{ SELECT * FROM foo WHERE pg_advisory_lock(0) IS NOT NULL FOR UPDATE NOWAIT; }
step s1b	{ COMMIT; }

session s2
step s2a	{ SELECT pg_advisory_lock(0); }
step s2b	{ UPDATE foo SET data = data; }
step s2c	{ BEGIN; }
step s2d	{ UPDATE foo SET data = data; }
step s2e	{ SELECT pg_advisory_unlock(0); }
step s2f	{ COMMIT; }

permutation s2a s1a s2b s2c s2d s2e s1b s2f
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardening pg_upgrade

2014-08-25 Thread Bruce Momjian
On Mon, Aug 25, 2014 at 02:15:18PM -0400, Bruce Momjian wrote:
 On Mon, Aug 25, 2014 at 06:34:12AM -0700, Kevin Grittner wrote:
  Bernd Helmle maili...@oopsware.de wrote:
   Magnus Hagander mag...@hagander.net wrote:
  
   I vote for discarding 8.3 support in pg_upgrade.  There are already
   enough limitations on pg_upgrade from pre-8.4 to make it of questionable
   value; if it's going to create problems like this, it's time to cut the
   rope.
  
   +1. 8.3 has been unsupported for a fairly long time now, and you can
   still do a two-step upgrade if you're on that old a version.
  
   Also +1 from my side. I've seen some old 8.3 installations at customers,
   still, but they aren't large and can easily be upgraded with a two step
   upgrade.
  
  +1
  
  If we could leave it without it being any extra work, fine; but
  once a release is over a year out of support, if it's a matter of
  putting extra work on the pg hackers or on the users who have
  chosen to wait more than a year after support ends to do the
  upgrade, I'm OK with asking those users to do a two-phase upgrade
  or fall back to pg_dump.  It's not like we're leaving them without 
  any options.
 
 OK, I will move in the direction of removing 8.3 support and use a
 single query to pull schema information.   I was hesistant to remove 8.3
 support as I know we have kept pg_dump support all the way back to 7.0,
 but it seems pg_upgrade need not have the same version requirements.

I will modify pg_upgrade in three steps:

o  remove 8.3 support
o  use a single CTE rather than use a temp table
o  harden the backend to prevent automatic oid assignment

This is all for 9.5.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Josh Berkus
On 08/25/2014 01:23 PM, Fabien COELHO wrote:
 
 Hello pgdevs,
 
 I've been playing with pg for some time now to try to reduce the maximum
 latency of simple requests, to have a responsive server under small to
 medium load.
 
 On an old computer with a software RAID5 HDD attached, pgbench simple
 update script run for a some time (scale 100, fillfactor 95)
 
 pgbench -M prepared -N -c 2 -T 500 -P 1 ...
 
 gives 300 tps. However this performance is really +1000 tps for a few
 seconds followed by 16 seconds at about 0 tps for the checkpoint induced
 IO storm. The server is totally unresponsive 75% of the time. That's
 bandwidth optimization for you. Hmmm... why not.

So I think that you're confusing the roles of bgwriter vs. spread
checkpoint.  What you're experiencing above is pretty common for
nonspread checkpoints on slow storage (and RAID5 is slow for DB updates,
no matter how fast the disks are), or for attempts to do spread
checkpoint on filesystems which don't support it (e.g. Ext3, HFS+).  In
either case, what's happening is that the *OS* is freezing all logical
and physical IO while it works to write out all of RAM, which makes me
suspect you're using Ext3 or HFS+.

Making the bgwriter more aggressive adds a significant risk of writing
the same pages multiple times between checkpoints, so it's not a simple fix.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Andres Freund
Hi,

On 2014-08-25 22:23:40 +0200, Fabien COELHO wrote:
 seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO
 storm. The server is totally unresponsive 75% of the time. That's bandwidth
 optimization for you. Hmmm... why not.
 
 Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above
 max):
 
   pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ...
 
 The same thing more or less happens in a delayed fashion... You get 50 tps
 for some time, followed by sections of 15 seconds at 0 tps for the
 checkpoint when the segments are full... the server is unresponsive about
 10% of the time (one in ten transaction is late by more than 200 ms).

That's ext4 I guess? Did you check whether xfs yields a, err, more
predictable performance?

 It is not satisfying, pg should be able to handle that load easily.
 
 The culprit I found is bgwriter, which is basically doing nothing to
 prevent the coming checkpoint IO storm, even though there would be ample
 time to write the accumulating dirty pages so that checkpoint would find a
 clean field and pass in a blink.

While I agree that the current bgwriter implementation is far from good,
note that this isn't the bgwriter's job. Its goal is to avoid backends
from having to write out buffers themselves. I.e. that there are clean
victim buffers when shared_buffers  working set.

Note that it would *not* be a good idea to make the bgwriter write out
everything, as much as possible - that'd turn sequential write io into
random write io.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add CREATE support to event triggers

2014-08-25 Thread Michael Paquier
On Tue, Aug 26, 2014 at 5:33 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 Michael Paquier wrote:
 And 4.  Yes, they are.  I wanted to get trivial stuff out of the way
 while I had some other trivial patch at hand.  I'm dealing with another
 patch from the commitfest now, so I'm not posting a rebased version
 right away, apologies.
No problems. I imagine that most of the patches still apply.

 How do people like this patch series?  It would be much easier for me to
 submit a single patch, but I feel handing it in little pieces makes it
 easier for reviewers.
Well, I like the patch series for what it counts as long as you can
submit it as such. That's far easier to test and certainly helps in
spotting issues when kicking different code paths.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add .NOTPARALLEL to contrib/Makefile

2014-08-25 Thread Andres Freund
Hi,

Currently running make -j16 all check in contrib/ results in a mess because
all pg_regress invocations fight over the same port. Adding a simple
.NOTPARALLEL: check-%-recurse
into contrib/Makefile fixes that. Do we want that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add .NOTPARALLEL to contrib/Makefile

2014-08-25 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 Currently running make -j16 all check in contrib/ results in a mess because
 all pg_regress invocations fight over the same port. Adding a simple
 .NOTPARALLEL: check-%-recurse
 into contrib/Makefile fixes that. Do we want that?

Dunno, but if we do, it should be applied to installcheck as well.
(In that case the fight is over the contrib_regression database.)

A larger point is that you shouldn't really fix this only for contrib.
What about tests run in src/pl, or contrib vs the main regression tests
vs src/pl vs test/isolationtester, etc etc.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-08-25 Thread Andrew Gierth
 Pavel == Pavel Stehule pavel.steh...@gmail.com writes:

 Pavel Hi
 Pavel I checked this patch, and it working very well

 Pavel I found only two issue - I am not sure if it is issue

 Pavel It duplicate rows

 Pavel postgres=# explain select name, place, sum(count), grouping(name),
 Pavel grouping(place) from cars group by rollup(name, place), name;
 PavelQUERY PLAN
 Pavel 
 Pavel  GroupAggregate  (cost=101.14..101.38 rows=18 width=68)
 PavelGrouping Sets: (name, place), (name), (name)

I think I can safely claim from the spec that our version is correct.
Following the syntactic transformations given in 7.9 group by clause
of sql2008, we have:

GROUP BY rollup(name,place), name;

parses as  GROUP BY rollup list, ordinary grouping set

Syntax rule 13 replaces the rollup list giving:

GROUP BY GROUPING SETS ((name,place), (name), ()), name;

Syntax rule 16b gives:

GROUP BY GROUPING SETS ((name,place), (name), ()), GROUPING SETS (name);

Syntax rule 16c takes the cartesian product of the two sets:

GROUP BY GROUPING SETS ((name,place,name), (name,name), (name));

Syntax rule 17 gives:

SELECT ... GROUP BY name,place,name
UNION ALL
SELECT ... GROUP BY name,name
UNION ALL
SELECT ... GROUP BY name

Obviously at this point the extra name columns become redundant so
we eliminate them (this doesn't correspond to a spec rule, but doesn't
change the semantics). So we're left with:

SELECT ... GROUP BY name,place
UNION ALL
SELECT ... GROUP BY name
UNION ALL
SELECT ... GROUP BY name

Running a quick test on sqlfiddle with Oracle 11 suggests that Oracle's
behavior agrees with my interpretation.

Nothing in the spec that I can find licenses the elimination of
duplicate grouping sets except indirectly via feature T434 (GROUP BY
DISTINCT ...), which we did not attempt to implement.

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] What in the world is happening with castoroides and protosciurus?

2014-08-25 Thread Tom Lane
For the last month or so, these two buildfarm animals (which I believe are
the same physical machine) have been erratically failing with errors that
reflect low-order differences in floating-point calculations.

A recent example is at

http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=protosciurusdt=2014-08-25%2010%3A39%3A52

where the only regression diff is

*** 
/export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/expected/hash_index.out
   Mon Aug 25 11:41:00 2014
--- 
/export/home/dpage/pgbuildfarm/protosciurus/HEAD/pgsql.22860/src/test/regress/results/hash_index.out
Mon Aug 25 11:57:26 2014
***
*** 171,179 
  SELECT h.seqno AS i8096, h.random AS f1234_1234
 FROM hash_f8_heap h
 WHERE h.random = '-1234.1234'::float8;
!  i8096 | f1234_1234 
! ---+
!   8906 | -1234.1234
  (1 row)
  
  UPDATE hash_f8_heap
--- 171,179 
  SELECT h.seqno AS i8096, h.random AS f1234_1234
 FROM hash_f8_heap h
 WHERE h.random = '-1234.1234'::float8;
!  i8096 |f1234_1234 
! ---+---
!   8906 | -1234.12356777216
  (1 row)
  
  UPDATE hash_f8_heap

... a result that certainly makes no sense.  The results are not
repeatable, failing in equally odd ways in different tests on different
runs.  This is happening in all the back branches too, not just HEAD.

Has there been a system software update on this machine a month or so ago?
If not, it's hard to think anything except that the floating point
hardware on this box has developed problems.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add .NOTPARALLEL to contrib/Makefile

2014-08-25 Thread Andres Freund
On 2014-08-25 20:16:50 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Currently running make -j16 all check in contrib/ results in a mess 
  because
  all pg_regress invocations fight over the same port. Adding a simple
  .NOTPARALLEL: check-%-recurse
  into contrib/Makefile fixes that. Do we want that?
 
 Dunno, but if we do, it should be applied to installcheck as well.
 (In that case the fight is over the contrib_regression database.)

Right. Although you can mostly fight it there using USE_MODULE_DB.

The attached patch that replaces all hardcoded occurrences of
'contrib_regression' with current_database(). Allowing a make -j32 -s
installcheck in contrib to succeed in less than 4 seconds...

That's not particularly pretty, especially in the CREATE SERVER calls
(via DO ... EXECUTE), but imo worth it given the timesavings.

What's your thought on that one?

 A larger point is that you shouldn't really fix this only for contrib.
 What about tests run in src/pl, or contrib vs the main regression tests
 vs src/pl vs test/isolationtester, etc etc.

Unfortunately I don't think you can make .NOTPARALLEL work across more
than one directory when using recursive make :(. At least I don't know
how.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From e81c42ed9eb640dab874bf5778e5e71b485633ba Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 26 Aug 2014 02:54:53 +0200
Subject: [PATCH] Don't hardcode contrib_regression dbname in postgres_fdw and
 dblink tests.

That allows parallel make installcheck to succeed inside contrib/. The
output is not particularly pretty, but it's fast.
---
 contrib/dblink/Makefile|  3 --
 contrib/dblink/expected/dblink.out | 40 ++---
 contrib/dblink/sql/dblink.sql  | 41 +++---
 contrib/postgres_fdw/Makefile  |  3 --
 contrib/postgres_fdw/expected/postgres_fdw.out |  8 +++--
 contrib/postgres_fdw/sql/postgres_fdw.sql  |  8 +++--
 6 files changed, 57 insertions(+), 46 deletions(-)

diff --git a/contrib/dblink/Makefile b/contrib/dblink/Makefile
index e5d0cd6..f5ab164 100644
--- a/contrib/dblink/Makefile
+++ b/contrib/dblink/Makefile
@@ -14,9 +14,6 @@ REGRESS = paths dblink
 REGRESS_OPTS = --dlpath=$(top_builddir)/src/test/regress
 EXTRA_CLEAN = sql/paths.sql expected/paths.out
 
-# the db name is hard-coded in the tests
-override USE_MODULE_DB =
-
 ifdef USE_PGXS
 PG_CONFIG = pg_config
 PGXS := $(shell $(PG_CONFIG) --pgxs)
diff --git a/contrib/dblink/expected/dblink.out b/contrib/dblink/expected/dblink.out
index eee31a2..0dc4838 100644
--- a/contrib/dblink/expected/dblink.out
+++ b/contrib/dblink/expected/dblink.out
@@ -90,7 +90,7 @@ SELECT dblink_build_sql_delete('MySchema.Foo','1 2',2,'{0, a}');
 
 -- regular old dblink
 SELECT *
-FROM dblink('dbname=contrib_regression','SELECT * FROM foo') AS t(a int, b text, c text[])
+FROM dblink('dbname='||current_database(),'SELECT * FROM foo') AS t(a int, b text, c text[])
 WHERE t.a  7;
  a | b | c  
 ---+---+
@@ -116,9 +116,9 @@ DECLARE
 	detail text;
 BEGIN
 	PERFORM wait_pid(crash_pid)
-	FROM dblink('dbname=contrib_regression', $$
+	FROM dblink('dbname='||current_database(), $$
 		SELECT pg_backend_pid() FROM dblink(
-			'service=test_ldap dbname=contrib_regression',
+			'service=test_ldap dbname='||current_database(),
 			-- This string concatenation is a hack to shoehorn a
 			-- set_pgservicefile call into the SQL statement.
 			'SELECT 1' || set_pgservicefile('pg_service.conf')
@@ -131,7 +131,7 @@ EXCEPTION WHEN OTHERS THEN
 END
 $pl$;
 -- create a persistent connection
-SELECT dblink_connect('dbname=contrib_regression');
+SELECT dblink_connect('dbname='||current_database());
  dblink_connect 
 
  OK
@@ -267,14 +267,14 @@ WHERE t.a  7;
 ERROR:  connection not available
 -- put more data into our slave table, first using arbitrary connection syntax
 -- but truncate the actual return value so we can use diff to check for success
-SELECT substr(dblink_exec('dbname=contrib_regression','INSERT INTO foo VALUES(10,''k'',''{a10,b10,c10}'')'),1,6);
+SELECT substr(dblink_exec('dbname='||current_database(),'INSERT INTO foo VALUES(10,''k'',''{a10,b10,c10}'')'),1,6);
  substr 
 
  INSERT
 (1 row)
 
 -- create a persistent connection
-SELECT dblink_connect('dbname=contrib_regression');
+SELECT dblink_connect('dbname='||current_database());
  dblink_connect 
 
  OK
@@ -374,7 +374,7 @@ ERROR:  could not establish connection
 DETAIL:  missing = after myconn in connection info string
 
 -- create a named persistent connection
-SELECT dblink_connect('myconn','dbname=contrib_regression');
+SELECT dblink_connect('myconn','dbname='||current_database());
  dblink_connect 
 
  OK
@@ -403,10 +403,10 @@ CONTEXT:  Error occurred on 

Re: [HACKERS] improving speed of make check-world

2014-08-25 Thread Peter Eisentraut
On 8/25/14 1:32 PM, Heikki Linnakangas wrote:
 The new EXTRA_INSTALL makefile variable ought to be documented in
 extend.sgml, where we list REGRESS_OPTS and others.

But EXTRA_INSTALL is only of use inside the main source tree, not by
extensions.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Jeff Janes
On Monday, August 25, 2014, Fabien COELHO coe...@cri.ensmp.fr wrote:



 The culprit I found is bgwriter, which is basically doing nothing to
 prevent the coming checkpoint IO storm, even though there would be ample
 time to write the accumulating dirty pages so that checkpoint would find a
 clean field and pass in a blink. Indeed, at the end of the 500 seconds
 throttled test, pg_stat_bgwriter says:


Are you doing pg_stat_reset_shared('bgwriter') after running pgbench -i?
 You don't want your steady state stats polluted by the bulk load.



   buffers_checkpoint = 19046
   buffers_clean = 2995


Out of curiosity, what does buffers_backend show?

In any event, this almost certainly is a red herring.  Whichever of the
three ways are being used to write out the buffers, it is the checkpointer
that is responsible for fsyncing them, and that is where your drama is
almost certainly occurring. Writing out with one path rather than a
different isn't going to change things, unless you change the fsync.

Also, are you familiar with checkpoint_completion_target, and what is it
set to?

Cheers,

Jeff


[HACKERS] Question about coding of free space map

2014-08-25 Thread Tatsuo Ishii
While looking into backend/storage/freespace/freespace.c, I noticed
that struct FSMAddress is passed to functions by value, rather than
reference. I thought our code practice is defining pointer to a struct
data and using the pointer for parameter passing etc.

typedef struct RelationData *Relation;

IMO freespace.c is better to follow the practice.

Maybe this has been allowed because:

typedef struct
{
int level;  /* level */
int logpageno;  /* page number within 
the level */
} FSMAddress;

the struct size is 4+4=8 byte, which is same as 64 bit pointer. Still
I think it's better to use pointer to the struct because someday we
may want to add new member to the struct.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hardening pg_upgrade

2014-08-25 Thread Bruce Momjian
On Mon, Aug 25, 2014 at 06:34:38PM -0400, Bruce Momjian wrote:
  OK, I will move in the direction of removing 8.3 support and use a
  single query to pull schema information.   I was hesistant to remove 8.3
  support as I know we have kept pg_dump support all the way back to 7.0,
  but it seems pg_upgrade need not have the same version requirements.
 
 I will modify pg_upgrade in three steps:
 
 o  remove 8.3 support
 o  use a single CTE rather than use a temp table
 o  harden the backend to prevent automatic oid assignment
 
 This is all for 9.5.

Done.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misaligned BufferDescriptors causing major performance problems on AMD

2014-08-25 Thread Bruce Momjian
On Thu, Apr 17, 2014 at 11:23:24AM +0200, Andres Freund wrote:
 On 2014-04-16 19:18:02 -0400, Bruce Momjian wrote:
  On Thu, Feb  6, 2014 at 09:40:32AM +0100, Andres Freund wrote:
   On 2014-02-05 12:36:42 -0500, Robert Haas wrote:
 It may well be that your proposal is spot on.  But I'd like to see 
 some
 data-structure-by-data-structure measurements, rather than assuming 
 that
 alignment must be a good thing.

 I am fine with just aligning BufferDescriptors properly. That has
 clearly shown massive improvements.

I thought your previous idea of increasing BUFFERALIGN to 64 bytes had
a lot to recommend it.
   
   Good.
   
   I wonder if we shouldn't move that bit of logic:
 if (size = BUFSIZ)
 newStart = BUFFERALIGN(newStart);
   out of ShmemAlloc() and instead have a ShmemAllocAligned() and
   ShmemInitStructAligned() that does it. So we can sensibly can control it
   per struct.
   
But that doesn't mean it doesn't need testing.
   
   I feel the need here, to say that I never said it doesn't need testing
   and never thought it didn't...
  
  Where are we on this?
 
 It needs somebody with time to evaluate possible performance regressions
 - I personally won't have time to look into this in detail before pgcon.

Again, has anyone made any headway on this?  Is it a TODO item?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2014-08-25 Thread Etsuro Fujita

(2014/08/25 21:58), Albe Laurenz wrote:

Here is my review:


Thank you for the review!


I played with it, and apart from Hanada's comments I have found the following:

test= EXPLAIN (ANALYZE, VERBOSE) UPDATE rtest SET val=NULL WHERE id  3;
 QUERY PLAN
--
  Update on laurenz.rtest  (cost=100.00..14134.40 rows=299970 width=10) (actual 
time=0.005..0.005 rows=0 loops=1)
-  Foreign Scan on laurenz.rtest  (cost=100.00..14134.40 rows=299970 
width=10) (actual time=0.002..0.002 rows=27 loops=1)
  Output: id, val, ctid
  Remote SQL: UPDATE laurenz.test SET val = NULL::text WHERE ((id  3))
  Planning time: 0.179 ms
  Execution time: 3706.919 ms
(6 rows)

Time: 3708.272 ms

The actual time readings are surprising.
Shouldn't these similar to the actual execution time, since most of the time is 
spent
in the foreign scan node?


I was also thinkng that this is confusing to the users.  I think this is 
because the patch executes the UPDATE/DELETE statement during 
postgresBeginForeignScan, not postgresIterateForeignScan, as you 
mentioned below:



Reading the code, I noticed that the pushed down UPDATE or DELETE statement is 
executed
during postgresBeginForeignScan rather than during postgresIterateForeignScan.
It probably does not matter, but is there a reason to do it different from the 
normal scan?


I'll modify the patch so as to execute the statement during 
postgresIterateForeignScan.



It is not expected that postgresReScanForeignScan is called when the 
UPDATE/DELETE
is pushed down, right?  Maybe it would make sense to add an assertion for that.


IIUC, that is right.  As ModifyTable doesn't support rescan currently, 
postgresReScanForeignScan needn't to be called in the update pushdown 
case.  The assertion is a good idea.  I'll add it.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Fujii Masao
On Tue, Aug 26, 2014 at 5:46 AM, Andres Freund and...@anarazel.de wrote:
 On August 25, 2014 10:35:20 PM CEST, Alvaro Herrera 
 alvhe...@2ndquadrant.com wrote:
Michael Paquier wrote:
 On Tue, Aug 26, 2014 at 3:48 AM, Fujii Masao masao.fu...@gmail.com
wrote:
  On Mon, Aug 25, 2014 at 4:33 PM, Sawada Masahiko
sawada.m...@gmail.com wrote:
  this might be difficult to call this as --concurrently.
  It might need to be change the name.
 
  I'm OK to say that as --concurrently if the document clearly
  explains that restriction. Or --almost-concurrently? ;P
 By reading that I am thinking as well about a wording with lock,
 like --minimum-locks.

Why not just finish up the REINDEX CONCURRENTLY patch.

+1

 +many. Although I'm not sure if we managed to find a safe relation swap.

That safe relation swap is possible if an AccessExclusive lock is taken. Right?
That means that REINDEX CONCURRENTLY is not completely-concurrently, but
I think that many users are satisfied with even this feature.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Concurrently option for reindexdb

2014-08-25 Thread Michael Paquier
On Tue, Aug 26, 2014 at 12:28 PM, Fujii Masao masao.fu...@gmail.com wrote:
 +many. Although I'm not sure if we managed to find a safe relation swap.

Well we didn't AFAIK. With the latest patch provided I could not
really find any whole in the logic, and Andres felt that something may
be wrong miles away. If I'd revisit the patch now with a rebased
version maybe I may find smth...

 That safe relation swap is possible if an AccessExclusive lock is taken. 
 Right?
 That means that REINDEX CONCURRENTLY is not completely-concurrently, but
 I think that many users are satisfied with even this feature.

This would block as well isolation tests on this feature, something
not that welcome for a feature calling itself concurrently, but it
would deadly simplify the patch and reduce deadlock occurrences if
done right with the exclusive locks (no need to check for past
snapshots necessary when using ShareUpdateExclusiveLock?).

I left notes on the wiki the status of this patch:
https://wiki.postgresql.org/wiki/Reindex_concurrently

Reading this thread, the consensus would be to use an exclusive lock
for swap and be done. Well if there are enough votes for this approach
I wouldn't mind resending an updated patch for the next CF.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SSL renegotiation

2014-08-25 Thread Alvaro Herrera
Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-11-15 10:43:23 -0500, Tom Lane wrote:
  Another reason I'm not in a hurry is that the problem we're trying
  to solve doesn't seem to be causing real-world trouble.  So by
  awhile, I'm thinking let's let it get through 9.4 beta testing.
 
  Well, there have been a bunch of customer complaints about it, afair
  that's what made Alvaro look into it in the first place. So it's not a
  victimless bug.
 
 OK, then maybe end-of-beta is too long.  But how much testing will it get
 during development?  I know I never use SSL on development installs.
 How many hackers do?

Just a reminder that I intend to backpatch this (and subsequent fixes).
We've gone over two 9.4 betas now.  Maybe it'd be a good thing if the
beta3 announcement carried a note about enabling SSL with a low
ssl_renegotiation_limit setting.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql latency bgwriter not doing its job

2014-08-25 Thread Amit Kapila
On Tue, Aug 26, 2014 at 1:53 AM, Fabien COELHO coe...@cri.ensmp.fr wrote:


 Hello pgdevs,

 I've been playing with pg for some time now to try to reduce the maximum
latency of simple requests, to have a responsive server under small to
medium load.

 On an old computer with a software RAID5 HDD attached, pgbench simple
update script run for a some time (scale 100, fillfactor 95)

 pgbench -M prepared -N -c 2 -T 500 -P 1 ...

 gives 300 tps. However this performance is really +1000 tps for a few
seconds followed by 16 seconds at about 0 tps for the checkpoint induced IO
storm. The server is totally unresponsive 75% of the time. That's bandwidth
optimization for you. Hmmm... why not.

 Now, given this setup, if pgbench is throttled at 50 tps (1/6 the above
max):

 pgbench -M prepared -N -c 2 -R 50.0 -T 500 -P 1 ...

 The same thing more or less happens in a delayed fashion... You get 50
tps for some time, followed by sections of 15 seconds at 0 tps for the
checkpoint when the segments are full... the server is unresponsive about
10% of the time (one in ten transaction is late by more than 200 ms).

I think another thing to know here is why exactly checkpoint
storm is causing tps to drop so steeply.  One reason could be
that backends might need to write more WAL due Full_Page_Writes,
another could be contention around buffer content_lock.

To dig more about the reason, the same tests can be tried
by making Full_Page_Writes = off and/or
synchronous_commit = off to see if WAL writes are causing
tps to go down.

Similarly for checkpoints, use  checkpoint_completion_target to
spread the checkpoint_writes as suggested by Jeff as well to see
if that can mitigate the problem.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Final Patch for GROUPING SETS

2014-08-25 Thread Pavel Stehule
2014-08-26 2:45 GMT+02:00 Andrew Gierth and...@tao11.riddles.org.uk:

  Pavel == Pavel Stehule pavel.steh...@gmail.com writes:

  Pavel Hi
  Pavel I checked this patch, and it working very well

  Pavel I found only two issue - I am not sure if it is issue

  Pavel It duplicate rows

  Pavel postgres=# explain select name, place, sum(count), grouping(name),
  Pavel grouping(place) from cars group by rollup(name, place), name;
  PavelQUERY PLAN
  Pavel
 
  Pavel  GroupAggregate  (cost=101.14..101.38 rows=18
 width=68)
  PavelGrouping Sets: (name, place), (name), (name)

 I think I can safely claim from the spec that our version is correct.
 Following the syntactic transformations given in 7.9 group by clause
 of sql2008, we have:

 GROUP BY rollup(name,place), name;

 parses as  GROUP BY rollup list, ordinary grouping set

 Syntax rule 13 replaces the rollup list giving:

 GROUP BY GROUPING SETS ((name,place), (name), ()), name;

 Syntax rule 16b gives:

 GROUP BY GROUPING SETS ((name,place), (name), ()), GROUPING SETS (name);

 Syntax rule 16c takes the cartesian product of the two sets:

 GROUP BY GROUPING SETS ((name,place,name), (name,name), (name));

 Syntax rule 17 gives:

 SELECT ... GROUP BY name,place,name
 UNION ALL
 SELECT ... GROUP BY name,name
 UNION ALL
 SELECT ... GROUP BY name

 Obviously at this point the extra name columns become redundant so
 we eliminate them (this doesn't correspond to a spec rule, but doesn't
 change the semantics). So we're left with:

 SELECT ... GROUP BY name,place
 UNION ALL
 SELECT ... GROUP BY name
 UNION ALL
 SELECT ... GROUP BY name

 Running a quick test on sqlfiddle with Oracle 11 suggests that Oracle's
 behavior agrees with my interpretation.

 Nothing in the spec that I can find licenses the elimination of
 duplicate grouping sets except indirectly via feature T434 (GROUP BY
 DISTINCT ...), which we did not attempt to implement.


ok, I'll try to search in my memory to find some indices, so redundant
columns should be reduced,

Regards

Pavel



 --
 Andrew (irc:RhodiumToad)



Re: [HACKERS] SSL renegotiation

2014-08-25 Thread Noah Misch
On Mon, Aug 25, 2014 at 11:46:13PM -0400, Alvaro Herrera wrote:
 Tom Lane wrote:
  OK, then maybe end-of-beta is too long.  But how much testing will it get
  during development?  I know I never use SSL on development installs.
  How many hackers do?
 
 Just a reminder that I intend to backpatch this (and subsequent fixes).
 We've gone over two 9.4 betas now.  Maybe it'd be a good thing if the
 beta3 announcement carried a note about enabling SSL with a low
 ssl_renegotiation_limit setting.

To elaborate on my private comments of 2013-10-11, I share Robert's
wariness[1] concerning the magic number of 1024 bytes of renegotiation
headroom.  Use of that number predates your work, but your work turned
exhaustion of that headroom into a FATAL error.  Situations where the figure
is too small will become disruptive, whereas the problem is nearly invisible
today.  Network congestion is a factor, so the lack of complaints during beta
is relatively uninformative.  Disabling renegotiation is a quick workaround,
fortunately, but needing to use that workaround will damage users' fragile
faith in the safety of our minor releases.

My recommendation is to either keep this 9.4-only or do focused load testing
to determine the actual worst-case headroom requirement.

[1] 
http://www.postgresql.org/message-id/ca+tgmozvgmyzlx7e4arq_5nu4uden7wrvg1xjxg_o9c61ch...@mail.gmail.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [GSoC2014] Patch ALTER TABLE ... SET LOGGED

2014-08-25 Thread Fabrízio de Royes Mello
On Fri, Aug 22, 2014 at 5:23 PM, Alvaro Herrera alvhe...@2ndquadrant.com
wrote:

 Fabrízio de Royes Mello wrote:
  On Fri, Aug 22, 2014 at 4:45 PM, Alvaro Herrera 
alvhe...@2ndquadrant.com
  wrote:

   I pointed out, in the email just before pushing the patch, that
perhaps
   we should pass down the new relpersistence flag into finish_heap_swap,
   and from there we could pass it down to reindex_index() which is where
   it would be needed.  I'm not sure it's worth the trouble, but I think
we
   can still ask Fabrizio to rework that part.

  I can rework this part if it's a real concern.

 I guess we can make a better assessment by seeing what it would take.
 I'm afraid it will turn out to be really ugly.

 Now, there's some desire to have unlogged indexes on logged tables; I
 guess if we have that, then eventually there will also be a desire to
 swap individual indexes from logged to unlogged.  Perhaps whatever fix
 we come up with here would pave the road for that future feature.


Álvaro,

I did a refactoring to pass down the relpersistence to finish_heap_swap
and then change the catalog inside the reindex_index instead of in the
rewrite table phase.

That way we can get rid the function ATChangeIndexesPersistence in the
src/backend/commands/tablecmds.c.

Thoughts?

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index ee10594..173f412 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1973,7 +1973,7 @@ index_build(Relation heapRelation,
 	 * created it, or truncated twice in a subsequent transaction, the
 	 * relfilenode won't change, and nothing needs to be done here.
 	 */
-	if (heapRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED 
+	if (indexRelation-rd_rel-relpersistence == RELPERSISTENCE_UNLOGGED 
 		!smgrexists(indexRelation-rd_smgr, INIT_FORKNUM))
 	{
 		RegProcedure ambuildempty = indexRelation-rd_am-ambuildempty;
@@ -3099,7 +3099,7 @@ IndexGetRelation(Oid indexId, bool missing_ok)
  * reindex_index - This routine is used to recreate a single index
  */
 void
-reindex_index(Oid indexId, bool skip_constraint_checks)
+reindex_index(Oid indexId, bool skip_constraint_checks, char relpersistence)
 {
 	Relation	iRel,
 heapRelation;
@@ -3160,6 +3160,12 @@ reindex_index(Oid indexId, bool skip_constraint_checks)
 			indexInfo-ii_ExclusionStrats = NULL;
 		}
 
+		/*
+		 * Check if need to set the new relpersistence
+		 */
+		if (iRel-rd_rel-relpersistence != relpersistence)
+			iRel-rd_rel-relpersistence = relpersistence;
+
 		/* We'll build a new physical relation for the index */
 		RelationSetNewRelfilenode(iRel, InvalidTransactionId,
   InvalidMultiXactId);
@@ -3358,11 +3364,19 @@ reindex_relation(Oid relid, int flags)
 		foreach(indexId, indexIds)
 		{
 			Oid			indexOid = lfirst_oid(indexId);
+			char		relpersistence = rel-rd_rel-relpersistence;
 
 			if (is_pg_class)
 RelationSetIndexList(rel, doneIndexes, InvalidOid);
 
-			reindex_index(indexOid, !(flags  REINDEX_REL_CHECK_CONSTRAINTS));
+			/* Check for flags to force UNLOGGED or PERMANENT persistence */
+			if (flags  REINDEX_REL_FORCE_UNLOGGED)
+relpersistence = RELPERSISTENCE_UNLOGGED;
+			else if (flags  REINDEX_REL_FORCE_PERMANENT)
+relpersistence = RELPERSISTENCE_PERMANENT;
+
+			reindex_index(indexOid, !(flags  REINDEX_REL_CHECK_CONSTRAINTS),
+		  relpersistence);
 
 			CommandCounterIncrement();
 
diff --git a/src/backend/commands/cluster.c b/src/backend/commands/cluster.c
index ff80b09..f285026 100644
--- a/src/backend/commands/cluster.c
+++ b/src/backend/commands/cluster.c
@@ -588,7 +588,8 @@ rebuild_relation(Relation OldHeap, Oid indexOid, bool verbose)
 	 */
 	finish_heap_swap(tableOid, OIDNewHeap, is_system_catalog,
 	 swap_toast_by_content, false, true,
-	 frozenXid, cutoffMulti);
+	 frozenXid, cutoffMulti,
+	 OldHeap-rd_rel-relpersistence);
 }
 
 
@@ -1474,7 +1475,8 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
  bool check_constraints,
  bool is_internal,
  TransactionId frozenXid,
- MultiXactId cutoffMulti)
+ MultiXactId cutoffMulti,
+ char newrelpersistence)
 {
 	ObjectAddress object;
 	Oid			mapped_tables[4];
@@ -1518,6 +1520,12 @@ finish_heap_swap(Oid OIDOldHeap, Oid OIDNewHeap,
 	reindex_flags = REINDEX_REL_SUPPRESS_INDEX_USE;
 	if (check_constraints)
 		reindex_flags |= REINDEX_REL_CHECK_CONSTRAINTS;
+
+	if (newrelpersistence == RELPERSISTENCE_UNLOGGED)
+		reindex_flags |= REINDEX_REL_FORCE_UNLOGGED;
+	else if (newrelpersistence == RELPERSISTENCE_PERMANENT)
+		reindex_flags |= REINDEX_REL_FORCE_PERMANENT;
+
 	reindex_relation(OIDOldHeap, reindex_flags);
 
 	/*
diff --git a/src/backend/commands/indexcmds.c