Re: [HACKERS] Proposal: Log inability to lock pages during vacuum

2014-11-11 Thread Andres Freund
On 2014-11-10 19:36:18 -0600, Jim Nasby wrote:
 On 11/10/14, 12:56 PM, Andres Freund wrote:
 On 2014-11-10 12:37:29 -0600, Jim Nasby wrote:
 On 11/10/14, 12:15 PM, Andres Freund wrote:
 If what we want is to quantify the extent of the issue, would it be more
 convenient to save counters to pgstat?  Vacuum already sends pgstat
 messages, so there's no additional traffic there.
 I'm pretty strongly against that one in isolation. They'd need to be
 stored somewhere and they'd need to be queryable somewhere with enough
 context to make sense.  To actually make sense of the numbers we'd also
 need to report all the other datapoints of vacuum in some form. That's
 quite a worthwile project imo - but*much*  *much*  more work than this.
 
 We already report statistics on vacuums
 (lazy_vacuum_rel()/pgstat_report_vacuum), so this would just be adding
 1 or 2 counters to that. Should we add the other counters from vacuum?
 That would be significantly more data.
 
 At the very least it'd require:
 * The number of buffers skipped due to the vm
 * The number of buffers actually scanned
 * The number of full table in contrast to partial vacuums
 
 If we're going to track full scan vacuums separately, I think we'd
 need two separate scan counters.

Well, we already have the entire number of vacuums, so we'd have that.

 I think (for pgstats) it'd make more sense to just count initial
 failure to acquire the lock in a full scan in the 'skipped page'
 counter. In terms of answering the question how common is it not to
 get the lock, it's really the same event.

It's absolutely not. You need to correlate the number of skipped pages
to the number of vacuumed pages. If you have 100k skipped in 10 billion
total scanned pages it's something entirely different than 100k in 200k
pages.

 Honestly, my desire at this point is just to see if there's actually a
 problem. Many people are asserting that this should be a very rare
 occurrence, but there's no way to know.

Ok.

 Towards that simple end, I'm a bit torn. My preference would be to
 simply log, and throw a warning if it's over some threshold. I believe
 that would give the best odds of getting feedback from users if this
 isn't as uncommon as we think.

I'm strongly against a warning. We have absolutely no sane way of tuning
that. We'll just create a pointless warning that people will get
confused about and that they'll have to live with till the next release.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-11-11 Thread Michael Paquier
On Mon, Nov 10, 2014 at 5:26 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 I'll go through the patch once again a bit later, but feel free to comment.
Reading again the patch with a fresher mind, I am not sure if the
current approach taken is really the best one. What the patch does now
is looking at the header of the first backup block, and then
compresses the rest, aka the other blocks, up to 4, and their headers,
up to 3. I think that we should instead define an extra bool flag in
XLogRecord to determine if the record is compressed, and then use this
information. Attaching the compression status to XLogRecord is more
in-line with the fact that all the blocks are compressed, and not each
one individually, so we basically now duplicate an identical flag
value in all the backup block headers, which is a waste IMO.
Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-11 Thread Greg Stark
On Tue, Nov 11, 2014 at 2:14 AM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 I'm not sure why you say opaque is not associated with the specific
 scan.  Are you thinking we could reuse opaque for a future scan?  I
 think we could consider that opaque *is* the place to cache things such
 as the hashed value of the qual constants or whatever.

Oh. I guess this goes back to my original suggestion that the API docs
need to explain some sense of when OpcInfo is called. I didn't realize
it was tied to a specific scan. This does raise the question of why
the scan information isn't available in OpcInfo though. That would let
me build the hash value in a natural place instead of having to do it
lazily which I find significantly more awkward.

Is it possible for scan keys to change between calls for nested loop
joins or quirky SQL with volatile functions in the scan or anything? I
guess that would prevent the index scan from being used at all. But I
can be reassured the Opcinfo call will be called again when a cached
plan is reexecuted? Stable functions might have new values in a
subsequent execution even if the plan hasn't changed at all for
example.


 That's to test the Union procedure; if you look at the code, it's just
 used in assert-enabled builds.  Now that I think about it, perhaps this
 can turn out to be problematic for your bloom filter opclass.  I
 considered the idea of allowing the opclass to disable this testing
 procedure, but it isn't done (yet.)

No, it isn't a problem for my opclass other than performance, it was
quite helpful in turning up bugs early in fact. It was just a bit
confusing because I was trying to test things one by one and it turned
out the assertion checks meant a simple insert turned up bugs in Union
which I hadn't expected. But it seems perfectly sensible in an
assertion check.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-11-11 Thread Amit Langote
On Tue, Nov 11, 2014 at 5:10 PM, Michael Paquier
michael.paqu...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 5:26 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
 I'll go through the patch once again a bit later, but feel free to comment.
 Reading again the patch with a fresher mind, I am not sure if the
 current approach taken is really the best one. What the patch does now
 is looking at the header of the first backup block, and then
 compresses the rest, aka the other blocks, up to 4, and their headers,
 up to 3. I think that we should instead define an extra bool flag in
 XLogRecord to determine if the record is compressed, and then use this
 information. Attaching the compression status to XLogRecord is more
 in-line with the fact that all the blocks are compressed, and not each
 one individually, so we basically now duplicate an identical flag
 value in all the backup block headers, which is a waste IMO.
 Thoughts?

I think this was changed based on following, if I am not wrong.

http://www.postgresql.org/message-id/54297a45.8080...@vmware.com

Regards,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-11 Thread Simon Riggs
On 10 November 2014 15:57, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Oct 15, 2014 at 2:55 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Something usable, with severe restrictions, is actually better than we
 have now. I understand the journey this work represents, so don't be
 embarrassed by submitting things with heuristics and good-enoughs in
 it. Our mentor, Mr.Lane, achieved much by spreading work over many
 releases, leaving others to join in the task.

 It occurs to me that, now that the custom-scan stuff is committed, it
 wouldn't be that hard to use that, plus the other infrastructure we
 already have, to write a prototype of parallel sequential scan.

+1

 Given
 where we are with the infrastructure, there would be a number of
 unhandled problems, such as deadlock detection (needs group locking or
 similar), assessment of quals as to parallel-safety (needs
 proisparallel or similar), general waterproofing to make sure that
 pushing down a qual we shouldn't does do anything really dastardly
 like crash the server (another written but yet-to-be-published patch
 adds a bunch of relevant guards), and snapshot sharing (likewise).
 But if you don't do anything weird, it should basically work.

If we build something fairly restricted, but production ready we can
still do something useful to users.

* only functions marked as CONTAINS NO SQL
* parallel_workers = 2 or fixed
* only one Plan type, designed to maximise benefit and minimize optimizer change

Why?

* The above is all that is needed to test any infrastructure changes
we accept. We need both infrastructure and tests. We could do this as
a custom scan plugin, but why not make it work in core - that doesn't
prevent a plugin version also if you desire it.

* only functions marked as CONTAINS NO SQL
We don't really know what proisparallel is, but we do know what
CONTAINS NO SQL means and can easily check for it.
Plus I already have a patch for this, slightly bitrotted.

* parallel_workers = 2 (or at least not make it user settable)
By fixing the number of workers at 2 we avoid any problems caused by
having N variable, such as how to vary N fairly amongst users and
other such considerations. We get the main benefit of parallelism,
without causing other issues across the server.

* Fixed Plan: aggregate-scan
To make everything simpler, allow only plans of a single type.
 SELECT something, list of aggregates
 FROM foo
 WHERE filters
 GROUP BY something
because we know that passing large amounts of data from worker to
master process will be slow, so focusing only on seq scan is not
sensible; we should focus on plans that significantly reduce the
number of rows passed upwards. We could just do this for very
selective WHERE clauses, but that is not an important class of query.
As soon as include aggregates, we reduce data passing significantly
AND we hit a very important subset of queries:

This plan type is widely used in reporting queries, so will hit the
mainline of BI applications and many Mat View creations.
This will allow SELECT count(*) FROM foo to go faster also.

The execution plan for that query type looks like this...
Hash Aggregate
   Gather From Workers
  {Worker Nodes workers = 2
HashAggregate
PartialScan}

Which is simple enough that we include a mechanism for the Gather
operation, plus it is simple enough to not need extensive optimizer
changes - just changes to set_plain_rel_pathlist() to consider
parallel plans.

Plan costs are easily to calculate for the above...
 cpu_worker_startup_cost = 100 -- Startup cost is easy to calculate by
observation, but a reasonably large default will be OK
 cpu_ipc_tuple_cost = 0.1 -- assume x10 normal cost of cpu_tuple_cost
Partial scan costs are just same as SeqScan, just with fewer blocks.
All other costs are the same

We can submit the main patch by Dec 15, fix all the problems by Feb 15.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-11-11 Thread Heikki Linnakangas

On 11/11/2014 09:39 AM, Michael Paquier wrote:

On Tue, Nov 11, 2014 at 4:29 AM, Heikki Linnakangas hlinnakan...@vmware.com

wrote:



Attached is a new version. It's rebased on current git master, including
BRIN. I've also fixed the laundry list of small things you reported, as
well as a bunch of bugs I uncovered during my own testing.


This patch needs a small rebase, it has been broken by a590f266 that fixed
WAL replay for brin indexes:
patching file src/backend/access/brin/brin_xlog.c
Hunk #2 FAILED at 42.
Hunk #3 FAILED at 91.
This will facilitate testing as well.


Here's a rebased patch. No other changes.

- Heikki



wal-format-and-api-changes-8.patch.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-11-11 Thread Rahila Syed
I think this was changed based on following, if I am not wrong. 

http://www.postgresql.org/message-id/54297A45.8080904@...
Yes this change is the result of the above complaint.

Attaching the compression status to XLogRecord is more 
in-line with the fact that all the blocks are compressed, and not each 
one individually, so we basically now duplicate an identical flag 
value in all the backup block headers, which is a waste IMO. 
Thoughts? 

If I understand your point correctly, as all blocks are compressed, adding
compression attribute to XLogRecord surely makes more sense if the record
contains backup blocks . But in case of XLOG records without backup blocks
the compression attribute in record header might not make much sense.

Attaching the status of compression to XLogRecord will mean that the status
is duplicated across all records. It will mean that it is an attribute of
all the records when it is only an attribute of records with backup blocks
or the attribute of backup blocks. 
The current approach is adopted with this thought.


Regards,
Rahila Syed



--
View this message in context: 
http://postgresql.nabble.com/Compression-of-full-page-writes-tp5769039p5826487.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-11-11 Thread Andres Freund
On 2014-11-11 17:10:01 +0900, Michael Paquier wrote:
 On Mon, Nov 10, 2014 at 5:26 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:
  I'll go through the patch once again a bit later, but feel free to comment.

 Reading again the patch with a fresher mind, I am not sure if the
 current approach taken is really the best one. What the patch does now
 is looking at the header of the first backup block, and then
 compresses the rest, aka the other blocks, up to 4, and their headers,
 up to 3. I think that we should instead define an extra bool flag in
 XLogRecord to determine if the record is compressed, and then use this
 information. Attaching the compression status to XLogRecord is more
 in-line with the fact that all the blocks are compressed, and not each
 one individually, so we basically now duplicate an identical flag
 value in all the backup block headers, which is a waste IMO.

I don't buy the 'waste' argument. If there's a backup block those up
bytes won't make a noticeable difference. But for the majority of record
where there's no backup blocks it will.


The more important thing here is that I see little chance of this
getting in before Heikki's larger rework of the wal format gets
in. Since that'll change everything around anyay I'm unsure how much
point there is to iterate till that's done. I know that sucks, but I
don't see much of an alternative.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Scaling shared buffer eviction

2014-11-11 Thread Thom Brown
On 26 September 2014 12:40, Amit Kapila amit.kapil...@gmail.com wrote:

  On Tue, Sep 23, 2014 at 10:31 AM, Robert Haas robertmh...@gmail.com
 wrote:
 
  But this gets at another point: the way we're benchmarking this right
  now, we're really conflating the effects of three different things:
 
  1. Changing the locking regimen around the freelist and clocksweep.
  2. Adding a bgreclaimer process.
  3. Raising the number of buffer locking partitions.

 First of all thanks for committing part-1 of this changes and it
 seems you are planing to commit part-3 based on results of tests
 which Andres is planing to do and for remaining part (part-2), today


Were parts 2 and 3 committed in the end?

-- 
Thom


Re: [HACKERS] Scaling shared buffer eviction

2014-11-11 Thread Andres Freund
On 2014-11-11 09:29:22 +, Thom Brown wrote:
 On 26 September 2014 12:40, Amit Kapila amit.kapil...@gmail.com wrote:
 
   On Tue, Sep 23, 2014 at 10:31 AM, Robert Haas robertmh...@gmail.com
  wrote:
  
   But this gets at another point: the way we're benchmarking this right
   now, we're really conflating the effects of three different things:
  
   1. Changing the locking regimen around the freelist and clocksweep.
   2. Adding a bgreclaimer process.
   3. Raising the number of buffer locking partitions.
 
  First of all thanks for committing part-1 of this changes and it
  seems you are planing to commit part-3 based on results of tests
  which Andres is planing to do and for remaining part (part-2), today
 
 
 Were parts 2 and 3 committed in the end?

3 was committed. 2 wasn't because it's not yet clear whether how
beneficial it is generally.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw behaves oddly

2014-11-11 Thread Etsuro Fujita

Hi Ashutosh,

Thanks for the review!

(2014/11/10 20:05), Ashutosh Bapat wrote:

Since two separate issues 1. using reltargetlist instead of attr_needed
and 2. system columns usage in FDW are being tackled here, we should
separate the patch into two one for each of the issues.


Agreed.  Will split the patch into two.


While I agree that the system columns shouldn't be sent to the remote
node, it doesn't look clear to me as to what would they or their values
mean in the context of foreign data. Some columns like tableoid would
have a value which is the OID of the foreign table, other system columns
like xmin/xmax/ctid will have different meanings (or no meaning)
depending upon the foreign data source. In case of later columns, each
FDW would have its own way of defining that meaning (I guess). But in
any case, I agree that we shouldn't send the system columns to the
remote side.

Is there a way we can enforce this rule across all the FDWs? OR we want
to tackle it separately per FDW?


ISTM it'd be better to tackle it separately because I feel that such an 
enforcement by the PG core would go too far.  I'm also concerned about a 
possibility of impedance mismatching between such an enforcement and 
postgres_fdw, which sends the ctid column to the remote side internally 
when executing UPDATE/DELETE on foreign tables.  See 
postgresPlanForeignModify and eg, deparseUpdateSql.


Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-11-11 Thread Etsuro Fujita

(2014/11/11 2:31), Fujii Masao wrote:

On Mon, Nov 10, 2014 at 4:15 PM, Etsuro Fujita

The patch looks good to me except for the following point:



*** a/src/backend/access/gin/ginfast.c
--- b/src/backend/access/gin/ginfast.c
***
*** 25,30 
--- 25,32 
   #include utils/memutils.h
   #include utils/rel.h

+ /* GUC parameter */
+ int   pending_list_cleanup_size = 0;

I think we need to initialize the GUC to boot_val, 4096 in this case.


No, IIUC basically the variable for GUC doesn't need to be initialized
to its default value. OTOH, it's also harmless to initialize it to the default.
I like the current code a bit because we don't need to change the initial
value again when we decide to change the default value of GUC.
I have no strong opinion about this, though.


OK, so if there are no objections of others, I'll mark this as Ready 
for Committer.


Thanks,

Best regards,
Etsuro Fujita


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_prewarm really needs some CHECK_FOR_INTERRUPTS

2014-11-11 Thread Andres Freund
Hi,

pg_prewarm() currently can't be cannot be interrupted - which seems odd
given that it's intended to read large amounts of data from disk. A
rather slow process.

Unless somebody protests I'm going to add a check to the top of each of
the three loops.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-11-11 Thread Abhijit Menon-Sen
At 2014-11-04 14:40:36 +0100, and...@2ndquadrant.com wrote:

 On 2014-11-04 08:21:13 -0500, Robert Haas wrote:
  
  Are you going to get the slice-by-N stuff working next, to speed
  this up?
 
 I don't plan to do anything serious with it, but I've hacked up the
 crc code to use the hardware instruction.

I'm working on this (first speeding up the default calculation using
slice-by-N, then adding support for the SSE4.2 CRC instruction on top).

-- Abhijit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_prewarm really needs some CHECK_FOR_INTERRUPTS

2014-11-11 Thread Michael Paquier
On Tue, Nov 11, 2014 at 8:17 PM, Andres Freund and...@2ndquadrant.com wrote:
 pg_prewarm() currently can't be cannot be interrupted - which seems odd
 given that it's intended to read large amounts of data from disk. A
 rather slow process.

 Unless somebody protests I'm going to add a check to the top of each of
 the three loops.
Good idea, +1.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-11-11 Thread Fujii Masao
On Tue, Nov 11, 2014 at 7:01 PM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 (2014/11/11 2:31), Fujii Masao wrote:

 On Mon, Nov 10, 2014 at 4:15 PM, Etsuro Fujita

 The patch looks good to me except for the following point:


 *** a/src/backend/access/gin/ginfast.c
 --- b/src/backend/access/gin/ginfast.c
 ***
 *** 25,30 
 --- 25,32 
#include utils/memutils.h
#include utils/rel.h

 + /* GUC parameter */
 + int   pending_list_cleanup_size = 0;

 I think we need to initialize the GUC to boot_val, 4096 in this case.


 No, IIUC basically the variable for GUC doesn't need to be initialized
 to its default value. OTOH, it's also harmless to initialize it to the
 default.
 I like the current code a bit because we don't need to change the initial
 value again when we decide to change the default value of GUC.
 I have no strong opinion about this, though.


 OK, so if there are no objections of others, I'll mark this as Ready for
 Committer.

I just pushed this. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-11 Thread Alvaro Herrera
Greg Stark wrote:
 On Tue, Nov 11, 2014 at 2:14 AM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  I'm not sure why you say opaque is not associated with the specific
  scan.  Are you thinking we could reuse opaque for a future scan?  I
  think we could consider that opaque *is* the place to cache things such
  as the hashed value of the qual constants or whatever.
 
 Oh. I guess this goes back to my original suggestion that the API docs
 need to explain some sense of when OpcInfo is called. I didn't realize
 it was tied to a specific scan. This does raise the question of why
 the scan information isn't available in OpcInfo though. That would let
 me build the hash value in a natural place instead of having to do it
 lazily which I find significantly more awkward.

Hmm.  OpcInfo is also called in contexts other than scans, though, so
passing down scan keys into it seems wrong.  Maybe we do need another
amproc that initializes the scan for the opclass, which would get
whatever got returned from opcinfo as well as scankeys.  There you would
have the opportunity to run the hash and store it into the opaque.

 Is it possible for scan keys to change between calls for nested loop
 joins or quirky SQL with volatile functions in the scan or anything? I
 guess that would prevent the index scan from being used at all. But I
 can be reassured the Opcinfo call will be called again when a cached
 plan is reexecuted? Stable functions might have new values in a
 subsequent execution even if the plan hasn't changed at all for
 example.

As far as I understand, the scan keys don't change within any given
scan; if they do, the rescan AM method is called, at which point we
should reset whatever is cached about the previous scan.

  That's to test the Union procedure; if you look at the code, it's just
  used in assert-enabled builds.  Now that I think about it, perhaps this
  can turn out to be problematic for your bloom filter opclass.  I
  considered the idea of allowing the opclass to disable this testing
  procedure, but it isn't done (yet.)
 
 No, it isn't a problem for my opclass other than performance, it was
 quite helpful in turning up bugs early in fact. It was just a bit
 confusing because I was trying to test things one by one and it turned
 out the assertion checks meant a simple insert turned up bugs in Union
 which I hadn't expected. But it seems perfectly sensible in an
 assertion check.

Great, thanks.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-11 Thread Greg Stark
On Tue, Nov 11, 2014 at 12:12 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 As far as I understand, the scan keys don't change within any given
 scan; if they do, the rescan AM method is called, at which point we
 should reset whatever is cached about the previous scan.

But am I guaranteed that rescan will throw away the opcinfo struct and
its opaque element? I guess that's the heart of the uncertainty I had.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add generate_series(numeric, numeric)

2014-11-11 Thread Fujii Masao
On Fri, Nov 7, 2014 at 3:19 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Oct 14, 2014 at 3:22 PM, Michael Paquier
 michael.paqu...@gmail.com wrote:


 On Tue, Oct 7, 2014 at 8:38 AM, Ali Akbar the.ap...@gmail.com wrote:

 2014-10-06 22:51 GMT+07:00 Marti Raudsepp ma...@juffo.org:



  the one that tests values just before numeric overflow

 Actually I don't know if that's too useful. I think you should add a
 test case that causes an error to be thrown.


 Actually i added the test case because in the code, when adding step into
 current for the last value, i expected it to overflow:

 /* increment current in preparation for next iteration */
 add_var(fctx-current, fctx-step, fctx-current);

 where in the last calculation, current is 9 * 10^131071. Plus 10^131071,
 it will be 10^131072, which i expected to overflow numeric type (in the doc,
 numeric's range is up to 131072 digits before the decimal point).

 In attached patch, i narrowed the test case to produce smaller result.


 Well, as things stand now, the logic relies on cmp_var and the sign of the
 stop and current values. it is right that it would be better to check for
 overflow before going through the next iteration, and the cleanest and
 cheapest way to do so would be to move the call to make_result after add_var
 and save the result variable in the function context, or something similar,
 but honestly I am not sure it is worth the complication as it would mean
 some refactoring on what make_result actually already does.

 I looked at this patch again a bit and finished with the attached, adding an
 example in the docs, refining the error messages and improving a bit the
 regression tests. I have nothing more to comment, so I am marking this patch
 as ready for committer.

 The patch looks good to me. Barring any objection I will commit the patch.
 Memo for me: CATALOG_VERSION_NO must be changed at the commit.

Pushed.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-11 Thread Alvaro Herrera
Greg Stark wrote:
 On Tue, Nov 11, 2014 at 12:12 PM, Alvaro Herrera
 alvhe...@2ndquadrant.com wrote:
  As far as I understand, the scan keys don't change within any given
  scan; if they do, the rescan AM method is called, at which point we
  should reset whatever is cached about the previous scan.
 
 But am I guaranteed that rescan will throw away the opcinfo struct and
 its opaque element? I guess that's the heart of the uncertainty I had.

Well, it should, and if not that's a bug, which should be fixed by the
attached (untested) patch.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services
diff --git a/src/backend/access/brin/brin.c b/src/backend/access/brin/brin.c
index bd35cf6..441127f 100644
--- a/src/backend/access/brin/brin.c
+++ b/src/backend/access/brin/brin.c
@@ -519,6 +519,7 @@ brinrescan(PG_FUNCTION_ARGS)
 {
 	IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
 	ScanKey		scankey = (ScanKey) PG_GETARG_POINTER(1);
+	BrinOpaque *opaque;
 
 	/* other arguments ignored */
 
@@ -526,6 +527,14 @@ brinrescan(PG_FUNCTION_ARGS)
 		memmove(scan-keyData, scankey,
 scan-numberOfKeys * sizeof(ScanKeyData));
 
+	/*
+	 * Reinitialize the opaque struct; some opclasses might choose to
+	 * cache per-scan info in there.
+	 */
+	opaque = (BrinOpaque *) scan-opaque;
+	brin_free_desc(opaque-bo_bdesc);
+	opaque-bo_bdesc = brin_build_desc(scan-indexRelation);
+
 	PG_RETURN_VOID();
 }
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-11 Thread Greg Stark
It might be clearer to have an opclassinfo and a scaninfo which can
store information in separate opc_opaque and scan_opaque fields with
distinct liftetimes.

In the bloom filter case the longlived info is the (initial?) size of
the bloom filter and the number of hash functions. But I still haven't
determined how much it will cost to recalculate them. Right now
they're just hard coded so it doesn't hurt to do it on every rescan
but if it involves peeking at the index reloptions or stats that might
be impractical.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-11 Thread Alvaro Herrera
Greg Stark wrote:
 It might be clearer to have an opclassinfo and a scaninfo which can
 store information in separate opc_opaque and scan_opaque fields with
 distinct liftetimes.
 
 In the bloom filter case the longlived info is the (initial?) size of
 the bloom filter and the number of hash functions. But I still haven't
 determined how much it will cost to recalculate them. Right now
 they're just hard coded so it doesn't hurt to do it on every rescan
 but if it involves peeking at the index reloptions or stats that might
 be impractical.

Patches welcome :-)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_prewarm really needs some CHECK_FOR_INTERRUPTS

2014-11-11 Thread Robert Haas
On Tue, Nov 11, 2014 at 6:17 AM, Andres Freund and...@2ndquadrant.com wrote:
 pg_prewarm() currently can't be cannot be interrupted - which seems odd
 given that it's intended to read large amounts of data from disk. A
 rather slow process.

Oops.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-11-11 Thread Robert Haas
On Tue, Nov 11, 2014 at 7:11 AM, Fujii Masao masao.fu...@gmail.com wrote:
 OK, so if there are no objections of others, I'll mark this as Ready for
 Committer.

 I just pushed this. Thanks!

Not to kibitz too much after-the-fact, but wouldn't it be better to
give this a name that has GIN in it somewhere?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN indexes - TRAP: BadArgument

2014-11-11 Thread Greg Stark
On Tue, Nov 11, 2014 at 1:04 PM, Alvaro Herrera
alvhe...@2ndquadrant.com wrote:
 It might be clearer ...

 Patches welcome :-)

Or perhaps there could still be a single opaque field but have two
optional opclass methods scaninit and rescan which allow the op
class to set or reset whichever fields inside opaque that need to be
reset.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [v9.5] Custom Plan API

2014-11-11 Thread Robert Haas
On Tue, Nov 11, 2014 at 12:33 AM, Amit Kapila amit.kapil...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 6:33 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Nov 10, 2014 at 6:55 AM, Amit Kapila amit.kapil...@gmail.com
 wrote:
  I thought that in general if user has the API to register the custom
  path
  methods, it should have some way to unregister them and also user might
  need to register some different custom path methods after unregistering
  the previous one's.  I think we should see what Robert or others have to
  say about this point before trying to provide such an API.

 I wouldn't bother.  As KaiGai says, if you want to shut the
 functionality off, the provider itself can provide a GUC.  Also, we
 really have made no effort to ensure that loadable modules can be
 safely unloaded, or hooked functions safely-unhooked.
 ExecutorRun_hook is a good example.  Typical of hook installation is
 this:

 prev_ExecutorRun = ExecutorRun_hook;
 ExecutorRun_hook = pgss_ExecutorRun;


 In this case, Extension takes care of register and unregister for
 hook.  In _PG_init(), it register the hook and _PG_fini() it
 unregisters the same.

The point is that there's nothing that you can do _PG_fini() that will
work correctly. If it does ExecutorRun_hook = prev_ExecutorRun, that's
fine if it's the most-recently-installed hook.  But if it isn't, then
doing so corrupts the list.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plpgsql plan changes causing failure after repeated invocation

2014-11-11 Thread Merlin Moncure
I chased down a problem today where users were reporting sporadic
failures in the application.   Turns out, the function would work
exactly 5 times and then fail; this is on 9.2.  I think I understand
why this is happening and I'm skeptical it's a bug in postgres, but I
thought I'd socialize it.

What's happening here is a query structured like this, somewhat deep
into a pl/pgsql function:

SELECT row_to_json(q) FROM
(
  SELECT *
  FROM
  (
complex_inner_query
  ) q
  LEFT JOIN foo f ON
_plpgsql_var != 'xxx'
AND (
  (_plpgsql_var = 'yyy' and q.data::int = foo.foo_id)
  OR (_plpgsql_var = 'zzz' and q.data = _other_var)
)
) q;

What is happening, along with some triggers I don't completely
understand (this problem started hitting when I made an unrelated
change in the function) is that the cast (q.data::int) started to
fail.  In cases where _plpgsql_var is not 'yyy', the cast was getting
applied where previously it did not.

The workaround was simple, insert a case statement so that q.data::int
becomes CASE WHEN _plpgsql_var = 'yyy' THEN q.data::int ELSE NULL END.
That being said, it does bring up some interesting points.

*) relying on A being checked first in 'A OR B' is obviously not
trustworthy, and it shouldn't be.  Generally I assume the planner will
do the cheaper of the two first (along with some extra encouragement
to put it on the left side), but this can't be relied upon.

*) It's possible to write queries so that they will fail depending on
plan choice. This is not good, and should be avoided when possible
(the query isn't great I'll admit), but the interaction with execution
count is a little unpleasant.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql plan changes causing failure after repeated invocation

2014-11-11 Thread Albe Laurenz
Merlin Moncure wrote:
 I chased down a problem today where users were reporting sporadic
 failures in the application.   Turns out, the function would work
 exactly 5 times and then fail; this is on 9.2.  I think I understand
 why this is happening and I'm skeptical it's a bug in postgres, but I
 thought I'd socialize it.
 
 What's happening here is a query structured like this, somewhat deep
 into a pl/pgsql function:

[...]
   (_plpgsql_var = 'yyy' and q.data::int = foo.foo_id)
[...]

 What is happening, along with some triggers I don't completely
 understand (this problem started hitting when I made an unrelated
 change in the function) is that the cast (q.data::int) started to
 fail.  In cases where _plpgsql_var is not 'yyy', the cast was getting
 applied where previously it did not.
 
 The workaround was simple, insert a case statement so that q.data::int
 becomes CASE WHEN _plpgsql_var = 'yyy' THEN q.data::int ELSE NULL END.
 That being said, it does bring up some interesting points.
 
 *) relying on A being checked first in 'A OR B' is obviously not
 trustworthy, and it shouldn't be.  Generally I assume the planner will
 do the cheaper of the two first (along with some extra encouragement
 to put it on the left side), but this can't be relied upon.
 
 *) It's possible to write queries so that they will fail depending on
 plan choice. This is not good, and should be avoided when possible
 (the query isn't great I'll admit), but the interaction with execution
 count is a little unpleasant.

This must be the custom plan feature added in 9.2 switching over to
a generic plan after 5 executions.

But you are right, it is not nice.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-11-11 Thread Fujii Masao
On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Nov 11, 2014 at 7:11 AM, Fujii Masao masao.fu...@gmail.com wrote:
 OK, so if there are no objections of others, I'll mark this as Ready for
 Committer.

 I just pushed this. Thanks!

 Not to kibitz too much after-the-fact, but wouldn't it be better to
 give this a name that has GIN in it somewhere?

Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format and API changes (9.5)

2014-11-11 Thread Amit Kapila
On Tue, Nov 11, 2014 at 2:15 PM, Heikki Linnakangas hlinnakan...@vmware.com
wrote:z

 On 11/11/2014 09:39 AM, Michael Paquier wrote:

 On Tue, Nov 11, 2014 at 4:29 AM, Heikki Linnakangas 
hlinnakan...@vmware.com

 wrote:


 Attached is a new version. It's rebased on current git master, including
 BRIN. I've also fixed the laundry list of small things you reported, as
 well as a bunch of bugs I uncovered during my own testing.


 This patch needs a small rebase, it has been broken by a590f266 that
fixed
 WAL replay for brin indexes:
 patching file src/backend/access/brin/brin_xlog.c
 Hunk #2 FAILED at 42.
 Hunk #3 FAILED at 91.
 This will facilitate testing as well.


 Here's a rebased patch. No other changes.


I have done some performance testing of this patch using attached
script and data is as below:


Performance Data

IBM POWER-8 24 cores, 192 hardware threads
RAM = 492GB
checkpoint_segments=300
checkpoint_timeout=30min
full_page_writes = off

HEAD (commit - ae667f7) -

testname | wal_generated | duration
-+---+--
 two short fields, no change | 398510104 |  11.993057012558
 two short fields, no change | 396984168 | 10.3508098125458
 two short fields, no change | 396983624 | 10.4196150302887
 two short fields, one changed   | 437106016 |  10.650365114212
 two short fields, one changed   | 437102456 |  10.756795167923
 two short fields, one changed   | 437103000 | 10.6824090480804
 two short fields, both changed  | 438131520 | 11.1027519702911
 two short fields, both changed  | 437112248 | 11.0681071281433
 two short fields, both changed  | 437107168 | 11.0370399951935
 one short and one long field, no change |  76044176 | 2.90529608726501
 one short and one long field, no change |  76047432 | 2.86844515800476
 one short and one long field, no change |  76042664 | 2.84641098976135
 ten tiny fields, all changed| 478210904 | 13.1221458911896
 ten tiny fields, all changed| 47728 | 12.8017349243164
 ten tiny fields, all changed| 477217832 | 12.9907081127167
 hundred tiny fields, all changed| 180353400 | 6.03354096412659
 hundred tiny fields, all changed| 180349536 | 5.99893379211426
 hundred tiny fields, all changed| 180350064 | 6.00634717941284
 hundred tiny fields, half changed   | 180348480 | 6.03162407875061
 hundred tiny fields, half changed   | 180348440 | 6.08004903793335
 hundred tiny fields, half changed   | 180349056 | 6.03126788139343
 hundred tiny fields, half nulled| 101369936 | 5.43424606323242
 hundred tiny fields, half nulled| 101660160 | 5.06207084655762
 hundred tiny fields, half nulled| 100119184 | 5.51814889907837
 9 short and 1 long, short changed   | 108142168 | 3.26260495185852
 9 short and 1 long, short changed   | 108138144 | 3.20250797271729
 9 short and 1 long, short changed   | 109761240 | 3.21728992462158
(27 rows)


Patch-

testname | wal_generated | duration
-+---+--
 two short fields, no change | 398692848 | 11.2538840770721
 two short fields, no change | 396988648 | 11.4972231388092
 two short fields, no change | 396988416 | 11.0026741027832
 two short fields, one changed   | 437104840 | 11.1911199092865
 two short fields, one changed   | 437103832 | 11.1138079166412
 two short fields, one changed   | 437101872 | 11.3141660690308
 two short fields, both changed  | 437133648 | 11.5836050510406
 two short fields, both changed  | 437106640 | 11.5675358772278
 two short fields, both changed  | 437104944 | 11.6147150993347
 one short and one long field, no change |  77226464 | 3.01720094680786
 one short and one long field, no change |  77645248 |  2.9660210609436
 one short and one long field, no change |  76045256 | 3.03326010704041
 ten tiny fields, all changed| 477396616 | 13.3963651657104
 ten tiny fields, all changed| 477219840 | 13.3838939666748
 ten tiny fields, all changed| 477223960 | 13.5736708641052
 hundred tiny fields, all changed| 180742136 | 5.87386584281921
 hundred tiny fields, all changed| 180398320 | 6.12072706222534
 hundred tiny fields, all changed| 180354080 | 6.16246104240417
 hundred tiny fields, half changed   | 180351456 | 6.17317008972168
 hundred tiny fields, half changed   | 180348096 | 6.13790798187256
 hundred tiny fields, half changed   | 183369688 | 6.19142913818359
 hundred 

[HACKERS] 9.4RC1 next week

2014-11-11 Thread Tom Lane
We need to get moving if we want to have RC1 out before the holiday season
starts.  Accordingly, the core committee has agreed that we should wrap it
next week (usual timing: wrap Monday 17th for announcement Thursday 20th).

According to
https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items
the only open release-blocking issue is what are we going to do about
the incorrect CRC calculation in logical-replication files.  My own
vote is to Just Fix It, but in any case we need to make a decision
this week.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Defining dedicated macro to grab a relation's persistence

2014-11-11 Thread Fabrízio de Royes Mello
On Fri, Nov 7, 2014 at 11:12 AM, Andres Freund and...@2ndquadrant.com
wrote:

 Hi,

 On 2014-11-07 22:08:33 +0900, Michael Paquier wrote:
  After looking at a patch of this commit fest using
  rd_rel-relpersistence, I got a look at how many times this expression
  was being used directly in the backend code and wondered if it would
  not be useful to add a dedicated macro in rel.h to get the persistence
  of a relation like in the patch attached. (Note: it is actually used
  39 times).

 I personally find the direct access actually more readable, so I'm not a
 fan of further extending the scheme. Consistency with some other common
 accessors is an argument though.


What you meant is relation-rd_rel-relpersistence is more readable than
RelationGetPersistence(relation) ??

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
 Timbira: http://www.timbira.com.br
 Blog: http://fabriziomello.github.io
 Linkedin: http://br.linkedin.com/in/fabriziomello
 Twitter: http://twitter.com/fabriziomello
 Github: http://github.com/fabriziomello


Re: [HACKERS] Defining dedicated macro to grab a relation's persistence

2014-11-11 Thread Alvaro Herrera
Fabrízio de Royes Mello wrote:
 On Fri, Nov 7, 2014 at 11:12 AM, Andres Freund and...@2ndquadrant.com
 wrote:

  I personally find the direct access actually more readable, so I'm not a
  fan of further extending the scheme. Consistency with some other common
  accessors is an argument though.
 
 What you meant is relation-rd_rel-relpersistence is more readable than
 RelationGetPersistence(relation) ??

I too have a hard time getting excited about this change.  I'd just
leave it alone.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4RC1 next week

2014-11-11 Thread Andres Freund
Hi,

On 2014-11-11 10:18:55 -0500, Tom Lane wrote:
 We need to get moving if we want to have RC1 out before the holiday season
 starts.  Accordingly, the core committee has agreed that we should wrap it
 next week (usual timing: wrap Monday 17th for announcement Thursday 20th).

Ah cool. So there won't be a corresponding set of backbranch releases
which will instead be done together with 9.4.0?

 According to
 https://wiki.postgresql.org/wiki/PostgreSQL_9.4_Open_Items
 the only open release-blocking issue is what are we going to do about
 the incorrect CRC calculation in logical-replication files.  My own
 vote is to Just Fix It, but in any case we need to make a decision
 this week.

I've a patch and will push it tomorrow.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: [HACKERS] HEAD seems to generate larger WAL regarding GIN index

2014-11-11 Thread Tom Lane
Fujii Masao masao.fu...@gmail.com writes:
 On Tue, Nov 11, 2014 at 10:14 PM, Robert Haas robertmh...@gmail.com wrote:
 Not to kibitz too much after-the-fact, but wouldn't it be better to
 give this a name that has GIN in it somewhere?

 Maybe. gin_pending_list_cleanup_size? gin_pending_list_limit? Better name?

gin_pending_list_limit sounds good to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4RC1 next week

2014-11-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On 2014-11-11 10:18:55 -0500, Tom Lane wrote:
 We need to get moving if we want to have RC1 out before the holiday season
 starts.  Accordingly, the core committee has agreed that we should wrap it
 next week (usual timing: wrap Monday 17th for announcement Thursday 20th).

 Ah cool. So there won't be a corresponding set of backbranch releases
 which will instead be done together with 9.4.0?

No.  I don't think we have our ducks in a row for back-branch updates just
yet.  In any case, it would be good to get some real-world testing of
b2cbced9e before we unleash that on stable-branch users ;-)

BTW, we try to avoid doing back-branch updates at the exact same time as
a major .0 release anyway.  In the first place, that confuses the PR
messaging: we'd rather it be all about the new release and not about bugs
fixed in old branches.  In the second place, as an ex-packager I know that
there are usually some gotchas in packaging a new major release, so it's
better that packagers not have back-branch updates on their plates at the
same time.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4RC1 next week

2014-11-11 Thread Andres Freund
On 2014-11-11 10:52:30 -0500, Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On 2014-11-11 10:18:55 -0500, Tom Lane wrote:
  We need to get moving if we want to have RC1 out before the holiday season
  starts.  Accordingly, the core committee has agreed that we should wrap it
  next week (usual timing: wrap Monday 17th for announcement Thursday 20th).
 
  Ah cool. So there won't be a corresponding set of backbranch releases
  which will instead be done together with 9.4.0?
 
 No.  I don't think we have our ducks in a row for back-branch updates just
 yet.

Right. I'm mainly asking because there's some unlogged table crash
recovery issues I want to get fixed before the next set of backbranch
releases.

It's also been a while since the last back branch release. Nothing
egregiously bad, but a fair number of moderately annoying things.

Apropos back branches: I think 52eed3d426 et al wasn't reverted and we
didn't really agree on a solution?

 In any case, it would be good to get some real-world testing of
 b2cbced9e before we unleash that on stable-branch users ;-)

Heh.

 BTW, we try to avoid doing back-branch updates at the exact same time as
 a major .0 release anyway.  In the first place, that confuses the PR
 messaging: we'd rather it be all about the new release and not about bugs
 fixed in old branches.  In the second place, as an ex-packager I know that
 there are usually some gotchas in packaging a new major release, so it's
 better that packagers not have back-branch updates on their plates at the
 same time.

Makes sense.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 9.4RC1 next week

2014-11-11 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 Apropos back branches: I think 52eed3d426 et al wasn't reverted and we
 didn't really agree on a solution?

I think we agreed what we wanted to do instead, but actually doing it
is on my queue and hasn't reached the front yet.  In any case, 52eed3d426
is better than no fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Simon Riggs
On 4 November 2014 08:23, Andres Freund and...@2ndquadrant.com wrote:

 6) Shouldn't any value update of track_commit_timestamp be tracked in
 XLogReportParameters? That's thinking about making the commit timestamp
 available on standbys as well..

 Yes, it should.

Agree committs should be able to run on standby, but it seems possible
to do that without it running on the master. The two should be
unconnected.

Not sure why we'd want to have parameter changes on master reported?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Andres Freund
On 2014-11-11 16:10:47 +, Simon Riggs wrote:
 On 4 November 2014 08:23, Andres Freund and...@2ndquadrant.com wrote:
 
  6) Shouldn't any value update of track_commit_timestamp be tracked in
  XLogReportParameters? That's thinking about making the commit timestamp
  available on standbys as well..
 
  Yes, it should.
 
 Agree committs should be able to run on standby, but it seems possible
 to do that without it running on the master.

I don't think that's realistic. It requires WAL to be written in some
cases, so that's not going to work. I also don't think it's a
particularly interesting ability?

 The two should be unconnected.

Why?

 Not sure why we'd want to have parameter changes on master reported?

So it works correctly. We're currently truncating the slru on startup
when the guc is disabled which would cause problems WAL records coming
in from the primary. I think the code also needs some TLC to correctly
work after a failover.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Simon Riggs
On 9 November 2014 16:57, Steve Singer st...@ssinger.info wrote:
 On 11/07/2014 07:07 PM, Petr Jelinek wrote:


 The list of what is useful might be long, but we can't have everything
 there as there are space constraints, and LSN is another 8 bytes and I still
 want to have some bytes for storing the origin or whatever you want to
 call it there, as that's the one I personally have biggest use-case for.
 So this would be ~24bytes per txid already, hmm I wonder if we can pull
 some tricks to lower that a bit.


 The reason why Jim and myself are asking for the LSN and not just the
 timestamp is that I want to be able to order the transactions. Jim pointed
 out earlier in the thread that just ordering on timestamp allows for
 multiple transactions with the same timestamp.

 Maybe we don't need the entire LSN to solve that.  If you already have the
 commit timestamp maybe you only need another byte or two of granularity to
 order transactions that are within the same microsecond.

It looks like there are quite a few potential uses for this.

If we include everything it will be too fat to use for any of the
potential uses, since each will be pulled down by the others.

Sounds like it needs to be configurable in some way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing FIN_CRC32 calls in logical replication code

2014-11-11 Thread Andres Freund
On 2014-11-03 21:58:51 +0200, Heikki Linnakangas wrote:
 PS. I find the name ReplicationSlotOnDiskDynamicSize confusing, as it is
 in fact a fixed size struct. I gather it's expected that the size of that
 part might change across versions, but by that definition nothing is
 constant.

Well, the idea is that the 'constant' part is version independent. The
part following afterwards (dynamic) can differ based on the 'version'
struct member. The reason is that that allows files from older releases
to be read after a pg_upgrade.

If you have suggestions for better names.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Simon Riggs
On 11 November 2014 16:19, Andres Freund and...@2ndquadrant.com wrote:
 On 2014-11-11 16:10:47 +, Simon Riggs wrote:
 On 4 November 2014 08:23, Andres Freund and...@2ndquadrant.com wrote:

  6) Shouldn't any value update of track_commit_timestamp be tracked in
  XLogReportParameters? That's thinking about making the commit timestamp
  available on standbys as well..
 
  Yes, it should.

 Agree committs should be able to run on standby, but it seems possible
 to do that without it running on the master.

 I don't think that's realistic. It requires WAL to be written in some
 cases, so that's not going to work. I also don't think it's a
 particularly interesting ability?

OK, so we are saying commit timestamp will NOT be available on Standbys.

I'm fine with that, since data changes aren't generated there.


 So it works correctly. We're currently truncating the slru on startup
 when the guc is disabled which would cause problems WAL records coming
 in from the primary. I think the code also needs some TLC to correctly
 work after a failover.

OK

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Andres Freund
On 2014-11-11 17:09:54 +, Simon Riggs wrote:
 On 11 November 2014 16:19, Andres Freund and...@2ndquadrant.com wrote:
  On 2014-11-11 16:10:47 +, Simon Riggs wrote:
  On 4 November 2014 08:23, Andres Freund and...@2ndquadrant.com wrote:
 
   6) Shouldn't any value update of track_commit_timestamp be tracked in
   XLogReportParameters? That's thinking about making the commit timestamp
   available on standbys as well..
  
   Yes, it should.
 
  Agree committs should be able to run on standby, but it seems possible
  to do that without it running on the master.
 
  I don't think that's realistic. It requires WAL to be written in some
  cases, so that's not going to work. I also don't think it's a
  particularly interesting ability?
 
 OK, so we are saying commit timestamp will NOT be available on Standbys.

Hm? They should be available - xact.c WAL replay will redo the setting
of the timestamps and explicitly overwritten timestamps will generate
their own WAL records. What I mean is just that you can't use commit
timestamps without also using it on the primary.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column/type dependency recording inconsistencies

2014-11-11 Thread Tom Lane
Petr Jelinek p...@2ndquadrant.com writes:
 On 09/11/14 23:36, Petr Jelinek wrote:
 On 09/11/14 23:04, Tom Lane wrote:
 I suspect this is actually a bug in the dependency search logic in DROP.
 Don't have the energy to chase it right now though.

 But this logic depends on the fact that the columns that belong to that
 table are processed after the table was already processed, however as
 the new type was added after the table was added, it's processed before
 the table and because of the dependency between type and columns, the
 columns are also processed before the table and so they are added to the
 object list for further dependency checking.

Yeah.  We had code to handle the table-visited-before-column case, but
for some reason not the column-visited-before-table case.  Rather odd
that this hasn't been reported before.

 So here is what I came up with to fix this. It's quite simple and works 
 well enough, we don't really have any regression test that would hit 
 this though.

This is incomplete.  stack_address_present_add_flags needs a similar fix,
and both of them need to be taught to deal with the possibility that more
than one such column is present (I don't think your continue quite works
for that).  Also, I find physically removing an entry fairly ugly and
possibly unsafe, since it's not clear what outer recursion levels might be
assuming about the contents of the array -- and that certainly won't work
for the stack case.  What seems better is to invent another flag bit
to indicate that we no longer need to physically delete this subobject
because we've discovered the whole object will be deleted.  Then we
can just skip it during the execution phase.

Will work on this.  Thanks for the report and test case!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Jim Nasby

On 11/10/14, 7:40 AM, Alvaro Herrera wrote:

Robert Haas wrote:

On Sun, Nov 9, 2014 at 8:41 PM, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

Robert Haas wrote:

I think the key question here is the time for which the data needs to
be retained.  2^32 of anything is a lot, but why keep around that
number of records rather than more (after all, we have epochs to
distinguish one use of a given txid from another) or fewer?


The problem is not how much data we retain; is about how much data we
can address.


I thought I was responding to a concern about disk space utilization.


Ah, right.  So AFAIK we don't need to keep anything older than
RecentXmin or something like that -- which is not too old.  If I recall
correctly Josh Berkus was saying in a thread about pg_multixact that it
used about 128kB or so in = 9.2 for his customers; that one was also
limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
would be pretty acceptable.  Moreso considering that it's turned off by
default.


FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is able 
to advance datminmxid, which will (now) only happen when an entire relation has 
been scanned (which should be infrequent).

I believe the low normal space usage is just an indication that most databases 
don't use many MultiXacts.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Alvaro Herrera
Jim Nasby wrote:
 On 11/10/14, 7:40 AM, Alvaro Herrera wrote:

 Ah, right.  So AFAIK we don't need to keep anything older than
 RecentXmin or something like that -- which is not too old.  If I recall
 correctly Josh Berkus was saying in a thread about pg_multixact that it
 used about 128kB or so in = 9.2 for his customers; that one was also
 limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
 would be pretty acceptable.  Moreso considering that it's turned off by
 default.
 
 FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is 
 able to advance datminmxid, which will (now) only happen when an entire 
 relation has been scanned (which should be infrequent).
 
 I believe the low normal space usage is just an indication that most 
 databases don't use many MultiXacts.

That's in 9.3.  Prior to that, they were truncated much more often.
Maybe you've not heard enough about this commit:

commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
Author: Alvaro Herrera alvhe...@alvh.no-ip.org
Date:   Wed Jan 23 12:04:59 2013 -0300

Improve concurrency of foreign key locking

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor problem with comment added by 5ea86e

2014-11-11 Thread Peter Geoghegan
Attached patch removes incorrect comment added by recent Sortsupport
for B-Tree/CLUSTER commit (commit 5ea86e).

-- 
Peter Geoghegan
diff --git a/src/backend/utils/sort/sortsupport.c b/src/backend/utils/sort/sortsupport.c
index 16342f3..4813f57 100644
--- a/src/backend/utils/sort/sortsupport.c
+++ b/src/backend/utils/sort/sortsupport.c
@@ -167,7 +167,6 @@ PrepareSortSupportFromIndexRel(Relation indexRel, int16 strategy,
 
 	Assert(ssup-comparator == NULL);
 
-	/* Find the operator in pg_amop */
 	if (indexRel-rd_rel-relam != BTREE_AM_OID)
 		elog(ERROR, unexpected non-btree AM: %u, indexRel-rd_rel-relam);
 	if (strategy != BTGreaterStrategyNumber 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column/type dependency recording inconsistencies

2014-11-11 Thread Tom Lane
I wrote:
 ... Also, I find physically removing an entry fairly ugly and
 possibly unsafe, since it's not clear what outer recursion levels might be
 assuming about the contents of the array -- and that certainly won't work
 for the stack case.  What seems better is to invent another flag bit
 to indicate that we no longer need to physically delete this subobject
 because we've discovered the whole object will be deleted.  Then we
 can just skip it during the execution phase.

On further reflection, neither of those approaches is really a good idea.
With either implementation, skipping the DROP COLUMN step would mean that
we're letting the DROP TYPE happen before we drop a table containing a
live column of that type.  While that failed to fail in this particular
example, it sounds like a really bad idea --- any number of things, such
as event triggers, could well blow up when abused like that.  The lack of
prior reports means that this is a very rare case, so rather than trying
to optimize it, I think we should just let the drops happen in the
scheduled order.  All that we need is to make sure that the column gets
marked with flags indicating that dropping is allowed.  Accordingly, the
attached patch should do it.

regards, tom lane

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 256486c..f338acf 100644
*** a/src/backend/catalog/dependency.c
--- b/src/backend/catalog/dependency.c
*** object_address_present_add_flags(const O
*** 2116,2121 
--- 2116,2122 
   int flags,
   ObjectAddresses *addrs)
  {
+ 	bool		result = false;
  	int			i;
  
  	for (i = addrs-numrefs - 1; i = 0; i--)
*** object_address_present_add_flags(const O
*** 2130,2151 
  ObjectAddressExtra *thisextra = addrs-extras + i;
  
  thisextra-flags |= flags;
! return true;
  			}
! 			if (thisobj-objectSubId == 0)
  			{
  /*
   * We get here if we find a need to delete a column after
   * having already decided to drop its whole table.  Obviously
!  * we no longer need to drop the column.  But don't plaster
!  * its flags on the table.
   */
! return true;
  			}
  		}
  	}
  
! 	return false;
  }
  
  /*
--- 2131,2178 
  ObjectAddressExtra *thisextra = addrs-extras + i;
  
  thisextra-flags |= flags;
! result = true;
  			}
! 			else if (thisobj-objectSubId == 0)
  			{
  /*
   * We get here if we find a need to delete a column after
   * having already decided to drop its whole table.  Obviously
!  * we no longer need to drop the subobject, so report that we
!  * found the subobject in the array.  But don't plaster its
!  * flags on the whole object.
   */
! result = true;
! 			}
! 			else if (object-objectSubId == 0)
! 			{
! /*
!  * We get here if we find a need to delete a whole table after
!  * having already decided to drop one of its columns.  We
!  * can't report that the whole object is in the array, but we
!  * should mark the subobject with the whole object's flags.
!  *
!  * It might seem attractive to physically delete the column's
!  * array entry, or at least mark it as no longer needing
!  * separate deletion.  But that could lead to, e.g., dropping
!  * the column's datatype before we drop the table, which does
!  * not seem like a good idea.  This is a very rare situation
!  * in practice, so we just take the hit of doing a separate
!  * DROP COLUMN action even though we know we're gonna delete
!  * the table later.
!  *
!  * Because there could be other subobjects of this object in
!  * the array, this case means we always have to loop through
!  * the whole array; we cannot exit early on a match.
!  */
! ObjectAddressExtra *thisextra = addrs-extras + i;
! 
! thisextra-flags |= flags;
  			}
  		}
  	}
  
! 	return result;
  }
  
  /*
*** stack_address_present_add_flags(const Ob
*** 2156,2161 
--- 2183,2189 
  int flags,
  ObjectAddressStack *stack)
  {
+ 	bool		result = false;
  	ObjectAddressStack *stackptr;
  
  	for (stackptr = stack; stackptr; stackptr = stackptr-next)
*** stack_address_present_add_flags(const Ob
*** 2168,2188 
  			if (object-objectSubId == thisobj-objectSubId)
  			{
  stackptr-flags |= flags;
! return true;
  			}
- 
- 			/*
- 			 * Could visit column with whole table already on stack; this is
- 			 * the same case noted in object_address_present_add_flags(), and
- 			 * as in that case, we don't propagate flags for the component to
- 			 * the whole object.
- 			 */
- 			if (thisobj-objectSubId == 0)
- return true;
  		}
  	}
  
! 	return false;
  }
  
  /*
--- 2196,2226 
  			if (object-objectSubId == thisobj-objectSubId)
  			{
  stackptr-flags |= flags;
! result = true;
! 			}
! 			else if (thisobj-objectSubId == 0)
! 			{
! /*
!  

Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Jim Nasby

On 11/11/14, 2:03 PM, Alvaro Herrera wrote:

Jim Nasby wrote:

On 11/10/14, 7:40 AM, Alvaro Herrera wrote:



Ah, right.  So AFAIK we don't need to keep anything older than
RecentXmin or something like that -- which is not too old.  If I recall
correctly Josh Berkus was saying in a thread about pg_multixact that it
used about 128kB or so in = 9.2 for his customers; that one was also
limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
would be pretty acceptable.  Moreso considering that it's turned off by
default.


FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is able 
to advance datminmxid, which will (now) only happen when an entire relation has 
been scanned (which should be infrequent).

I believe the low normal space usage is just an indication that most databases 
don't use many MultiXacts.


That's in 9.3.  Prior to that, they were truncated much more often.


Well, we're talking about a new feature, so I wasn't looking in back branches. 
;P


Maybe you've not heard enough about this commit:

commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182


Interestingly, git.postgresql.org hasn't either: 
http://git.postgresql.org/gitweb/?p=postgresql.gita=searchh=HEADst=commits=0ac5ad5134f2769ccbaefec73844f8504c4d6182

The commit is certainly there though...
decibel@decina:[15:12]~/pgsql/HEAD/src/backend (master=)$git log 
0ac5ad5134f2769ccbaefec73844f8504c4d6182|head -n1
commit 0ac5ad5134f2769ccbaefec73844f8504c4d6182
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Failback to old master

2014-11-11 Thread Maeldron T.

Hi,

2014-10-29 17:46 GMT+01:00 Robert Haas robertmh...@gmail.com 
mailto:robertmh...@gmail.com:



   Yes, but after the restart, the slave will also rewind to the most
   recent restart-point to begin replay, and some of the sanity checks
   that recovery.conf enforces will be lost during that replay.  A safe
   way to do this might be to shut down the master, make a note of the
   ending WAL position on the master, and then promote the slave (without
   shutting it down) once it's reached that point in replay.


As far as I remember (I can’t test it right now but I am 99% sure) 
promoting the slave makes it impossible to connect the old master to the 
new one without making a base_backup. The reason is the timeline change. 
It complains.


The only way to do this is:
1. Stop the master
2. Restart the slave without recovery conf
3. Restart the old master master with a recovery conf.

I have done this a couple of times back and forward and it worked. I 
mean it didn't complain.



 I also thought that if there was a crash on the original master
   and it
 applied WAL entries on itself that are not presented on the slave
   then it
 will throw an error when I try to connect it to the new master
   (to the old
 slave).

   I don't think you're going to be that lucky.

 It would be nice to know as creating a base_backup takes much time.

   rsync can speed things up by copying only changed data, but yes,
   it's a problem.


Actually I am more afraid of rsyncing database data files between the 
nodes than trusting the postgresql error log. There is no technical 
reason for that, it's more like psychological.


Is it possible that the new master has unreplicated changes and won't 
notice that when connecting to the old slave? I thought that wal records 
might have unique identifiers but I don't know the details.






Re: [HACKERS] tracking commit timestamps

2014-11-11 Thread Alvaro Herrera
Jim Nasby wrote:
 On 11/11/14, 2:03 PM, Alvaro Herrera wrote:
 Jim Nasby wrote:
 On 11/10/14, 7:40 AM, Alvaro Herrera wrote:
 
 Ah, right.  So AFAIK we don't need to keep anything older than
 RecentXmin or something like that -- which is not too old.  If I recall
 correctly Josh Berkus was saying in a thread about pg_multixact that it
 used about 128kB or so in = 9.2 for his customers; that one was also
 limited to RecentXmin AFAIR.  I think a similar volume of commit_ts data
 would be pretty acceptable.  Moreso considering that it's turned off by
 default.
 
 FWIW, AFAICS MultiXacts are only truncated after a (auto)vacuum process is 
 able to advance datminmxid, which will (now) only happen when an entire 
 relation has been scanned (which should be infrequent).
 
 I believe the low normal space usage is just an indication that most 
 databases don't use many MultiXacts.
 
 That's in 9.3.  Prior to that, they were truncated much more often.
 
 Well, we're talking about a new feature, so I wasn't looking in back 
 branches. ;P

Well, I did mention = 9.2 in the text above ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto: PGP signatures

2014-11-11 Thread Jeff Janes
On Sat, Nov 1, 2014 at 7:52 AM, Marko Tiikkaja ma...@joh.to wrote:

 Hi,

 I discovered a problem with the lack of MDC handling in the signature info
 extraction code, so I've fixed that and added a test message.  v9 here.




Hi Marko,

I get a segfault when the length of the message is exactly 16308 bytes, see
attached perl script.

I can't get a backtrace, for some reason it acts as if there were no debug
symbols despite that I built with them.  I've not seen that before.

I get this whether or not the bug 11905 patch is applied, so the problem
seems to be related but different.

Cheers,

Jeff


pgcrypto3.pl
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Deciding which index to use

2014-11-11 Thread Christophe Pettus
Where in the optimizer code does PostgreSQL decide which of several 
possibly-matching partial indexes to use?
--
-- Christophe Pettus
   x...@thebuild.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deciding which index to use

2014-11-11 Thread Tom Lane
Christophe Pettus x...@thebuild.com writes:
 Where in the optimizer code does PostgreSQL decide which of several 
 possibly-matching partial indexes to use?

It costs them all out and uses the cheapest.

If the cost estimates come out exactly the same, you get an arbitrary
choice (I think probably always the index with smallest OID, but that's
an implementation artifact rather than something that's done explicitly
anywhere).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Column/type dependency recording inconsistencies

2014-11-11 Thread Petr Jelinek

On 11/11/14 21:47, Tom Lane wrote:

I wrote:

... Also, I find physically removing an entry fairly ugly and
possibly unsafe, since it's not clear what outer recursion levels might be
assuming about the contents of the array -- and that certainly won't work
for the stack case.  What seems better is to invent another flag bit
to indicate that we no longer need to physically delete this subobject
because we've discovered the whole object will be deleted.  Then we
can just skip it during the execution phase.


On further reflection, neither of those approaches is really a good idea.
With either implementation, skipping the DROP COLUMN step would mean that
we're letting the DROP TYPE happen before we drop a table containing a
live column of that type.  While that failed to fail in this particular
example, it sounds like a really bad idea --- any number of things, such
as event triggers, could well blow up when abused like that.  The lack of
prior reports means that this is a very rare case, so rather than trying
to optimize it, I think we should just let the drops happen in the
scheduled order.  All that we need is to make sure that the column gets
marked with flags indicating that dropping is allowed.  Accordingly, the
attached patch should do it.



Yup, this patch seems to be much better and safer fix.

I can confirm that it works fine with both the test-case and my original 
upgrade script. Thanks!


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor problem with comment added by 5ea86e

2014-11-11 Thread Robert Haas
On Tue, Nov 11, 2014 at 3:11 PM, Peter Geoghegan p...@heroku.com wrote:
 Attached patch removes incorrect comment added by recent Sortsupport
 for B-Tree/CLUSTER commit (commit 5ea86e).

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] What exactly is our CRC algorithm?

2014-11-11 Thread Robert Haas
On Tue, Nov 11, 2014 at 6:26 AM, Abhijit Menon-Sen a...@2ndquadrant.com wrote:
 At 2014-11-04 14:40:36 +0100, and...@2ndquadrant.com wrote:
 On 2014-11-04 08:21:13 -0500, Robert Haas wrote:
  Are you going to get the slice-by-N stuff working next, to speed
  this up?

 I don't plan to do anything serious with it, but I've hacked up the
 crc code to use the hardware instruction.

 I'm working on this (first speeding up the default calculation using
 slice-by-N, then adding support for the SSE4.2 CRC instruction on top).

Great!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-11 Thread Kouhei Kaigai
  Given
  where we are with the infrastructure, there would be a number of
  unhandled problems, such as deadlock detection (needs group locking or
  similar), assessment of quals as to parallel-safety (needs
  proisparallel or similar), general waterproofing to make sure that
  pushing down a qual we shouldn't does do anything really dastardly
  like crash the server (another written but yet-to-be-published patch
  adds a bunch of relevant guards), and snapshot sharing (likewise).
  But if you don't do anything weird, it should basically work.
 
 If we build something fairly restricted, but production ready we can still
 do something useful to users.
 
 * only functions marked as CONTAINS NO SQL
 * parallel_workers = 2 or fixed
 * only one Plan type, designed to maximise benefit and minimize optimizer
 change
 
 Why?
 
 * The above is all that is needed to test any infrastructure changes we
 accept. We need both infrastructure and tests. We could do this as a custom
 scan plugin, but why not make it work in core - that doesn't prevent a plugin
 version also if you desire it.
 
 * only functions marked as CONTAINS NO SQL
 We don't really know what proisparallel is, but we do know what CONTAINS
 NO SQL means and can easily check for it.
 Plus I already have a patch for this, slightly bitrotted.
 
Isn't provolatile = PROVOLATILE_IMMUTABLE sufficient?

 * parallel_workers = 2 (or at least not make it user settable) By fixing
 the number of workers at 2 we avoid any problems caused by having N variable,
 such as how to vary N fairly amongst users and other such considerations.
 We get the main benefit of parallelism, without causing other issues across
 the server.
 
 * Fixed Plan: aggregate-scan
 To make everything simpler, allow only plans of a single type.
  SELECT something, list of aggregates
  FROM foo
  WHERE filters
  GROUP BY something
 because we know that passing large amounts of data from worker to master
 process will be slow, so focusing only on seq scan is not sensible; we should
 focus on plans that significantly reduce the number of rows passed upwards.
 We could just do this for very selective WHERE clauses, but that is not
 an important class of query.
 As soon as include aggregates, we reduce data passing significantly AND
 we hit a very important subset of queries:
 
 This plan type is widely used in reporting queries, so will hit the mainline
 of BI applications and many Mat View creations.
 This will allow SELECT count(*) FROM foo to go faster also.
 
I agree with you. The key of performance is how many rows can be reduced
by parallel processor, however, one thing I doubt is that the condition
to kick parallel execution may be too restrictive.
I'm not sure how much workloads run aggregate functions towards flat tables
rather than multiple joined relations.

Sorry, it might be a topic to discuss in a separated thread.
Can't we have a functionality to push-down aggregate functions across table
joins? If we could do this in some cases, but not any cases, it makes sense
to run typical BI workload in parallel.

Let me assume the following query:

  SELECT SUM(t1.X), SUM(t2.Y) FROM t1 JOIN t2 ON t1.pk_id = t2.fk_id GROUP BY 
t1.A, t2.B;

It is a usual query that joins two relations with PK and FK.

It can be broken down, like:

  SELECT SUM(t1.X), SUM(sq2.Y)
  FROM t1 JOIN (SELECT t2.fk_id, t2.B, SUM(t2.Y) Y FROM t2 GROUP BY t2.fk_id, 
t2.B) sq2
  GROUP BY t1.A, sq2.B;

If FK has 100 tuples per one PK in average, aggregate function that runs prior
to join will reduce massive number of tuples to be joined, and we can leverage
parallel processors to make preliminary aggregate on flat table.

I'm now checking the previous academic people's job to identify the condition
what kind of join allows to push down aggregate functions.
  http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.54.7751

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei kai...@ak.jp.nec.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-11 Thread Robert Haas
On Tue, Nov 11, 2014 at 3:29 AM, Simon Riggs si...@2ndquadrant.com wrote:
 * only functions marked as CONTAINS NO SQL
 We don't really know what proisparallel is, but we do know what
 CONTAINS NO SQL means and can easily check for it.
 Plus I already have a patch for this, slightly bitrotted.

Interestingly, I have a fairly solid idea of what proisparallel is,
but I have no clear idea what CONTAINS NO SQL is or why it's relevant.
I would imagine that srandom() contains no SQL under any reasonable
definition of what that means, but it ain't parallel-safe.

 * parallel_workers = 2 (or at least not make it user settable)
 By fixing the number of workers at 2 we avoid any problems caused by
 having N variable, such as how to vary N fairly amongst users and
 other such considerations. We get the main benefit of parallelism,
 without causing other issues across the server.

I think this is a fairly pointless restriction.  The code
simplification we'll get out of it appears to me to be quite minor,
and we'll just end up putting the stuff back in anyway.

 * Fixed Plan: aggregate-scan
 To make everything simpler, allow only plans of a single type.
  SELECT something, list of aggregates
  FROM foo
  WHERE filters
  GROUP BY something
 because we know that passing large amounts of data from worker to
 master process will be slow, so focusing only on seq scan is not
 sensible; we should focus on plans that significantly reduce the
 number of rows passed upwards. We could just do this for very
 selective WHERE clauses, but that is not an important class of query.
 As soon as include aggregates, we reduce data passing significantly
 AND we hit a very important subset of queries:

This is moving the goalposts in a way that I'm not at all comfortable
with.  Parallel sequential-scan is pretty simple and may well be a win
if there's a restrictive filter condition involved.  Parallel
aggregation requires introducing new infrastructure into the aggregate
machinery to allow intermediate state values to be combined, and that
would be a great project for someone to do at some time, but it seems
like a distraction for me to do that right now.

 This plan type is widely used in reporting queries, so will hit the
 mainline of BI applications and many Mat View creations.
 This will allow SELECT count(*) FROM foo to go faster also.

 The execution plan for that query type looks like this...
 Hash Aggregate
Gather From Workers
   {Worker Nodes workers = 2
 HashAggregate
 PartialScan}

I'm going to aim for the simpler:

Hash Aggregate
- Parallel Seq Scan
Workers: 4

Yeah, I know that won't perform as well as what you're proposing, but
I'm fairly sure it's simpler.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [REVIEW] Re: Compression of full-page-writes

2014-11-11 Thread Michael Paquier
On Tue, Nov 11, 2014 at 6:27 PM, Andres Freund and...@2ndquadrant.com wrote:
 The more important thing here is that I see little chance of this
 getting in before Heikki's larger rework of the wal format gets
 in. Since that'll change everything around anyay I'm unsure how much
 point there is to iterate till that's done. I know that sucks, but I
 don't see much of an alternative.
True enough. Hopefully the next patch changing WAL format will put in
all the infrastructure around backup blocks, so we won't have any need
to worry about major conflicts for this release cycle after it.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Final Patch for GROUPING SETS

2014-11-11 Thread David Fetter
On Sat, Sep 27, 2014 at 06:37:38AM +0100, Andrew Gierth wrote:
  Andrew == Andrew Gierth and...@tao11.riddles.org.uk writes:
 
  Andrew I was holding off on posting a recut patch with the latest
  Andrew EXPLAIN formatting changes (which are basically cosmetic)
  Andrew until it became clear whether RLS was likely to be reverted
  Andrew or kept (we have a tiny but irritating conflict with it, in
  Andrew the regression test schedule file where we both add to the
  Andrew same list of tests).
 
 And here is that recut patch set.
 
 Changes since last posting (other than conflict removal):
 
   - gsp1.patch: clearer EXPLAIN output as per discussion
 
 Recut patches:
 
 gsp1.patch - phase 1 code patch (full syntax, limited functionality)
 gsp2.patch - phase 2 code patch (adds full functionality using the
  new chained aggregate mechanism)
 gsp-doc.patch  - docs
 gsp-contrib.patch  - quote cube in contrib/cube and contrib/earthdistance,
  intended primarily for testing pending a decision on
  renaming contrib/cube or unreserving keywords
 gsp-u.patch- proposed method to unreserve CUBE and ROLLUP
 
 (the contrib patch is not necessary if the -u patch is used; the
 contrib/pg_stat_statements fixes are in the phase1 patch)
 
 -- 
 Andrew (irc:RhodiumToad)
 

Tom, any word on this?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgres_fdw behaves oddly

2014-11-11 Thread Etsuro Fujita

(2014/11/11 18:45), Etsuro Fujita wrote:

(2014/11/10 20:05), Ashutosh Bapat wrote:

Since two separate issues 1. using reltargetlist instead of attr_needed
and 2. system columns usage in FDW are being tackled here, we should
separate the patch into two one for each of the issues.


Agreed.  Will split the patch into two.


Here are splitted patches:

fscan-reltargetlist.patch  - patch for #1
postgres_fdw-syscol.patch  - patch for #2

Thanks,

Best regards,
Etsuro Fujita
*** a/src/backend/optimizer/plan/createplan.c
--- b/src/backend/optimizer/plan/createplan.c
***
*** 20,25 
--- 20,26 
  #include math.h
  
  #include access/skey.h
+ #include access/sysattr.h
  #include catalog/pg_class.h
  #include foreign/fdwapi.h
  #include miscadmin.h
***
*** 1945,1950  create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
--- 1946,1953 
  	RelOptInfo *rel = best_path-path.parent;
  	Index		scan_relid = rel-relid;
  	RangeTblEntry *rte;
+ 	Bitmapset  *attrs_used = NULL;
+ 	ListCell   *lc;
  	int			i;
  
  	/* it should be a base rel... */
***
*** 1993,2008  create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path,
  	 * bit of a kluge and might go away someday, so we intentionally leave it
  	 * out of the API presented to FDWs.
  	 */
  	scan_plan-fsSystemCol = false;
  	for (i = rel-min_attr; i  0; i++)
  	{
! 		if (!bms_is_empty(rel-attr_needed[i - rel-min_attr]))
  		{
  			scan_plan-fsSystemCol = true;
  			break;
  		}
  	}
  
  	return scan_plan;
  }
  
--- 1996,2030 
  	 * bit of a kluge and might go away someday, so we intentionally leave it
  	 * out of the API presented to FDWs.
  	 */
+ 
+ 	/*
+ 	 * Add all the attributes needed for joins or final output.  Note: we must
+ 	 * look at reltargetlist, not the attr_needed data, because attr_needed
+ 	 * isn't computed for inheritance child rels.
+ 	 */
+ 	pull_varattnos((Node *) rel-reltargetlist, rel-relid, attrs_used);
+ 
+ 	/* Add all the attributes used by restriction clauses. */
+ 	foreach(lc, rel-baserestrictinfo)
+ 	{
+ 		RestrictInfo *rinfo = (RestrictInfo *) lfirst(lc);
+ 
+ 		pull_varattnos((Node *) rinfo-clause, rel-relid, attrs_used);
+ 	}
+ 
+ 	/* Are any system columns requested from rel? */
  	scan_plan-fsSystemCol = false;
  	for (i = rel-min_attr; i  0; i++)
  	{
! 		if (bms_is_member(i - FirstLowInvalidHeapAttributeNumber, attrs_used))
  		{
  			scan_plan-fsSystemCol = true;
  			break;
  		}
  	}
  
+ 	bms_free(attrs_used);
+ 
  	return scan_plan;
  }
  
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 252,257  foreign_expr_walker(Node *node,
--- 252,265 
  if (var-varno == glob_cxt-foreignrel-relid 
  	var-varlevelsup == 0)
  {
+ 	/*
+ 	 * System columns can't be sent to remote.
+ 	 *
+ 	 * XXX: we should probably send ctid to remote.
+ 	 */
+ 	if (var-varattno  0)
+ 		return false;
+ 
  	/* Var belongs to foreign table */
  	collation = var-varcollid;
  	state = OidIsValid(collation) ? FDW_COLLATE_SAFE : FDW_COLLATE_NONE;
***
*** 1261,1266  deparseVar(Var *node, deparse_expr_cxt *context)
--- 1269,1279 
  	if (node-varno == context-foreignrel-relid 
  		node-varlevelsup == 0)
  	{
+ 		/*
+ 		 * System columns shouldn't be examined here.
+ 		 */
+ 		Assert(node-varattno = 0);
+ 
  		/* Var belongs to foreign table */
  		deparseColumnRef(buf, node-varno, node-varattno, context-root);
  	}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] New storage parameter pages_per_range not mentioned in CREATE INDEX doc

2014-11-11 Thread Michael Paquier
Hi all,

The new storage parameter pages_per_range is missing in the
documentation of CREATE INDEX. The patch attached corrects that.
Regards,
-- 
Michael


20141112_brin_doc_fix.patch
Description: binary/octet-stream

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] using custom scan nodes to prototype parallel sequential scan

2014-11-11 Thread David Rowley
On Tue, Nov 11, 2014 at 9:29 PM, Simon Riggs si...@2ndquadrant.com wrote:


 This plan type is widely used in reporting queries, so will hit the
 mainline of BI applications and many Mat View creations.
 This will allow SELECT count(*) FROM foo to go faster also.


We'd also need to add some infrastructure to merge aggregate states
together for this to work properly. This means that could also work for
avg() and stddev etc. For max() and min() the merge functions would likely
just be the same as the transition functions.

Regards

David Rowley


Re: [HACKERS] group locking: incomplete patch, just for discussion

2014-11-11 Thread Jeff Davis
On Wed, 2014-10-15 at 00:03 -0400, Robert Haas wrote:
 For parallelism, I think we need a concept of group locking.  That is,
 suppose we have a user backend and N worker backends collaborating to
 execute some query.  For the sake of argument, let's say it's a
 parallel CLUSTER, requiring a full table lock.  We need all of the
 backends to be able to lock the table even though at least one of them
 holds AccessExclusiveLock.  This suggests that the backends should all
 be members of a locking group, and that locks within the same locking
 group should be regarded as mutually non-conflicting.

Trying to catch up on this thread, please excuse me if these questions
are already covered.

You mention the possibility of undetected deadlocks, which is surely
unacceptable. But why not improve the deadlock detector? How bad are
_detected_ deadlocks? A lot of the concern was around catalog accesses,
but those lock requests would rarely wait anyway.

I also wonder if group locking is generally the wrong approach to
parallelism. Parallel scan/sort should work by assigning workers to
chunks of data, and then merging the results. In principle the workers
don't need to touch the same data at all, so why are we trying so hard
to get them to all take the same locks?

The reason, I assume, is that a full table is the lockable object, but
we are imagining the segment files as the chunks. But maybe there's a
way to address this more directly with an extra field in the lock tag,
and perhaps some catalog knowledge?

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers