date:20160203

Re: [HACKERS] pg_dump data structures for triggers

2016-02-03 Thread Vik Fearing

On 02/04/2016 01:44 AM, Tom Lane wrote:
> I'm looking into fixing the problem reported here:
> http://www.postgresql.org/message-id/1445a624-d09f-4b51-9c41-46ba1e2d6...@neveragain.de
> namely that if we split a view into a table + rule (because of circular
> dependencies), parallel pg_restore fails to ensure that it creates any
> triggers for the view only after creating the rule.  If it tries to
> create the triggers first, the backend may barf because they're the wrong
> type of triggers for a plain table.
> 
> While it's not that hard to make pg_dump add some more dependencies while
> breaking a circular dependency, there's a small problem of finding the
> view's triggers so we can attach those dependencies to them.  AFAICS, with
> the current pg_dump data structures, there is actually no better way to
> do that than to iterate through every DumpableObject known to pg_dump to
> see which of them are TriggerInfos pointing to the view relation :-(.
> 
> What I have in mind is to add linked-list fields to TableInfo and
> TriggerInfo to allow all the triggers of a table to be found by chasing
> a list.  The increment in the size of those data structures isn't much,
> and we may have other uses for such an ability later.
> 
> Anybody have an objection or better idea?

No objections to this, but my "better idea" is simply to allow INSTEAD
OF triggers on tables like discussed last year.
http://www.postgresql.org/message-id/14c6fe168a9-1012-10...@webprd-a87.mail.aol.com
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Batch update of indexes

2016-02-03 Thread konstantin knizhnik


On Feb 4, 2016, at 2:00 AM, Jim Nasby wrote:

> 
> My suspicion is that it would be useful to pre-order the new data before 
> trying to apply it to the indexes.

Sorry, but ALTER INDEX is expected to work for all indexes, not only B-Tree, 
and for them sorting may not be possible...
But for B-Tree presorting inserted data should certainly increase performance. 
I will think about it.



> -- 
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-02-03 Thread Kouhei Kaigai

> > KaiGai-san,
> >
> > On 2016/02/01 10:38, Kouhei Kaigai wrote:
> > > As an aside, background of my motivation is the slide below:
> > > http://www.slideshare.net/kaigai/sqlgpussd-english
> > > (LT slides in JPUG conference last Dec)
> > >
> > > I'm under investigation of SSD-to-GPU direct feature on top of
> > > the custom-scan interface. It intends to load a bunch of data
> > > blocks on NVMe-SSD to GPU RAM using P2P DMA, prior to the data
> > > loading onto CPU/RAM, to preprocess the data to be filtered out.
> > > It only makes sense if the target blocks are not loaded to the
> > > CPU/RAM yet, because SSD device is essentially slower than RAM.
> > > So, I like to have a reliable way to check the latest status of
> > > the shared buffer, to kwon whether a particular block is already
> > > loaded or not.
> >
> > Quite interesting stuff, thanks for sharing!
> >
> > I'm in no way expert on this but could this generally be attacked from the
> > smgr API perspective? Currently, we have only one implementation - md.c
> > (the hard-coded RelationData.smgr_which = 0). If we extended that and
> > provided end-to-end support so that there would be md.c alternatives to
> > storage operations, I guess that would open up opportunities for
> > extensions to specify smgr_which as an argument to ReadBufferExtended(),
> > provided there is already support in place to install md.c alternatives
> > (perhaps in .so). Of course, these are just musings and, perhaps does not
> > really concern the requirements of custom scan methods you have been
> > developing.
> >
> Thanks for your idea. Indeed, smgr hooks are good candidate to implement
> the feature, however, what I need is a thin intermediation layer rather
> than alternative storage engine.
> 
> It becomes clear we need two features here.
> 1. A feature to check whether a particular block is already on the shared
>buffer pool.
>It is available. BufTableLookup() under the BufMappingPartitionLock
>gives us the information we want.
> 
> 2. A feature to suspend i/o write-out towards a particular blocks
>that are registered by other concurrent backend, unless it is not
>unregistered (usually, at the end of P2P DMA).
>==> to be discussed.
> 
> When we call smgrwrite(), like FlushBuffer(), it fetches function pointer
> from the 'smgrsw' array, then calls smgr_write.
> 
>   void
>   smgrwrite(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
> char *buffer, bool skipFsync)
>   {
>   (*(smgrsw[reln->smgr_which].smgr_write)) (reln, forknum, blocknum,
> buffer, skipFsync);
>   }
> 
> If extension would overwrite smgrsw[] array, then call the original
> function under the control by extension, it allows to suspend the call
> of the original smgr_write until completion of P2P DMA.
> 
> It may be a minimum invasive way to implement, and portable to any
> further storage layers.
> 
> How about your thought? Even though it is a bit different from your
> original proposition.
>
I tried to design a draft of enhancement to realize the above i/o write-out
suspend/resume, with less invasive way as possible as we can.

  ASSUMPTION: I intend to implement this feature as a part of extension,
  because this i/o suspend/resume checks are pure overhead increment
  for the core features, unless extension which utilizes it.

Three functions shall be added:

extern intGetStorageMgrNumbers(void);
extern f_smgr GetStorageMgrHandlers(int smgr_which);
extern void   SetStorageMgrHandlers(int smgr_which, f_smgr smgr_handlers);

As literal, GetStorageMgrNumbers() returns the number of storage manager
currently installed. It always return 1 right now.
GetStorageMgrHandlers() returns the currently configured f_smgr table to
the supplied smgr_which. It allows extensions to know current configuration
of the storage manager, even if other extension already modified it.
SetStorageMgrHandlers() assigns the supplied 'smgr_handlers', instead of
the current one.
If extension wants to intermediate 'smgr_write', extension will replace
the 'smgr_write' by own function, then call the original function, likely
mdwrite, from the alternative function.

In this case, call chain shall be:

  FlushBuffer, and others...
   +-- smgrwrite(...)
+-- (extension's own function)
 +-- mdwrite

Once extension's own function blocks write i/o until P2P DMA completed by
concurrent process, we don't need to care about partial update of OS cache
or storage device.
It is not difficult for extensions to implement a feature to track/untrack
a pair of (relFileNode, forkNum, blockNum), automatic untracking according
to the resource-owner, and a mechanism to block the caller by P2P DMA
completion.

On the other hands, its flexibility seems to me a bit larger than necessity
(what I want to implement is just a blocker of buffer write i/o). And, it
may give people wrong impression for the feature of pluggable storage.

[HACKERS] pgcommitfest reply having corrupted subject line

2016-02-03 Thread Noah Misch

The following message, which bears "User-Agent: pgcommitfest",

http://www.postgresql.org/message-id/flat/20160202164101.1291.30526.p...@coridan.postgresql.org

added spaces after commas in its subject line, compared to the subject line of
its In-Reply-To message.

 new subject line: Re: Add generate_series(date, date) and 
generate_series(date, date, integer)
previous subject line: Re: Add generate_series(date,date) and 
generate_series(date,date,integer)

I see no way to manually alter the subject line from the
commitfest.postgresql.org review UI.  Is commitfest.postgresql.org initiating
this change internally?

Thanks,
nm


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-02-03 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Robert Haas
> Sent: Thursday, February 04, 2016 11:39 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Andres Freund; Amit Kapila; pgsql-hackers
> Subject: Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan 
> support
> on readfuncs.c)
> 
> On Wed, Feb 3, 2016 at 8:00 PM, Kouhei Kaigai  wrote:
> >> Well, looking at this a bit more, it seems like the documentation
> >> you've written here is really misplaced.  The patch is introducing a
> >> new facility that applies to both CustomScan and ForeignScan, but the
> >> documentation is only to do with CustomScan.  I think we need a whole
> >> new chapter on extensible nodes, or something.  I'm actually not
> >> really keen on the fact that we keep adding SGML documentation for
> >> this stuff; it seems like it belongs in a README in the source tree.
> >> We don't explain nodes in general, but now we're going to have to try
> >> to explain extensible nodes.  How's that going to work?
> >>
> > The detail of these callbacks are not for end-users, administrators and
> > so on except for core/extension developers. So, I loves idea not to have
> > such a detailed description in SGML.
> > How about an idea to have more detailed source code comments close to
> > the definition of ExtensibleNodeMethods?
> > I haven't seen the src/backend/nodes/README yet, and it has only 6 updates
> > history from Postgres95 era. I guess people may forget to update README
> > file if description is separately located from the implementation.
> 
> Hmm, that might work, although that file is so old that it may be
> difficult to add to.  Another idea is: maybe we could have a header
> file for the extensible node stuff and just give it a really long
> header comment.
>
At this moment, I tried to write up description at nodes/nodes.h.
The amount of description is about 100lines. It is on a borderline
whether we split off this chunk into another header file, in my sense.


On the other hands, I noticed that we cannot omit checks for individual
callbacks on Custom node type, ExtensibleNodeMethods is embedded in
the CustomMethods structure, thus we may have Custom node with
no extensible feature.
This manner is beneficial because extension does not need to register
the library and symbol name for serialization. So, CustomScan related
code still checks existence of individual callbacks.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 
> 
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


pgsql-v9.6-custom-private.v4.patch
Description: pgsql-v9.6-custom-private.v4.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Copy-pasto in the ExecForeignDelete documentation

2016-02-03 Thread Etsuro Fujita


On 2016/02/04 0:13, Robert Haas wrote:

On Mon, Feb 1, 2016 at 5:26 AM, Etsuro Fujita
 wrote:

I don't think the data is referenced by the AFTER ROW DELETE triggers.



Why do you think that?  And why would DELETE triggers be different
from UPDATE triggers, which do something similar?


As for the UPDATE case, I think local AFTER ROW UPDATE triggers have to 
reference the data since a BEFORE trigger on the remote server might 
change the to-be-updated version of the row originally assigned.  But as 
for the DELETE case, I was not thinking so.



I looked up the history of this code and it was introduced in
7cbe57c3, which added support for triggers on foreign tables.  Noah
did that commit and he's rarely wrong about stuff like this, so I
suspect you may be missing something.  One thing to consider is
whether the version of the row that finally gets deleted is
necessarily the same as the version originally selected from the
remote side; e.g. suppose the remote side has triggers, too.


Maybe I'm missing something, but I was thinking that version should be 
the same as the version originally selected from the remote server; the 
delete would be otherwise discarded since the updated version would not 
satisfy the delete's condition, something similar to "ctid = $1" in the 
postgres_fdw case, during an EvalPlanQual-like recheck on the remote server.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

2016-02-03 Thread Etsuro Fujita


On 2016/02/04 12:04, Robert Haas wrote:

On Wed, Feb 3, 2016 at 10:00 PM, Noah Misch  wrote:

Thank you.



+1.


Thank you!

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread David Steele

On 2/3/16 8:04 PM, Robert Haas wrote:
> On Wed, Feb 3, 2016 at 6:16 PM, Tom Lane  wrote:
>> "Joshua D. Drake"  writes:
>>> On 02/03/2016 02:52 PM, Robert Haas wrote:
 Well, my view is that if somebody wants an alternative behavior
 besides dropping the connection, they can write a patch to provide
 that as an additional option.  That, too, has been discussed before.
 But the fact that somebody might want that doesn't make this a bad or
 useless behavior.  Indeed, I'd venture that more people would want
 this than would want that.
>>
>>> Something feels wrong about just dropping the connection.
>>
>> I have the same uneasy feeling about it as JD.  However, you could
>> certainly argue that if the client application has lost its marbles
>> to the extent of allowing a transaction to time out, there's no good
>> reason to suppose that it will wake up any time soon, ...
> 
> <...> But what I think really happens is
> some badly-written Java application loses track of a connection
> someplace and just never finds it again. <...>

That's what I've seen over and over again.  And then sometimes it's not
a badly-written Java application, but me, and in that case I definitely
want the connection killed.  Without logging, if you please.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] WIP: Failover Slots

2016-02-03 Thread Robert Haas

On Fri, Jan 22, 2016 at 11:51 AM, Andres Freund  wrote:
> I think it's technically quite possible to maintain the required
> resources on multiple nodes. The question is how would you configure on
> which nodes the resources need to be maintained? I can't come up with a
> satisfying scheme...

For this to work, I feel like the nodes need names, and a directory
that tells them how to reach each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 10:00 PM, Noah Misch  wrote:
> Thank you.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

2016-02-03 Thread Noah Misch

On Wed, Feb 03, 2016 at 06:18:02PM +0100, Alvaro Herrera wrote:
> I just closed a large number of patches in the 2006-01 commitfest as
> "returned with feedback".  The vast majority of those were in "waiting
> on author"; I verified that the threads had posted something to the
> author and the author had not yet replied.  Since we're past the closing
> point for the commitfest, that's fair.  Patch authors are welcome to
> submit a new version of their patch, which can be considered during the
> next commitfest.  In this group I also included a number of "Needs
> review" patches which had gotten some feedback and were waiting for a
> new version.
> 
> Also, I some patches that have actually gotten some feedback but a new
> version was posted afterwards I moved to the next commitfest.  We're not
> waiting on the author to post a new version, but we've already given
> some feedback so we're not obliged to keep them in the current
> commitfest.

Thank you.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 8:00 PM, Kouhei Kaigai  wrote:
>> Well, looking at this a bit more, it seems like the documentation
>> you've written here is really misplaced.  The patch is introducing a
>> new facility that applies to both CustomScan and ForeignScan, but the
>> documentation is only to do with CustomScan.  I think we need a whole
>> new chapter on extensible nodes, or something.  I'm actually not
>> really keen on the fact that we keep adding SGML documentation for
>> this stuff; it seems like it belongs in a README in the source tree.
>> We don't explain nodes in general, but now we're going to have to try
>> to explain extensible nodes.  How's that going to work?
>>
> The detail of these callbacks are not for end-users, administrators and
> so on except for core/extension developers. So, I loves idea not to have
> such a detailed description in SGML.
> How about an idea to have more detailed source code comments close to
> the definition of ExtensibleNodeMethods?
> I haven't seen the src/backend/nodes/README yet, and it has only 6 updates
> history from Postgres95 era. I guess people may forget to update README
> file if description is separately located from the implementation.

Hmm, that might work, although that file is so old that it may be
difficult to add to.  Another idea is: maybe we could have a header
file for the extensible node stuff and just give it a really long
header comment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 6:16 PM, Tom Lane  wrote:
> "Joshua D. Drake"  writes:
>> On 02/03/2016 02:52 PM, Robert Haas wrote:
>>> Well, my view is that if somebody wants an alternative behavior
>>> besides dropping the connection, they can write a patch to provide
>>> that as an additional option.  That, too, has been discussed before.
>>> But the fact that somebody might want that doesn't make this a bad or
>>> useless behavior.  Indeed, I'd venture that more people would want
>>> this than would want that.
>
>> Something feels wrong about just dropping the connection.
>
> I have the same uneasy feeling about it as JD.  However, you could
> certainly argue that if the client application has lost its marbles
> to the extent of allowing a transaction to time out, there's no good
> reason to suppose that it will wake up any time soon, ...

That's exactly what I think.  If you imagine a user who starts a
transaction and then leaves for lunch, aborting the transaction seems
nicer than killing the connection.  But what I think really happens is
some badly-written Java application loses track of a connection
someplace and just never finds it again.  Again, I'm not averse to
having both behavior someday, but my gut feeling is that killing the
connection will be the more useful one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-02-03 Thread Kouhei Kaigai

> On Wed, Jan 27, 2016 at 9:36 PM, Robert Haas  wrote:
> > On Mon, Jan 25, 2016 at 8:06 PM, Kouhei Kaigai  wrote:
> >> Sorry for my late response. I've been unavailable to have enough
> >> time to touch code for the last 1.5 month.
> >>
> >> The attached patch is a revised one to handle private data of
> >> foregn/custom scan node more gracefully.
> >>
> >> The overall consensus upthread were:
> >> - A new ExtensibleNodeMethods structure defines a unique name
> >>   and a set of callbacks to handle node copy, serialization,
> >>   deserialization and equality checks.
> >> - (Foreign|Custom)(Path|Scan|ScanState) are first host of the
> >>   ExtensibleNodeMethods, to allow extension to define larger
> >>   structure to store its private fields.
> >> - ExtensibleNodeMethods does not support variable length
> >>   structure (like a structure with an array on the tail, use
> >>   separately allocated array).
> >> - ExtensibleNodeMethods shall be registered on _PG_init() of
> >>   extensions.
> >>
> >> The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
> >> this feature. As I pointed out before, it uses dynhash instead
> >> of the self invented hash table.
> >
> > On a first read-through, I see nothing in this patch to which I would
> > want to object except for the fact that the comments and documentation
> > need some work from a native speaker of English.  It looks like what
> > we discussed, and I think it's an improvement over what we have now.
> 
> Well, looking at this a bit more, it seems like the documentation
> you've written here is really misplaced.  The patch is introducing a
> new facility that applies to both CustomScan and ForeignScan, but the
> documentation is only to do with CustomScan.  I think we need a whole
> new chapter on extensible nodes, or something.  I'm actually not
> really keen on the fact that we keep adding SGML documentation for
> this stuff; it seems like it belongs in a README in the source tree.
> We don't explain nodes in general, but now we're going to have to try
> to explain extensible nodes.  How's that going to work?
>
The detail of these callbacks are not for end-users, administrators and
so on except for core/extension developers. So, I loves idea not to have
such a detailed description in SGML.
How about an idea to have more detailed source code comments close to
the definition of ExtensibleNodeMethods?
I haven't seen the src/backend/nodes/README yet, and it has only 6 updates
history from Postgres95 era. I guess people may forget to update README
file if description is separately located from the implementation.

> I think you should avoid the call to GetExtensibleNodeMethods() in the
> case where extnodename is NULL.  On the other hand, I think that if
> extnodename is non-NULL, all four methods should be required, so that
> you don't have to check if (methods && methods->nodeRead) but just if
> (extnodename) { methods = GetExtensibleNodeMethods(extnodename);
> methods->nodeRead( ... ); }.  That seems like it would be a bit
> tidier.
>
OK, I'll fix up. No need to have 'missing_ok' argument here.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Sanity checking for ./configure options?

2016-02-03 Thread Jim Nasby

I just discovered that ./configure will happily accept '--with-pgport=' 
(I was actually doing =$PGPORT, and didn't realize $PGPORT was empty). 
What you end up with is a compile error in guc.c, with no idea why it's 
broken. Any reason not to have configure or at least make puke if pgport 
isn't valid?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CustomScan under the Gather node?

2016-02-03 Thread Kouhei Kaigai

> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Thursday, February 04, 2016 2:54 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: pgsql-hackers@postgresql.org
> Subject: ##freemail## Re: [HACKERS] CustomScan under the Gather node?
> 
> On Thu, Jan 28, 2016 at 8:14 PM, Kouhei Kaigai  wrote:
> >>  total ForeignScandiff
> >> 0 workers: 17584.319 ms   17555.904 ms  28.415 ms
> >> 1 workers: 18464.476 ms   18110.968 ms 353.508 ms
> >> 2 workers: 19042.755 ms   14580.335 ms4462.420 ms
> >> 3 workers: 19318.254 ms   12668.912 ms6649.342 ms
> >> 4 workers: 21732.910 ms   13596.788 ms8136.122 ms
> >> 5 workers: 23486.846 ms   14533.409 ms8953.437 ms
> >>
> >> This workstation has 4 CPU cores, so it is natural nworkers=3 records the
> >> peak performance on ForeignScan portion. On the other hands, nworkers>1 
> >> also
> >> recorded unignorable time consumption (probably, by Gather node?)
> >   :
> >> Further investigation will need
> >>
> > It was a bug of my file_fdw patch. ForeignScan node in the master process 
> > was
> > also kicked by the Gather node, however, it didn't have coordinate 
> > information
> > due to oversight of the initialization at InitializeDSMForeignScan callback.
> > In the result, local ForeignScan node is still executed after the completion
> > of coordinated background worker processes, and returned twice amount of 
> > rows.
> >
> > In the revised patch, results seems to me reasonable.
> >  total ForeignScan  diff
> > 0 workers: 17592.498 ms   17564.457 ms 28.041ms
> > 1 workers: 12152.998 ms   11983.485 ms169.513 ms
> > 2 workers: 10647.858 ms   10502.100 ms145.758 ms
> > 3 workers:  9635.445 ms9509.899 ms125.546 ms
> > 4 workers: 11175.456 ms   10863.293 ms312.163 ms
> > 5 workers: 12586.457 ms   12279.323 ms307.134 ms
> 
> Hmm.  Is the file_fdw part of this just a demo, or do you want to try
> to get that committed?  If so, maybe start a new thread with a more
> appropriate subject line to just talk about that.  I haven't
> scrutinized that part of the patch in any detail, but the general
> infrastructure for FDWs and custom scans to use parallelism seems to
> be in good shape, so I rewrote the documentation and committed that
> part.
>
Thanks, I expect file_fdw part is just for demonstration.
It does not require any special hardware to reproduce this parallel
execution, rather than GpuScan of PG-Strom.

> Do you have any idea why this isn't scaling beyond, uh, 1 worker?
> That seems like a good thing to try to figure out.
>
The hardware I run the above query has 4 CPU cores, so it is not
surprising that 3 workers (+ 1 master) recorded the peak performance.

In addition, enhancement of file_fdw part is a corner-cutting work.

It picks up the next line number to be fetched from the shared memory
segment using pg_atomic_add_fetch_u32(), then it reads the input file
until worker meets the target line. Unrelated line shall be ignored.
Individual worker parses its responsible line only, thus, parallel
execution makes sense in this part. On the other hands, total amount
of CPU cycles for file scan will increase because all the workers
at least have to parse all the lines.

If we would simply split time consumption factor in 0 worker case
as follows:
  (time to scan file; TSF) + (time to parse lines; TPL)

Total amount of workloads when we distribute file_fdw into N workers is:

  N * (TSF) + (TPL)

Thus, individual worker has to process the following amount of works:

  (TSF) + (TPL)/N

It is a typical formula of Amdahl's law when sequencial part is not
small. The above result says, TSF part is about 7.4s, TPL part is
about 10.1s.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_dump data structures for triggers

2016-02-03 Thread Tom Lane

I'm looking into fixing the problem reported here:
http://www.postgresql.org/message-id/1445a624-d09f-4b51-9c41-46ba1e2d6...@neveragain.de
namely that if we split a view into a table + rule (because of circular
dependencies), parallel pg_restore fails to ensure that it creates any
triggers for the view only after creating the rule.  If it tries to
create the triggers first, the backend may barf because they're the wrong
type of triggers for a plain table.

While it's not that hard to make pg_dump add some more dependencies while
breaking a circular dependency, there's a small problem of finding the
view's triggers so we can attach those dependencies to them.  AFAICS, with
the current pg_dump data structures, there is actually no better way to
do that than to iterate through every DumpableObject known to pg_dump to
see which of them are TriggerInfos pointing to the view relation :-(.

What I have in mind is to add linked-list fields to TableInfo and
TriggerInfo to allow all the triggers of a table to be found by chasing
a list.  The increment in the size of those data structures isn't much,
and we may have other uses for such an ability later.

Anybody have an objection or better idea?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Tom Lane

"Joshua D. Drake"  writes:
> On 02/03/2016 02:52 PM, Robert Haas wrote:
>> Well, my view is that if somebody wants an alternative behavior
>> besides dropping the connection, they can write a patch to provide
>> that as an additional option.  That, too, has been discussed before.
>> But the fact that somebody might want that doesn't make this a bad or
>> useless behavior.  Indeed, I'd venture that more people would want
>> this than would want that.

> Something feels wrong about just dropping the connection.

I have the same uneasy feeling about it as JD.  However, you could
certainly argue that if the client application has lost its marbles
to the extent of allowing a transaction to time out, there's no good
reason to suppose that it will wake up any time soon, and then we are
definitely wasting resources by letting it monopolize a backend.  Not as
many resources as if the xact were still open, but waste none the less.

My design sketch wherein we do nothing to notify the client certainly
doesn't do anything to help the client wake up, either.  So maybe it's
fine and we should just go forward with the kill-the-connection approach.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-02-03 Thread Thomas Munro

On Thu, Feb 4, 2016 at 7:48 AM, Robert Haas  wrote:
> On Sun, Jan 31, 2016 at 5:19 PM, Thomas Munro
>  wrote:
>> As described in a recent Reddit discussion[1] and bug report 9301[2],
>> there are scenarios where overlapping concurrent read-write sequences
>> produce serialization failures without constraints, but produce
>> constraint violations when there is a unique constraint.  A simple
>> example is deciding on a new value for a primary key by first checking
>> the existing contents of a table.
>>
>> This makes life difficult if you're trying to build systems that
>> automatically retry SERIALIZABLE transactions where appropriate,
>> because you have to decide how and when to handle unique constraint
>> violations too.  For example, people have experimented with automatic
>> retry-on-40001 at the level of HTTP requests for Django applications
>> when using the middleware that maps HTTP requests to database
>> transactions, and the same opportunities presumably exist in Java
>> application servers and other web service technologies, but unique
>> constraint violations get in the way of that.
>>
>> Here is an experimental patch to report such SSI conflicts.  I had to
>> make a change to aminsert_function so that the snapshot could be
>> available to btree insert code (which may be unpopular).  The
>> questions on my mind now are:  Are there still conflicting schedules
>> that would be missed, or significant new spurious conflict reports, or
>> other theoretical soundness problems?  Is that PredicateLockPage call
>> sound and cheap enough?  It is the only new code that isn't in a path
>> already doomed to ereport, other than the extra snapshot propagation,
>> and without it read-write-unique-3.spec (taken from bug report 9301)
>> doesn't detect the conflict.
>>
>> Thoughts?
>
> I don't feel qualified to have an opinion on whether this is an
> improvement.  I'm a little skeptical of changes like this on general
> principle because sometimes one clientele wants error A to be reported
> rather than error B and some other clientele wants the reverse.
> Deciding who is right is above my pay grade.

I don't see it as a difficult choice between two reasonable
alternatives.  It quacks suspiciously like a bug.

The theoretical problem with the current behaviour is that by
reporting a unique constraint violation in this case, we reach an
outcome that is neither a serialization failure nor a result that
could occur in any serial ordering of transactions.  The overlapping
transactions both observed that the key they planned to insert was not
present before inserting, and therefore they can't be untangled: there
is no serial order of the transactions where the second transaction to
run wouldn't see the key already inserted by the first transaction and
(presumably) take a different course of action.  (If it *does* see the
value already present in its snapshot, or doesn't even look first
before inserting and it turns out to be present, then it really
*should* get a unique constraint violation when trying to insert.)

The practical problem with the current behaviour is that the user has
to work out whether a unique constraint violation indicates:

1.  A programming error -- something is wrong that retrying probably won't fix

2.  An unfavourable outcome in the case that you speculatively
inserted without checking whether the value was present so you were
expecting a violation to be possible, in which case you know what
you're doing and you can figure out what to do next, probably retry or
give up

3.  A serialization failure that has been masked because our coding
happens to check for unique constraint violations without considering
SSI conflicts first -- retrying will eventually succeed.

It's complicated for a human to work out how to distinguish the third
category errors in each place where they might occur (and even to know
that they are possible, since AFAIK the manual doesn't point it out),
and impossible for an automatic retry-on-40001 framework to handle in
general.  SERIALIZABLE is supposed to be super easy to use (and
devilishly hard to implement...).

Here's an example.  Suppose you live in a country that requires
invoices to be numbered without gaps starting at one for each calendar
year.  You write:

  BEGIN ISOLATION LEVEL SERIALIZABLE;
  SELECT COALESCE(MAX(invoice_number) + 1, 1) FROM invoice WHERE year = 2016;
  INSERT INTO invoice VALUES (2016, $1, ...); -- using value computed above
  COMMIT;

I think it's pretty reasonable to expect that to either succeed or
ereport SQLSTATE 40001, and I think it's pretty reasonable to want to
be able to give the code that runs that transaction to a mechanism
that will automatically retry the whole thing if 40001 is reported.
Otherwise the promise of SERIALIZABLE is thwarted: you should be able
to forget about concurrency and write simple queries that assume they
are the only transaction in the universe.  The activities of other
overlapping transactions shouldn't chan

Re: [HACKERS] Batch update of indexes

2016-02-03 Thread Jim Nasby


On 1/21/16 11:47 AM, Konstantin Knizhnik wrote:

BTW, could you explain, what is the reason to copy data into the
pending list and then copy it again while flushing pending list into
the index? Why not read this data directly from the table? I feel that
I've missed something important here.


No, I do  not think that inserted data should be placed in pending list
and then copied to main table.
It should be stored directly in the main table and "pending list" is
just some fast, transient index.


That sounds similar to what we would need to support referencing OLD and 
NEW in per-statement triggers: a good way to find everything that was 
changed in a statement.


Or if you will, s/statement/transaction/.

Having that is probably a prerequisite for doing incremental refresh 
materialized views.


My suspicion is that it would be useful to pre-order the new data before 
trying to apply it to the indexes.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Joshua D. Drake


On 02/03/2016 02:52 PM, Robert Haas wrote:

On Wed, Feb 3, 2016 at 5:36 PM, Jim Nasby  wrote:

I think killing the session is a perfectly sensible thing to do in this
case.  Everything meaningful that was done in the session will be rolled
back - no need to waste resources keeping the connection open.



Except you end up losing stuff like every GUC you've set, existing temp
tables, etc. For an application that presumably doesn't matter, but for a
user connection it would be a PITA.

I wouldn't put a bunch of effort into it though. Dropping the connection is
certainly better than nothing.


Well, my view is that if somebody wants an alternative behavior
besides dropping the connection, they can write a patch to provide
that as an additional option.  That, too, has been discussed before.
But the fact that somebody might want that doesn't make this a bad or
useless behavior.  Indeed, I'd venture that more people would want
this than would want that.


Something feels wrong about just dropping the connection. I can see 
doing what connection poolers do (DISCARD ALL) + a rollback but the idea 
that we are going to destroy a connection to the database due to an idle 
transaction seems like a potential foot gun. Unfortunately, outside of a 
feeling I can not provide a good example.


Sincerely,

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 5:56 PM, Robert Haas  wrote:
> On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat
>  wrote:
>> PFA patches with naming conventions similar to previous ones.
>> pg_fdw_core_v7.patch: core changes
>> pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown
>> pg_join_pd_v7.patch: combined patch for ease of testing.
>
> Hmm, I think that GetPathForEPQRecheck() is a pretty terrible name.
> How about GetExistingJoinPath()?

Oops.  Hit Send too soon.  Also, how about writing if
(path->param_info != NULL) continue; instead of burying the core of
the function in another level of indentation?  I think you should skip
paths that aren't parallel_safe, too, and the documentation should be
clear that this will find an unparameterized, parallel-safe joinpath
if one exists.

+   ForeignPath *foreign_path;
+   foreign_path = (ForeignPath
*)joinpath->outerjoinpath;

Maybe insert a blank line between here, and in the other, similar case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat
 wrote:
> PFA patches with naming conventions similar to previous ones.
> pg_fdw_core_v7.patch: core changes
> pg_fdw_join_v7.patch: postgres_fdw changes for join pushdown
> pg_join_pd_v7.patch: combined patch for ease of testing.

Hmm, I think that GetPathForEPQRecheck() is a pretty terrible name.
How about GetExistingJoinPath()?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Vik Fearing

On 02/03/2016 11:36 PM, Jim Nasby wrote:
> On 2/3/16 4:05 PM, David Steele wrote:
>> On 2/3/16 4:25 PM, Tom Lane wrote:
>>> Robert Haas  writes:
 On Wed, Feb 3, 2016 at 3:41 PM, Jim Nasby 
 wrote:
> Wouldn't it be more sensible to just roll the transaction back and not
> disconnect?
>>>
>>> I'm not sure how messy this would be in practice.  But if we think that
>>> killing the whole session is not desirable but something we're doing for
>>> expediency, then it would be worth looking into that approach.
>>
>> I think killing the session is a perfectly sensible thing to do in this
>> case.  Everything meaningful that was done in the session will be rolled
>> back - no need to waste resources keeping the connection open.

That was the consensus last time I presented this bikeshed for painting.

> Except you end up losing stuff like every GUC you've set, existing temp
> tables, etc. For an application that presumably doesn't matter, but for
> a user connection it would be a PITA.
> 
> I wouldn't put a bunch of effort into it though. Dropping the connection
> is certainly better than nothing.

You could always put  SET idle_in_transaction_session_timeout = 0;  in
your .psqlrc file to exempt your manual sessions from it.  Or change it
just for your user or something.
-- 
Vik Fearing  +33 6 46 75 15 36
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 5:36 PM, Jim Nasby  wrote:
>> I think killing the session is a perfectly sensible thing to do in this
>> case.  Everything meaningful that was done in the session will be rolled
>> back - no need to waste resources keeping the connection open.
>
>
> Except you end up losing stuff like every GUC you've set, existing temp
> tables, etc. For an application that presumably doesn't matter, but for a
> user connection it would be a PITA.
>
> I wouldn't put a bunch of effort into it though. Dropping the connection is
> certainly better than nothing.

Well, my view is that if somebody wants an alternative behavior
besides dropping the connection, they can write a patch to provide
that as an additional option.  That, too, has been discussed before.
But the fact that somebody might want that doesn't make this a bad or
useless behavior.  Indeed, I'd venture that more people would want
this than would want that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Jim Nasby


On 2/3/16 4:05 PM, David Steele wrote:

On 2/3/16 4:25 PM, Tom Lane wrote:

Robert Haas  writes:

On Wed, Feb 3, 2016 at 3:41 PM, Jim Nasby  wrote:

Wouldn't it be more sensible to just roll the transaction back and not
disconnect?


I'm not sure how messy this would be in practice.  But if we think that
killing the whole session is not desirable but something we're doing for
expediency, then it would be worth looking into that approach.


I think killing the session is a perfectly sensible thing to do in this
case.  Everything meaningful that was done in the session will be rolled
back - no need to waste resources keeping the connection open.


Except you end up losing stuff like every GUC you've set, existing temp 
tables, etc. For an application that presumably doesn't matter, but for 
a user connection it would be a PITA.


I wouldn't put a bunch of effort into it though. Dropping the connection 
is certainly better than nothing.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread David Steele

On 2/3/16 4:25 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Wed, Feb 3, 2016 at 3:41 PM, Jim Nasby  wrote:
>>> Wouldn't it be more sensible to just roll the transaction back and not
>>> disconnect?
> 
> I'm not sure how messy this would be in practice.  But if we think that
> killing the whole session is not desirable but something we're doing for
> expediency, then it would be worth looking into that approach.

I think killing the session is a perfectly sensible thing to do in this
case.  Everything meaningful that was done in the session will be rolled
back - no need to waste resources keeping the connection open.

-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Tom Lane

Robert Haas  writes:
> On Wed, Feb 3, 2016 at 3:41 PM, Jim Nasby  wrote:
>> Wouldn't it be more sensible to just roll the transaction back and not
>> disconnect?

> It would be nice to be able to do that, but the client-server protocol
> can't handle it without losing sync.  Basically, if you send an error
> to an idle client, you have to kill the session.  This has come up
> many times before.

Well, you can't just spit out an unprompted error message and go back to
waiting for the next command; as Robert says, that would leave the wire
protocol state out of sync.  But in principle we could kill the
transaction and not say anything to the client right then.  Instead set
some state that causes the next command from the client to get an error.
(This would not be much different from what happens when you send a
command in an already-reported-failed transaction; though we'd want to
issue a different error message than for that case.)

I'm not sure how messy this would be in practice.  But if we think that
killing the whole session is not desirable but something we're doing for
expediency, then it would be worth looking into that approach.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread Tom Lane

Jim Nasby  writes:
> As for PGXN being an untrusted source, that's something that it's in the 
> project's best interest to try and address somehow, perhaps by having 
> formally audited extensions. Amazon already has to do this to some 
> degree before an extension can be allowed in RDS, and so does Heroku, so 
> maybe that would be a starting point.

> I think a big reason Postgres got to where it is today is because of 
> it's superior extensibility, and I think continuing to encourage that 
> with formal support for things like PGXN is important.

Yeah.  Auditing strikes me as a fine example of something for which there
is no *technical* reason to need to put it in core.  It might need some
more hooks than we have now, but that's no big deal.  In the long run,
we'll be a lot better off if we can address the non-technical factors
that make people want to push such things into the core distribution.

Exactly how we get there, I don't pretend to know.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 12:08 PM, Ashutosh Bapat
 wrote:
> The patch implements your algorithm to deparse a query as described in
> previous mail. The logic is largely coded in deparseFromExprForRel() and
> foreign_join_ok(). The later one pulls up the clauses from joining relations
> and first one deparses the FROM clause recursively.

Cool!

+   /* Add outer relation. */
+   appendStringInfo(buf, "(%s", join_sql_o.data);
+
+   /* Add join type */
+   appendStringInfo(buf, " %s JOIN ",
get_jointype_name(fpinfo->jointype));
+
+   /* Add inner relation */
+   appendStringInfo(buf, "%s", join_sql_i.data);
+
+   /* Append ON clause; ON (TRUE) in case empty join clause list */
+   appendStringInfoString(buf, " ON ");

Uh, couldn't that all be done as a single appendStringInfo?

It seems a little tortured the way you're passing "relations" all the
way down the callstack from deparseSelectStmtForRel, and at each level
it might be NULL.  If you made a rule that the caller MUST pass a
StringInfo, then you could get rid of some conditional logic in
deparseFromExprForRel.   By the way, deparseSelectSql()'s header
comment could use an update to mention this additional argument.
Generally, it's helpful to say in each relevant function header
comment something like "May be NULL" or "Must not be NULL" in cases
like this to clarify the API contract.

Similarly, I would be inclined to continue to require that
deparseTargetList() have retrieved_attrs != NULL.  If the caller
doesn't want the value, it can pass a dummy variable and ignore the
return value.  This is of course slightly more cycles, but I think
it's unlikely to matter, and making the code simpler would be good.

+ * Function is the entry point to deparse routines while constructing
+ * ForeignScan plan or estimating cost and size for ForeignPath. It is called
+ * recursively to build SELECT statements for joining relations of a
pushed down
+ * foreign join.

"This function is the entrypoint to the routines, either when
constructing ForeignScan plan or when estimating" etc.

+ * tuple descriptor for the corresponding foreign scan. For a base relation,
+ * which is not part of a pushed down join, fpinfo->attrs_used can be used to
+ * construct SELECT clause, thus the function doesn't need tlist. Hence when
+ * tlist passed, the function assumes that it's constructing the SELECT
+ * statement to be part of a pushed down foreign join.

I thought you got rid of that assumption.  I think it should be gotten
rid of, and then the comment can go too.  If we're keeping the comment
for some reason, should be "doesn't need the tlist" and when "when the
tlist is passed".

+ * 1, since those are the attribute numbers are in the corresponding scan.

Extra "are".  Should be: "Those are the attribute numbers in the
corresponding scan."

Would it be nuts to set fdw_scan_tlist in all cases?  Would the code
come out simpler than what we have now?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 3:41 PM, Jim Nasby  wrote:
> On 2/3/16 2:30 PM, Robert Haas wrote:
>>
>> On Sun, Jan 31, 2016 at 8:33 AM, Vik Fearing  wrote:
>>>
>>> Attached is a rebased and revised version of my
>>> idle_in_transaction_session_timeout patch from last year.
>>>
>>> This version does not suffer the problems the old one did where it would
>>> jump out of SSL code thanks to Andres' patch in commit
>>> 4f85fde8eb860f263384fffdca660e16e77c7f76.
>>>
>>> The basic idea is if a session remains idle in a transaction for longer
>>> than the configured time, that connection will be dropped thus releasing
>>> the connection slot and any locks that may have been held by the broken
>>> client.
>>>
>>> Added to the March commitfest.
>>
>>
>> +1 for doing something like this.  Great idea!
>
>
> Wouldn't it be more sensible to just roll the transaction back and not
> disconnect?

It would be nice to be able to do that, but the client-server protocol
can't handle it without losing sync.  Basically, if you send an error
to an idle client, you have to kill the session.  This has come up
many times before.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-03 Thread Jim Nasby


On 2/2/16 10:10 PM, Robert Haas wrote:

Now, you could also set such configuration settings in
a situation where it will not work out well.  But that is true of most
configuration settings.


Yeah, if we're going to start playing parent then I think the first 
thing to do is remove the fsync GUC. The AWS team has done testing that 
shows it to be worthless from a performance standpoint now that we have 
synchronous commit, and it's an extremely large foot-bazooka to have 
laying around.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread Jim Nasby

On 2/3/16 10:36 AM, Robert Haas wrote:

People who are interested in audit are also understandably leery of
>downloading code from an untrusted source.  Both PGXN and GitHub are The
>Wild West as far as conservative auditors are concerned.

I hate to be rude here, but that's not my problem.  You can put it on
your corporate web site and let people download it from there.  I'm
sure that auditors are familiar with the idea of downloading software
from for-profit companies.  Do they really not use any software from
Microsoft or Apple, for example?  If the problem is that they will
trust the PostgreSQL open source project but not YOUR company, then I
respectfully suggest that you need to establish the necessary
credibility, not try to piggyback on someone else's.

Luckily pgaudit is it's own group on Github 
(https://github.com/pgaudit), so it doesn't even have to be controlled 
by a single company. If others care about auditing I would hope that 
they'd contribute code there and eventually become a formal member of 
the pgaudit project.

As for PGXN being an untrusted source, that's something that it's in the 
project's best interest to try and address somehow, perhaps by having 
formally audited extensions. Amazon already has to do this to some 
degree before an extension can be allowed in RDS, and so does Heroku, so 
maybe that would be a starting point.

I think a big reason Postgres got to where it is today is because of 
it's superior extensibility, and I think continuing to encourage that 
with formal support for things like PGXN is important.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Jim Nasby


On 2/3/16 2:30 PM, Robert Haas wrote:

On Sun, Jan 31, 2016 at 8:33 AM, Vik Fearing  wrote:

Attached is a rebased and revised version of my
idle_in_transaction_session_timeout patch from last year.

This version does not suffer the problems the old one did where it would
jump out of SSL code thanks to Andres' patch in commit
4f85fde8eb860f263384fffdca660e16e77c7f76.

The basic idea is if a session remains idle in a transaction for longer
than the configured time, that connection will be dropped thus releasing
the connection slot and any locks that may have been held by the broken
client.

Added to the March commitfest.


+1 for doing something like this.  Great idea!


Wouldn't it be more sensible to just roll the transaction back and not 
disconnect?

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-02-03 Thread Robert Haas

On Sun, Jan 31, 2016 at 8:33 AM, Vik Fearing  wrote:
> Attached is a rebased and revised version of my
> idle_in_transaction_session_timeout patch from last year.
>
> This version does not suffer the problems the old one did where it would
> jump out of SSL code thanks to Andres' patch in commit
> 4f85fde8eb860f263384fffdca660e16e77c7f76.
>
> The basic idea is if a session remains idle in a transaction for longer
> than the configured time, that connection will be dropped thus releasing
> the connection slot and any locks that may have been held by the broken
> client.
>
> Added to the March commitfest.

+1 for doing something like this.  Great idea!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

2016-02-03 Thread Pavel Stehule

Dne 3. 2. 2016 20:51 napsal uživatel "Daniel Verite" <
dan...@manitou-mail.org>:
>
> Alvaro Herrera wrote:
>
> >  https://commitfest.postgresql.org/8/372/
> > \crosstabview (previously: \rotate) in psql for crosstab-style
display
>
> About this one, the code is no longer moving, the latest addition was
> regression tests a couple days ago.
>
> I think it should be moved to the next CF, to be hopefully promoted to
> Ready for Committer soon.

I hope it too.I am sorry, my way to Moscow was little bit longer than I
expected. I have not any open questions or notes, just would to recheck
final version.

Regards

Pavel
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread David Steele

On 2/3/16 11:36 AM, Robert Haas wrote:

> That's good to hear, but again, it's not enough for a core submission.
> Code that goes into our main git repository needs to be "the
> prettiest".  I mean it's not all perfect of course, but it should be
> pretty darn good.

I still think it's pretty darn good given what 2ndQuadrant and I had to
work with but I also think it could be a lot better with some core changes.

> Also, understand this: when you get a core submission accepted, the
> core project is then responsible for maintaining that code even if you
> disappear.  It's entirely reasonable for the project to demand that
> this isn't going to be too much work.  It's entirely reasonable for
> the community to want the design to be very good and the code quality
> to be high.  It's entirely reasonable for the community NOT to want to
> privilege one implementation over another.  If you don't agree that
> those things are reasonable then we disagree pretty fundamentally on
> the role of the community.  The community is a group of people to whom
> I (or you) can give our time and my (or your) code, not a group of
> people who owe me (or you) anything.

I think our differences are in the details and not in the general idea.
 In no way do I want to circumvent the process or get preferential
treatment for me or for my company.  I believe in the community but I
also believe that we won't get anywhere as a community unless
individuals give voice to their ideas.

I appreciate you taking the time to voice your opinion.  From my
perspective there's little to be gained in continuing to beat this
horse.  If the errhidefromclient() patch is accepted then that will be a
good step for pgaudit and I'll be on the lookout for other ways I can
both contribute useful code to core and move the pgaudit project forward.

-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Generalizing SortSupport for text to work with char(n), bytea, and alternative opclasses

2016-02-03 Thread Robert Haas

On Sun, Jan 31, 2016 at 10:59 PM, Andreas Karlsson  wrote:
> I have reviewed this now and I think this is a useful addition even though
> these indexes are less common. Consistent behavior is worth a lot in my mind
> and this patch is reasonably small.
>
> The patch no longer applies due to 1) oid collisions and 2) a trivial
> conflict. When I fixed those two the test suite passed.
>
> I tested this patch by building indexes with all the typess and got nice
> measurable speedups.
>
> Logically I think the patch makes sense and the code seems to be correct,
> but I have some comments on it.
>
> - You use two names a lot "string" vs "varstr". What is the difference
> between those? Is there any reason for not using varstr consistently?
>
> - You have a lot of renaming as has been mentioned previously in this
> thread. I do not care strongly for it either way, but it did make it harder
> to spot what the patch changed. If it was my own project I would have
> considered splitting the patch into two parts, one renaming everything and
> another adding the new feature, but the PostgreSQL seem to often prefer
> having one commit per feature. Do as you wish here.
>
> - I think the comment about NUL bytes in varstr_abbrev_convert makes it seem
> like the handling is much more complicated than it actually is. I did not
> understand it after a couple of readings and had to read the code understand
> what it was talking about.
>
> Nice work, I like your sorting patches.

Thanks for the review.  I fixed the OID conflict, tweaked a few
comments, and committed this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-02-03 Thread Robert Haas

On Sun, Jan 31, 2016 at 5:19 PM, Thomas Munro
 wrote:
> As described in a recent Reddit discussion[1] and bug report 9301[2],
> there are scenarios where overlapping concurrent read-write sequences
> produce serialization failures without constraints, but produce
> constraint violations when there is a unique constraint.  A simple
> example is deciding on a new value for a primary key by first checking
> the existing contents of a table.
>
> This makes life difficult if you're trying to build systems that
> automatically retry SERIALIZABLE transactions where appropriate,
> because you have to decide how and when to handle unique constraint
> violations too.  For example, people have experimented with automatic
> retry-on-40001 at the level of HTTP requests for Django applications
> when using the middleware that maps HTTP requests to database
> transactions, and the same opportunities presumably exist in Java
> application servers and other web service technologies, but unique
> constraint violations get in the way of that.
>
> Here is an experimental patch to report such SSI conflicts.  I had to
> make a change to aminsert_function so that the snapshot could be
> available to btree insert code (which may be unpopular).  The
> questions on my mind now are:  Are there still conflicting schedules
> that would be missed, or significant new spurious conflict reports, or
> other theoretical soundness problems?  Is that PredicateLockPage call
> sound and cheap enough?  It is the only new code that isn't in a path
> already doomed to ereport, other than the extra snapshot propagation,
> and without it read-write-unique-3.spec (taken from bug report 9301)
> doesn't detect the conflict.
>
> Thoughts?

I don't feel qualified to have an opinion on whether this is an
improvement.  I'm a little skeptical of changes like this on general
principle because sometimes one clientele wants error A to be reported
rather than error B and some other clientele wants the reverse.
Deciding who is right is above my pay grade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Additional role attributes && superuser review

2016-02-03 Thread Robert Haas

On Thu, Jan 28, 2016 at 4:37 PM, Stephen Frost  wrote:
> pg_monitor
>
>   Allows roles granted more information from pg_stat_activity.  Can't be
>   just a regular non-default-role right as we don't, currently, have a
>   way to say "filter out the values of certain columns on certain rows,
>   but not on others."
>
>   (There's a question here though- for the privileges which will be
>   directly GRANT'able, should we GRANT those to pg_monitor, or have
>   pg_monitor only provide unfiltered access to pg_stat_activity, or..?
>   Further, if it's only for pg_stat_activity, should we name it
>   something else?)

I endorse this proposed role.  I'd have it just grant access to
pg_stat_activity but keep the name pg_monitor so that it could apply
to other similar things in the future, but there might be other good
alternatives too.

> pg_signal_backend
>
>   Allows roles to signal other backend processes beyond those backends
>   which are owned by a user they are a role member of.  Can't be a
>   regular non-default-role right as we don't, currently, have any way to
>   GRANT rights to send signals only to backends you own or are a member
>   of.

I also endorse this.

> pg_replication
>
>   Allows roles to use the various replication functions.  Can't be a
>   regular non-default-role right as the REPLICATION role attribute
>   allows access to those functions and the GRANT system has no way of
>   saying "allow access to these functions if they have role attribute
>   X."
>
> Make sense, as these are cases where we can't simply write GRANT
> statements and get the same result, but we don't need (or at least,
> shouldn't have without supporting GRANT on catalog objects and agreement
> on what they're intended for):

This seems like it could be reshuffled so that it can be done with
GRANT.  Therefore, I don't endorse this.

> pg_backup
>
>   pg_start_backup(), pg_stop_backup(), pg_switch_xlog(), and
>   pg_create_restore_point() will all be managed by the normal GRANT
>   system and therefore we don't need a default role for those use-cases.

Agreed.

> pg_file_settings
>
>   pg_file_settings() function and pg_file_settings view will be managed
>   by the normal GRANT system and therefore we don't need a default role
>   for those use-cases.

Agreed.

> pg_replay
>
>   pg_xlog_replay_pause(), and pg_xlog_replay_resume() will be managed
>   by the normal GRANT system and therefore we don't need a default role
>   for those use-cases.

Agreed.

> pg_rotate_logfile
>
>   pg_rotate_logfile() will be managed by the normal GRANT system and
>   therefore we don't need a default role for that use-cases.

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread Joshua D. Drake


On 02/03/2016 10:36 AM, Robert Haas wrote:


I'll be the first to admit that the design is not the prettiest.  Trying

It's entirely reasonable for the community NOT to want to
privilege one implementation over another.


This, not so much.


No, this is ABSOLUTELY critical.  Suppose EnterpriseDB writes an
auditing solution, 2ndQuadrant writes an auditing solution, and Cruncy
Data writes an auditing solution, and the community then picks one of
those to put in core.  Do you not think that the other two companies
will feel like they got the fuzzy end of the lollipop?  The only time
this sort of thing doesn't provoke hard feelings is when everybody
agrees that the solution that was finally adopted was way better than
the competing things.  If you don't think this is a problem, I
respectfully suggest that you haven't seen enough of these situations
play out.



I am on the fence with this one because we should not care about what a 
company feels, period. We are not a business, we are not an employer. We 
are a community of people not companies.


On the other hand, I do very much understand what you are saying here 
and it is a difficult line to walk.


Then on the third hand (for those of us that were cloned and have 
issues), those companies chose PostgreSQL as their base, that is *their* 
problem, not ours. We also have to be considerate of the fact that those 
companies to contribute a lot to the community.


In short, may the best solution for the community win. Period.

Sincerely,

JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 12:38 PM, David G. Johnston
 wrote:
>> Right now that may be true, although it wouldn't surprise me very much
>> to find out that other people have written such extensions and they
>> just didn't get as much press.  Also, consider the future.  It is
>> *possible* that your version of pgaudit will turn out to be the be-all
>> and the end-all, but it's equally possible that somebody will fork
>> your version in turn and evolve it some more.  I don't see how you can
>> look at the pgaudit facilities you've got here and say that this is
>> the last word on auditing and all PostgreSQL users should be content
>> with exactly that facility.  I find that ridiculous.  Look me in the
>> eye and tell me that nobody's going to fork your version and evolve it
>> a bunch more.
>
> This rings hollow to me.  JSON got included with an admittedly weak feature
> set and then got significantly improved upon in subsequent releases.

True.  But most of that was adding, not changing.  Look, fundamentally
this is an opinion question, and everybody's entitled to an opinion on
whether pgaudit should be in core.  I have given mine; other people
can think about it differently.

> Those
> who would be incline to fork pgaudit, seeing it already being in core, would
> more likely and to the benefit of the community put that work into improving
> the existing work.  My off-the-cuff understanding is that some current big
> features (namely the parallel-related stuff) are also taking this "lets
> commit smaller but still useful pieces" into core to build up to this
> super-feature we want working two years from now.  I don't see any
> fundamental reason auditing couldn't be given the same opportunity to
> improve inside core.

Well, it means that every change has to be dealt with by a PostgreSQL
committer, for one thing.  We don't have a ton of people who have the
skillset for that, the time to work on it, and the community's trust.

> The other major downside of having it in core is that now the feature
> release cycle is tied to core.  Telling PostGIS they can only release new
> features when new versions of PostgreSQL come out would be an unacceptable
> situation.

Yep.

> The best of both worlds would be for core to have its own implementation
> written as an extension and to readily allow for other implementations to be
> plugged in as well.  As your alluded to above there are likely a number of
> things core really needs to enable such functionality without providing the
> entire UI - leaving that for extensions.

I really think this is not for the best.  People who write non-core
extensions are often quite unhappy when core gets a feature that is
even somewhat similar, because they feel that this takes attention
away from their work in favor of the not-necessarily-better work that
went into core.

> A bit short-sighted maybe.  Endorsing and including such a feature could
> open PostgreSQL up to a new market being supported by people who right now
> are not readily able to join the PostgreSQL community because they cannot
> invest the necessary resources to get the horses put before the cart.  Those
> people, if they could get their clients to more easily use PostgreSQL, may
> just find it worth their while to then contribute back to this new frontier
> that has been opened up to them.  This would ideally increase the number of
> contributors and reviewers within the community which is the main thing that
> is presently needed.

This is based, though, on the idea that they must not only have the
feature but they must have it in the core distribution.  And I'm
simply not willing to endorse that as a reason to put things in core.
Maybe it would be good for PostgreSQL adoption, but if everything that
somebody won't use unless it's in core goes in core, core will become
a bloated, stinking mess.

>> > I'll be the first to admit that the design is not the prettiest.  Trying
>> It's entirely reasonable for the community NOT to want to
>> privilege one implementation over another.
>
> This, not so much.

No, this is ABSOLUTELY critical.  Suppose EnterpriseDB writes an
auditing solution, 2ndQuadrant writes an auditing solution, and Cruncy
Data writes an auditing solution, and the community then picks one of
those to put in core.  Do you not think that the other two companies
will feel like they got the fuzzy end of the lollipop?  The only time
this sort of thing doesn't provoke hard feelings is when everybody
agrees that the solution that was finally adopted was way better than
the competing things.  If you don't think this is a problem, I
respectfully suggest that you haven't seen enough of these situations
play out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CustomScan under the Gather node?

2016-02-03 Thread Robert Haas

On Thu, Jan 28, 2016 at 8:14 PM, Kouhei Kaigai  wrote:
>>  total ForeignScandiff
>> 0 workers: 17584.319 ms   17555.904 ms  28.415 ms
>> 1 workers: 18464.476 ms   18110.968 ms 353.508 ms
>> 2 workers: 19042.755 ms   14580.335 ms4462.420 ms
>> 3 workers: 19318.254 ms   12668.912 ms6649.342 ms
>> 4 workers: 21732.910 ms   13596.788 ms8136.122 ms
>> 5 workers: 23486.846 ms   14533.409 ms8953.437 ms
>>
>> This workstation has 4 CPU cores, so it is natural nworkers=3 records the
>> peak performance on ForeignScan portion. On the other hands, nworkers>1 also
>> recorded unignorable time consumption (probably, by Gather node?)
>   :
>> Further investigation will need
>>
> It was a bug of my file_fdw patch. ForeignScan node in the master process was
> also kicked by the Gather node, however, it didn't have coordinate information
> due to oversight of the initialization at InitializeDSMForeignScan callback.
> In the result, local ForeignScan node is still executed after the completion
> of coordinated background worker processes, and returned twice amount of rows.
>
> In the revised patch, results seems to me reasonable.
>  total ForeignScan  diff
> 0 workers: 17592.498 ms   17564.457 ms 28.041ms
> 1 workers: 12152.998 ms   11983.485 ms169.513 ms
> 2 workers: 10647.858 ms   10502.100 ms145.758 ms
> 3 workers:  9635.445 ms9509.899 ms125.546 ms
> 4 workers: 11175.456 ms   10863.293 ms312.163 ms
> 5 workers: 12586.457 ms   12279.323 ms307.134 ms

Hmm.  Is the file_fdw part of this just a demo, or do you want to try
to get that committed?  If so, maybe start a new thread with a more
appropriate subject line to just talk about that.  I haven't
scrutinized that part of the patch in any detail, but the general
infrastructure for FDWs and custom scans to use parallelism seems to
be in good shape, so I rewrote the documentation and committed that
part.

Do you have any idea why this isn't scaling beyond, uh, 1 worker?
That seems like a good thing to try to figure out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 2016-01 Commitfest

2016-02-03 Thread Daniel Verite

Alvaro Herrera wrote:

>  https://commitfest.postgresql.org/8/372/
> \crosstabview (previously: \rotate) in psql for crosstab-style display

About this one, the code is no longer moving, the latest addition was
regression tests a couple days ago.

I think it should be moved to the next CF, to be hopefully promoted to
Ready for Committer soon.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread David G. Johnston

On Wed, Feb 3, 2016 at 9:36 AM, Robert Haas  wrote:

> On Wed, Feb 3, 2016 at 10:37 AM, David Steele  wrote:
> > On 2/1/16 11:23 PM, Robert Haas wrote:
>
> >> In
> >> saying that it's arbitrary, I'm not saying it isn't *useful*.  I'm
> >> saying there could be five extensions like this that make equally
> >> arbitrary decisions about what to do and how to do it, and they could
> >> all be useful to different people.
> >
> > There *could* be five extensions but there are not.  To my knowledge
> > there are two and one is just a more evolved version of the other.
>
> Right now that may be true, although it wouldn't surprise me very much
> to find out that other people have written such extensions and they
> just didn't get as much press.  Also, consider the future.  It is
> *possible* that your version of pgaudit will turn out to be the be-all
> and the end-all, but it's equally possible that somebody will fork
> your version in turn and evolve it some more.  I don't see how you can
> look at the pgaudit facilities you've got here and say that this is
> the last word on auditing and all PostgreSQL users should be content
> with exactly that facility.  I find that ridiculous.  Look me in the
> eye and tell me that nobody's going to fork your version and evolve it
> a bunch more.
>

This rings hollow to me.  JSON got included with an admittedly weak
feature set and then got significantly improved upon in subsequent
releases.  Those who would be incline to fork pgaudit, seeing it already
being in core, would more likely and to the benefit of the community put
that work into improving the existing work.  My off-the-cuff understanding
is that some current big features (namely the parallel-related stuff) are
also taking this "lets commit smaller but still useful pieces" into core to
build up to this super-feature we want working two years from now.  I don't
see any fundamental reason auditing couldn't be given the same opportunity
to improve inside core.

The other major downside of having it in core is that now the feature
release cycle is tied to core.  Telling PostGIS they can only release new
features when new versions of PostgreSQL come out would be an unacceptable
situation.

The best of both worlds would be for core to have its own implementation
written as an extension and to readily allow for other implementations to
be plugged in as well.  As your alluded to above there are likely a number
of things core really needs to enable such functionality without providing
the entire UI - leaving that for extensions.

> > People who are interested in audit are also understandably leery of
> > downloading code from an untrusted source.  Both PGXN and GitHub are The
> > Wild West as far as conservative auditors are concerned.
>
> I hate to be rude here, but that's not my problem.  You can put it on
> your corporate web site and let people download it from there.  I'm
> sure that auditors are familiar with the idea of downloading software
> from for-profit companies.  Do they really not use any software from
> Microsoft or Apple, for example?  If the problem is that they will
> trust the PostgreSQL open source project but not YOUR company, then I
> respectfully suggest that you need to establish the necessary
> credibility, not try to piggyback on someone else's.
>

A bit short-sighted maybe.  Endorsing and including such a feature could
open PostgreSQL up to a new market being supported by people who right now
are not readily able to join the PostgreSQL community because they cannot
invest the necessary resources to get the horses put before the cart.
Those people, if they could get their clients to more easily use
PostgreSQL, may just find it worth their while to then contribute back to
this new frontier that has been opened up to them.  This would ideally
increase the number of contributors and reviewers within the community
which is the main thing that is presently needed.

> > I'll be the first to admit that the design is not the prettiest.  Trying
> > to figure out what Postgres is doing internally through a couple of
> > hooks is like trying to replicate the script of a play when all you have
> > is the program.  However, so far it has been performed well and been
> > reliable in field tests.
>
> That's good to hear, but again, it's not enough for a core submission.
> Code that goes into our main git repository needs to be "the
> prettiest".  I mean it's not all perfect of course, but it should be
> pretty darn good.
>
> Also, understand this: when you get a core submission accepted, the
> core project is then responsible for maintaining that code even if you
> disappear.  It's entirely reasonable for the project to demand that
> this isn't going to be too much work.  It's entirely reasonable for
> the community to want the design to be very good and the code quality
> to be high.

So far so good...

It's entirely reasonable for the community NOT to want to
> privilege one implementation over another.

Re: [HACKERS] 2016-01 Commitfest

2016-02-03 Thread Tom Lane

Alvaro Herrera  writes:
> A number of patches still remain in the current commitfest.  11 of them
> are marked as "ready for committer" so supposedly some committer should
> grab them and push them.

The "SET ROLE hook" patch should be moved to RWF state; it's not going
to be committed in anything like its current form, per the last two
messages in that thread.

The reason the "remove wal_level archive" patch hasn't been committed
is that there is just about zero consensus on whether it's a good idea.
I don't see that getting resolved soon, and would not expect that it
will get committed in this CF.  Don't know what action is appropriate
on the CF entry, but RFC status seems misleading.

Heikki and/or Andres have their names on three of the remaining RFC
patches; it's unlikely any other committer will touch those patches
unless they take their names off.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Freeze avoidance of very large table.

2016-02-03 Thread Masahiko Sawada

On Tue, Feb 2, 2016 at 7:22 PM, Alvaro Herrera  wrote:
> Masahiko Sawada wrote:
>
>> I misunderstood. Sorry for noise.
>> I agree with adding conversion method as a pageConverter routine.
>
> \o/
>
>> This patch doesn't change page layout actually, but pageConverter
>> routine checks only the page layout.
>> And we have to plugin named convertLayout_X_to_Y.
>>
>> I think we have two options.
>>
>> 1. Change page layout(PG_PAGE_LAYOUT_VERSION) to 5. pg_upgrade detects
>> it and then converts only VM files.
>> 2. Change pg_upgrade plugin mechanism so that it can handle other name
>> conversion plugins (e.g., convertLayout_vm_to_vfm)
>>
>> I think #2 is better. Thought?
>
> My vote is for #2 as well.  Maybe we just didn't have forks when this
> functionality was invented; maybe the author just didn't think hard
> enough about what would be the right interface to do it.

I've almost wrote up the very rough patch. (it can pass regression test)
Windows supporting is not yet, and Makefile is not correct.

I've divided the main patch into two patches; add frozen bit patch and
pg_upgrade support patch.
000 patch is almost  same as previous code. (includes small fix)
001 patch provides rewriting visibility map as a pageConverter routine.
002 patch is for enhancement debug message in visibilitymap.c

In order to support pageConvert plugin, I made the following changes.
* Main changes
- Remove PAGE_CONVERSION
- pg_upgrade plugin is located to 'src/bin/pg_upgrade/plugins' directory.
- Move directory having plugins from '$(bin)/plugins' to '$(lib)/plugins'.
- Add new page-converter plugin function for visibility map.
- Current code doesn't allow us to use link mode (-k) in the case
where page-converter is required.
  But I changed it so that if page-converter for fork file is
specified, we convert it actually even when link mode.

* Interface designe
convertFile() and convertPage() are plugin function for main relation
file, and these functions are dynamically loaded by
loadConvertPlugin().
I added a new pageConvert plugin function converVMFile() for
visibility map (fork file).
If layout of CLOG, FSM or etc will be changed in the future, we could
expand some new pageConvert plugin functions like convertCLOGFile() or
convertFSMFile(), and these functions are dynamically loaded by
loadAdditionalConvertPlugin().
It means that main file and other fork file conversion are executed
independently, and conversion for fork file are executed even if link
mode is specified.
Each conversion plugin is loaded and used only when it's required.

I still agree with this plugin approach, but I felt it's still
complicated a bit, and I'm concerned that patch size has been
increased.
Please give me feedbacks.
If there are not objections about this, I'm going to spend time to improve it.

Regards,

--
Masahiko Sawada
diff --git a/contrib/pgstattuple/pgstatapprox.c b/contrib/pgstattuple/pgstatapprox.c
index 001988b..5d08c73 100644
--- a/contrib/pgstattuple/pgstatapprox.c
+++ b/contrib/pgstattuple/pgstatapprox.c
@@ -87,7 +87,7 @@ statapprox_heap(Relation rel, output_type *stat)
 		 * If the page has only visible tuples, then we can find out the free
 		 * space from the FSM and move on.
 		 */
-		if (visibilitymap_test(rel, blkno, &vmbuffer))
+		if (VM_ALL_VISIBLE(rel, blkno, &vmbuffer))
 		{
 			freespace = GetRecordedFreeSpace(rel, blkno);
 			stat->tuple_len += BLCKSZ - freespace;
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 392eb70..c43443a 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5916,7 +5916,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
   

-VACUUM performs a whole-table scan if the table's
+VACUUM performs an eager freezing if the table's
 pg_class.relfrozenxid field has reached
 the age specified by this setting.  The default is 150 million
 transactions.  Although users can set this value anywhere from zero to
@@ -5960,7 +5960,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
   

-VACUUM performs a whole-table scan if the table's
+VACUUM performs an eager freezing if the table's
 pg_class.relminmxid field has reached
 the age specified by this setting.  The default is 150 million multixacts.
 Although users can set this value anywhere from zero to two billions,
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 5204b34..7cc975d 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -352,9 +352,9 @@
 Vacuum maintains a visibility map for each
 table to keep track of which pages contain only tuples that are known to be
 visible to all active transactions (and all future transactions, until the
-page is again modified).  This has two purposes.  First, vacuum
-itself can skip such pages on the next run, since there is nothing to
-clean up.
+pa

Re: [HACKERS] 2016-01 Commitfest

2016-02-03 Thread Alvaro Herrera

Hi,

Status summary:
 Needs review: 7.
 Ready for Committer: 11.
 Committed: 28.
 Moved to next CF: 23.
 Rejected: 2.
 Returned with Feedback: 28.
Total: 99. 

I just closed a large number of patches in the 2006-01 commitfest as
"returned with feedback".  The vast majority of those were in "waiting
on author"; I verified that the threads had posted something to the
author and the author had not yet replied.  Since we're past the closing
point for the commitfest, that's fair.  Patch authors are welcome to
submit a new version of their patch, which can be considered during the
next commitfest.  In this group I also included a number of "Needs
review" patches which had gotten some feedback and were waiting for a
new version.

Also, I some patches that have actually gotten some feedback but a new
version was posted afterwards I moved to the next commitfest.  We're not
waiting on the author to post a new version, but we've already given
some feedback so we're not obliged to keep them in the current
commitfest.

A number of patches still remain in the current commitfest.  11 of them
are marked as "ready for committer" so supposedly some committer should
grab them and push them.

7 are in "needs review" state; I just checked and most of those already
received some feedback and it seems fair to close them as RwF; new
versions can be submitted to next commitfest.  Two of them didn't,
AFAICS:

  https://commitfest.postgresql.org/8/372/
 \crosstabview (previously: \rotate) in psql for crosstab-style display
  https://commitfest.postgresql.org/8/323/
 Statistics for array types

and for closure it would be good to either send them feedback if they
need rework or mark them as ready-for-committer.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 11:17 AM, Ashutosh Bapat
 wrote:
> There seems to be double successive assignment to fdw_private in recent
> commit. Here's patch to remove the first one.

Committed along with a fix for another problem I noted along the way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Batch update of indexes

2016-02-03 Thread Konstantin Knizhnik


Attached please find patch for "ALTER INDEX ... WHERE ..." clause.
It is now able to handle all three possible situations:
1. Making index partial (add WHERE condition to the ordinary index)
2. Extend partial index range (less restricted index predicate)
3. Arbitrary change of partial index predicate

In case 2) new records are added to the index.
In other two cases index is completely reconstructed.

This patch includes src/bin/insbench utility for testing insert 
performance. It can be easily excluded from the patch to reduce it size.
Also it is better to apply this patch together with "index-only scans 
with partial indexes" patch:


http://www.postgresql.org/message-id/560c7213.3010...@2ndquadrant.com

only in this case regression test will produce expected output.


On 27.01.2016 23:15, Robert Haas wrote:

On Wed, Jan 20, 2016 at 4:28 AM, Konstantin Knizhnik
 wrote:

Please notice that such alter table statement, changing condition for
partial index, is not supported now.
But I do not see any principle problems with supporting such construction.
We should just include in the index all records which match new condition
and do not match old condition:

ts < '21/01/2016' and not (ts < '20/01/2016')

You'd also need to remove any rows from the index that match the old
condition but not the new one.  In your example, that's impossible,
but in general, it's definitely possible.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index b450bcf..b6ffb19 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -32,6 +32,7 @@
 #include "commands/tablespace.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
+#include "funcapi.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/planner.h"
@@ -50,6 +51,9 @@
 #include "utils/snapmgr.h"
 #include "utils/syscache.h"
 #include "utils/tqual.h"
+#include "utils/ruleutils.h"
+#include "executor/executor.h"
+#include "executor/spi.h"
 
 
 /* non-export function prototypes */
@@ -275,6 +279,160 @@ CheckIndexCompatible(Oid oldId,
 	return ret;
 }
 
+static void 
+UpdateIndex(Oid indexRelationId, Node* whereClause)
+{
+	Datum		values[Natts_pg_index];
+	bool		isnull[Natts_pg_index];
+	HeapTuple   oldTuple;
+	HeapTuple   newTuple;
+	Relation	pg_index;
+
+	pg_index = heap_open(IndexRelationId, RowExclusiveLock);
+	oldTuple = SearchSysCacheCopy1(INDEXRELID, ObjectIdGetDatum(indexRelationId));
+	if (!HeapTupleIsValid(oldTuple))
+		elog(ERROR, "cache lookup failed for index %u", indexRelationId);
+
+	heap_deform_tuple(oldTuple, RelationGetDescr(pg_index), values, isnull);
+	values[Anum_pg_index_indpred - 1] = CStringGetTextDatum(nodeToString(whereClause));
+	isnull[Anum_pg_index_indpred - 1] = false;
+	newTuple = heap_form_tuple(RelationGetDescr(pg_index), values, isnull);
+	simple_heap_update(pg_index, &oldTuple->t_self, newTuple);
+	CatalogUpdateIndexes(pg_index, newTuple);
+	heap_freetuple(newTuple);
+	heap_freetuple(oldTuple);
+	heap_close(pg_index, NoLock);
+}
+
+void
+AlterIndex(Oid indexRelationId, IndexStmt *stmt)
+{
+	char* select;
+	Oid heapRelationId;
+	IndexUniqueCheck checkUnique;
+	Datum		values[INDEX_MAX_KEYS];
+	bool		isnull[INDEX_MAX_KEYS];
+	Relation heapRelation;
+	Relation indexRelation;
+SPIPlanPtr plan;
+Portal portal;
+	HeapTuple tuple;
+	TupleTableSlot *slot;
+	ItemPointer tupleid;
+	IndexInfo  *indexInfo;
+	EState *estate;
+	Oid	namespaceId;
+	List*   deparseCtx;
+	char*   oldIndexPredicate;
+	char*   newIndexPredicate;
+	char*   relationName;
+
+	Assert(stmt->whereClause);
+	CheckPredicate((Expr *) stmt->whereClause);
+
+	/* Open and lock the parent heap relation */
+	heapRelationId = IndexGetRelation(indexRelationId, false);
+	heapRelation = heap_open(heapRelationId, AccessShareLock);
+
+	/* Open the target index relation */
+	/*	indexRelation = index_open(indexRelationId, RowExclusiveLock); */
+	indexRelation = index_open(indexRelationId, ShareUpdateExclusiveLock);
+	/* indexRelation = index_open(indexRelationId, AccessShareLock); */
+	namespaceId = RelationGetNamespace(indexRelation);
+
+	indexInfo = BuildIndexInfo(indexRelation);
+	Assert(!indexInfo->ii_ExclusionOps);
+ 
+	/*
+	 * Generate the constraint and default execution states
+	 */
+	estate = CreateExecutorState();
+
+	checkUnique = indexRelation->rd_index->indisunique ? UNIQUE_CHECK_YES : UNIQUE_CHECK_NO;
+
+	slot = MakeSingleTupleTableSlot(RelationGetDescr(heapRelation));
+	
+	deparseCtx = deparse_context_for(RelationGetRelationName(heapRelation), heapRelationId);
+	relationName = quote_qualified_identifier(get_namespace_name(namespaceId),
+			  get_rel_name(heapRelationId)),
+	newIndexPredicate = deparse_expression(stmt->whereClause, deparseCtx, false, false);
+	oldIndexPredicate = indexInfo->ii_Predicate 
+		? deparse_expression((Node*)make_ands_explicit(i

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 10:37 AM, David Steele  wrote:
> On 2/1/16 11:23 PM, Robert Haas wrote:
>> OK, I'll bite: I'm worried that this patch will be a maintenance
>> burden.  It's easy to imagine that changes to core will result in the
>> necessity or at least desirability of changes to pgaudit, but I'm
>> definitely not prepared to insist that future authors try to insist
>> that future patch submitters have to understand this code and update
>> it as things change.
>
> I agree this is a concern.  It's similar to deparse or event triggers in
> this regard with the notable exception that pgaudit is not in core.

I don't see event triggers as having the same issues, actually.  DDL
deparse - which I assume is what you mean by deparse - does, and I
complained about that too, quite a lot, and some work was done to
address it - I would have liked more.  One of the things I argued for
forcefully with regard to DDL deparse is that it needed to be able to
deparse every DDL command we have, not just the ones the authors
thought were most important.  I would expect no less from an auditing
facility.

> However, if it becomes popular enough out of core as everyone insists is
> preferable then people will still need to maintain it.  Just as PostGIS
> has a close relationship with core, the pgaudit team would need to have
> the same sort of relationship.  Patches would be submitted for review
> and (hopefully) committed and core developer time would still be spent
> on pgaudit, ableit indirectly.  Core developers would still have to be
> careful not to break pgaudit if it became popular enough.

This just isn't how it works.  I have no idea what the PostGIS folks
are doing, and they generally don't need to know what we're doing.
Occasionally we interact with each other, but mostly those two
different pieces of software can be developed by different people, and
that's a good thing.  Migrating PostGIS into PostgreSQL's core would
not be good for either project IMHO.  It is neither necessary nor
desirable to have multiple software projects all merged together in a
single git repo.

>> The set of things that the patch can audit is pretty arbitrary and not
>> well tied into the core code.
>
> Since the set of what it can audit is every command that can be run by a
> user in Postgres I don't see how that's arbitrary.

That's not what I'm talking about.  You audit relation access and
function calls but not, say, creation of event triggers.  Yes, you can
log every statement that comes in, but that's not the secret sauce:
log_statement=all will do that much.  The secret sauce is figuring out
the set of events that a statement might perform which might cause
that statement to generate audit records.  And it does not seem to me
that what you've got there right now is particularly general - you've
got relation access and function calls and a couple of other things,
but it's far from comprehensive.

>> There is a list of string constants in
>> the code that covers each type of relations plus functions, but not
>> any other kind of SQL object.  If somebody adds a new relkind, this
>> would probably need to updated - it would not just work.  If somebody
>> adds a new type of SQL object, it won't be covered unless the user
>> takes some explicit action, but there's no obvious guiding principle
>> to say whether that would be appropriate in any particular case.
>
> I think a lot of this could be mitigated by some changes in utility.c.
> I'm planning patches that will allow mapping command strings back to
> event tags and a general classifier function that could incidentally be
> used to improve the granularity of log_statement.

So, *this* starts to smell like a reason for core changes.  "I can't
really do what I want in my extension, but with these changes I could"
is an excellent reason to change core.

>> In
>> saying that it's arbitrary, I'm not saying it isn't *useful*.  I'm
>> saying there could be five extensions like this that make equally
>> arbitrary decisions about what to do and how to do it, and they could
>> all be useful to different people.
>
> There *could* be five extensions but there are not.  To my knowledge
> there are two and one is just a more evolved version of the other.

Right now that may be true, although it wouldn't surprise me very much
to find out that other people have written such extensions and they
just didn't get as much press.  Also, consider the future.  It is
*possible* that your version of pgaudit will turn out to be the be-all
and the end-all, but it's equally possible that somebody will fork
your version in turn and evolve it some more.  I don't see how you can
look at the pgaudit facilities you've got here and say that this is
the last word on auditing and all PostgreSQL users should be content
with exactly that facility.  I find that ridiculous.  Look me in the
eye and tell me that nobody's going to fork your version and evolve it
a bunch more.

> People who are interested in audit are also u

Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-03 Thread Ashutosh Bapat

There seems to be double successive assignment to fdw_private in recent
commit. Here's patch to remove the first one.

On Wed, Feb 3, 2016 at 7:40 PM, Robert Haas  wrote:

> On Tue, Feb 2, 2016 at 10:42 PM, Corey Huinker 
> wrote:
> >> I don't see how.  There really is no declaration in there for a
> >> variable called server.
> >
> > Absolutely correct. My only guess is that it was from failing to make
> clean
> > after a checkout/re-checkout. A good reason to have even boring
> regression
> > tests.
> >
> > Regression tests added.
>
> Committed.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
diff --git a/contrib/postgres_fdw/postgres_fdw.c 
b/contrib/postgres_fdw/postgres_fdw.c
index d5c0383..c95ac05 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1009,22 +1009,20 @@ postgresGetForeignPlan(PlannerInfo *root,
 * expressions to be sent as parameters.
 */
initStringInfo(&sql);
deparseSelectStmtForRel(&sql, root, baserel, remote_conds,

best_path->path.pathkeys, &retrieved_attrs,
¶ms_list);
/*
 * Build the fdw_private list that will be available to the executor.
 * Items in the list must match enum FdwScanPrivateIndex, above.
 */
-   fdw_private = list_make2(makeString(sql.data),
-retrieved_attrs);
fdw_private = list_make3(makeString(sql.data),
 retrieved_attrs,
 
makeInteger(fpinfo->fetch_size));
 
/*
 * Create the ForeignScan node from target list, filtering expressions,
 * remote parameter expressions, and FDW private information.
 *
 * Note that the remote parameter expressions are stored in the 
fdw_exprs
 * field of the finished plan node; we can't keep them in private state

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2016-02-03 Thread Robert Haas

On Wed, Jan 27, 2016 at 9:36 PM, Robert Haas  wrote:
> On Mon, Jan 25, 2016 at 8:06 PM, Kouhei Kaigai  wrote:
>> Sorry for my late response. I've been unavailable to have enough
>> time to touch code for the last 1.5 month.
>>
>> The attached patch is a revised one to handle private data of
>> foregn/custom scan node more gracefully.
>>
>> The overall consensus upthread were:
>> - A new ExtensibleNodeMethods structure defines a unique name
>>   and a set of callbacks to handle node copy, serialization,
>>   deserialization and equality checks.
>> - (Foreign|Custom)(Path|Scan|ScanState) are first host of the
>>   ExtensibleNodeMethods, to allow extension to define larger
>>   structure to store its private fields.
>> - ExtensibleNodeMethods does not support variable length
>>   structure (like a structure with an array on the tail, use
>>   separately allocated array).
>> - ExtensibleNodeMethods shall be registered on _PG_init() of
>>   extensions.
>>
>> The 'pgsql-v9.6-custom-private.v3.patch' is the main part of
>> this feature. As I pointed out before, it uses dynhash instead
>> of the self invented hash table.
>
> On a first read-through, I see nothing in this patch to which I would
> want to object except for the fact that the comments and documentation
> need some work from a native speaker of English.  It looks like what
> we discussed, and I think it's an improvement over what we have now.

Well, looking at this a bit more, it seems like the documentation
you've written here is really misplaced.  The patch is introducing a
new facility that applies to both CustomScan and ForeignScan, but the
documentation is only to do with CustomScan.  I think we need a whole
new chapter on extensible nodes, or something.  I'm actually not
really keen on the fact that we keep adding SGML documentation for
this stuff; it seems like it belongs in a README in the source tree.
We don't explain nodes in general, but now we're going to have to try
to explain extensible nodes.  How's that going to work?

I think you should avoid the call to GetExtensibleNodeMethods() in the
case where extnodename is NULL.  On the other hand, I think that if
extnodename is non-NULL, all four methods should be required, so that
you don't have to check if (methods && methods->nodeRead) but just if
(extnodename) { methods = GetExtensibleNodeMethods(extnodename);
methods->nodeRead( ... ); }.  That seems like it would be a bit
tidier.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Audit Extension

2016-02-03 Thread David Steele

Hi Robert,

Thank you for replying.

On 2/1/16 11:23 PM, Robert Haas wrote:
> OK, I'll bite: I'm worried that this patch will be a maintenance
> burden.  It's easy to imagine that changes to core will result in the
> necessity or at least desirability of changes to pgaudit, but I'm
> definitely not prepared to insist that future authors try to insist
> that future patch submitters have to understand this code and update
> it as things change.

I agree this is a concern.  It's similar to deparse or event triggers in
this regard with the notable exception that pgaudit is not in core.

However, if it becomes popular enough out of core as everyone insists is
preferable then people will still need to maintain it.  Just as PostGIS
has a close relationship with core, the pgaudit team would need to have
the same sort of relationship.  Patches would be submitted for review
and (hopefully) committed and core developer time would still be spent
on pgaudit, ableit indirectly.  Core developers would still have to be
careful not to break pgaudit if it became popular enough.

> The set of things that the patch can audit is pretty arbitrary and not
> well tied into the core code.  

Since the set of what it can audit is every command that can be run by a
user in Postgres I don't see how that's arbitrary.  The amount of
*additional* detail provided for each audit record is also not arbitrary
but a function of what information is available through various hooks
and event triggers.

I was able to improve on the amount of information provided over the
original 2Q code (mostly by abandoning support for Postgres < 9.5) but
the methodology remains the same.

> There is a list of string constants in
> the code that covers each type of relations plus functions, but not
> any other kind of SQL object.  If somebody adds a new relkind, this
> would probably need to updated - it would not just work.  If somebody
> adds a new type of SQL object, it won't be covered unless the user
> takes some explicit action, but there's no obvious guiding principle
> to say whether that would be appropriate in any particular case.

I think a lot of this could be mitigated by some changes in utility.c.
I'm planning patches that will allow mapping command strings back to
event tags and a general classifier function that could incidentally be
used to improve the granularity of log_statement.

> In
> saying that it's arbitrary, I'm not saying it isn't *useful*.  I'm
> saying there could be five extensions like this that make equally
> arbitrary decisions about what to do and how to do it, and they could
> all be useful to different people.

There *could* be five extensions but there are not.  To my knowledge
there are two and one is just a more evolved version of the other.

> I don't really want to bless any
> given one of those current or hypothetical future solutions.  We have
> hooks precisely so that people can write stuff like this and put it up
> on PGXN or github or wherever - and this code can be published there,
> and people who want to can use it.

People who are interested in audit are also understandably leery of
downloading code from an untrusted source.  Both PGXN and GitHub are The
Wild West as far as conservative auditors are concerned.

Your use of the phrase "or wherever" only reinforces the point in my
mind.  The implication is that it doesn't matter where the pgaudit
extension comes from but it does matter, a lot.

> It also appears to me that if we did want to do that, it would need
> quite a lot of additional cleanup.  I haven't dug in enough to have a
> list of specific issues, but it does look to me like there would be
> quite a bit.  Maybe that'd be worth doing if there were other
> advantages of having this be in core, but I don't see them.

I'll be the first to admit that the design is not the prettiest.  Trying
to figure out what Postgres is doing internally through a couple of
hooks is like trying to replicate the script of a play when all you have
is the program.  However, so far it has been performed well and been
reliable in field tests.

-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] Copy-pasto in the ExecForeignDelete documentation

2016-02-03 Thread Robert Haas

On Mon, Feb 1, 2016 at 5:26 AM, Etsuro Fujita
 wrote:
> I don't think the data is referenced by the AFTER ROW DELETE triggers.

Why do you think that?  And why would DELETE triggers be different
from UPDATE triggers, which do something similar?

I looked up the history of this code and it was introduced in
7cbe57c3, which added support for triggers on foreign tables.  Noah
did that commit and he's rarely wrong about stuff like this, so I
suspect you may be missing something.  One thing to consider is
whether the version of the row that finally gets deleted is
necessarily the same as the version originally selected from the
remote side; e.g. suppose the remote side has triggers, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpoints after database start/immediate checkpoints

2016-02-03 Thread Robert Haas

On Mon, Feb 1, 2016 at 7:43 PM, Andres Freund  wrote:
> Right now it takes checkpoint_timeout till we start a checkpoint, and
> checkpoint_timeout + checkpoint_timeout * checkpoint_completion_target
> till we complete the first checkpoint after shutdown/forced checkpoints.
>
> That means a) that such checkpoint will often be bigger/more heavyweight
> than the following ones, not what you want after a restart/create
> database/basebackup... b) reliable benchmarks measuring steady state
> have to run for even longer.
>
> There's some logic to that behaviour though: With a target of 0 it'd be
> weird to start a checkpoint directly after startup, and even otherwise
> there'll be less to write at the beginning.
>
> As an example:
> 2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint complete: wrote 186041 
> buffers (71.0%); 1 transaction log file(s) added, 0 removed, 0 recycled; 
> sort=0.071 s, write=9.504 s, sync=0.000 s, total=9.532 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=1460260 kB, estimate=1460260 kB
> 2016-02-02 01:32:58.053 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:33:08.061 CET 21361 LOG:  checkpoint complete: wrote 127249 
> buffers (48.5%); 0 transaction log file(s) added, 0 removed, 89 recycled; 
> sort=0.045 s, write=9.987 s, sync=0.000 s, total=10.007 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=1187558 kB, estimate=1432989 kB
> 2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint complete: wrote 124649 
> buffers (47.5%); 0 transaction log file(s) added, 0 removed, 69 recycled; 
> sort=0.048 s, write=10.160 s, sync=0.000 s, total=10.176 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=1135086 kB, estimate=1337776 kB
> 2016-02-02 01:33:58.216 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint complete: wrote 123298 
> buffers (47.0%); 0 transaction log file(s) added, 0 removed, 69 recycled; 
> sort=0.049 s, write=9.838 s, sync=0.000 s, total=9.843 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=1184077 kB, estimate=1322406 kB
> 2016-02-02 01:34:08.060 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:18.052 CET 21361 LOG:  checkpoint complete: wrote 118165 
> buffers (45.1%); 0 transaction log file(s) added, 0 removed, 72 recycled; 
> sort=0.046 s, write=9.987 s, sync=0.000 s, total=9.992 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=1198018 kB, estimate=1309967 kB
> 2016-02-02 01:34:18.053 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:28.089 CET 21361 LOG:  checkpoint complete: wrote 120814 
> buffers (46.1%); 0 transaction log file(s) added, 0 removed, 73 recycled; 
> sort=0.051 s, write=10.020 s, sync=0.000 s, total=10.036 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=1203691 kB, estimate=1299339 kB
> 2016-02-02 01:34:28.090 CET 21361 LOG:  checkpoint starting: time
> 2016-02-02 01:34:39.182 CET 21361 LOG:  checkpoint complete: wrote 110411 
> buffers (42.1%); 0 transaction log file(s) added, 0 removed, 74 recycled; 
> sort=0.047 s, write=9.908 s, sync=0.000 s, total=11.092 s; sync files=0, 
> longest=0.000 s, average=0.000 s; distance=1073612 kB, estimate=1276767 kB
>
> We wrote roughly 1/3 more wal/buffers during the first checkpoint than
> following ones (And yes, the above is with an impossibly low timeout,
> but that doesn't change anything). You can make that much more extreme
> with larget shared buffers and larger timeouts.
>
>
> I wonder if this essentially point at checkpoint_timeout being wrongly
> defined: Currently it means we'll try to finish a checkpoint
> (1-checkpoint_completion_target) * timeout before the next one - but
> perhaps it should instead be that we start checkpoint_timeout * _target
> before the next timeout? Afaics that'd work more graceful in the face of
> restarts and forced checkpoints.

There's a certain appeal to that, but at the same time it seems pretty
wonky.  Right now, you can say that a checkpoint is triggered when the
amount of WAL reaches X or the amount of time reaches Y, but with the
alternative definition it's a bit harder to explain what's going on
there.  Maybe we should do it anyway, but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] extend pgbench expressions with functions

2016-02-03 Thread Michael Paquier

On Wed, Feb 3, 2016 at 11:28 PM, Robert Haas  wrote:
> On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
>  wrote:
>> OK, here are patches for 9.1~9.4. The main differences are that in
>> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
>> that's int32. In the latter case pgbench blows up the same way with
>> that:
>> \set i -2147483648
>> \set i :i / -1
>> select :i;
>> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
>> at the top of pgbench.c. I thing that's fine.
>
> I thing so too.  Committed.

Thanks for thinging so.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Raising the checkpoint_timeout limit

2016-02-03 Thread David Steele

On 2/2/16 11:10 PM, Robert Haas wrote:
> On Tue, Feb 2, 2016 at 10:58 PM, Tom Lane  wrote:
>> I've gotta go with the "paternalism" side of the argument here.  Suppose
>> you configure your system to checkpoint once a year --- what is going to
>> happen when the year is up?  Or when you try to shut it down?  You *will*
>> regret such a setting.
>>
>> I don't think we should allow the checkpoint distances to be so large that
>> checkpoints don't happen in the normal course of events.  I'd be okay with
>> the max being a day, perhaps.
> 
> If smart people[1] want to set checkpoint_timeout to a value higher
> than 1 day[2], then I think we should let them.

Agreed - I have a specific instance where I would prefer the daily
backups or checkpoint segments to be the primary source of checkpoints
with checkpoint_timeout set to 36 hours and used only as a fallback.

A limit of 1 day would be lower than I like though still better than
what I have now, 1 hour.

For this case I would probably configure:

checkpoint_segments = 256
checkpoint_timeout = 36h

So the daily backups would generally trigger the checkpoint unless there
was unusually high volume in which case it would be checkpoint segments.
 Finally, if the backups weren't working and volume was normal then
checkpoint_timeout would come along to save the day.

In this case its all about reducing full-page writes on a medium-sized
system with heavy churn in order to make a long backup/archive retention
more cost-effective.  Recovery time is not as much of an issue for this
application.

-- 
-David
da...@pgmasters.net

signature.asc
Description: OpenPGP digital signature

Re: [HACKERS] extend pgbench expressions with functions

2016-02-03 Thread Robert Haas

On Mon, Feb 1, 2016 at 9:46 PM, Michael Paquier
 wrote:
> OK, here are patches for 9.1~9.4. The main differences are that in
> 9.3/9.4 int64 is used for the division operations, and in 9.2/9.1
> that's int32. In the latter case pgbench blows up the same way with
> that:
> \set i -2147483648
> \set i :i / -1
> select :i;
> In those patches INT32_MIN/INT64_MIN need to be explicitly set as well
> at the top of pgbench.c. I thing that's fine.

I thing so too.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] "using previous checkpoint record at" maybe not the greatest idea?

2016-02-03 Thread Robert Haas

On Mon, Feb 1, 2016 at 6:58 PM, Andres Freund  wrote:
> currently if, when not in standby mode, we can't read a checkpoint
> record, we automatically fall back to the previous checkpoint, and start
> replay from there.
>
> Doing so without user intervention doesn't actually seem like a good
> idea. While not super likely, it's entirely possible that doing so can
> wreck a cluster, that'd otherwise easily recoverable. Imagine e.g. a
> tablespace being dropped - going back to the previous checkpoint very
> well could lead to replay not finishing, as the directory to create
> files in doesn't even exist.
>
> As there's, afaics, really no "legitimate" reasons for needing to go
> back to the previous checkpoint I don't think we should do so in an
> automated fashion.
>
> All the cases where I could find logs containing "using previous
> checkpoint record at" were when something else had already gone pretty
> badly wrong. Now that obviously doesn't have a very large significance,
> because in the situations where it "just worked" are unlikely to be
> reported...
>
> Am I missing a reason for doing this by default?

I agree: this seems like a terrible idea.  Would we still have some
way of forcing the older checkpoint record to be used if somebody
wants to try to do that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-03 Thread Michael Paquier

On Tue, Feb 2, 2016 at 1:42 PM, Amit Kapila  wrote:
> On Sat, Jan 30, 2016 at 7:38 PM, Michael Paquier 
> wrote:
>>
>> On Fri, Jan 29, 2016 at 9:13 PM, Michael Paquier wrote:
>> > Well, to put it short, I am just trying to find a way to make the
>> > backend skip unnecessary checkpoints on an idle system, which results
>> > in the following WAL pattern if system is completely idle:
>> > CHECKPOINT_ONLINE
>> > RUNNING_XACTS
>> > RUNNING_XACTS
>> > [etc..]
>> >
>> > The thing is that I am lost with the meaning of this condition to
>> > decide if a checkpoint should be skipped or not:
>> > if (prevPtr == ControlFile->checkPointCopy.redo &&
>> > prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
>> > {
>> > WALInsertLockRelease();
>> > LWLockRelease(CheckpointLock);
>> > return;
>> > }
>> > As at least one standby snapshot is logged before the checkpoint
>> > record, the redo position is never going to match the previous insert
>> > LSN, so checkpoints will never be skipped if wal_level >= hot_standby.
>> > Skipping such unnecessary checkpoints is what you would like to
>> > address, no? Because that's what I would like to do here first. And
>> > once we got that right, we could think about addressing the case where
>> > WAL segments are forcibly archived for idle systems.
>>
>> I have put a bit more of brain power into that, and finished with the
>> patch attached. A new field called chkpProgressLSN is attached to
>> XLogCtl, which is to the current insert position of the checkpoint
>> when wal_level <= archive, or the LSN position of the standby snapshot
>> taken after a checkpoint. The bgwriter code is modified as well so as
>> it uses this progress LSN and compares it with the current insert LSN
>> to determine if a standby snapshot should be logged or not. On an
>> instance of Postgres completely idle, no useless checkpoints, and no
>> useless standby snapshots are generated anymore.
>>
>
>
> @@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
>   if ((flags & (CHECKPOINT_IS_SHUTDOWN |
> CHECKPOINT_END_OF_RECOVERY |
>CHECKPOINT_FORCE)) == 0)
>   {
> - if
> (prevPtr == ControlFile->checkPointCopy.redo &&
> + if (GetProgressRecPtr() == prevPtr &&
>
> prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
>   {
>
> I think such a check won't consider all the WAL-activity happened
> during checkpoint (after checkpoint start, till it finishes) which was
> not the intention of this code without your patch.

Yes, that's true, in v2 this progress LSN can be updated in two ways:
- At the position where checkpoint has begun
- At the position where standby snapshot was taken after a checkpoint has run.
Honestly, even if we log the snapshot for example before
CheckpointGuts(), that's not going to make it... The main issue here
is that there will be a snapshot after this checkpoint, so the
existing condition is a rather broken concept already when wal_level
>= hot_standby because the redo pointer is never going to match the
previous LSN pointer. Another idea would be to ensure that the
snapshot is logged just after the redo pointer, this would be
reliable.

> I think both this and previous patch (hs-checkpoints-v1) approach
> won't fix the issue in all kind of scenario's.
>
> Let me try to explain what I think should fix this issue based on
> discussion above, suggestions by Andres and some of my own
> thoughts:
>
> Have a new variable in WALInsertLock that would indicate the last
> insertion position (last_insert_pos) we write to after holding that lock.
> Ensure that we don't update last_insert_pos while logging standby
> snapshot (simple way is to pass a flag in XLogInsert or distinguish
> it based on type of record (XLOG_RUNNING_XACTS) or if you can
> think of a more smarter way).

Simplest way is in XLogInsertRecord, the record data is available
there, there is for example some logic related to segment switches.

> Now both during checkpoint and
> in bgwriter, to find out whether there is any activity since last
> time, we need to check all the WALInsertLocks for latest insert position
> (by referring last_insert_pos) and compare it with the latest position
> we have noted during last such check and decide whether to proceed
> or not based on comparison result.  If you think it is not adequate to
> store last_insert_pos in WALInsertLock, then we can think of having
> it in PGPROC.

So the progress check is used when deciding if a checkpoint should be
skipped or not, and when deciding if a standby snapshot should be
taken by the bgwriter? When bgwriter logs a snapshot, it will update
as well the last LSN found (which would be a single variable in for
example XLogCtlData, updated from the data taken from the WAL insert
slots). But there is a problem here: if there is no activity since the
last bgwriter snapshot, the next checkpoint would be skipped as well.
It seems to me that this is not acceptable, a checkpoint generation
would be decided

Re: [HACKERS] Re: BUG #13685: Archiving while idle every archive_timeout with wal_level hot_standby

2016-02-03 Thread Andres Freund

On 2016-02-02 10:12:25 +0530, Amit Kapila wrote:
> @@ -8239,7 +8262,7 @@ CreateCheckPoint(int flags)
>   if ((flags & (CHECKPOINT_IS_SHUTDOWN |
> CHECKPOINT_END_OF_RECOVERY |
>CHECKPOINT_FORCE)) == 0)
>   {
> - if
> (prevPtr == ControlFile->checkPointCopy.redo &&
> + if (GetProgressRecPtr() == prevPtr &&
> 
> prevPtr / XLOG_SEG_SIZE == curInsert / XLOG_SEG_SIZE)
>   {
> 
> I think such a check won't consider all the WAL-activity happened
> during checkpoint (after checkpoint start, till it finishes) which was
> not the intention of this code without your patch.

Precisely.

> I think both this and previous patch (hs-checkpoints-v1 ) approach
> won't fix the issue in all kind of scenario's.

Agreed.

> Let me try to explain what I think should fix this issue based on
> discussion above, suggestions by Andres and some of my own
> thoughts:

> Have a new variable in WALInsertLock that would indicate the last
> insertion position (last_insert_pos) we write to after holding that lock.
> Ensure that we don't update last_insert_pos while logging standby
> snapshot (simple way is to pass a flag in XLogInsert or distinguish
> it based on type of record (XLOG_RUNNING_XACTS) or if you can
> think of a more smarter way).  Now both during checkpoint and
> in bgwriter, to find out whether there is any activity since last
> time, we need to check all the WALInsertLocks for latest insert position
> (by referring last_insert_pos) and compare it with the latest position
> we have noted during last such check and decide whether to proceed
> or not based on comparison result.  If you think it is not adequate to
> store last_insert_pos in WALInsertLock, then we can think of having
> it in PGPROC.

Yes, that's pretty much what I was thinking of.

> Yet another idea that occurs to me this morning is that why not
> have a variable in shared memory in XLogCtlInsert or XLogCtl
> similar to CurrBytePos/PrevBytePos which will be updated on
> each XLOGInsert apart from the XLOGInsert for standby snapshots
> and use that in a patch somewhat close to what you have in
> hs-checkpoints-v1.

That'll require holding locks longer...

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [POC] FETCH limited by bytes.

2016-02-03 Thread Robert Haas

On Tue, Feb 2, 2016 at 10:42 PM, Corey Huinker  wrote:
>> I don't see how.  There really is no declaration in there for a
>> variable called server.
>
> Absolutely correct. My only guess is that it was from failing to make clean
> after a checkout/re-checkout. A good reason to have even boring regression
> tests.
>
> Regression tests added.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Re-Writes

2016-02-03 Thread Robert Haas

On Wed, Feb 3, 2016 at 7:28 AM, Amit Kapila  wrote:
> On further testing, it has been observed that misaligned writes could
> cause reads even when blocks related to file are not in-memory, so
> I think what Jan is describing is right.  The case where there is
> absolutely zero chance of reads is when we write in OS-page boundary
> which is generally 4K.  However I still think it is okay to provide an
> option for  WAL writing in smaller chunks (512 bytes , 1024 bytes, etc)
> for the cases when these are beneficial like when wal_level is
> greater than equal to Archive and keep default as OS-page size if
> the same is smaller than 8K.

Hmm, a little research seems to suggest that 4kB pages are standard on
almost every system we might care about: x86_64, x86, Power, Itanium,
ARMv7.  Sparc uses 8kB, though, and a search through the Linux kernel
sources (grep for PAGE_SHIFT) suggests that there are other obscure
architectures that can at least optionally use larger pages, plus a
few that can use smaller ones.

I'd like this to be something that users don't have to configure, and
it seems like that should be possible.  We can detect the page size on
non-Windows systems using sysctl(_SC_PAGESIZE), and on Windows by
using GetSystemInfo.  And I think it's safe to make this decision at
configure time, because the page size is a function of the hardware
architecture (it seems there are obscure systems that support multiple
page sizes, but I don't care about them particularly).  So what I
think we should do is set an XLOG_WRITESZ along with XLOG_BLCKSZ and
set it to the smaller of XLOG_BLCKSZ and the system page size.  If we
can't determine the system page size, assume 4kB.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL Re-Writes

2016-02-03 Thread Amit Kapila

On Wed, Feb 3, 2016 at 11:12 AM, Amit Kapila 
wrote:
>
> On Mon, Feb 1, 2016 at 8:05 PM, Jim Nasby 
wrote:
>>
>> On 1/31/16 3:26 PM, Jan Wieck wrote:
>>>
>>> On 01/27/2016 08:30 AM, Amit Kapila wrote:

 operation.  Now why OS couldn't find the corresponding block in
 memory is that, while closing the WAL file, we use
 POSIX_FADV_DONTNEED if wal_level is less than 'archive' which
 lead to this problem.  So with this experiment, the conclusion is that
 though we can avoid re-write of WAL data by doing exact writes, but
 it could lead to significant reduction in TPS.
>>>
>>>
>>> POSIX_FADV_DONTNEED isn't the only way how those blocks would vanish
>>> from OS buffers. If I am not mistaken we recycle WAL segments in a round
>>> robin fashion. In a properly configured system, where the reason for a
>>> checkpoint is usually "time" rather than "xlog", a recycled WAL file
>>> written to had been closed and not touched for about a complete
>>> checkpoint_timeout or longer. You must have a really big amount of spare
>>> RAM in the machine to still find those blocks in memory. Basically we
>>> are talking about the active portion of your database, shared buffers,
>>> the sum of all process local memory and the complete pg_xlog directory
>>> content fitting into RAM.
>
>
>
> I think that could only be problem if reads were happening at write or
> fsync call, but that is not the case here.  Further investigation on this
> point reveals that the reads are not for fsync operation, rather they
> happen when we call posix_fadvise(,,POSIX_FADV_DONTNEED).
> Although this behaviour (writing in non-OS-page-cache-size chunks could
> lead to reads if followed by a call to posix_fadvise
> (,,POSIX_FADV_DONTNEED)) is not very clearly documented, but the
> reason for the same is that fadvise() call maps the specified data range
> (which in our case is whole file) into the list of pages and then
invalidate
> them which will further lead to removing them from OS cache, now any
> misaligned (w.r.t OS page-size) writes done during writing/fsyncing to
file
> could cause additional reads as everything written by us will not be on
> OS-page-boundary.
>

On further testing, it has been observed that misaligned writes could
cause reads even when blocks related to file are not in-memory, so
I think what Jan is describing is right.  The case where there is
absolutely zero chance of reads is when we write in OS-page boundary
which is generally 4K.  However I still think it is okay to provide an
option for  WAL writing in smaller chunks (512 bytes , 1024 bytes, etc)
for the cases when these are beneficial like when wal_level is
greater than equal to Archive and keep default as OS-page size if
the same is smaller than 8K.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-02-03 Thread Thomas Munro

On Wed, Feb 3, 2016 at 10:59 PM, Amit Langote
 wrote:
> There seems to be a copy-pasto there - shouldn't that be:
>
> + if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < MyWalSnd->flush)

Indeed, thanks!  New patch attached.

-- 
Thomas Munro
http://www.enterprisedb.com


causal-reads-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-02-03 Thread Tomas Vondra


On 02/03/2016 06:46 AM, Noah Misch wrote:

On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:

On 12/22/2015 03:49 PM, Noah Misch wrote:

...

If the collector entered pgstat_write_statsfile() with more
inquiries waiting in its socket receive buffer, it would ignore
them as being too old once it finished the write and resumed
message processing. Commit 187492b converted last_statrequest to
a "last_statrequests" list that we wipe after each write.


So essentially we remove the list of requests, and thus on the next
round we don't know the timestamp of the last request and write the
file again unnecessarily. Do I get that right?


Essentially right. Specifically, for each database, we must remember
the globalStats.stats_timestamp of the most recent write. It could be
okay to forget the last request timestamp. (I now doubt I picked the
best lines to quote, above.)


What if we instead kept the list but marked the requests as
'invalid' so that we know the timestamp? In that case we'd be able
to do pretty much exactly what the original code did (but at per-db
granularity).


The most natural translation of the old code would be to add a
write_time field to struct DBWriteRequest. One can infer "invalid"
from write_time and request_time. There are other reasonable designs,
though.


OK, makes sense. I'll look into that.




We'd have to cleanup the list once in a while not to grow
excessively large, but something like removing entries older than
PGSTAT_STAT_INTERVAL should be enough.


Specifically, if you assume the socket delivers messages in the order
sent, you may as well discard entries having write_time at least
PGSTAT_STAT_INTERVAL older than the most recent cutoff_time seen in a
PgStat_MsgInquiry. That delivery order assumption does not hold in
general, but I expect it's close enough for this purpose.


Agreed. If I get that right, it might result in some false negatives (in 
the sense that we'll remove a record too early, forcing us to write the 
database file again). But I expect that to be a rare case.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-02-03 Thread Etsuro Fujita


On 2016/01/28 15:20, Rushabh Lathia wrote:

On Thu, Jan 28, 2016 at 11:33 AM, Etsuro Fujita
mailto:fujita.ets...@lab.ntt.co.jp>> wrote:

On 2016/01/27 21:23, Rushabh Lathia wrote:

If I understood correctly, above documentation means, that if
FDW have
DMLPushdown APIs that is enough. But in reality thats not the
case, we
need  ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
in case
DML is not pushable.

And here fact is DMLPushdown APIs are optional for FDW, so that
if FDW
don't have DMLPushdown APIs they can still very well perform the DML
operations using ExecForeignInsert, ExecForeignUpdate, or
ExecForeignDelete.



So documentation should be like:

If the IsForeignRelUpdatable pointer is set to NULL, foreign
tables are
assumed to be insertable, updatable, or deletable if the FDW
provides
ExecForeignInsert, ExecForeignUpdate, or ExecForeignDelete
respectively,

If FDW provides DMLPushdown APIs and the DML are pushable to the
foreign
server, then FDW still needs ExecForeignInsert,
ExecForeignUpdate, or
ExecForeignDelete for the non-pushable DML operation.

What's your opinion ?



I agree that we should add this to the documentation, too.


I added docs to the IsForeignRelUpdatable documentation.  Also, a brief 
introductory remark has been added at the beginning of the DML pushdown 
APIs' documentation.



BTW, if I understand correctly, I think we should also modify
relation_is_updatabale() accordingly.  Am I right?



Yep, we need to modify relation_is_updatable().


I thought I'd modify that function in the same way as 
CheckValidResultRel(), but I noticed that we cannot do that, because we 
don't have any information on whether each update is pushed down to the 
remote server by PlanDMLPushdown, during relation_is_updatabale().  So, 
I left that function as-is.  relation_is_updatabale() is just used for 
display in the information_schema views, so ISTM that that function is 
fine as-is.  (As for CheckValidResultRel(), I revised it so as to check 
the presence of DML pushdown APIs after checking the existing APIs if 
the given command will be pushed down.  The reason is because we assume 
the presence of the existing APIs, anyway.)


I revised other docs and some comments, mostly for consistency.

Attached is an updated version of the patch, which has been created on 
top of the updated version of the bugfix patch posted by Robert in [1] 
(attached).


Best regards,
Etsuro Fujita

[1] 
http://www.postgresql.org/message-id/ca+tgmoz40j2uc5ac1nxu03oj4crvolks15xx+ptfp-1u-8z...@mail.gmail.com
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index df3d1ee..d778e61 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -112,6 +112,7 @@ static void deparseTargetList(StringInfo buf,
   PlannerInfo *root,
   Index rtindex,
   Relation rel,
+  bool is_returning,
   Bitmapset *attrs_used,
   List **retrieved_attrs);
 static void deparseReturningList(StringInfo buf, PlannerInfo *root,
@@ -776,7 +777,7 @@ deparseSelectSql(Bitmapset *attrs_used, List **retrieved_attrs,
 	 * Construct SELECT list
 	 */
 	appendStringInfoString(buf, "SELECT ");
-	deparseTargetList(buf, root, foreignrel->relid, rel, attrs_used,
+	deparseTargetList(buf, root, foreignrel->relid, rel, false, attrs_used,
 	  retrieved_attrs);
 
 	/*
@@ -790,7 +791,8 @@ deparseSelectSql(Bitmapset *attrs_used, List **retrieved_attrs,
 
 /*
  * Emit a target list that retrieves the columns specified in attrs_used.
- * This is used for both SELECT and RETURNING targetlists.
+ * This is used for both SELECT and RETURNING targetlists; the is_returning
+ * parameter is true only for a RETURNING targetlist.
  *
  * The tlist text is appended to buf, and we also create an integer List
  * of the columns being retrieved, which is returned to *retrieved_attrs.
@@ -800,6 +802,7 @@ deparseTargetList(StringInfo buf,
   PlannerInfo *root,
   Index rtindex,
   Relation rel,
+  bool is_returning,
   Bitmapset *attrs_used,
   List **retrieved_attrs)
 {
@@ -829,6 +832,8 @@ deparseTargetList(StringInfo buf,
 		{
 			if (!first)
 appendStringInfoString(buf, ", ");
+			else if (is_returning)
+appendStringInfoString(buf, " RETURNING ");
 			first = false;
 
 			deparseColumnRef(buf, rtindex, i, root);
@@ -846,6 +851,8 @@ deparseTargetList(StringInfo buf,
 	{
 		if (!first)
 			appendStringInfoString(buf, ", ");
+		else if (is_returning)
+			appendStringInfoString(buf, " RETURNING ");
 		first = false;
 
 		appendStringInfoString(buf, "ctid");
@@ -855,7 +862,7 @@ deparseTargetList(StringInfo buf,
 	}
 
 	/* Don't generate bad syntax if no undropped columns */
-	if (first)
+	if (first && !is_returning)
 		appendStringInfoString(buf, "NULL");

Re: [HACKERS] Proposal: "Causal reads" mode for load balancing reads without stale data

2016-02-03 Thread Amit Langote


Hi Thomas,

On 2016/01/20 13:12, Thomas Munro wrote:
> That one conflicts with b1a9bad9e744857291c7d5516080527da8219854, so
> here is a new version.

-if (walsndctl->lsn[SYNC_REP_WAIT_WRITE] < MyWalSnd->write)
+if (is_highest_priority_sync_standby)

[ ... ]

-if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < MyWalSnd->flush)
-{
-walsndctl->lsn[SYNC_REP_WAIT_FLUSH] = MyWalSnd->flush;
-numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH);

[ ... ]

+if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < MyWalSnd->write)
+{
+walsndctl->lsn[SYNC_REP_WAIT_FLUSH] = MyWalSnd->flush;
+numflush = SyncRepWakeQueue(false, SYNC_REP_WAIT_FLUSH,
+MyWalSnd->flush);

There seems to be a copy-pasto there - shouldn't that be:

+ if (walsndctl->lsn[SYNC_REP_WAIT_FLUSH] < MyWalSnd->flush)

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Development with Eclipse - Wrong error messages in IDE

2016-02-03 Thread Peter Moser


Good morning hackers,
I have some strange error message inside Eclipse, that some symbols 
cannot be found. I work with version 9.6 currently. For instance,


Symbol 'RM_HEAP_ID' could not be resolved
src/backend/access/heap/heapam.c

It affects all occurrences of symbols that are defined in
src/include/access/rmgrlist.h. Eclipse just says "Syntax error" here.

However, the source code compiles and runs without any compile-time 
error or warning. It is just an IDE problem I think, but it distracts me 
from finding real bugs.


Does anyone had similar problems? Do I have to configure Eclipse to 
understand the PG_RMGR macro or is there another possibility to teach 
Eclipse these macros?


Thanks for any help or comment.

Best regards,
Peter


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minor typo in syncrep.c

2016-02-03 Thread Amit Langote

On 2016/02/03 17:50, Amit Langote wrote:
> Attached patch removes an extraneous word in the comment above

I kept reading and found one place in a comment within the function where
a word is most probably missing,  Attached fixes it.

 /*
  * If this WALSender is serving a standby that is not on the list of
- * potential standbys then we have nothing to do. If we are still
starting
- * up, still running base backup or the current flush position is still
- * invalid, then leave quickly also.
+ * potential sync standbys then we have nothing to do. If we are still
+ * starting up, still running base backup or the current flush position
+ * is still invalid, then leave quickly also.

Thanks,
Amit
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 7f85b88..c44161b 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -419,9 +419,9 @@ SyncRepReleaseWaiters(void)
 
 	/*
 	 * If this WALSender is serving a standby that is not on the list of
-	 * potential standbys then we have nothing to do. If we are still starting
-	 * up, still running base backup or the current flush position is still
-	 * invalid, then leave quickly also.
+	 * potential sync standbys then we have nothing to do. If we are still
+	 * starting up, still running base backup or the current flush position
+	 * is still invalid, then leave quickly also.
 	 */
 	if (MyWalSnd->sync_standby_priority == 0 ||
 		MyWalSnd->state < WALSNDSTATE_STREAMING ||

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Minor typo in syncrep.c

2016-02-03 Thread Amit Langote

Attached patch removes an extraneous word in the comment above
SyncRepReleaseWaiters() in syncrep.c

- * Other policies are possible, which would change what we do here and what
+ * Other policies are possible, which would change what we do here and
  * perhaps also which information we store as well.

Thanks,
Amit
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 7f85b88..d945c71 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -406,7 +406,7 @@ SyncRepGetSynchronousStandby(void)
  * Update the LSNs on each queue based upon our latest state. This
  * implements a simple policy of first-valid-standby-releases-waiter.
  *
- * Other policies are possible, which would change what we do here and what
+ * Other policies are possible, which would change what we do here and
  * perhaps also which information we store as well.
  */
 void

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PostgreSQL Auditing

2016-02-03 Thread Joshua D. Drake


On 02/02/2016 07:26 PM, Robert Haas wrote:

On Tue, Feb 2, 2016 at 8:25 PM, Curtis Ruck
 wrote:

Additionally Robert, given your professional status, you are by no means an
unbiased contributor in this discussion.  Your stance on this matter shows
that you don't necessarily want the open source solution to succeed in the
commercial/compliance required space.  Instead of arguing blankly against
inclusion can you at least provide actionable based feedback that if met
would allow patches of this magnitude in?


I am *the* first person to criticise EDB. There isn't a person in EDB 
that would find that surprising.


That said, Robert, Dave, and all the other contributors that work for 
EDB are stand up people. They deserve our respect. Period.


I appreciate the thought process here but if you want any kind of 
legitimacy in this community I suggest you do your homework.


JD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

73 matches

Mail list logo