date:20150510

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

2015-05-10 Thread Amit Kapila

On Sun, May 10, 2015 at 6:01 AM, Andrew Dunstan  wrote:
>
>
>
> This generally looks good, but I have a couple of questions before I
commit it.
>
> First, why is the new option for the  BASE_BACKUP command of the
Streaming Replication protcol "TAR"? It seems rather misleading. Shouldn't
it be something like "TABLESPACEMAP"?
>

The reason to keep new option's name as TAR was that tablespace_map
was generated for that format type, but I agree with you that something
like "TABLESPACEMAP" suits better, so I have changed it to
"TABLESPACE_MAP".  Putting '_' in name makes it somewhat consistent
with other names and filename it generates with this new option.

> Second, these lines in xlog.c seem wrong:
>
> else if ((ch == '\n' || ch == '\r') && prev_ch == '\\')
> str[i-1] = '\n';
>
> It looks to me like we should be putting ch in the string, not
arbitrarily transforming \r into \n.
>

You are right, I have changed it as per your suggestion.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

extend_basebackup_to_include_symlink_v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multixacts woes

2015-05-10 Thread Noah Misch

On Sun, May 10, 2015 at 09:17:58PM -0400, Robert Haas wrote:
> On Sun, May 10, 2015 at 1:40 PM, Noah Misch  wrote:
> > I don't know whether this deserves prompt remediation, but if it does, I 
> > would
> > look no further than the hard-coded 25% figure.  We permit users to operate
> > close to XID wraparound design limits.  GUC maximums force an 
> > anti-wraparound
> > vacuum at no later than 93.1% of design capacity.  XID assignment warns at
> > 99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
> > pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
> > 50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
> > bulkiest mandatory cushion yet, an anti-wraparound vacuum when
> > pg_multixact/members is just 25% full.
> 
> That's certainly one possible approach.  I had discounted it because
> you can't really get more than a small multiple out of it, but getting
> 2-3x more room might indeed be enough to help some people quite a bit.
> Just raising the threshold from 25% to say 40% would buy back a
> healthy amount.

Right.  It's fair to assume that the new VACUUM burden would be discountable
at a 90+% threshold, because the installations that could possibly find it
expensive are precisely those experiencing corruption today.  These reports
took eighteen months to appear, whereas some corruption originating in commit
0ac5ad5 saw reports within three months.  Therefore, sites burning
pg_multixact/members proportionally faster than both pg_multixact/offsets and
XIDs must be unusual.  Bottom line: if we do need to reduce VACUUM burden
caused by the commits you cited upthread, we almost certainly don't need more
than a 4x improvement.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Kouhei Kaigai

> On 2015-05-10 21:26:26 -0400, Robert Haas wrote:
> > On Sun, May 10, 2015 at 8:41 PM, Tom Lane  wrote:
> > > > This commit reverts create_plan_recurse() as static function.
> > > Yes.  I am not convinced that external callers should be calling that,
> > > and would prefer not to enlarge createplan.c's API footprint without a
> > > demonstration that this is right and useful.  (This is one of many
> > > ways in which this patch is suffering from having gotten committed
> > > without submitted use-cases.)
> 
> Wasn't there a submitted use case? IIRC Kaigai had referenced some
> pg-strom (?) code using it?
> 
> I'm failing to see how create_plan_recurse() being exposed externally is
> related to "having gotten committed without submitted use-cases".  Even
> if submitted, presumably as simple as possible code, doesn't use it,
> that's not a proof that less simple code does not need it.
>
Yes, PG-Strom code uses create_plan_recurse() to construct child plan
node of the GPU accelerated custom-join logic, once it got chosen.
Here is nothing special. It calls create_plan_recurse() as built-in
join path doing on the underlying inner/outer paths. 
It is not difficult to submit as a working example, however, its total
code size (excludes GPU code) is 25KL at this moment.

I'm not certain whether it is a simple example.

> > Your unwillingness to make functions global or to stick PGDLLIMPORT
> > markings on variables that people want access to is hugely
> > handicapping extension authors.  Many people have complained about
> > that on multiple occasions.  Frankly, I find it obstructionist and
> > petty.
> 
> While I don't find the tone of the characterization super helpful, I do
> tend to agree that we're *far* too conservative on that end.  I've now
> seen a significant number of extension that copied large swathes of code
> just to cope with individual functions not being available.  And even
> cases where that lead to minor forks with such details changed.
>
I may have to join the members?

> I know that I'm "fearful" of asking for functions being made
> public. Because it'll invariably get into a discussion of merits that's
> completely out of proportion with the size of the change.  And if I, who
> has been on the list for a while now, am "afraid" in that way, you can
> be sure that others won't even dare to ask, lest argue their way
> through.
> 
> I think the problem is that during development the default often is to
> create function as static if they're used only in one file. Which is
> fine. But it really doesn't work if it's a larger battle to change
> single incidences.  Besides the pain of having to wait for the next
> major release...
> 
Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] feature freeze and beta schedule

2015-05-10 Thread Andres Freund

On 2015-05-01 18:37:23 +0200, Andres Freund wrote:
> * Multivariate statistics
>   This is not intended to be committed this CF.
>   => I'd like to mark this as returned with (little) feedback.
>
> * Avoiding plan disasters with LIMIT
>   I'm not enthused by the approach, it's disabled by default though. So
>   it might not be too bad to just do it. Would probably have been a good
>   idea to discuss the patch in a separate thread.
>   => ?
>
> * Turning off HOT for larger SQL queries
>   Seems to have degenerated into a discussion of not really related
>   things. I personally would vote for committing something close to what
>   Simon proposed last *directly at the beginning* of the next cycle.
>   => Move?

> * Unique Joins
>   This seems to require more work and came in pretty late
>   => Returned with feedback.
>
> * INNER JOIN removals
>   Seem far to controversial to consider comitting in 9.5.
>   => Returned (or even rejected :()

> * Async execution of postgres_fdw.
>   Later iterations of the patch haven't gotten much review yet. The
>   original version of the patch is just from 2014-12-15.
>   => Should imo be moved to the next CF

> 
> * Allow "snapshot too old" error, to prevent bloat
>   
> http://archives.postgresql.org/message-id/1361166406.1897609.1424371443904.JavaMail.yahoo%40mail.yahoo.com
>   talked about a new version that afaics never materialized
>   => Returned with feedback

> * Parallel Seq scan
>   In my opinion the topic has progressed greatly. But at the same time
>   it doesn't seem like it's in a state we should consider for 9.5.
>   => Return?

> * logical column ordering (WIP)
>   This pretty clearly isn't 9.5 material.
>   => Return

> * Support ORDER BY in CREATE FUNCTION for Set Returning Functions
>   Uhm. I think the outcome of the discussion so far wasn't really
>   favorable to the idea s proposed.
>   => Rejected

Marked as such.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Andres Freund

On 2015-05-10 22:51:33 -0400, Robert Haas wrote:
> > And there's definitely some things
> > around that pretty much only still exist because changing them would
> > break too much stuff.
> 
> Such as what?

Without even thinking about it:
* linitial vs lfirst vs lnext. That thing still induces an impedance
  mismatch when reading code for me, and I believe a good number of
  other people.
* Two 'string buffer' APIs with essentially only minor differences.
* A whole bunch of libpq APIs. Admittedly that's a bit more exposed than
  lots of backend only things.
* The whole V0 calling convention that makes it so much easier to get
  odd crashes.

Admittedly that's all I could come up without having to think. But I do
vaguely remember a lot of things we did not do because of bwcompat
concerns.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas

On Sun, May 10, 2015 at 10:37 PM, Andres Freund  wrote:
> On 2015-05-10 21:53:45 -0400, Robert Haas wrote:
>> Please name EVEN ONE instance in which core development has been
>> prevented for fear of changing a function API.
>
> Even *moving* function declarations to a different file has been laudly
> and repeatedly complained about...

Moving declarations is a lot more likely to break compiles than adding
declarations.  But even the 9.3 header file reorganizations, which
broke enough compiles to be annoying, were only annoying, not a
serious problem for anyone.  I doubted whether that stuff was worth
changing, but that's just because I don't really get excited about
partial recompiles.

> And there's definitely some things
> around that pretty much only still exist because changing them would
> break too much stuff.

Such as what?

> But.
>
> I don't think that's a reason to not expose more functions
> externally. Because the usual consequence of not exposing them is that
> either ugly workarounds will be found, or code will just copy pasted
> around. That's not in any way better, and much likely to be worse.

Yes.

> I'm not saying that we shouldn't use judgement, but I do think that the
> current approach ridicules our vaunted extensibility in many cases.

Double yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Async execution of postgres_fdw.

2015-05-10 Thread Kyotaro HORIGUCHI

Hello.

> I've gone ahead and marked this as Rejected.  The concept of async
> execution of postgres_fdw is certainly still open and a worthwhile goal,
> but this implementation isn't the way to achieve that.

It sounds fair. I'm satisfied that we have agreed that the goal
is worthwhile. I'll show other implementations sooner.

Thank you.

> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Stephen Frost  writes:
> > > I'm all for improving performance of postgres_fdw and would like to see
> > > us support sending queries off to be worked asyncronously, but starting
> > > execution on the remote server during ExecInitNode is against the
> > > documentated FDW API spec.  I discussed exactly this issue over a year
> > > ago here:
> > 
> > > http://www.postgresql.org/message-id/20131104032604.gb2...@tamriel.snowman.net
> > 
> > > Sadly, there weren't any direct responses to that email, but I do recall
> > > having a discussion on another thread (or in person?) with Tom where we
> > > ended up agreeing that we can't simply remove that requirement from the
> > > docs or the API.
> > 
> > Yeah.  There are at least a couple of reasons why not:
> 
> Thanks for the reminders of those.
> 
> > Also, so far as a quick review of the actual patch goes, I would really
> > like to see this lose the "PFC" wrapper layer, which accounts for 95% of
> > the code churn in the patch and doesn't seem to add any actual value.
> > What it does add is unchecked malloc failure conditions.
> 
> Agreed, the wrapper isn't doing anything particularly useful; I had
> started out thinking that would be my first comment until it became
> clear where all the performance improvement was coming from.
> 
> I've gone ahead and marked this as Rejected.  The concept of async
> execution of postgres_fdw is certainly still open and a worthwhile goal,
> but this implementation isn't the way to achieve that.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Andres Freund

On 2015-05-10 21:53:45 -0400, Robert Haas wrote:
> Please name EVEN ONE instance in which core development has been
> prevented for fear of changing a function API.

Even *moving* function declarations to a different file has been laudly
and repeatedly complained about... And there's definitely some things
around that pretty much only still exist because changing them would
break too much stuff.

But.

I don't think that's a reason to not expose more functions
externally. Because the usual consequence of not exposing them is that
either ugly workarounds will be found, or code will just copy pasted
around. That's not in any way better, and much likely to be worse.

I'm not saying that we shouldn't use judgement, but I do think that the
current approach ridicules our vaunted extensibility in many cases.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Andres Freund

On 2015-05-10 21:26:26 -0400, Robert Haas wrote:
> On Sun, May 10, 2015 at 8:41 PM, Tom Lane  wrote:
> > > This commit reverts create_plan_recurse() as static function.
> > Yes.  I am not convinced that external callers should be calling that,
> > and would prefer not to enlarge createplan.c's API footprint without a
> > demonstration that this is right and useful.  (This is one of many
> > ways in which this patch is suffering from having gotten committed
> > without submitted use-cases.)

Wasn't there a submitted use case? IIRC Kaigai had referenced some
pg-strom (?) code using it?

I'm failing to see how create_plan_recurse() being exposed externally is
related to "having gotten committed without submitted use-cases".  Even
if submitted, presumably as simple as possible code, doesn't use it,
that's not a proof that less simple code does not need it.

> Your unwillingness to make functions global or to stick PGDLLIMPORT
> markings on variables that people want access to is hugely
> handicapping extension authors.  Many people have complained about
> that on multiple occasions.  Frankly, I find it obstructionist and
> petty.

While I don't find the tone of the characterization super helpful, I do
tend to agree that we're *far* too conservative on that end.  I've now
seen a significant number of extension that copied large swathes of code
just to cope with individual functions not being available.  And even
cases where that lead to minor forks with such details changed.

I know that I'm "fearful" of asking for functions being made
public. Because it'll invariably get into a discussion of merits that's
completely out of proportion with the size of the change.  And if I, who
has been on the list for a while now, am "afraid" in that way, you can
be sure that others won't even dare to ask, lest argue their way
through.

I think the problem is that during development the default often is to
create function as static if they're used only in one file. Which is
fine. But it really doesn't work if it's a larger battle to change
single incidences.  Besides the pain of having to wait for the next
major release...

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas

On Sun, May 10, 2015 at 9:34 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> Your unwillingness to make functions global or to stick PGDLLIMPORT
>> markings on variables that people want access to is hugely
>> handicapping extension authors.  Many people have complained about
>> that on multiple occasions.  Frankly, I find it obstructionist and
>> petty.
>
> Sure, we could export every last static function in the core code,
> and extension authors would rejoice ... while development on the core
> code basically stops for fear of breaking extensions.  It's important
> not to export things that we don't have to, especially when doing so
> is really just a quick-n-dirty substitute for doing things properly.

Please name EVEN ONE instance in which core development has been
prevented for fear of changing a function API.  Sure, we take those
things into consideration, like trying to ensure that there will be
type conflicts until people update their code, but I cannot recall a
single instance in six and a half years of working on this project
where that's been a real problem.  I think this concern is entirely
hypothetical.  Besides, no one has ever proposed making every static
function public.  It's been proposed a handful of times for limited
classes of functions - in this case ONE - and you've fought it every
time despite clear consensus on the other side.  I find that highly
regrettable and I'm very sure I'm not the only one.

I notice that you carefully didn't answer the other part of my
question: what gives you the right to revert my commits without
discussion or consensus, and do I have an equal right to change it
back?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] "unaddressable bytes" in BRIN

2015-05-10 Thread Alvaro Herrera

Andres Freund just forwarded me a valgrind error report that Peter
Geoghegan noticed:

==29892== Unaddressable byte(s) found during client check request
==29892==at 0x7D1317: PageAddItem (bufpage.c:314)
==29892==by 0x468106: brin_doinsert (brin_pageops.c:315)
==29892==by 0x4671A5: form_and_insert_tuple (brin.c:1178)
==29892==by 0x466006: brinbuildCallback (brin.c:596)
==29892==by 0x53F6E4: IndexBuildHeapRangeScan (index.c:2548)
==29892==by 0x53EC19: IndexBuildHeapScan (index.c:2161)
==29892==by 0x466443: brinbuild (brin.c:694)
==29892==by 0x92F09F: OidFunctionCall3Coll (fmgr.c:1649)
==29892==by 0x53E924: index_build (index.c:2024)
==29892==by 0x53D5FC: index_create (index.c:1099)
==29892==by 0x60B3B7: DefineIndex (indexcmds.c:605)
==29892==by 0x7E2142: ProcessUtilitySlow (utility.c:1203)
==29892==  Address 0xccffd86 is 5,270 bytes inside a block of size 8,192 alloc'd
==29892==at 0x4C2AB80: malloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29892==by 0x950425: AllocSetAlloc (aset.c:847)
==29892==by 0x9530A5: palloc (mcxt.c:821)
==29892==by 0x4C6A32: heap_tuple_untoast_attr (tuptoaster.c:215)
==29892==by 0x9302F2: pg_detoast_datum (fmgr.c:2238)
==29892==by 0x8794C8: numeric_lt (numeric.c:2060)
==29892==by 0x92E211: FunctionCall2Coll (fmgr.c:1323)
==29892==by 0x46C441: brin_minmax_add_value (brin_minmax.c:113)
==29892==by 0x92E49C: FunctionCall4Coll (fmgr.c:1375)
==29892==by 0x466108: brinbuildCallback (brin.c:618)
==29892==by 0x53F6E4: IndexBuildHeapRangeScan (index.c:2548)
==29892==by 0x53EC19: IndexBuildHeapScan (index.c:2161)

What I think this means is that during an index build
brin_minmax_add_value() called numeric_lt() which detoasted one of its
input values; later, brin_doinsert() inserts a tuple containing the
value, but it tries to use more bytes than were allocated.  I haven't
had time to actually study what is going on here, but wanted to archive
this publicly.  (Value detoasting evidently plays a role here, but I
don't know how.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Andres Freund

On 2015-05-10 21:09:14 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I'm not sure what exactly to use as a performance benchmark
> > here. For now I chose
> > SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, 
> > generate_series(1, 1000) repeat(i);
> > that'll hit array_out, which uses iterators.
> 
> Hmm, probably those results are swamped by I/O functions though.

I did check with a quick profile, and the iteration itself is a
significant part of the total execution time.

> I'd suggest trying something that exercises array_map(), which
> it looks like means doing an array coercion.  Perhaps like so:

> do $$
> declare a int4[];
> x int;
> begin
>   a := array(select generate_series(1,1000));
>   for i in 1..10 loop
> x := array_length(a::int8[], 1);
>   end loop;
> end$$;

with the loop count set to 1 instead, I get:
before:
after:
tps = 20.940092 (including connections establishing)
after:
tps = 20.568730 (including connections establishing)

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane

Robert Haas  writes:
> Your unwillingness to make functions global or to stick PGDLLIMPORT
> markings on variables that people want access to is hugely
> handicapping extension authors.  Many people have complained about
> that on multiple occasions.  Frankly, I find it obstructionist and
> petty.

Sure, we could export every last static function in the core code,
and extension authors would rejoice ... while development on the core
code basically stops for fear of breaking extensions.  It's important
not to export things that we don't have to, especially when doing so
is really just a quick-n-dirty substitute for doing things properly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas

On Sun, May 10, 2015 at 8:41 PM, Tom Lane  wrote:
> Kohei KaiGai  writes:
>> I briefly checked your updates.
>> Even though it is not described in the commit-log, I noticed a
>> problematic change.
>
>> This commit reverts create_plan_recurse() as static function.
>
> Yes.  I am not convinced that external callers should be calling that,
> and would prefer not to enlarge createplan.c's API footprint without a
> demonstration that this is right and useful.  (This is one of many
> ways in which this patch is suffering from having gotten committed
> without submitted use-cases.)

I really think that reverting somebody else's committed change without
discussion is inappropriate.  If I don't like the fact that you
reverted this change, can I go revert it back?

Your unwillingness to make functions global or to stick PGDLLIMPORT
markings on variables that people want access to is hugely
handicapping extension authors.  Many people have complained about
that on multiple occasions.  Frankly, I find it obstructionist and
petty.

If you want to improve the design of this so that it does the same
things more elegantly, fine: I'll get out of the way.  If you just
want to make things impossible that the patch previously made
possible, I strongly object to that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multixacts woes

2015-05-10 Thread Robert Haas

On Sun, May 10, 2015 at 1:40 PM, Noah Misch  wrote:
> I don't know whether this deserves prompt remediation, but if it does, I would
> look no further than the hard-coded 25% figure.  We permit users to operate
> close to XID wraparound design limits.  GUC maximums force an anti-wraparound
> vacuum at no later than 93.1% of design capacity.  XID assignment warns at
> 99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
> pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
> 50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
> bulkiest mandatory cushion yet, an anti-wraparound vacuum when
> pg_multixact/members is just 25% full.

That's certainly one possible approach.  I had discounted it because
you can't really get more than a small multiple out of it, but getting
2-3x more room might indeed be enough to help some people quite a bit.
Just raising the threshold from 25% to say 40% would buy back a
healthy amount.

Or, as you suggest, we could just add a GUC.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Tom Lane

Andres Freund  writes:
> I'm not sure what exactly to use as a performance benchmark
> here. For now I chose
> SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, 
> generate_series(1, 1000) repeat(i);
> that'll hit array_out, which uses iterators.

Hmm, probably those results are swamped by I/O functions though.
I'd suggest trying something that exercises array_map(), which
it looks like means doing an array coercion.  Perhaps like so:

do $$
declare a int4[];
x int;
begin
  a := array(select generate_series(1,1000));
  for i in 1..10 loop
x := array_length(a::int8[], 1);
  end loop;
end$$;

Anyway, thanks for poking at it!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Kouhei Kaigai

> Kohei KaiGai  writes:
> > I briefly checked your updates.
> > Even though it is not described in the commit-log, I noticed a
> > problematic change.
> 
> > This commit reverts create_plan_recurse() as static function.
> 
> Yes.  I am not convinced that external callers should be calling that,
> and would prefer not to enlarge createplan.c's API footprint without a
> demonstration that this is right and useful.  (This is one of many
> ways in which this patch is suffering from having gotten committed
> without submitted use-cases.)
>
Hmm. I got it is intentional change.

> > It means extension
> > cannot have child node, even if it wants to add a custom-join logic.
> > Please assume a simple case below:
> >   SELECT * FROM t0, t1 WHERE t0.a = t1.x;
> 
> > An extension adds a custom join path, then its PlanCustomPath method will be
> > called back to create a plan node once it gets chosen by planner.
> > The role of PlanCustomPath is to construct a plan-node of itself, and 
> > plan-nodes
> > of the source relations also.
> > If create_plan_recurse() is static, we have no way to initialize
> > plan-node for t0
> > and t1 scan even if join-logic itself is powerful than built-in ones.
> 
> I find this argument quite unconvincing, because even granting that this
> is an appropriate way to create child nodes of a CustomScan, there is a
> lot of core code besides createplan.c that would not know about those
> child nodes either.
>
> As a counterexample, suppose that your cool-new-join-method is capable of
> joining three inputs at once.  You could stick two of the children into
> lefttree and righttree perhaps, but where are you gonna put the other?
>
> This comes back to the fact that trying to wedge join behavior into scan
> node types was a pretty bad idea (as evidenced by the entirely bogus
> decision that now scanrelid can be zero, which I rather doubt you've found
> all the places that that broke).  Really there should have been a new
> CustomJoin node or something like that.  If we'd done that, it would be
> possible to design that node type more like Append, with any number of
> child nodes.  And we could have set things up so that createplan.c knows
> about those child nodes and takes care of the recursion for you; it would
> still not be a good idea to expose create_plan_recurse and hope that
> callers of that would know how to use it correctly.
>
> I am still more than half tempted to say we should revert this entire
> patch series and hope for a better design to be submitted for 9.6.
> In the meantime, though, poking random holes in the modularity of core
> code is a poor substitute for having designed a well-thought-out API.
> 
> A possible compromise that we could perhaps still wedge into 9.5 is to
> extend CustomPath with a List of child Paths, and CustomScan with a List
> of child Plans, which createplan.c would know to build from the Paths,
> and other modules would then also be aware of these children.  I find that
> uglier than a separate join node type, but it would be tolerable I guess.
>
At this moment, my custom-join logic add a dummy node to have two child
nodes when it tries to join more than 3 relations.
Yep, if CustomPath node (ForeignPath also?) can have a list of child-path
nodes then core backend handles its initialization job, it will be more
comfortable for extensions.
I prefer this idea, rather than agree.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Andres Freund

On 2015-05-10 12:09:41 -0400, Tom Lane wrote:
> > * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
> >   buy the argument that turning them into functions will be slower. I'd
> >   bet the contrary on common platforms.

> Perhaps; do you want to do some testing and see?

I've added new iterator functions using a on-stack state variable and
array_iter_setup/next functions pretty analogous to the macros. And then
converted arrayfuncs.c to use them.

Codesize before introducing inline functions:

assert:
   textdata bss dec hex filename
8142400   50562   295952 8488914  8187d2 src/backend/postgres
optimize:
   textdata bss dec hex filename
6892928   50022   295920 7238870  6e74d6 src/backend/postgres

After:

assert:
   textdata bss dec hex filename
8133040   50562   295952 8479554  816342 src/backend/postgres
optimize:
   textdata bss dec hex filename
6890256   50022   295920 7236198  6e6a66 src/backend/postgres

That's a small decrease.

I'm not sure what exactly to use as a performance benchmark
here. For now I chose
SELECT * FROM (SELECT ARRAY(SELECT generate_series(1, 1))) d, 
generate_series(1, 1000) repeat(i);
that'll hit array_out, which uses iterators.

pgbench -P 10 -h /tmp -p 5440 postgres -n -f /tmp/bench.sql -c 4 -T 60
(I chose parallel because it'll show icache efficiency differences)

before, best of four:
tps = 4.921260 (including connections establishing)

after, best of four:
tps = 5.046437 (including connections establishing)

That's a relatively small difference. I'm not surprised, I'd not have
expected anything major.

Personally I think something roughly along those lines is both more
robust and easier to maintain. Even if possibly need to protect against
inlines not being available.


Similarly using inline funcs for AARR_NDIMS/HASNULL does not appear to
hamper performance and gets rid of the multiple evaluation risk.

These patches are obviously WIP. Especially with the iter stuff it's
possible that the concept could be extended a bit further.

Greetings,

Andres Freund
>From 1fe208cda8df5341794ef6b76e11578ecd46cdd8 Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 11 May 2015 02:43:06 +0200
Subject: [PATCH 1/2] WIP: Use inline functions for iteration.

---
 src/backend/utils/adt/arrayfuncs.c | 68 +
 src/include/c.h|  2 +
 src/include/utils/array.h  | 78 +-
 3 files changed, 115 insertions(+), 33 deletions(-)

diff --git a/src/backend/utils/adt/arrayfuncs.c b/src/backend/utils/adt/arrayfuncs.c
index 26fa648..287aca9 100644
--- a/src/backend/utils/adt/arrayfuncs.c
+++ b/src/backend/utils/adt/arrayfuncs.c
@@ -1037,7 +1037,7 @@ array_out(PG_FUNCTION_ARGS)
 	int			ndim,
 			   *dims,
 			   *lb;
-	ARRAY_ITER	ARRAY_ITER_VARS(iter);
+	array_iter	iter;
 	ArrayMetaState *my_extra;
 
 	/*
@@ -1105,7 +1105,7 @@ array_out(PG_FUNCTION_ARGS)
 	needquotes = (bool *) palloc(nitems * sizeof(bool));
 	overall_length = 1;			/* don't forget to count \0 at end. */
 
-	ARRAY_ITER_SETUP(iter, v);
+	array_iter_setup(&iter, v);
 
 	for (i = 0; i < nitems; i++)
 	{
@@ -1114,7 +1114,8 @@ array_out(PG_FUNCTION_ARGS)
 		bool		needquote;
 
 		/* Get source element, checking for NULL */
-		ARRAY_ITER_NEXT(iter, i, itemvalue, isnull, typlen, typbyval, typalign);
+		itemvalue = array_iter_next(&iter, i, &isnull,
+	typlen, typbyval, typalign);
 
 		if (isnull)
 		{
@@ -1542,7 +1543,7 @@ array_send(PG_FUNCTION_ARGS)
 			   *dim,
 			   *lb;
 	StringInfoData buf;
-	ARRAY_ITER	ARRAY_ITER_VARS(iter);
+	array_iter	iter;
 	ArrayMetaState *my_extra;
 
 	/*
@@ -1597,7 +1598,7 @@ array_send(PG_FUNCTION_ARGS)
 	}
 
 	/* Send the array elements using the element's own sendproc */
-	ARRAY_ITER_SETUP(iter, v);
+	array_iter_setup(&iter, v);
 
 	for (i = 0; i < nitems; i++)
 	{
@@ -1605,7 +1606,8 @@ array_send(PG_FUNCTION_ARGS)
 		bool		isnull;
 
 		/* Get source element, checking for NULL */
-		ARRAY_ITER_NEXT(iter, i, itemvalue, isnull, typlen, typbyval, typalign);
+		itemvalue = array_iter_next(&iter, i, &isnull,
+	typlen, typbyval, typalign);
 
 		if (isnull)
 		{
@@ -3105,7 +3107,7 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate)
 	int			typlen;
 	bool		typbyval;
 	char		typalign;
-	ARRAY_ITER	ARRAY_ITER_VARS(iter);
+	array_iter	iter;
 	ArrayMetaState *inp_extra;
 	ArrayMetaState *ret_extra;
 
@@ -3165,7 +3167,7 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate)
 	nulls = (bool *) palloc(nitems * sizeof(bool));
 
 	/* Loop over source data */
-	ARRAY_ITER_SETUP(iter, v);
+	array_iter_setup(&iter, v);
 	hasnulls = false;
 
 	for (i = 0; i < nitems; i++)
@@ -3173,8 +3175,8 @@ array_map(FunctionCallInfo fcinfo, Oid retType, ArrayMapState *amstate)
 		bool		callit = true;
 
 		/* Get source element, checking for NULL */
-		ARRAY_ITER_NEXT(iter, i, fcinfo->arg[

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane

Kohei KaiGai  writes:
> I briefly checked your updates.
> Even though it is not described in the commit-log, I noticed a
> problematic change.

> This commit reverts create_plan_recurse() as static function.

Yes.  I am not convinced that external callers should be calling that,
and would prefer not to enlarge createplan.c's API footprint without a
demonstration that this is right and useful.  (This is one of many
ways in which this patch is suffering from having gotten committed
without submitted use-cases.)

> It means extension
> cannot have child node, even if it wants to add a custom-join logic.
> Please assume a simple case below:
>   SELECT * FROM t0, t1 WHERE t0.a = t1.x;

> An extension adds a custom join path, then its PlanCustomPath method will be
> called back to create a plan node once it gets chosen by planner.
> The role of PlanCustomPath is to construct a plan-node of itself, and 
> plan-nodes
> of the source relations also.
> If create_plan_recurse() is static, we have no way to initialize
> plan-node for t0
> and t1 scan even if join-logic itself is powerful than built-in ones.

I find this argument quite unconvincing, because even granting that this
is an appropriate way to create child nodes of a CustomScan, there is a
lot of core code besides createplan.c that would not know about those
child nodes either.

As a counterexample, suppose that your cool-new-join-method is capable of
joining three inputs at once.  You could stick two of the children into
lefttree and righttree perhaps, but where are you gonna put the other?

This comes back to the fact that trying to wedge join behavior into scan
node types was a pretty bad idea (as evidenced by the entirely bogus
decision that now scanrelid can be zero, which I rather doubt you've found
all the places that that broke).  Really there should have been a new
CustomJoin node or something like that.  If we'd done that, it would be
possible to design that node type more like Append, with any number of
child nodes.  And we could have set things up so that createplan.c knows
about those child nodes and takes care of the recursion for you; it would
still not be a good idea to expose create_plan_recurse and hope that
callers of that would know how to use it correctly.

I am still more than half tempted to say we should revert this entire
patch series and hope for a better design to be submitted for 9.6.
In the meantime, though, poking random holes in the modularity of core
code is a poor substitute for having designed a well-thought-out API.

A possible compromise that we could perhaps still wedge into 9.5 is to
extend CustomPath with a List of child Paths, and CustomScan with a List
of child Plans, which createplan.c would know to build from the Paths,
and other modules would then also be aware of these children.  I find that
uglier than a separate join node type, but it would be tolerable I guess.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2015-05-10 Thread Tom Lane

Etsuro Fujita  writes:
> [ EvalPlanQual-v6.patch ]

I've started to study this in a little more detail, and I'm not terribly
happy with some of the API decisions in it.

In particular, I find the addition of "void *fdw_state" to ExecRowMark
to be pretty questionable.  That does not seem like a good place to keep
random state.  (I realize that WHERE CURRENT OF keeps some state in
ExecRowMark, but that's a crock not something to emulate.)  ISTM that in
most scenarios, the state that LockForeignRow/FetchForeignRow would like
to get at is probably the FDW state associated with the ForeignScanState
that the tuple came from.  Which this API doesn't help with particularly.
I wonder if we should instead add a "ScanState*" field and expect the
core code to set that up (ExecOpenScanRelation could do it with minor
API changes...).

I'm also a bit tempted to pass the TIDs to LockForeignRow and
FetchForeignRow as Datums not ItemPointers.  We have the Datum format
available already at the call sites, so this is free as far as the core
code is concerned, and would only cost another line or so for the FDWs.
This is by no means sufficient to allow FDWs to use some other type than
"tid" for row identifiers; but it would be a down payment on that problem,
and at least would avoid nailing the rowids-are-tids assumption into yet
another global API.

Thoughts?

Also, as I mentioned, I'd be a whole lot happier if we had a way to test
this...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multixacts woes

2015-05-10 Thread Jim Nasby


On 5/8/15 1:15 PM, Robert Haas wrote:

I somehow did not realize until very recently that we
actually use two SLRUs to keep track of multixacts: one for the
multixacts themselves (pg_multixacts/offsets) and one for the members
(pg_multixacts/members). Confusingly, members are sometimes called
offsets, and offsets are sometimes called IDs, or multixacts.


FWIW, since I had to re-read this bit...
 * We use two SLRU areas, one for storing the offsets at which the data
 * starts for each MultiXactId in the other one.  This trick allows us to
 * store variable length arrays of TransactionIds.


Another way this could be 'fixed' would be to bump MultiXactOffset (but 
NOT MultiXactId) to uint64. That would increase the number of total 
members we could keep by a factor of 2^32. At that point wraparound 
wouldn't even be possible, because you can't have more than 2^31 members 
in an MXID (and there can only be 2^31 MXIDs). It may not be a trivial 
change through, because SLRUs are currently capped at 2^32 pages.


This probably isn't a good long-term solution, but it would eliminate 
the risk of really frequent freeze vacuums. It sounds like Josh at least 
knows some people that could cause big problems for.

--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Andres Freund

On 2015-05-10 12:09:41 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Looking at this. First reading the patch to understand the details.
> 
> > * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to
> >   beneficial, before the compiler could implement the whole thing as a
> >   computed goto or lookup table, afterwards not.
> 
> Well, if you're worried about the speed of VARTAG_SIZE() then the right
> thing to do would be to revert your change that made enum vartag_external
> distinct from the size of the struct, so that we could go back to just
> using the second byte of a varattrib_1b_e datum as its size.  As I said
> at the time, inserting pad bytes to force each different type of toast
> pointer to be a different size would probably be a better tradeoff than
> what commit 3682025015 did.

I doubt that'd be a net positive. Anyway, all I'm saying is that I can't
see the VARTAG_IS_EXPANDED trick being beneficial in comparison to
checking both explicit values.

> > * You were rather bothered by the potential of multiple evaluations for
> >   the ilist stuff. And now the AARR macros are full of them...
> 
> Yeah, there is doubtless some added cost there.  But I think it's a better
> answer than duplicating each function in toto; the code space that that
> would take isn't free either.

Yea, duplicating would be horrid. I'm more thinking of declaring some
iterator state outside the macro, or just using an inline function.

> > * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
> >   buy the argument that turning them into functions will be slower. I'd
> >   bet the contrary on common platforms.
> 
> Perhaps; do you want to do some testing and see?

Not exactly with great joy, but I will.

> > * The list of hardwired safe ops in exec_check_rw_parameter is somewhat
> >   sad. Don't have a better idea though.
> 
> It's very sad, and it will be high on my list to improve that in 9.6.


> But I do not think it's a fatal problem to ship it that way in 9.5,
> because *as things stand today* those are the only two functions that
> could benefit anyway.  It won't really matter until we have extensions
> that want to use this mechanism.

Agreed that it's not fatal.

> > ISTM that the worst case for the new situation is large arrays that
> > exist as plpgsql variables but are only ever passed on.
> 
> Well, more to the point, large arrays that are forced into expanded format
> (costing a conversion step) but then we never do anything with them that
> would benefit from that.  Just saying they're "passed on" doesn't prove
> much since the called function might or might not get any benefit.
> array_length doesn't, but some other things would.

Right. But I'm not sure it's that uncommon.

> Your example with array_agg() is interesting, since one of the things on
> my to-do list is to see whether we could change array_agg to return an
> expanded array.

Well, I chose array_agg only because it was a trivial way to generate a
large array. The values could actually come from disk or something.

> It would not be hard to make it build that representation
> directly, instead of its present ad-hoc internal state.  The trick would
> be, when can you return the internal state without an additional copy
> step?  But maybe it could return a R/O pointer ...

R/O or R/W?

> > ... Expanding only in
> > cases where it'd be beneficial is going to be hard.
> 
> Yeah, improving that heuristic looks like a research project.  Still, even
> with all the limitations and to-do items in the patch now, I'm pretty sure
> this will be a net win for practically all applications.

I wonder if we could somehow 'mark' other toast pointers as 'expand if
useful'. I.e. have something pretty much like ExpandedObjectHeader,
except that it initially works like the indirect toast stuff.  So
eoh_context is set, but the data is still in the original datum. When
accessed via 'plain' accessors that don't know about the expanded format
the pointed to datum is returned. But when accessed by something
"desiring" the expanded version it's expanded.  It seemed that'd be
doable expanding the new infrastructure a bit more.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Kohei KaiGai

Tom,

I briefly checked your updates.
Even though it is not described in the commit-log, I noticed a
problematic change.

This commit reverts create_plan_recurse() as static function. It means extension
cannot have child node, even if it wants to add a custom-join logic.
Please assume a simple case below:
  SELECT * FROM t0, t1 WHERE t0.a = t1.x;

An extension adds a custom join path, then its PlanCustomPath method will be
called back to create a plan node once it gets chosen by planner.
The role of PlanCustomPath is to construct a plan-node of itself, and plan-nodes
of the source relations also.
If create_plan_recurse() is static, we have no way to initialize
plan-node for t0
and t1 scan even if join-logic itself is powerful than built-in ones.

It was already discussed in the upthread, and people's consensus.
Please revert create_plan_recurse() as like initial commit.

Also, regarding of the *_scan_tlist,

> I don't have time to pursue this idea right now, but I think it would be
> a good change to squeeze into 9.5, just so that we could have some test
> coverage on those parts of this patch.
>
Do you want just a test purpose module and regression test cases?

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] default value dosen't get applyed in this situation

2015-05-10 Thread Tom Lane

Aliouii Ali  writes:
> this a test case :

This is not supposed to insert a default.  Attach a default expression
to the view column, if you want inserts on the view to have defaults.
ALTER VIEW ... ALTER COLUMN ... SET DEFAULT is the way.

(Note that in recent versions of PG, that view would be auto-insertable
so you would not need to bother with the trigger.  But the situation
with defaults hasn't changed, and won't because it would be a backwards
compatibility break with no real value-add.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-05-10 Thread Robert Haas

On Thu, May 7, 2015 at 3:23 AM, Amit Kapila  wrote:
>> > I observed one issue while working on this review comment.  When we
>> > try to destroy the parallel setup via ExecEndNode (as due to Limit
>> > Node, it could not destroy after consuming all tuples), it waits for
>> > parallel
>> > workers to finish (WaitForParallelWorkersToFinish()) and parallel
>> > workers
>> > are waiting for master backend to signal them as their queue is full.
>> > I think in such a case master backend needs to inform workers either
>> > when
>> > the scan is discontinued due to limit node or while waiting for parallel
>> > workers to finish.
>>
>> Isn't this why TupleQueueFunnelShutdown() calls shm_mq_detach()?
>> That's supposed to unstick the workers; any impending or future writes
>> will just return SHM_MQ_DETACHED without waiting.
>
> Okay, that can work if we call it in ExecEndNode() before
> WaitForParallelWorkersToFinish(), however what if we want to do something
> like TupleQueueFunnelShutdown() when Limit node decides to stop processing
> the outer node. We can traverse the whole plan tree and find the nodes
> where
> parallel workers needs to be stopped, but I don't think thats good way to
> handle
> it.  If we don't want to stop workers from processing until
> ExecutorEnd()--->ExecEndNode(), then it will lead to workers continuing till
> that time and it won't be easy to get instrumentation/buffer usage
> information
> from workers (workers fill such information for master backend after
> execution
> is complete) as that is done before ExecutorEnd().  For Explain Analyze ..,
> we
> can ensure that workers are stopped before fetching that information from
> Funnel node, but the same is not easy for buffer usage stats required by
> plugins as that operates at ExecutorRun() and ExecutorFinish() level where
> we don't have direct access to node level information.  You can refer
> pgss_ExecutorEnd() where it completes the storage of stats information
> before calling ExecutorEnd().  Offhand, I could not think of a good way to
> do this, but one crude way could be introduce a new API
> (ParallelExecutorEnd())
> for such plugins which needs to be called before completing the stats
> accumulation.
> This API will call ExecEndPlan() if parallelmodeNeeded flag is set and allow
> accumulation of stats (InstrStartNode()/InstrStopNode())

OK, so if I understand you here, the problem is what to do about an
"orphaned" worker.  The Limit node just stops fetching from the lower
nodes, and those nodes don't get any clue that this has happened, so
their workers just sit there until the end of the query.  Of course,
that happens already, but it doesn't usually hurt very much, because
the Limit node usually appears at or near the top of the plan.

It could matter, though.  Suppose the Limit is for a subquery that has
a Sort somewhere (not immediately) beneath it.  My guess is the Sort's
tuplestore will stick around until after the subquery finishes
executing for as long as the top-level query is executing, which in
theory could be a huge waste of resources.  In practice, I guess
people don't really write queries that way.  If they did, I think we'd
have already developed some general method for fixing this sort of
problem.

I think it might be better to try to solve this problem in a more
localized way.  Can we arrange for planstate->instrumentation to point
directory into the DSM, instead of copying the data over later?  That
seems like it might help, or perhaps there's another approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] default value dosen't get applyed in this situation

2015-05-10 Thread Aliouii Ali


this a test case :


CREATE TABLE tab
(
  _id bigserial NOT NULL,
  _name text,
  CONSTRAINT tab_pkey PRIMARY KEY (_id)
);
CREATE TABLE tab_s1
(
CONSTRAINT tab_s1_check CHECK (1 = 1)
)
INHERITS (tab);
CREATE OR REPLACE VIEW v_tab AS 
 SELECT tab._id,
tab._name
   FROM tab;
CREATE OR REPLACE FUNCTION tab_insert()
  
RETURNS trigger AS
 $BODY$
BEGIN
INSERT INTO tab_s1 VALUES ((NEW).*);
RETURN NEW;
END $BODY$
  
LANGUAGE plpgsql;
CREATE TRIGGER tab_trigger
INSTEAD OF INSERT ON v_tab
FOR EACH ROW EXECUTE PROCEDURE tab_insert();

-- the query fail because _id is null
insert into v_tab(_name) values ('');

Re: [HACKERS] [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table

2015-05-10 Thread Tom Lane

postgres...@realityexists.net writes:
> I have a deferred EXCLUDE constraint on a derived table. Inside a
> transaction I insert a new row that conflicts with an existing one (so the
> constraint would fail if it was immediate), delete the old row and run an
> unrelated UPDATE on the new row, then try to commit. I would expect the
> commit to succeed, since there is now no conflict, but it fails with
>   ERROR: conflicting key value violates exclusion constraint
> "uq_derived_timeslice_dup_time_ex"

Hm.  The given test case is overcomplicated; in point of fact it will fail
on any deferred exclusion constraint, eg


DROP TABLE IF EXISTS derived_timeslice CASCADE;

CREATE TABLE derived_timeslice
(
  timeslice_id integer NOT NULL,
  feature_id integer NOT NULL,
  name text,
  CONSTRAINT pk_derived_timeslice PRIMARY KEY (timeslice_id),
  CONSTRAINT uq_derived_timeslice_dup_time_ex EXCLUDE 
USING btree (feature_id WITH =)
DEFERRABLE INITIALLY DEFERRED
);

INSERT INTO derived_timeslice (timeslice_id, feature_id) VALUES (51, 1);

BEGIN;

-- Insert row that violates deferred constraint
INSERT INTO derived_timeslice (timeslice_id, feature_id) VALUES (52, 1);

-- Delete the old row - now there should be no more conflict
DELETE FROM derived_timeslice WHERE timeslice_id = 51;

-- Problem doesn't occur without an UPDATE statement
UPDATE derived_timeslice SET name = 'Updated' WHERE timeslice_id = 52;

-- This confirms there is only 1 row - no conflict
SELECT * FROM derived_timeslice;

COMMIT; -- Enforce constraint - error occurs here


The cause of the problem seems to be that the UPDATE performs a HOT update
of the new tuple, leaving in this case a dead tuple at (0,2) that is HOT
updated by (0,3).  When unique_key_recheck() is invoked for (0,2), it
believes, correctly, that it has to perform the recheck anyway ... but it
tells check_exclusion_constraint that the check is being performed for
(0,2).  So the index search inside check_exclusion_constraint finds the
live tuple at (0,3) and thinks that is a conflict.

This is reproducible clear back to 9.0 where exclusion constraints were
added.

The easiest fix seems to be to pass the HOT child's TID instead of the
TID we were called for.  (Note that the other path, for a regular unique
constraint, is correct as-is because the original TID is what the index
will know about.)

The attached patch seems to fix the problem without breaking any existing
regression tests, but I wonder if anyone can see a hole in it.

regards, tom lane

diff --git a/src/backend/commands/constraint.c b/src/backend/commands/constraint.c
index e49affb..28fccaf 100644
*** a/src/backend/commands/constraint.c
--- b/src/backend/commands/constraint.c
*** unique_key_recheck(PG_FUNCTION_ARGS)
*** 89,97 
  	 * because this trigger gets queued only in response to index insertions;
  	 * which means it does not get queued for HOT updates.  The row we are
  	 * called for might now be dead, but have a live HOT child, in which case
! 	 * we still need to make the check.  Therefore we have to use
! 	 * heap_hot_search, not just HeapTupleSatisfiesVisibility as is done in
! 	 * the comparable test in RI_FKey_check.
  	 *
  	 * This might look like just an optimization, because the index AM will
  	 * make this identical test before throwing an error.  But it's actually
--- 89,98 
  	 * because this trigger gets queued only in response to index insertions;
  	 * which means it does not get queued for HOT updates.  The row we are
  	 * called for might now be dead, but have a live HOT child, in which case
! 	 * we still need to make the check --- effectively, we're applying the
! 	 * check against the live child row, although we can use the values from
! 	 * this row since by definition all columns of interest to us are the
! 	 * same.
  	 *
  	 * This might look like just an optimization, because the index AM will
  	 * make this identical test before throwing an error.  But it's actually
*** unique_key_recheck(PG_FUNCTION_ARGS)
*** 159,165 
  	{
  		/*
  		 * Note: this is not a real insert; it is a check that the index entry
! 		 * that has already been inserted is unique.
  		 */
  		index_insert(indexRel, values, isnull, &(new_row->t_self),
  	 trigdata->tg_relation, UNIQUE_CHECK_EXISTING);
--- 160,168 
  	{
  		/*
  		 * Note: this is not a real insert; it is a check that the index entry
! 		 * that has already been inserted is unique.  Passing t_self is
! 		 * correct even if t_self is now dead, because that is the TID the
! 		 * index will know about.
  		 */
  		index_insert(indexRel, values, isnull, &(new_row->t_self),
  	 trigdata->tg_relation, UNIQUE_CHECK_EXISTING);
*** unique_key_recheck(PG_FUNCTION_ARGS)
*** 168,177 
  	{
  		/*
  		 * For exclusion constraints we just do the normal check, but now it's
! 		 * okay to throw error.
  		 */
  		check_exclusion_constraint(trigdata->tg_relation, index

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane

I've committed a cleanup patch along the lines discussed.

One thought is that we could test the nondefault-scan-tuple logic without
a lot of work by modifying the way postgres_fdw deals with columns it
decides don't need to be fetched.  Right now it injects NULL columns so
that the remote query results still match the foreign table's rowtype,
but that's not especially desirable or efficient.  What we could do
instead is generate an fdw_scan_tlist that just lists the columns we
intend to fetch.

I don't have time to pursue this idea right now, but I think it would be
a good change to squeeze into 9.5, just so that we could have some test
coverage on those parts of this patch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multixacts woes

2015-05-10 Thread Andrew Dunstan



On 05/10/2015 10:30 AM, Robert Haas wrote:



2. We have some logic that causes autovacuum to run in spite of
autovacuum=off when wraparound threatens.  My commit
53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
anti-wraparound protections for multixact members that exist for
multixact IDs and for regular XIDs, but this remains an outstanding
issue.  I believe I know how to fix this, and will work up an
appropriate patch based on some of Thomas's earlier work.

I believe autovacuum=off is fortunately uncommon, but certainly getting
this issue fixed is a good idea.

Right.





I suspect it's quite a bit more common than many people imagine.

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multixacts woes

2015-05-10 Thread Noah Misch

On Fri, May 08, 2015 at 02:15:44PM -0400, Robert Haas wrote:
> My colleague Thomas Munro and I have been working with Alvaro, and
> also with Kevin and Amit, to fix bug #12990, a multixact-related data
> corruption bug.

Thanks Alvaro, Amit, Kevin, Robert and Thomas for mobilizing to get this fixed.

> 1. I believe that there is still a narrow race condition that cause
> the multixact code to go crazy and delete all of its data when
> operating very near the threshold for member space exhaustion. See
> http://www.postgresql.org/message-id/ca+tgmozihwybetx8nzzptosjprg2kcr-nawgajkzclcbvj1...@mail.gmail.com
> for the scenario and proposed fix.

For anyone else following along, Thomas's subsequent test verified this threat
beyond reasonable doubt:

http://www.postgresql.org/message-id/CAEepm=3c32vpjloo45y0c3-3kwxnv2xm4japtsvjcrd2vg0...@mail.gmail.com

> 2. We have some logic that causes autovacuum to run in spite of
> autovacuum=off when wraparound threatens.  My commit
> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
> anti-wraparound protections for multixact members that exist for
> multixact IDs and for regular XIDs, but this remains an outstanding
> issue.  I believe I know how to fix this, and will work up an
> appropriate patch based on some of Thomas's earlier work.

That would be good to have, and its implementation should be self-contained.

> 3. It seems to me that there is a danger that some users could see
> extremely frequent anti-mxid-member-wraparound vacuums as a result of
> this work.  Granted, that beats data corruption or errors, but it
> could still be pretty bad.  The default value of
> autovacuum_multixact_freeze_max_age is 4.
> Anti-mxid-member-wraparound vacuums kick in when you exceed 25% of the
> addressable space, or 1073741824 total members.  So, if your typical
> multixact has more than 1073741824/4 = ~2.68 members, you're
> going to see more autovacuum activity as a result of this change.
> We're effectively capping autovacuum_multixact_freeze_max_age at
> 1073741824/(average size of your multixacts).  If your multixacts are
> just a couple of members (like 3 or 4) this is probably not such a big
> deal.  If your multixacts typically run to 50 or so members, your
> effective freeze age is going to drop from 400m to ~21.4m.  At that
> point, I think it's possible that relminmxid advancement might start
> to force full-table scans more often than would be required for
> relfrozenxid advancement.  If so, that may be a problem for some
> users.

I don't know whether this deserves prompt remediation, but if it does, I would
look no further than the hard-coded 25% figure.  We permit users to operate
close to XID wraparound design limits.  GUC maximums force an anti-wraparound
vacuum at no later than 93.1% of design capacity.  XID assignment warns at
99.5%, then stops at 99.95%.  PostgreSQL mandates a larger cushion for
pg_multixact/offsets, with anti-wraparound VACUUM by 46.6% and a stop at
50.0%.  Commit 53bb309d2d5a9432d2602c93ed18e58bd2924e15 introduced the
bulkiest mandatory cushion yet, an anti-wraparound vacuum when
pg_multixact/members is just 25% full.  The pgsql-bugs thread driving that
patch did reject making it GUC-controlled, essentially on the expectation that
25% should be adequate for everyone:

http://www.postgresql.org/message-id/CA+Tgmoap6-o_5ESu5X2mBRVht_F+KNoY+oO12OvV_WekSA=e...@mail.gmail.com
http://www.postgresql.org/message-id/20150506143418.gt2...@alvh.no-ip.org
http://www.postgresql.org/message-id/1570859840.1241196.1430928954257.javamail.ya...@mail.yahoo.com

> What can we do about this?  Alvaro proposed back-porting his fix for
> bug #8470, which avoids locking a row if a parent subtransaction
> already has the same lock.

Like Andres and yourself, I would not back-patch it.

> Another thought that occurs to me is that if we had a freeze map, it
> would radically decrease the severity of this problem, because
> freezing would become vastly cheaper.  I wonder if we ought to try to
> get that into 9.5, even if it means holding up 9.5.

Declaring that a release will wait for a particular feature has consistently
ended badly for PostgreSQL, and this feature is just in the planning stages.
If folks are ready to hit the ground running on the project, I suggest they do
so; a non-WIP submission to the first 9.6 CF would be a big accomplishment.
The time to contemplate slipping it into 9.5 comes after the patch is done.

If these aggressive ideas earn more than passing consideration, the 25%
threshold should become user-controllable after all.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multixacts woes

2015-05-10 Thread José Luis Tallón


On 05/08/2015 09:57 PM, Josh Berkus wrote:

[snip]

It's certainly possible to have workloads triggering that, but I think
it's relatively uncommon.  I in most cases I've checked the multixact
consumption rate is much lower than the xid consumption. There are some
exceptions, but often that's pretty bad code.

I have a couple workloads in my pool which do consume mxids faster than
xids, due to (I think) execeptional numbers of FK conflicts.  It's
definitely unusual, though, and I'm sure they'd rather have corruption
protection and endure some more vacuums.


Seen corruption happen recently with OpenBravo on PostgreSQL 9.3.6 
(Debian; binaries upgraded from 9.3.2) in a cluster pg_upgraded from 9.2.4

(albeit with quite insufficient autovacuum / poorly configured Postgres)

I fear that this might be more widespread than we thought, depending on 
the exact workload/activity pattern.
If it would help, I can try to get hold of a copy of the cluster in 
question (if the customer keeps any copy at all)



If we do this, though, it
might be worthwhile to backport the multixact age function, so that
affected users can check and schedule mxact wraparound vacuums
themselves, something you currently can't do on 9.3.


Thanks,

J.L.



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

2015-05-10 Thread Tom Lane

Andres Freund  writes:
> Looking at this. First reading the patch to understand the details.

> * The VARTAG_IS_EXPANDED(tag) trick in VARTAG_SIZE is unlikely to
>   beneficial, before the compiler could implement the whole thing as a
>   computed goto or lookup table, afterwards not.

Well, if you're worried about the speed of VARTAG_SIZE() then the right
thing to do would be to revert your change that made enum vartag_external
distinct from the size of the struct, so that we could go back to just
using the second byte of a varattrib_1b_e datum as its size.  As I said
at the time, inserting pad bytes to force each different type of toast
pointer to be a different size would probably be a better tradeoff than
what commit 3682025015 did.

> * It'd be nice if the get_flat_size comment in expandeddatm.h could
>   specify whether the header size is included. That varies enough around
>   toast that it seems worthwhile.

OK.

> * You were rather bothered by the potential of multiple evaluations for
>   the ilist stuff. And now the AARR macros are full of them...

Yeah, there is doubtless some added cost there.  But I think it's a better
answer than duplicating each function in toto; the code space that that
would take isn't free either.

> * I find the ARRAY_ITER_VARS/ARRAY_ITER_NEXT macros rather ugly. I don't
>   buy the argument that turning them into functions will be slower. I'd
>   bet the contrary on common platforms.

Perhaps; do you want to do some testing and see?

> * Not a fan of the EH_ prefix in array_expanded.c and EOH_
>   elsewhere. Just looks ugly to me. Whatever.

I'm not wedded to that naming if you have a better suggestion.

> * The list of hardwired safe ops in exec_check_rw_parameter is somewhat
>   sad. Don't have a better idea though.

It's very sad, and it will be high on my list to improve that in 9.6.
But I do not think it's a fatal problem to ship it that way in 9.5,
because *as things stand today* those are the only two functions that
could benefit anyway.  It won't really matter until we have extensions
that want to use this mechanism.

> * "Also, a C function that is modifying a read-write expanded value
>   in-place should take care to leave the value in a sane state if it
>   fails partway through." - that's a pretty hefty requirement imo.

It is, which is one reason that I'm nervous about relaxing the test in
exec_check_rw_parameter.  Still, it was possible to code array_set_element
to meet that restriction without too much pain.  I'm inclined to leave the
stronger requirement in the docs for now, until we get more push-back.

> * The forced RW->RO conversion in subquery scans is a bit sad, but I
>   seems like something left for later.

Yes, there are definitely some things that look like future opportunities
here.

> Somewhere in the thread you comment on the fact that it's a bit sad that
> plpgsql is the sole benefactor of this (unless some function forces
> expansion internally).  I'd be ok to leave it at that for now. It'd be
> quite cool to get some feedback from postgis folks about the suitability
> of this for their cases.

Indeed, that's the main reason I'm eager to ship something in 9.5, even if
it's not perfect.  I hope those guys will look at it and use it, and maybe
give us feedback on ways to improve it.

> ISTM that the worst case for the new situation is large arrays that
> exist as plpgsql variables but are only ever passed on.

Well, more to the point, large arrays that are forced into expanded format
(costing a conversion step) but then we never do anything with them that
would benefit from that.  Just saying they're "passed on" doesn't prove
much since the called function might or might not get any benefit.
array_length doesn't, but some other things would.

Your example with array_agg() is interesting, since one of the things on
my to-do list is to see whether we could change array_agg to return an
expanded array.  It would not be hard to make it build that representation
directly, instead of its present ad-hoc internal state.  The trick would
be, when can you return the internal state without an additional copy
step?  But maybe it could return a R/O pointer ...

> ... Expanding only in
> cases where it'd be beneficial is going to be hard.

Yeah, improving that heuristic looks like a research project.  Still, even
with all the limitations and to-do items in the patch now, I'm pretty sure
this will be a net win for practically all applications.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Tom Lane

Robert Haas  writes:
> On Sat, May 9, 2015 at 1:05 PM, Tom Lane  wrote:
>> For these reasons, I think that if an FDW tried to be laxer than "tables
>> must be on the same pg_foreign_server entry to be joined", the odds
>> approach unity that it would be broken, and probably dangerously broken.
>> So we should just make that check for the FDWs.  Anybody who thinks
>> they're smarter than the average bear can use set_join_pathlist_hook,
>> but they are probably wrong.

> Drilling down into postgres_fdw's connection properties seems pretty
> silly; the user isn't likely to create two SERVER objects that are
> identical and then choose which one to use at random, and if they do,
> they deserve what they get.  The universe of FDWs, however, is
> potentially bigger than that.  What does a server even mean for
> file_fdw, for example?

Nothing, which is why you'd only ever create one per database, and so the
issue doesn't arise anyway.  It would only be sane to create multiple
servers per FDW if there were a meaningful difference between them.

In any case, since the existing code doesn't support "remote" joins
involving a local table unless you use the join-path hook, this argument
seems pretty academic.  If we tighten the test to be same-server, we will
benefit all but very weirdly designed FDWs.  Anybody who's not happy with
that can still use the hook (and I continue to maintain that they will
probably have subtle bugs, but whatever).

Another point worth making is that the coding I have in mind doesn't
really do anything with RelOptInfo.serverid except compare it for
equality.  So an FDW that wants to consider some servers interchangeable
for joining purposes could override the value at GetForeignPaths time
(ie replace "serverid" with the OID of a preferred server), and then it
would get GetForeignJoinPaths calls as desired.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] BRIN range operator class

2015-05-10 Thread Emre Hasegeli

> I pushed patches 04 and 07, as well as adopting some of the changes to
> the regression test in 06.  I'm afraid I caused a bit of merge pain for
> you -- sorry about that.

No problem.  I rebased the remaining ones.


brin-inclusion-v09-02-strategy-numbers.patch
Description: Binary data


brin-inclusion-v09-03-remove-assert-checking.patch
Description: Binary data


brin-inclusion-v09-05-box-vs-point-operators.patch
Description: Binary data


brin-inclusion-v09-06-inclusion-opclasses.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

2015-05-10 Thread Robert Haas

On Sat, May 9, 2015 at 1:05 PM, Tom Lane  wrote:
>> I originally wanted to go quite the other way with this and check for
>> join pushdown via handler X any time at least one of the two relations
>> involved used handler X, even if the other one used some other handler
>> or was a plain table.  In particular, it seems to me quite plausible
>> to want to teach an FDW that a certain local table is replicated on a
>> remote node, allowing a join between a foreign table and a plain table
>> to be pushed down.
>
> If we did do something like that, I think a saner implementation would
> involve substituting a foreign table for the local one earlier, via view
> expansion.  So by the time we are doing join planning, there would be no
> need to consider cross-server joins anyway.

Huh?  You can't do this at rewrite time; it's very fundamentally a
planning problem.  Suppose the user wants to join A, B, and C, where A
is a plain table, B is a plain table that is replicated on a foreign
server, and C is a foreign table on that same foreign server.  If we
decide to join B to C first, we probably want to push down the join,
although maybe not, if we estimate that B JOIN C will have more rows
than C.  If we decide to join A to B first, we want to use the local
copy of B.

>> This infrastructure can't be used that way anyhow,
>> so maybe there's no harm in tightening it up, but I'm wary of
>> circumscribing what FDW authors can do.
>
> Somebody who's really intent on shooting themselves in the foot can always
> use the set_join_pathlist_hook to inject paths for arbitrary joins.
> The FDW mechanism should support reasonable use cases without undue
> complication, and I doubt that what we've got now is adding anything
> except complication and risk of bugs.
>
> For the archives' sake, let me lay out a couple of reasons why an FDW
> that tried to allow cross-server joins would almost certainly be broken,
> and broken in security-relevant ways.  Suppose for instance that
> postgres_fdw tried to be smart and drill down into foreign tables' server
> IDs to allow joining of any two tables that have the same effective host
> name, port, database name, user name, and anything else you think would be
> relevant to its choice of connections.  The trouble with that is that the
> user mapping is context dependent, in particular one local userID might
> map to the same remote user name for two different server OIDs, while
> another might map to different user names.  So if we plan a query under
> the first userID we might decide it's okay to do the join remotely.
> Then, if we re-use that plan while executing as another userID (which is
> entirely possible) what will probably happen is that the remote join
> query will get sent off under one or the other of the remote usernames
> associated with the second local userID.  This could lead to either a
> permission failure, or a remote table access that should not be allowed
> to the current local userID.  Admittedly, such cases might be rare in
> practice, but it's still a security hole.  Also, even if the FDW is
> defensive enough to recheck full matching of the tables' connection
> properties at execution time, there's not much it could do about the
> situation except fail; it couldn't cause a re-plan to occur.
>
> For another case, we do not have any mechanism whereby operations like
> ALTER SERVER OPTIONS could invalidate existing plans.  Thus, even if
> the two tables' connection properties matched at plan time, there's no
> guarantee that they still match at execution.  This is probably not a
> security hole (at least not if you assume foreign-server owners are
> trusted users), but it's still a bug that exists only if someone tries
> to allow cross-server joins.
>
> For these reasons, I think that if an FDW tried to be laxer than "tables
> must be on the same pg_foreign_server entry to be joined", the odds
> approach unity that it would be broken, and probably dangerously broken.
> So we should just make that check for the FDWs.  Anybody who thinks
> they're smarter than the average bear can use set_join_pathlist_hook,
> but they are probably wrong.

Drilling down into postgres_fdw's connection properties seems pretty
silly; the user isn't likely to create two SERVER objects that are
identical and then choose which one to use at random, and if they do,
they deserve what they get.  The universe of FDWs, however, is
potentially bigger than that.  What does a server even mean for
file_fdw, for example?  I can't think of any reason why somebody would
want to implement joins inside file_fdw, but if they did, all the
things being joined would be local files, so the server ID doesn't
really matter.  Now you may say that's a silly use case, but it's less
obviously silly if the files contain structured data, as with
cstore_fdw, yet the server ID could still be not especially relevant.
Maybe you've got servers representing filesystem directories; that
shouldn't preclude cross "server" joins.

Re: [HACKERS] multixacts woes

2015-05-10 Thread Robert Haas

On Fri, May 8, 2015 at 5:39 PM, Alvaro Herrera  wrote:
>> 1. I believe that there is still a narrow race condition that cause
>> the multixact code to go crazy and delete all of its data when
>> operating very near the threshold for member space exhaustion. See
>> http://www.postgresql.org/message-id/ca+tgmozihwybetx8nzzptosjprg2kcr-nawgajkzclcbvj1...@mail.gmail.com
>> for the scenario and proposed fix.
>
> I agree that there is a problem here.

OK, I'm glad we now agree on that, since it seemed like you were
initially unconvinced.

>> 2. We have some logic that causes autovacuum to run in spite of
>> autovacuum=off when wraparound threatens.  My commit
>> 53bb309d2d5a9432d2602c93ed18e58bd2924e15 provided most of the
>> anti-wraparound protections for multixact members that exist for
>> multixact IDs and for regular XIDs, but this remains an outstanding
>> issue.  I believe I know how to fix this, and will work up an
>> appropriate patch based on some of Thomas's earlier work.
>
> I believe autovacuum=off is fortunately uncommon, but certainly getting
> this issue fixed is a good idea.

Right.

>> 3. It seems to me that there is a danger that some users could see
>> extremely frequent anti-mxid-member-wraparound vacuums as a result of
>> this work.
>
> I agree with the idea that the long term solution to this issue is to
> make the freeze process cheaper.  I don't have any good ideas on how to
> make this less severe in the interim.  You say the fix for #8470 is not
> tested thoroughly enough to back-patch it just yet, and I can behind
> that; so let's wait until 9.5 has been tested a bit more.

Sounds good.

> Another avenue not mentioned and possibly worth exploring is making some
> more use of the multixact cache, and reuse multixacts that were
> previously issued and have the same effects as the one you're interested
> in: for instance, if you want a multixact with locking members
> (10,20,30) and you have one for (5,10,20,30) but transaction 5 has
> finished, then essentially both have the same semantics (because locks
> don't have any effect the transaction has finished) so we can use it
> instead of creating a new one.  I have no idea how to implement this;
> obviously, having to run TransactionIdIsCurrentTransactionId for each
> member on each multixact in the cache each time you want to create a new
> multixact is not very reasonable.

This sounds to me like it's probably too clever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup vs. Windows and tablespaces

Re: [HACKERS] multixacts woes

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] feature freeze and beta schedule

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] Async execution of postgres_fdw.

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

[HACKERS] "unaddressable bytes" in BRIN

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] multixacts woes

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

Re: [HACKERS] multixacts woes

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] default value dosen't get applyed in this situation

Re: [HACKERS] Parallel Seq Scan

[HACKERS] default value dosen't get applyed in this situation

Re: [HACKERS] [BUGS] BUG #13148: Unexpected deferred EXCLUDE constraint violation on derived table

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] multixacts woes

Re: [HACKERS] multixacts woes

Re: [HACKERS] multixacts woes

Re: [HACKERS] Manipulating complex types as non-contiguous structures in-memory

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] BRIN range operator class

Re: Custom/Foreign-Join-APIs (Re: [HACKERS] [v9.5] Custom Plan API)

Re: [HACKERS] multixacts woes

36 matches

Site Navigation

Mail list logo

Footer information