Re: [HACKERS] Parallel Hash take II

2017-11-13 Thread Thomas Munro
Hi Andres and Peter,

Please see below for inline responses to your feedback.  New patch attached.

On Wed, Nov 8, 2017 at 10:01 AM, Andres Freund  wrote:
> +set min_parallel_table_scan_size = 0;
> +set parallel_setup_cost = 0;
> +-- Make a simple relation with well distributed keys and correctly
> +-- estimated size.
> +create table simple as
> +  select generate_series(1, 2) AS id, 
> 'aa';
> +alter table simple set (parallel_workers = 2);
> +analyze simple;
> +-- Make a relation whose size we will under-estimate.  We want stats
> +-- to say 1000 rows, but actually there are 20,000 rows.
> +create table bigger_than_it_looks as
> +  select generate_series(1, 2) as id, 
> 'aa';
> +alter table bigger_than_it_looks set (autovacuum_enabled = 'false');
> +alter table bigger_than_it_looks set (parallel_workers = 2);
> +delete from bigger_than_it_looks where id <= 19000;
> +vacuum bigger_than_it_looks;
> +analyze bigger_than_it_looks;
> +insert into bigger_than_it_looks
> +  select generate_series(1, 19000) as id, 
> 'aa';
>
> It seems kinda easier to just manipulate ndistinct and reltuples...

Done.

> +set max_parallel_workers_per_gather = 0;
> +set work_mem = '4MB';
>
> I hope there's a fair amount of slop here - with different archs you're
> going to see quite some size differences.

Yeah, this is a problem I wrestled with.  See next.

> +-- The "good" case: batches required, but we plan the right number; we
> +-- plan for 16 batches, and we stick to that number, and peak memory
> +-- usage says within our work_mem budget
> +-- non-parallel
> +set max_parallel_workers_per_gather = 0;
> +set work_mem = '128kB';
>
> So how do we know that's actually the case we're testing rather than
> something arbitrarily different? There's IIRC tests somewhere that just
> filter the json explain output to the right parts...

Yeah, good idea.  My earlier attempts to dump out the hash join
dimensions ran into problems with architecture sensitivity and then
some run-to-run non-determinism in the parallel case (due to varying
fragmentation depending on how many workers get involved in time).
The attached version tells you about batch growth without reporting
the exact numbers, except in the "ugly" case where we know that there
is only one possible outcome because the extreme skew detector is
guaranteed to go off after the first nbatch increase (I got rid of all
other tuples except ones with the same key to make this true).

This exercise did reveal a bug in
0008-Show-hash-join-per-worker-information-in-EXPLAIN-ANA.patch
though: it is capturing shared instrumentation too soon in the
non-Parallel Hash case so the nbatch reported by EXPLAIN ANALYZE might
be too low if we grew while probing.  Oops.  Will post a fix for that.

> +/*
> + * Build the name for a given segment of a given BufFile.
> + */
> +static void
> +MakeSharedSegmentName(char *name, const char *buffile_name, int segment)
> +{
> +   snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment);
> +}
>
> Not a fan of this name - you're not "making" a filename here (as in
> allocating or such). I think I'd just remove the Make prefix.

Done.  I also changed some similar code where I'd used GetXXX when
building paths.

> +/*
> + * Open a file that was previously created in another backend with
> + * BufFileCreateShared in the same SharedFileSet using the same name.  The
> + * backend that created the file must have called BufFileClose() or
> + * BufFileExport() to make sure that it is ready to be opened by other
> + * backends and render it read-only.
> + */
>
> Is it actually guaranteed that it's another backend / do we rely on
> that?

No, it could be any backend that is attached to the SharedFileSet,
including the current one.  Wording improved.

> +BufFile *
> +BufFileOpenShared(SharedFileSet *fileset, const char *name)
> +{
>
> +   /*
> +* If we didn't find any files at all, then no BufFile exists with 
> this
> +* tag.
> +*/
> +   if (nfiles == 0)
> +   return NULL;
>
> s/taag/name/?

Fixed.

> +/*
> + * Delete a BufFile that was created by BufFileCreateShared in the given
> + * SharedFileSet using the given name.
> + *
> + * It is not necessary to delete files explicitly with this function.  It is
> + * provided only as a way to delete files proactively, rather than waiting 
> for
> + * the SharedFileSet to be cleaned up.
> + *
> + * Only one backend should attempt to delete a given name, and should know
> + * that it exists and has been exported or closed.
> + */
> +void
> +BufFileDeleteShared(SharedFileSet *fileset, const char *name)
> +{
> +   charsegment_name[MAXPGPATH];
> +   int segment = 0;
> +   boolfound = false;
> +
> +   /*
> +* We don't know how many segments the file has.  We'll keep deleting
> +* until we run out.  If we don't m

Re: [HACKERS] LDAPS

2017-11-12 Thread Thomas Munro
On Sat, Nov 4, 2017 at 2:05 AM, Thomas Munro
 wrote:
> I've only tested the attached lightly on FreeBSD + OpenLDAP and
> don't know if it'll work elsewhere.

While rebasing this on top of a nearby changes, I looked into how
portable it is.  The previous version unconditionally used
ldap_initialize() instead of ldap_init() in order to be able to pass
in ldap or ldaps.  According to the man pages on my system:

   At this time, ldap_open() and ldap_init() are deprecated in favor of
   ldap_initialize(), essentially because the latter allows to specify a
   schema in the URI and it explicitly returns an error code.

But:

1.  It looks like ldap_initialize() arrived in OpenLDAP 2.4 (2007),
which means that it won't work with RHEL5's OpenLDAP 2.3.  That's a
vintage still found in the build farm.  This new version of the patch
has a configure test so it can fall back to ldap_init(), dropping
ldaps support.  That is possibly also necessary for other
implementations.

2.  Windows doesn't have ldap_initialize(), but it has
ldap_sslinit()[1] which adds an SSL boolean argument.  I've included
(but not tested) code for that.  I would need a Windows + LDAP savvy
person to help test that.  I'm not sure if it should also do an
LDAP_OPT_SSL check to see if the server forced the connection back to
plaintext as shown in the Microsoft docs[2], or if that should be
considered OK, or it should be an option.

BTW, Stephen Layland posted a patch for ldaps years ago[3].  It must
have worked some other way though, because he mentions RHEL 4 and
OpenLDAP 2.2/2.3.  Unfortunately the patch wasn't attached and the
referenced webserver has disappeared from the intertubes.

I've added this to the January Commitfest.

[1] https://msdn.microsoft.com/en-us/library/aa366996(v=vs.85).aspx
[2] https://msdn.microsoft.com/en-us/library/aa366105(v=vs.85).aspx
[3] https://www.postgresql.org/message-id/20080426010240.gs5...@68k.org

-- 
Thomas Munro
http://www.enterprisedb.com


ldaps-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A GUC to prevent leader processes from running subplans?

2017-11-12 Thread Thomas Munro
On Sun, Nov 12, 2017 at 8:51 PM, Amit Kapila  wrote:
> On Sun, Nov 12, 2017 at 9:18 AM, Thomas Munro
>  wrote:
>> How about parallel_leader_participation = on|off?  The attached
>> version has it that way, and adds regression tests to exercise on, off
>> and off-but-couldn't-start-any-workers for both kinds of gather node.
>>
>> I'm not sure why node->need_to_rescan is initialised by both
>> ExecGatherInit() and ExecGather().  Only the latter's value matters,
>> right?
>>
>
> I don't see anything like need_to_rescan in the GatherState node.  Do
> you intend to say need_to_scan_locally?  If yes, then I think whatever
> you said is right.

Right, that's what I meant to write.  Thanks.

>> I've added this to the January Commitfest.
>>
>
> +1 to this idea.  Do you think such an option at table level can be
> meaningful?  We have a parallel_workers as a storage option for
> tables, so users might want leader to participate in parallelism only
> for some of the tables.

I'm not sure.  I think the reason for turning it off (other than
developer testing) would be that the leader is getting tied up doing
work that takes a long time (sorting, hashing, aggregating) and that's
causing the workers to be blocked because their output queue is full.
I think that type of behaviour comes from certain plan types, and it
probably wouldn't make sense to associate this behaviour with the
tables you're scanning.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A GUC to prevent leader processes from running subplans?

2017-11-11 Thread Thomas Munro
On Sat, Oct 21, 2017 at 8:09 AM, Robert Haas  wrote:
> On Tue, Oct 17, 2017 at 7:27 AM, Thomas Munro
>  wrote:
>> While testing parallelism work I've wanted to be able to prevent
>> gather nodes from running the plan in the leader process, and I've
>> heard others say the same.  One way would be to add a GUC
>> "multiplex_gather", like in the attached patch.  If you set it to off,
>> Gather and Gather Merge won't run the subplan unless they have to
>> because no workers could be launched.  I thought about adding a new
>> value for force_parallel_mode instead, but someone mentioned they
>> might want to do this on a production system too and
>> force_parallel_mode is not really for end users.  Better ideas?
>
> I don't think overloading force_parallel_mode is a good idea, but
> having some other GUC for this seems OK to me.  Not sure I like
> multiplex_gather, though.

How about parallel_leader_participation = on|off?  The attached
version has it that way, and adds regression tests to exercise on, off
and off-but-couldn't-start-any-workers for both kinds of gather node.

I'm not sure why node->need_to_rescan is initialised by both
ExecGatherInit() and ExecGather().  Only the latter's value matters,
right?

I've added this to the January Commitfest.

-- 
Thomas Munro
http://www.enterprisedb.com


parallel-leader-participation-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] LDAP URI decoding bugs

2017-11-10 Thread Thomas Munro
On Sat, Nov 11, 2017 at 8:37 AM, Peter Eisentraut
 wrote:
> On 11/6/17 23:30, Michael Paquier wrote:
>> On Fri, Nov 3, 2017 at 12:57 PM, Thomas Munro
>>  wrote:
>>> 1.  If you set up a pg_hba.conf with a URL that lacks a base DN or
>>> hostname, hba.c will segfault on startup when it tries to pstrdup a
>>> null pointer.  Examples: ldapurl="ldap://localhost"; and
>>> ldapurl="ldap://";.
>>>
>>> 2.  If we fail to bind but have no binddn configured, we'll pass NULL
>>> to ereport (snprint?) for %s, which segfaults on some libc
>>> implementations.  That crash requires more effort to reproduce but you
>>> can see pretty clearly a few lines above in auth.c that it can be
>>> NULL.  (I'm surprised Coverity didn't complain about that.  Maybe it
>>> can't see this code due to macros.)
>
> committed and backpatched

Thanks!

I suppose someone might eventually want to go further and teach it to
understand such bare URLs or missing options (ie leaving out any bits
you want and falling back to the ldap library's defaults, which come
from places like env variables, .ldaprc and /etc/ldap.conf, the way
that "ldapsearch" and other tools manage to work with reasonable
defaults, or at least only need to be set up in one place for all your
LDAP-client software).  I'm not planning to work on that.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Planning counters in pg_stat_statements

2017-11-10 Thread Thomas Munro
On Tue, Nov 7, 2017 at 6:39 PM, Tsunakawa, Takayuki
 wrote:
> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Thomas Munro
>> I have often wanted $SUBJECT and was happy to find that Fujii-san had posted
>> a patch five years ago[1].  The reception then seemed positive.
>> So here is a refurbished and (hopefully) improved version of his patch with
>> a new column for the replan count.  Thoughts?
>
> That's a timely proposal.  I sometimes faced performance problems where the 
> time pg_stat_statements shows is much shorter than the application perceives. 
>  The latest experience was that the execution time of a transaction, which 
> consists of dozens of DMLs and COMMIT, was about 200ms from the application's 
> perspective, while pg_stat_statements showed only about 10ms in total.  The 
> network should not be the cause because the application ran on the same host 
> as the database server.  I wanted to know how long the parsing and planning 
> time was.

Note that this patch doesn't include the parse or parse analysis
times.  I guess they would be less interesting?  But perhaps someone
would want to have the complete query production line measured.

BTW the reason I was looking into this was because an Oracle user
asked me how to see "hard parse" times on Postgres, and I've talked to
others who seem strangely concerned with "parsing" time.  On Oracle I
believe that term covers (among other things) actually planning, and I
guess planning is the most interesting component.  Planning is the
thing I've wanted to measure myself, to diagnose problems relating to
partition/inheritance planning and join explosions and to figure out
which things should be changed to PREPARE/EXECUTE.  Perhaps a separate
parse/analysis counter might become more interesting for us if we ever
add automatic plan cache so you could assess how often you're getting
an implicit prepared statement (something like Oracle's "soft parse")?

> BTW, the current pg_stat_statement shows unexpected time for COMMIT.  I 
> expect it to include the whole COMMIT processing, including the long WAL 
> flush and sync rep wait.  However, it only shows the time for the transaction 
> state change in memory.

That's an interesting point.  You could install a transaction hook to
measure that easily enough, but I'm not sure how useful it'd be: you'd
be grouping together COMMIT timing data from transactions that are
doing very different things (including nothing).  Would that tell you
anything actionable?  If you include commit time for COMMIT statements
then you'd also have to decide whether to include it for DML
statements that run in an implicit transaction.  The trouble with that
is that the same statement inside an explicit transaction wouldn't
have any commit time, so you'd be mixing oranges and apples.  I guess
you could fix that by putting adding "commits" and  "commit_time"
columns (= counters for this statement run as implicit transaction),
but I wonder if commit time monitoring really belongs somewhere else.

For sync rep waits, that's what the pg_stat_replication.XXX_lag
columns tell you.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Thomas Munro
On Thu, Nov 9, 2017 at 6:27 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Thu, Nov 9, 2017 at 5:03 PM, Tom Lane  wrote:
>>> Is the AC_SEARCH_LIBS configure call needed to make PG build with the
>>> FreeBSD package?
>
>> Yes.  My take is that the commit was correct: the library is needed
>> for --with-bonjour to work on non-macOS systems, and apparently it can
>> work (though I didn't personally try to assess that beyond seeing that
>> it could start up and connect to mdnsd).  Perhaps Avahi doesn't
>> qualify as a suitable Bonjour implementation any more though, and
>> someone out there might like to consider writing a --with-avahi option
>> that uses the native API it's shouting about.
>
> I'm not sure what to do at this point.  I concur that the AC_SEARCH_LIBS
> call is helpful if you're using mDNSResponder on FreeBSD (or wherever
> else that may be available) ... but I'm worried that it will enable
> people to create broken builds on Linux without trying very hard.
> We might be wise to deem that putting that call in is just creating
> an attractive nuisance.
>
> This would certainly be easier if we had a certifiably-working interface
> to the avahi library.  But we don't, and I don't plan to write one,
> and I doubt anyone else will come out of the woodwork to do it either.
>
> Is there really much interest in Bonjour support on non-macOS platforms?
> I hadn't heard that anybody but Apple was invested in it.

Not from me.  My only interest here was to pipe up because I knew that
what was originally proposed would have broken stuff on macOS, and
after that piqued curiosity.  I won't mind at all if you revert the
commit to prevent confusion.  If the intersection of FreeBSD,
PostgreSQL and Bonjour users is a non-empty set, [s]he might at least
find this archived discussion useful...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Thomas Munro
On Thu, Nov 9, 2017 at 5:03 PM, Tom Lane  wrote:
> Is the AC_SEARCH_LIBS configure call needed to make PG build with the
> FreeBSD package?

Yes.  My take is that the commit was correct: the library is needed
for --with-bonjour to work on non-macOS systems, and apparently it can
work (though I didn't personally try to assess that beyond seeing that
it could start up and connect to mdnsd).  Perhaps Avahi doesn't
qualify as a suitable Bonjour implementation any more though, and
someone out there might like to consider writing a --with-avahi option
that uses the native API it's shouting about.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-11-08 Thread Thomas Munro
On Wed, Nov 8, 2017 at 5:57 PM, Amit Khandekar  wrote:
> On 8 November 2017 at 07:55, Thomas Munro  
> wrote:
>> On Tue, Nov 7, 2017 at 8:03 AM, Robert Haas  wrote:
>>> The changes to trigger.c still make me super-nervous.  Hey THOMAS
>>> MUNRO, any chance you could review that part?

At first, it seemed quite strange to me that row triggers and
statement triggers fire different events for the same modification.
Row triggers see DELETE +  INSERT (necessarily because different
tables are involved), but this fact is hidden from the target table's
statement triggers.

The alternative would be for all triggers to see consistent events and
transitions.  Instead of having your special case code in ExecInsert
and ExecDelete that creates the two halves of a 'synthetic' UPDATE for
the transition tables, you'd just let the existing ExecInsert and
ExecDelete code do its thing, and you'd need a flag to record that you
should also fire INSERT/DELETE after statement triggers if any rows
moved.

After sleeping on this question, I am coming around to the view that
the way you have it is right.  The distinction isn't really between
row triggers and statement triggers, it's between triggers at
different levels in the hierarchy.  It just so happens that we
currently only fire target table statement triggers and leaf table row
triggers.  Future development ideas that seem consistent with your
choice:

1.  If we ever allow row triggers with transition tables on child
tables, then I think *their* transition tables should certainly see
the deletes and inserts, otherwise OLD TABLE and NEW TABLE would be
inconsistent with the OLD and NEW variables in a single trigger
invocation.  (These were prohibited mainly due to lack of time and
(AFAIK) limited usefulness; I think they would need probably need
their own separate tuplestores, or possibly some kind of filtering.)

2.  If we ever allow row triggers on partitioned tables (ie that fire
when its children are modified), then I think their UPDATE trigger
should probably fire when a row moves between any two (grand-)*child
tables, just as you have it for target table statement triggers.  It
doesn't matter that the view from parent tables' triggers is
inconsistent with the view from leaf table triggers: it's a feature
that we 'hide' partitioning from the user to the extent we can so that
you can treat the partitioned table just like a table.

Any other views?

As for the code, I haven't figured out how to break it yet, and I'm
wondering if there is some way to refactor so that ExecInsert and
ExecDelete don't have to record pseudo-UPDATE trigger events.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Thomas Munro
On Thu, Nov 9, 2017 at 1:39 PM, Tom Lane  wrote:
> Luke Lonergan  writes:
>> On 11/8/17, 3:00 PM, "Tom Lane"  wrote:
>>> BTW, when I try this on Fedora 25, it builds cleanly but the feature
>>> doesn't seem to work --- I get this at postmaster start:
>>> ...
>>> I wonder which libdns_sd you are using.
>
>> libavahi-compat-libdnssd1:amd64: /usr/lib/x86_64-linux-gnu/libdns_sd.so.1.0.0
>
> Hm, the library on F25 is also avahi's.  Digging in the archives, I find
> this old thread reporting the same behavior:
>
> https://www.postgresql.org/message-id/flat/17824.1252293423%40sss.pgh.pa.us
>
> So now I'm wondering if you know something the rest of us don't about
> how to configure the platform for bonjour to work.

FWIW it builds and starts up fine on FreeBSD with
mDNSResponder-878.1.1 installed (Apache-licensed Apple Bonjour code)
and the mdnsd daemon started.  If I don't start mdnsd it shows an
error at startup.  When built against the (conflicting)
avahi-libdns-0.6.31_2 package it shows the WARNING you reported and
"DNSServiceRegister() failed: error code -65537", which might just
mean it wants to talk to some daemon I'm not running.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg V10: Patch for bug in bonjour support

2017-11-08 Thread Thomas Munro
On Thu, Nov 9, 2017 at 10:05 AM, Luke Lonergan  wrote:
>   if test "$with_bonjour" = yes ; then
>
> AC_CHECK_HEADER(dns_sd.h, [], [AC_MSG_ERROR([header file  is
> required for Bonjour])])
>
> +   AC_CHECK_LIB(dns_sd, DNSServiceRefSockFD, [], [AC_MSG_ERROR([library
> 'dns_sd' is required for Bonjour])])
>
>   fi

Hi Luke,

It lives in libSystem.dylib (implicitly linked) on macOS, so that
would break the build there.  We'd need something a bit more
conditional, but I don't know what.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-11-07 Thread Thomas Munro
On Wed, Nov 8, 2017 at 5:57 PM, Amit Khandekar  wrote:
> Thomas, can you please try the attached incremental patch
> regress_locale_changes.patch and check if the test passes ? The patch
> is to be applied on the main v22 patch. If the test passes, I will
> include these changes (also for list_parted) in the upcoming v23
> patch.

That looks good.  Thanks.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-11-07 Thread Thomas Munro
On Tue, Nov 7, 2017 at 8:03 AM, Robert Haas  wrote:
> The changes to trigger.c still make me super-nervous.  Hey THOMAS
> MUNRO, any chance you could review that part?

Looking, but here's one silly thing that jumped out at me while
getting started with this patch.  I cannot seem to convince my macOS
system to agree with the expected sort order from :show_data, where
underscores precede numbers:

  part_a_10_a_20 | a | 10 | 200 |  1 |
  part_a_1_a_10  | a |  1 |   1 |  1 |
- part_d_1_15| b | 15 | 146 |  1 |
- part_d_1_15| b | 16 | 147 |  2 |
  part_d_15_20   | b | 17 | 155 | 16 |
  part_d_15_20   | b | 19 | 155 | 19 |
+ part_d_1_15| b | 15 | 146 |  1 |
+ part_d_1_15| b | 16 | 147 |  2 |

It seems that macOS (like older BSDs) just doesn't know how to sort
Unicode and falls back to sorting the bits.  I expect that means that
the test will also fail on any other OS with "make check
LC_COLLATE=C".  I believe our regression tests are supposed to pass
with a wide range of collations including C, so I wonder if this means
we should stick a leading zero on those single digit numbers, or
something, to stabilise the output.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] OpenTemporaryFile() vs resowner.c

2017-11-07 Thread Thomas Munro
Hi hackers,

Andres, Robert and Peter G rightly complained[1] that my shared
temporary file patch opens a file, then calls
ResourceOwnerEnlargeFiles() which can fail due to lack of memory, and
then registers the file handle to make sure we don't leak it.  Doh.
The whole point of the separate ResourceOwnerEnlargeXXX() interface is
to be able to put it before resource acquisition.

The existing OpenTemporaryFile() coding has the same mistake.  Please
see attached.

[1] 
https://www.postgresql.org/message-id/20171107210155.kuksdd324kgz5oev%40alap3.anarazel.de

-- 
Thomas Munro
http://www.enterprisedb.com


fix-file-leak.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Thomas Munro
Hi Peter,

See responses to a couple of points below.  I'll respond to the other
points separately (ie with code/comment changes).

On Wed, Nov 8, 2017 at 10:32 AM, Peter Geoghegan  wrote:
> On Tue, Nov 7, 2017 at 1:01 PM, Andres Freund  wrote:
>> +/*
>> + * Delete a BufFile that was created by BufFileCreateShared in the given
>> + * SharedFileSet using the given name.
>> + *
>> + * It is not necessary to delete files explicitly with this function.  It is
>> + * provided only as a way to delete files proactively, rather than waiting 
>> for
>> + * the SharedFileSet to be cleaned up.
>> + *
>> + * Only one backend should attempt to delete a given name, and should know
>> + * that it exists and has been exported or closed.
>> + */
>
> This part is new to me. We now want one backend to delete a given
> filename. What changed? Please provide a Message-Id reference if that
> will help me to understand.
>
> For now, I'm going to guess that this development had something to do
> with the need to deal with virtual FDs that do a close() on an FD to
> keep under backend limits. Do I have that right?

No -- this is simply an option available to client code that wants to
clean up individual temporary files earlier.  Such client code is
responsible for meeting the synchronisation requirements described in
the comment, but it's entirely optional.  For example:
SharedTuplestore is backed by multiple files and it could take the
opportunity to delete individual files after they've been scanned, if
you told it you were going to scan only once
(SHARED_TUPLESTORE_SINGLE_PASS) and if it could prove that no other
backend could still need to read from it, as mentioned in a comment in
sts_end_parallel_scan().

However, since I changed to the page/chunk based model (spinlock while
advancing block counter, mimicking Parallel Seq Scan) instead of the
earlier Fisher Price version (LWLock while reading each tuple with a
shared read head maintained with tell/seek), I don't actually do that.
Hitting the end of the file no longer means that no one else is
reading from the file (someone else might still be reading from an
earlier chunk even though you've finished reading the final chunk in
the file, and the vfd systems means that they must be free to close
and reopen the file at any time).  In the current patch version the
files are cleaned up wholesale at two times: SharedFileSet cleanup
triggered by DSM destruction, and SharedFileSet reset triggered by
rescan.  Practically, it's always the former case.  It's vanishingly
rare that you'd actually want to be rescanning a Parallel Hash that
spills to disk but in that case we delete the old files and recreate,
and that case is tested in the regression tests.

If it bothers you that I have an API there that I'm not actually using
yet, I will remove it.

>> +   if (vfdP->fdstate & FD_TEMP_FILE_LIMIT)
>> +   {
>> +   /* Subtract its size from current usage (do first in case of 
>> error) */
>> +   temporary_files_size -= vfdP->fileSize;
>> +   vfdP->fileSize = 0;
>> +   }
>>
>> So, is it right to do so unconditionally and without regard for errors?
>> If the file isn't deleted, it shouldn't be subtracted from fileSize. I
>> guess you're managing that through the flag, but that's not entirely
>> obvious.
>
> I think that the problem here is that the accounting is expected to
> always work. It's not like there is a resowner style error path in
> which temporary_files_size gets reset to 0.

But there is a resowner error path in which File handles get
automatically closed and temporary_files_size gets adjusted.  The
counter goes up when you create, and goes down when you close or
resowner closes for you.  Eventually you either close and the
bookkeeping is consistent or you crash and it doesn't matter.  And
some kind of freak multiple close attempt is guarded against by
setting the files size to 0 so we can't double-subtract.  Do you see a
bug?

None of this has any impact on whether files are leaked: either
SharedFileSet removes the files, or you crash (or take a filesystem
snapshot, etc) and RemovePgTempFiles() mops them up at the next clean
startup.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-07 Thread Thomas Munro
On Wed, Nov 8, 2017 at 10:32 AM, Robert Haas  wrote:
> On Tue, Nov 7, 2017 at 4:01 PM, Andres Freund  wrote:
>> diff --git a/src/backend/utils/resowner/resowner.c 
>> b/src/backend/utils/resowner/resowner.c
>> index 4c35ccf65eb..8b91d5a6ebe 100644
>> --- a/src/backend/utils/resowner/resowner.c
>> +++ b/src/backend/utils/resowner/resowner.c
>> @@ -528,16 +528,6 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
>> PrintRelCacheLeakWarning(res);
>> RelationClose(res);
>> }
>> -
>> -   /* Ditto for dynamic shared memory segments */
>> -   while (ResourceArrayGetAny(&(owner->dsmarr), &foundres))
>> -   {
>> -   dsm_segment *res = (dsm_segment *) 
>> DatumGetPointer(foundres);
>> -
>> -   if (isCommit)
>> -   PrintDSMLeakWarning(res);
>> -   dsm_detach(res);
>> -   }
>> }
>> else if (phase == RESOURCE_RELEASE_LOCKS)
>> {
>> @@ -654,6 +644,16 @@ ResourceOwnerReleaseInternal(ResourceOwner owner,
>> PrintFileLeakWarning(res);
>> FileClose(res);
>> }
>> +
>> +   /* Ditto for dynamic shared memory segments */
>> +   while (ResourceArrayGetAny(&(owner->dsmarr), &foundres))
>> +   {
>> +   dsm_segment *res = (dsm_segment *) 
>> DatumGetPointer(foundres);
>> +
>> +   if (isCommit)
>> +   PrintDSMLeakWarning(res);
>> +   dsm_detach(res);
>> +   }
>> }
>>
>> Is that entirely unproblematic? Are there any DSM callbacks that rely on
>> locks still being held? Please split this part into a separate commit
>> with such analysis.
>
> FWIW, I think this change is a really good idea (I recommended it to
> Thomas at some stage, I think).

Yeah, it was Robert's suggestion; I thought I needed *something* like
this but was hesitant for the niggling reason that Andres mentions:
what if someone somewhere (including code outside our source tree)
depends on this ordering because of unlocking etc?

At that time I thought that my clean-up logic wasn't going to work on
Windows without this reordering, because we were potentially closing
file handles after unlinking the files, and I was under the impression
that Windows wouldn't like that.  Since then I've learned that Windows
does actually allow it, but only if all file handles were opened with
the FILE_SHARE_DELETE flag.  We always do that (see src/port/open.c),
so in fact this change is probably not needed for my patch set (theory
not tested).  I will put it in a separate patch as requested by
Andres, because it's generally a good idea anyway for the reasons that
Robert explained (ie you probably always want to clean up memory last,
since it might contain the meta-data/locks/control objects/whatever
you'll need to clean up anything else).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Planning counters in pg_stat_statements

2017-11-06 Thread Thomas Munro
Hi hackers,

I have often wanted $SUBJECT and was happy to find that Fujii-san had
posted a patch five years ago[1].  The reception then seemed positive.
So here is a refurbished and (hopefully) improved version of his patch
with a new column for the replan count.  Thoughts?

Example output:

 query  | plans | plan_time | calls | total_time
+---+---+---+
 prepare x as select $1 | 1 | 0.026 |12 |   0.06
 select substr(query, $1, $2),  |11 | 1.427 |11 |  3.565
 prepare y as select * from foo | 2 | 7.336 | 5 |  0.331

I agree with the sentiment on the old thread that
{total,min,max,mean,stddev}_time now seem badly named, but adding
"execution" makes them so long...  Thoughts?

[1] 
https://www.postgresql.org/message-id/CAHGQGwFx_%3DDO-Gu-MfPW3VQ4qC7TfVdH2zHmvZfrGv6fQ3D-Tw%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


pg-stat-statements-planning-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Statement-level rollback

2017-11-06 Thread Thomas Munro
On Wed, Nov 1, 2017 at 6:47 AM, MauMau  wrote:
> From: Simon Riggs
> On 14 August 2017 at 23:58, Peter Eisentraut
>  wrote:
>> On 2/28/17 02:39, Tsunakawa, Takayuki wrote:
>>> The code for stored functions is not written yet, but I'd like your
> feedback for the specification and design based on the current patch.
> I'll add this patch to CommitFest 2017-3.
>>
>> This patch needs to be rebased for the upcoming commit fest.
>
> I'm willing to review this if the patch is going to be actively worked
> on.
>
>
> I'm very sorry I couldn't reply to your kind offer.  I rebased the
> patch and will add it to CF 2017/11.  I hope I will complete the patch
> in this CF.

Hi Tsunakawa-san,

With your v2 patch "make docs" fails.  Here is a small patch to apply
on top of yours to fix that and some small copy/paste errors, if I
understood correctly.

-- 
Thomas Munro
http://www.enterprisedb.com


docs-suggestion.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Flexible configuration for full-text search

2017-11-05 Thread Thomas Munro
On Sat, Oct 21, 2017 at 1:39 AM, Aleksandr Parfenov
 wrote:
> In attachment updated patch with fixes of empty XML tags in
> documentation.

Hi Aleksandr,

I'm not sure if this is expected at this stage, but just in case you
aren't aware, with this version of the patch the binary upgrade test
in
src/bin/pg_dump/t/002_pg_dump.pl fails for me:

#   Failed test 'binary_upgrade: dumps ALTER TEXT SEARCH CONFIGURATION
dump_test.alt_ts_conf1 ...'
#   at t/002_pg_dump.pl line 6715.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Add ALWAYS DEFERRED option for constraints

2017-11-05 Thread Thomas Munro
On Fri, Oct 20, 2017 at 9:05 AM, Nico Williams  wrote:
> Rebased (there were conflicts in the SGML files).

Hi Nico

FYI that version has some stray absolute paths in constraints.source:

-COPY COPY_TBL FROM '@abs_srcdir@/data/constro.data';
+COPY COPY_TBL FROM '/home/nico/ws/postgres/src/test/regress/data/constro.data';

-COPY COPY_TBL FROM '@abs_srcdir@/data/constrf.data';
+COPY COPY_TBL FROM '/home/nico/ws/postgres/src/test/regress/data/constrf.data';

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Overestimated filter cost and its mitigation

2017-11-05 Thread Thomas Munro
On Mon, Sep 11, 2017 at 7:43 PM, Yuto Hayamizu  wrote:
> Suppose that there are three qual clauses in a scan node, current
> postgres estimates per-tuple cost of the filter to be:
>cost(A) + cost(B) + cost(C)
>
> And basic idea of the attached patch is:
>cost(A) + clauselist_selectivity({A}) * cost(B) +
> clauselist_selectivity({A, B}) * cost(C)

I am no planner expert and I haven't tested or studied the patch in
detail, but here is some feedback for what it's worth.

This idea seems to makes intuitive sense.  I see that you use
order_qual_clauses() to know what order they'll run in, so I'm
wondering if there is any reason we shouldn't do it up front and keep
it during path building, instead of running it again at plan creation
time.  Is there some way it could ever produce a different result at
the two times?  Why not also apply this logic to qpquals of joins,
foreign scans, subplans?  That is, instead of replacing cost_qual_eval
with this code for baserels, I wonder if we should teach
cost_qual_eval how to do this so those other users could also benefit
(having perhaps already ordered the qual clauses earlier).

This is one of those patches that risks having an impact on many query
plans.  Yikes.  Of all the regression test queries, only
updatable_views complained though, and that involves leakproof
functions.  I see that those get some kind of special treatment in
order_qual_clauses().

+ ordered_clauses = order_qual_clauses(root, rel->baserestrictinfo);
+ clause_list = ordered_clauses;

Is clause_list necessary?  Can't you just use ordered_clauses for the
outer and inner loops?

+ List *clause_list_for_sel = NULL;

The convention is to use NIL for empty lists (a nod to the Lisp
machine they prototyped this project on).

+ /* Make a temporary clause list for selectivity calcuation */

s/calcuation/calculation/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Secondary index access optimizations

2017-11-05 Thread Thomas Munro
On Fri, Sep 8, 2017 at 3:58 AM, Konstantin Knizhnik
 wrote:
> Updated version of the patch is attached to this mail.
> Also I added support of date type to operator_predicate_proof to be able to
> imply (logdate <= '2017-03-31') from (logdate < '2017-04-01') .

Hi Konstantin,

Is there any reason why you don't want to split this into two separate
proposals?  One for remove_restrictions_implied_by_constraints() and
one for the operator_predicate_proof() changes.

Your v3 patch breaks the new partition_join test (the recently
committed partition-wise join stuff), as far as I can tell in a good
way.  Can you please double check those changes and post an updated
patch?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Restricting maximum keep segments by repslots

2017-11-05 Thread Thomas Munro
On Tue, Oct 31, 2017 at 10:43 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, this is a rebased version.

Hello Horiguchi-san,

I think the "ddl" test under contrib/test_decoding also needs to be
updated because it looks at pg_replication_slots and doesn't expect
your new columns.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers



Re: [HACKERS] LDAPS

2017-11-03 Thread Thomas Munro
On Sat, Nov 4, 2017 at 2:05 AM, Thomas Munro
 wrote:
> That
> said, I've only tested the attached lightly on FreeBSD + OpenLDAP and
> don't know if it'll work elsewhere.

Oops, that version's TAP test was a little too dependent on my
system's ldap.conf file.  Here's a version that sets the LDAPCONF env
var to fix that.

-- 
Thomas Munro
http://www.enterprisedb.com


ldaps-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LDAPS

2017-11-03 Thread Thomas Munro
Hi hackers,

I've run into a few requests for $SUBJECT in the field.  I understand
that this is a bit controversial: LDAP + StartTLS (what we already
support) is better than LDAPS because it's a proper standard, and LDAP
auth in general is not as good as some other authentication methods
that we should be encouraging.  I don't actually have a strong view on
whether we should support it myself, but I took a swing at it to see
if there would be any technical obstacles.  I did not find any.  That
said, I've only tested the attached lightly on FreeBSD + OpenLDAP and
don't know if it'll work elsewhere.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


ldaps-v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] LDAP URI decoding bugs

2017-11-03 Thread Thomas Munro
Hi hackers,

1.  If you set up a pg_hba.conf with a URL that lacks a base DN or
hostname, hba.c will segfault on startup when it tries to pstrdup a
null pointer.  Examples: ldapurl="ldap://localhost"; and
ldapurl="ldap://";.

2.  If we fail to bind but have no binddn configured, we'll pass NULL
to ereport (snprint?) for %s, which segfaults on some libc
implementations.  That crash requires more effort to reproduce but you
can see pretty clearly a few lines above in auth.c that it can be
NULL.  (I'm surprised Coverity didn't complain about that.  Maybe it
can't see this code due to macros.)

Please see attached.

-- 
Thomas Munro
http://www.enterprisedb.com


ldap-fixes.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-11-02 Thread Thomas Munro
On Fri, Nov 3, 2017 at 2:24 PM, Peter Geoghegan  wrote:
> Thomas Munro  wrote:
>> That way you don't have to opt in to BufFile's
>> double buffering and segmentation schemes just to get shared file
>> clean-up, if for some reason you want direct file handles.
>
> Is that something that you really think is possible?

It's pretty far fetched, but maybe shared temporary relation files
accessed via smgr.c/md.c?  Or maybe future things that don't want to
read/write through a buffer but instead want to mmap it.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-11-02 Thread Thomas Munro
On Wed, Nov 1, 2017 at 2:11 PM, Peter Geoghegan  wrote:
> On Tue, Oct 31, 2017 at 5:07 PM, Thomas Munro
>  wrote:
>> Another complaint is that perhaps fd.c
>> knows too much about buffile.c's business.  For example,
>> RemovePgTempFilesInDir() knows about the ".set" directories created by
>> buffile.c, which might be called a layering violation.  Perhaps the
>> set/directory logic should move entirely into fd.c, so you'd call
>> FileSetInit(FileSet *), not BufFileSetInit(BufFileSet *), and then
>> BufFileOpenShared() would take a FileSet *, not a BufFileSet *.
>> Thoughts?
>
> I'm going to make an item on my personal TODO list for that. No useful
> insights on that right now, though.

I decided to try that, but it didn't really work: fd.h gets included
by front-end code, so I can't very well define a struct and declare
functions that deal in dsm_segment and slock_t.  On the other hand it
does seem a bit better to for these shared file sets to work in terms
of File, not BufFile.  That way you don't have to opt in to BufFile's
double buffering and segmentation schemes just to get shared file
clean-up, if for some reason you want direct file handles.  So I in
the v24 parallel hash patch set I just posted over in the other thread
I have moved it into its own translation unit sharedfileset.c and made
it work with File objects.  buffile.c knows how to use it as a source
of segment files.  I think that's better.

> If the new standard is that you have temp file names that suggest the
> purpose of each temp file, then that may be something that parallel
> CREATE INDEX should buy into.

Yeah, I guess that could be useful.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-11-02 Thread Thomas Munro
On Mon, Oct 30, 2017 at 1:49 PM, Thomas Munro
 wrote:
> A couple of stupid things outstanding:
>
> 1.  EXPLAIN ANALYZE for Parallel Hash "actual" shows the complete row
> count, which is interesting to know (or not?  maybe I should show it
> somewhere else?), but the costing shows the partial row estimate used
> for costing purposes.

Fixed.

> 2.  The BufFileSet's temporary directory gets created even if you
> don't need it for batches.  Duh.

Fixed.

I also refactored shared temporary files a bit while looking into
this.  The shared file ownership mechanism is now promoted to its own
translation unit sharedfileset.c and it now works with fd.c files.
buffile.c can still make use of it.  That seems like a better division
of labour.

> 3.  I don't have a good query rescan regression query yet.  I wish I
> could write my own query plans to test the executor.

I found a query that rescans a parallel-aware hash join and added a
couple of variants to the regression tests.

I also decluttered the EXPLAIN ANALYZE output for enable_parallel_hash
= off a bit.

-- 
Thomas Munro
http://www.enterprisedb.com


parallel-hash-v24.patchset.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ucs_wcwidth vintage

2017-11-01 Thread Thomas Munro
Hi hackers,

src/backend/utils/mb/wchar.c contains a ~16 year old wcwidth
implementation that originally arrived in commit df4cba68, but the
upstream code[1] apparently continued evolving and there have been
more Unicode revisions since.  It probably doesn't matter much: the
observation made by Zr40 in the #postgresql IRC channel that lead me
to guess that this code might be responsible is that emojis screw up
psql's formatting, since current terminal emulators recognise them as
double-width but PostgreSQL doesn't.  Still, it's interesting that we
have artefacts deriving from various different frozen versions of the
Unicode standard in the source tree, and that might affect some proper
languages.

🤔

[1] http://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel tuplesort (for parallel B-Tree index creation)

2017-10-31 Thread Thomas Munro
' (inner
batch 3 of 8).  There is no need for there to be in any sort of
central registry for that though, because it rides on top of the
guarantees from 2 above: buffile.c will put those files into a
uniquely named directory, and that works as long as no one else is
allowed to create files or directories in the temp directory that
collide with its reserved pattern /^pgsql_tmp.+\.set$/.  For the same
reason, parallel CREATE INDEX is free to use worker numbers as BufFile
names, since it has its own BufFileSet to work within.

> * What's this all about?:
>
> + /* Accessor for the SharedBufFileSet that is at the end of Sharedsort. */
> + #define GetSharedBufFileSet(shared)\
> +   ((BufFileSet *) (&(shared)->tapes[(shared)->nTapes]))

In an earlier version, BufFileSet was one of those annoying data
structures with a FLEXIBLE_ARRAY_MEMBER that you'd use as an
incomplete type (declared but not defined in the includable header),
and here it was being used "inside" (or rather after) SharedSort,
which *itself* had a FLEXIBLE_ARRAY_MEMBER.  The reason for the
variable sized object was that I needed all backends to agree on the
set of temporary tablespace OIDs, of which there could be any number,
but I also needed a 'flat' (pointer-free) object I could stick in
relocatable shared memory.  In the newest version I changed that
flexible array to tablespaces[8], because 8 should be enough
tablespaces for anyone (TM).  I don't really believe anyone uses
temp_tablespaces for IO load balancing anymore and I hate code like
the above.  So I think Rushabh should now remove the above-quoted code
and just use a BufFileSet directly as a member of SharedSort.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-10-30 Thread Thomas Munro
On Tue, Aug 1, 2017 at 9:28 AM, Andres Freund  wrote:
> On 2017-07-26 20:12:56 +1200, Thomas Munro wrote:
>> I'll report on performance separately.
>
> Looking forward to that ;)

Here are some experimental results from a Xeon E5-2695 v3 with a ton
of RAM and spinning disks (EDB lab server "scylla").  I used TPC-H
dbgen scale 10 with the additional indexes suggested by Tomas
Vondra[1].  10GB of source data (= 23GB pgdata dir) is obviously quite
small as these things go, and I hope we'll run some of these with a
much larger scale soon (it's a big job), but it's enough to runs
queries for tens of seconds to minutes so it's definitely in parallel
query territory and shows some pretty interesting effects IMHO.

First, here's a stupid self-join as a warm-up.  The query is SELECT
COUNT(*) FROM lineitem r JOIN lineitem s USING (l_orderkey,
l_linenumber), where lineitem is ~60 million rows.

(1) With work_mem set sky-high so no batching is required, how much
speed-up does each worker contribute with this feature off (= same as
unpatched master) and on?  In this table, each cell shows the speed-up
compared to w=0 (no workers):

 parallel_hash |  w=0  |  w=1  |  w=2  |  w=3  |  w=4  |  w=5  |  w=6
---+---+---+---+---+---+---+---
 off   | 1.00x | 1.42x | 1.66x | 1.76x | 1.66x | 1.68x | 1.69x
 on| 1.00x | 1.76x | 2.54x | 3.21x | 3.80x | 4.30x | 4.79x

(2) With work_mem set to 128MB this query needs 32 batches.  Again,
how much speed-up does each worker contribute with the feature off and
on?

 parallel_hash |  w=0  |  w=1  |  w=2  |  w=3  |  w=4  |  w=5  |  w=6
---+---+---+---+---+---+---+---
 off   | 1.00x | 1.37x | 1.49x | 1.32x | 1.67x | 1.63x | 1.64x
 on| 1.00x | 1.94x | 2.72x | 2.82x | 3.02x | 4.64x | 5.98x

I haven't tried to grok the shape of that curve yet.  Interestingly
(not shown here) the 32 batch parallel hash actually starts to beat
the single-batch parallel hash somewhere around 5-6 workers, and at 15
workers it achieves 9.53x speed-up compared to w=0 and is still
gaining as you add more workers, whereas the single-batch version tops
out around 8 workers.  This may be in part due to the trade-offs
discussed in "Design and Evaluation of Main Memory Hash Join
Algorithms for Multi-core CPUs" (short version: partitioning up front
can pay off by reducing cache misses at various levels and some
research databases would consider that), but I would think we're
probably pretty far away from that frontier and there other probably
other more basic problems.  Investigation/profiling required.

Next, here are some numbers from the TPC-H queries.  I included only
queries where a Parallel Hash was selected by the planner.  I stopped
at w=6 because that's the highest number of workers the planner would
pick by default at that scale.  (If I'm getting the maths right, TPC-H
scale 300's lineitem table should inspire about 10 workers to get out
of bed; you get an extra worker each time a table triples in size.)

(3) What is the speed-up with enable_parallel_hash = on vs
enable_parallel_hash = off?  Here is the result table for various
numbers of workers, with work_mem set to 1GB.

 query |  w=0  |  w=1  |  w=2  |  w=3  |  w=4  |  w=5  |  w=6
---+---+---+---+---+---+---+---
 3 | 1.02x | 1.16x | 1.37x | 1.79x | 1.95x | 2.29x | 2.44x
 5 | 1.03x | 1.15x | 1.20x | 1.44x | 1.95x | 2.05x | 1.34x
 7 | 1.02x | 1.26x | 1.54x | 2.18x | 2.57x | 1.25x | 1.32x
 8 | 1.00x | 1.56x | 1.49x | 1.47x | 1.40x | 0.55x | 0.55x
 9 | 1.02x | 1.24x | 1.35x | 1.50x | 1.59x | 1.82x | 1.82x
10 | 1.02x | 1.16x | 1.19x | 1.44x | 1.51x | 1.75x | 1.83x
12 | 1.01x | 1.22x | 1.53x | 0.72x | 0.74x | 0.73x | 0.99x
14 | 1.00x | 1.08x | 1.18x | 1.33x | 1.41x | 1.54x | 1.52x
16 | 1.01x | 1.22x | 1.10x | 1.10x | 1.10x | 1.11x | 1.10x
18 | 0.99x | 1.07x | 1.05x | 0.99x | 0.99x | 0.99x | 1.03x
21 | 1.00x | 1.28x | 1.24x | 1.34x | 0.18x | 0.19x | 0.23x

Some commentary on the cases where the performance was apparently hurt
by the feature: for Q21 with w=3 workers and above with
enable_parallel_hash = off the planner switched from a hash join to a
nested loop and that turned out to be better, but with
enable_parallel_hash = on it didn't give up on hash join until there
were 6 workers.  Something similar happened with Q8 around 5 workers.
Q21 has some major cardinality estimation problems as discussed
elsewhere, and on this run I didn't think to apply the patch that
fixes (?) that.  In other words, as far as I can tell, all of those
are cases where there is possibly room for general planner improvement
outside this project: the point at which we flip from one plan type to
another moves around, not necessarily indicating a problem with
Parallel Hash as an executor node.  Tha

Re: [HACKERS] Parallel Hash take II

2017-10-29 Thread Thomas Munro
On Fri, Oct 27, 2017 at 12:24 AM, Rushabh Lathia
 wrote:
> While re-basing the parallel-B-tree-index-build patch on top v22 patch
> sets, found cosmetic review:

Thanks!

> 1) BufFileSetEstimate is removed but it's still into buffile.h
>
> +extern size_t BufFileSetEstimate(int stripes);

Fixed.

> 2) BufFileSetCreate is renamed with BufFileSetInit, but used at below
> place in comments:
>
> * Attach to a set of named BufFiles that was created with BufFileSetCreate.

Fixed.

Other minor tweaks: I fixed a harmless warning from Visual C++ and
added a CHECK_FOR_INTERRUPTS() to
ExecParallelHashJoinPartitionOuter()'s loop.

Two other changes:

1.  Improved concurrency for sharedtuplestore.c.

Last week I investigated some test failures on AppVeyor CI and
discovered a small problem with sharedtuplestore.c on Windows: it
could occasionally get ERROR_ACCESS_DENIED errors when attempting to
open files that were concurrently being unlinked (unlinking is not
atomic on Windows, see pgsql-bugs #14243 for another manifestation).
That code was a bit sloppy (though not incorrect), and was easy to fix
by doing some checks in a different order, but...

While hacking on that I realised that sharedtuplestore.c's parallel
scan efficiency was pretty terrible anyway, so I made an improvement
that I'd earlier threatened to make in a comment.  Instead of holding
a per-file lock while reading individual tuples, it now works in
page-multiple-sized chunks.  Readers only acquire a spinlock when they
need a new chunk, don't hold any locks while doing IO, and never read
overlapping pages.  From a user perspective, this means that EXPLAIN
(BUFFERS) temporary buffer read counts are approximately the same as
for the equivalent non-parallel hash join, because each worker reads a
disjoint set of temporary file pages instead of reading interleaved
tuples from the same pages, and there is no more LWLock "shared
tuplestore" that can show up in wait_event when backends pile up on
the same batch.  It writes very slightly more than it reads because of
unread pages at the end of the final chunk (because it reads back in
tuple-at-a-time and thus page-at-a-time, not whole chunk at a time --
I considered reading whole chunks and then returning pointer to
MinimalTuples in the chunk, but that required MAXALIGNing data in the
files on disk which made the files noticeably bigger).

So, to summarise, there are now three layers of contention avoidance
strategy being used by Parallel Hash Join for scanning batches in
parallel:

i)  Participants in a Parallel Hash Join try to work on different
batches so they avoid scanning the same SharedTuplestore completely.
That's visible with EXPLAIN ANALYZE as "Batches Probed" (that shows
the number of outer batches scanned; it doesn't seem worth the pixels
to show "Batches Loaded" for the number of inner batches scanned which
may be lower).

ii)  When that's not possible, they start out reading from different
backing files by starting with the one they wrote themselves and then
go around the full set.  That means they don't contend on the per-file
read-head lock until a reader drains a whole file and then choses
another one that's still being read by someone else.

iii)  [New in this version] Since they might still finish up reading
from the same file (and often do towards the end of a join), the
tuples are chopped into multi-page chunks and participants are
allocated chunks using a spinlock-protected counter.  This is quite a
lot like Parallel Sequential Scan, with some extra complications due
to variable sized chunks.

2.  Improved oversized tuple handling.

I added a new regression test case to exercise the oversized tuple
support in ExecParallelHashLoadTuple() and sts_puttuple(), as
requested by Andres a while back.  (Thanks to Andrew Gierth for a
pointer on how to get detoasted tuples into a hash join table without
disabling parallelism.)  While testing that I realised that my
defences against some degenerate behaviour with very small work_mem
weren't quite good enough.  Previously, I adjusted space_allowed to
make sure every backend could allocate at least one memory chunk
without triggering an increase in the number of batches.  Now, I leave
space_allowed alone but explicitly allow every backend to allocate at
least one chunk ignoring the memory budget (whether regular chunk size
or oversized tuple size), to avoid futile repeated batch increases
when a single monster tuple is never going to fit in work_mem.

A couple of stupid things outstanding:

1.  EXPLAIN ANALYZE for Parallel Hash "actual" shows the complete row
count, which is interesting to know (or not?  maybe I should show it
somewhere else?), but the costing shows the partial row estimate used
for costing purposes.
2.  The BufFileSet's temporary directory gets created even if you
don't need it for batches.  Duh.
3.  I d

Re: [HACKERS] Causal reads take II

2017-10-27 Thread Thomas Munro
On Sun, Oct 1, 2017 at 10:03 AM, Thomas Munro
 wrote:
>> I tried few more times, and I've got it two times from four attempts on a
>> fresh
>> installation (when all instances were on the same machine). But anyway I'll
>> try
>> to investigate, maybe it has something to do with my environment.
>>
>> ...
>>
>> 2017-09-30 15:47:55.154 CEST [6030] LOG:  invalid magic number  in log
>> segment 00010020, offset 10092544
>
> Hi Dmitry,
>
> Thanks for testing.  Yeah, it looks like the patch may be corrupting
> the WAL stream in some case that I didn't hit in my own testing
> procedure.  I will try to reproduce these failures.

Hi Dmitry,

I managed to reproduce something like this on one of my home lab
machines running a different OS.  Not sure why yet and it doesn't
happen on my primary development box which is how I hadn't noticed it.
I will investigate and aim to get a fix posted in time for the
Commitfest.  I'm also hoping to corner Simon at PGDay Australia in a
couple of weeks to discuss this proposal...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-25 Thread Thomas Munro
On Sat, Oct 14, 2017 at 3:08 AM, Andrew Dunstan
 wrote:
> I'll add some info to the wiki. Unfortunately, the tests fail on the
> tablespace test because they are running as a privileged user. We need
> to find a way around that, still looking into it. (See
> <https://blog.2ndquadrant.com/setting-build-machine-visual-studio-2017/>
> which describes how I get around that when running by hand).

Thanks for that pointer.  "runas" seems to be no good in batch files
since it asks for a password interactively, but I managed to get the
tablespace test to pass by running it like this:

test_script:
  - net user testuser password1234! /add
  - psexec.exe -u testuser -p password1234! -w
c:\projects\postgres\src\tools\msvc perl.exe vcregress.pl check

(It probably goes without saying but I'll say it anyway for anyone who
might find this: this plaintext-password-in-script-files stuff is
intended for use on self-destructing isolated build bot images only
and should never be done on a computer you care about.)

Hooray!  Now I can go and figure out why my Parallel Hash regression
test is failing with file permissions problems on Windows...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-10-24 Thread Thomas Munro
On Tue, Oct 24, 2017 at 10:10 PM, Thomas Munro
 wrote:
> Here is an updated patch set that does that ^.

It's a bit hard to understand what's going on with the v21 patch set I
posted yesterday because EXPLAIN ANALYZE doesn't tell you anything
interesting.  Also, if you apply the multiplex_gather patch[1] I
posted recently and set multiplex_gather to off then it doesn't tell
you anything at all, because the leader has no hash table (I suppose
that could happen with unpatched master given sufficiently bad
timing).  Here's a new version with an extra patch that adds some
basic information about load balancing to EXPLAIN ANALYZE, inspired by
what commit bf11e7ee did for Sort.

Example output:

enable_parallel_hash = on, multiplex_gather = on:

 ->  Parallel Hash (actual rows=100 loops=3)
   Buckets: 131072  Batches: 16
   Leader:Shared Memory Usage: 3552kB  Hashed: 396120  Batches Probed: 7
   Worker 0:  Shared Memory Usage: 3552kB  Hashed: 276640  Batches Probed: 6
   Worker 1:  Shared Memory Usage: 3552kB  Hashed: 327240  Batches Probed: 6
   ->  Parallel Seq Scan on simple s (actual rows=33 loops=3)

 ->  Parallel Hash (actual rows=1000 loops=8)
   Buckets: 131072  Batches: 256
   Leader:Shared Memory Usage: 2688kB  Hashed: 1347720
Batches Probed: 36
   Worker 0:  Shared Memory Usage: 2688kB  Hashed: 1131360
Batches Probed: 33
   Worker 1:  Shared Memory Usage: 2688kB  Hashed: 1123560
Batches Probed: 38
   Worker 2:  Shared Memory Usage: 2688kB  Hashed: 1231920
Batches Probed: 38
   Worker 3:  Shared Memory Usage: 2688kB  Hashed: 1272720
Batches Probed: 34
   Worker 4:  Shared Memory Usage: 2688kB  Hashed: 1234800
Batches Probed: 33
   Worker 5:  Shared Memory Usage: 2688kB  Hashed: 1294680
Batches Probed: 37
   Worker 6:  Shared Memory Usage: 2688kB  Hashed: 1363240
Batches Probed: 35
   ->  Parallel Seq Scan on big s2 (actual rows=125 loops=8)

enable_parallel_hash = on, multiplex_gather = off (ie no leader participation):

 ->  Parallel Hash (actual rows=100 loops=2)
   Buckets: 131072  Batches: 16
   Worker 0:  Shared Memory Usage: 3520kB  Hashed: 475920  Batches Probed: 9
   Worker 1:  Shared Memory Usage: 3520kB  Hashed: 524080  Batches Probed: 8
   ->  Parallel Seq Scan on simple s (actual rows=50 loops=2)

enable_parallel_hash = off, multiplex_gather = on:

 ->  Hash (actual rows=100 loops=3)
   Buckets: 131072  Batches: 16
   Leader:Memory Usage: 3227kB
   Worker 0:  Memory Usage: 3227kB
   Worker 1:  Memory Usage: 3227kB
   ->  Seq Scan on simple s (actual rows=100 loops=3)

enable_parallel_hash = off, multiplex_gather = off:

 ->  Hash (actual rows=100 loops=2)
   Buckets: 131072  Batches: 16
   Worker 0:  Memory Usage: 3227kB
   Worker 1:  Memory Usage: 3227kB
   ->  Seq Scan on simple s (actual rows=100 loops=2)

parallelism disabled (traditional single-line output, unchanged):

 ->  Hash (actual rows=100 loops=1)
   Buckets: 131072  Batches: 16  Memory Usage: 3227kB
   ->  Seq Scan on simple s (actual rows=100 loops=1)

(It actually says "Tuples Hashed", not "Hashed" but I edited the above
to fit on a standard punchcard.)  Thoughts?

[1] 
https://www.postgresql.org/message-id/CAEepm%3D2U%2B%2BLp3bNTv2Bv_kkr5NE2pOyHhxU%3DG0YTa4ZhSYhHiw%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


parallel-hash-v22.patchset.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-10-24 Thread Thomas Munro
On Tue, Sep 19, 2017 at 8:06 AM, Robert Haas  wrote:
> On Thu, Sep 14, 2017 at 10:01 AM, Thomas Munro
>  wrote:
>> 3.  Gather Merge and Parallel Hash Join may have a deadlock problem.
>
> [...]
>
> Thomas and I spent about an hour and a half brainstorming about this
> just now.
>
> [...]
>
> First, as long as nbatches == 1, we can use a hash
> table of up to size (participants * work_mem); if we have to switch to
> multiple batches, then just increase the number of batches enough that
> the current memory usage drops below work_mem.  Second, following an
> idea originally by Ashutosh Bapat whose relevance to this issue Thomas
> Munro realized during our discussion, we can make all the batches
> small enough to fit in work_mem (rather than participants * work_mem
> as the current patch does) and spread them across the workers (in the
> style of Parallel Append, including potentially deploying multiple
> workers against the same batch if there are fewer batches than
> workers).

Here is an updated patch set that does that ^.

Assorted details:

1.  To avoid deadlocks, we can only wait at barriers when we know that
all other attached backends have either arrived already or are
actively executing the code preceding the barrier wait, so that
progress is guaranteed.  The rules is that executor nodes can remain
attached to a barrier after they've emitted a tuple, which is useful
for resource management (ie avoids inventing a separate reference
counting scheme), but must never again wait for it.  With that
programming rule there can be no deadlock between executor nodes.

2.  Multiple batches are processed at the same time in parallel,
rather than being processed serially.  Participants try to spread
themselves out over different batches to reduce contention.

3.  I no longer try to handle outer joins.  I have an idea for how to
do that while adhering to the above deadlock-avoidance programming
rule, but I would like to consider that for a later patch.

4.  I moved most of the parallel-aware code into ExecParallelHash*()
functions that exist alongside the private hash table versions.  This
avoids uglifying the existing hash join code and introducing
conditional branches all over the place, as requested by Andres.  This
made some of the earlier refactoring patches unnecessary.

5.  Inner batch repartitioning, if required, is now completed up front
for Parallel Hash.  Rather than waiting until we try to load hash
tables back into memory to discover that they don't fit, this version
tracks the size of hash table contents while writing the batches out.
That change has various pros and cons, but its main purpose is to
remove dependencies between batches.

6.  Outer batch repartitioning is now done up front too, if it's
necessary.  This removes the dependencies that exist between batch 0
and later batches, but means that outer batch 0 is now written to disk
if for multi-batch joins.  I don't see any way to avoid that while
adhering to the deadlock avoidance rule stated above.  If we've
already started emitting tuples for batch 0 (by returning control to
higher nodes) then we have no deadlock-free way to wait for the scan
of the outer relation to finish, so we can't safely process later
batches.  Therefore we must buffer batch 0's tuples.  This renders the
skew optimisation useless.

7.  There is now some book-keeping state for each batch.  For typical
cases the total space used is negligible but obviously you can
contrive degenerate cases where it eats a lot of memory (say by
creating a million batches, which is unlikely to work well for other
reasons).  I have some ideas on reducing their size, but on the other
hand I also have some ideas on increasing it profitably... (this is
the perfect place to put the Bloom filters discussed elsewhere that
would  make up for the loss of the skew optimisation, for selective
joins; a subject for another day).

8.  Barrier API extended slightly.  (1) BarrierWait() is renamed to
BarrierArriveAndWait().  (2) BarrierArriveAndDetach() is new.  The new
function is the mechanism required to get from PHJ_BATCH_PROBING to
PHJ_BATCH_DONE without waiting, and corresponds to the operation known
as Phaser.arriveAndDeregister() in Java (and maybe also
barrier::arrive_and_drop() in the C++ concurrency TS and Clock.drop()
in X10, not sure).

9.  I got rid of PARALLEL_KEY_EXECUTOR_NODE_NTH().  Previously I
wanted more than one reserved smh_toc key per executor node.  Robert
didn't like that.

10.  I got rid of "LeaderGate".  That earlier attempt at deadlock
avoidance clearly missed the mark.  (I think it probably defended
against the Gather + full TupleQueue deadlock but not the
GatherMerge-induced deadlock so it was a useless non-general
solution.)

11.  The previous incarnation of SharedTuplestore had a built-in
concept of partitions, which allowed the number of partitions to

Re: [HACKERS] Block level parallel vacuum WIP

2017-10-21 Thread Thomas Munro
On Sun, Oct 22, 2017 at 5:09 AM, Robert Haas  wrote:
> On Tue, Sep 19, 2017 at 3:31 AM, Masahiko Sawada  
> wrote:
>>> Down at the bottom of the build log in the regression diffs file you can 
>>> see:
>>>
>>> ! ERROR: cache lookup failed for relation 32893
>>>
>>> https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907
>>
>> Thank you for letting me know.
>>
>> Hmm, it's an interesting failure. I'll investigate it and post the new patch.
>
> Did you ever find out what the cause of this problem was?

I wonder if it might have been the same issue that commit
19de0ab23ccba12567c18640f00b49f01471018d fixed a week or so later.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] A GUC to prevent leader processes from running subplans?

2017-10-17 Thread Thomas Munro
Hi hackers,

While testing parallelism work I've wanted to be able to prevent
gather nodes from running the plan in the leader process, and I've
heard others say the same.  One way would be to add a GUC
"multiplex_gather", like in the attached patch.  If you set it to off,
Gather and Gather Merge won't run the subplan unless they have to
because no workers could be launched.  I thought about adding a new
value for force_parallel_mode instead, but someone mentioned they
might want to do this on a production system too and
force_parallel_mode is not really for end users.  Better ideas?

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Add-a-GUC-to-control-whether-Gather-runs-subplans.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] EXPLAIN (ANALYZE, BUFFERS) reports bogus temporary buffer reads

2017-10-16 Thread Thomas Munro
Hi hackers,

Vik Fearing asked off-list why hash joins appear to read slightly more
temporary data than they write.  The reason is that we notch up a
phantom block read when we hit the end of each file.  Harmless but it
looks a bit weird and it's easily fixed.

Unpatched, a 16 batch hash join reports that we read 30 more blocks
than we wrote (2 per batch after the first, as expected):

   Buffers: shared hit=434 read=16234, temp read=5532 written=5502

With the attached patch:

   Buffers: shared hit=547 read=16121, temp read=5502 written=5502

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Don-t-count-EOF-as-a-temporary-buffer-read-in-EXPLAI.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in EphemeralNamedRelation support

2017-10-15 Thread Thomas Munro
On Fri, Oct 13, 2017 at 10:01 AM, Tom Lane  wrote:
> But I see very
> little case for allowing CTEs to capture such references, because surely
> we are never going to allow that to do anything useful, and we have
> several years of precedent now that they don't capture.

For what it's worth, SQL Server allows DML in CTEs like us but went
the other way on this.  Not only are its CTEs in scope as DML targets,
it actually lets you update them in cases where a view would be
updatable, rewriting as base table updates.  I'm not suggesting that
we should do that too (unless of course it shows up in a future
standard), just pointing it out as a curiosity.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread Thomas Munro
On Sat, Oct 14, 2017 at 11:27 AM, Thomas Munro
 wrote:
> On Sat, Oct 14, 2017 at 7:47 AM, legrand legrand
>  wrote:
>> Is it stored somewhere to permit to users like me
>> that want to test pg 10 on windows
>> without having to build it ?

Erm, wait, you said pg 10.  That's already released.  This thread is
about building and automatically testing development code on Windows.
If it's pre-built releases of PostgreSQL you're looking for, they're over
here:  https://www.postgresql.org/download/windows/

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread Thomas Munro
On Sat, Oct 14, 2017 at 7:47 AM, legrand legrand
 wrote:
> This may seems obvious for you
>
> but where is the build result ?

Each CI platform has a web page corresponding to your
GitHub/BitBucket/... user account that lists builds results.  You can
also get notifications by various means including email if there is a
failure.  Here's a randomly selected example build log:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/287749152

That particular build happened  because a cronjob of mine pushed a
Commitfest patch into a branch on github, and I maintain plenty more
branches here:

https://github.com/postgresql-cfbot/postgresql/branches

Here are the resulting builds:

https://travis-ci.org/postgresql-cfbot/postgresql/branches

Here's a build from a personal development branch of my own, this time
on AppVeyor:

https://ci.appveyor.com/project/macdice/postgres/build/1.0.3

As Andrew mentioned, AppVeyor builds using his appveyor.yml with the
addition of test_script that I showed above current fail in "test
tablespace" for some reason that we'll need to sort out, and as I
mentioned you can't yet see the regressions.diff output etc... but
it's a start.  This is actually already useful for me, because I
changed a bunch of temporarily file cleanup code and I wanted to
confirm that it would work on Window.  That's being exercised in one
of the tests.  So far so good.

> Is it stored somewhere to permit to users like me
> that want to test pg 10 on windows
> without having to build it ?

That's the idea.  It's quite similar to the build farm, except that it
tests your stuff *before* it gets committed to master and everyone
shouts at you, and can also be used to test strange experiments and
theories without disturbing anyone else.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-13 Thread Thomas Munro
On Fri, Oct 13, 2017 at 1:42 PM, Andrew Dunstan
 wrote:
> Well, as you can see here the appveyor.yml file can live outside the
> tree that's being built.

Here's a Wiki page where I hope we can collect how-to information on
this general topic:

https://wiki.postgresql.org/wiki/Continuous_Integration

I tried your appveyor.yml, and added:

test_script:
  - cd src\tools\msvc && vcregress check

That much I could figure out just by reading our manual and I could
see that it worked first time, but to make this really useful I guess
we'd have to teach it to dump out resulting regression.diffs files and
backtraces from core files (as the Travis CI version accessible from
that page does).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-10-12 Thread Thomas Munro
On Fri, Oct 13, 2017 at 3:59 PM, Peter Eisentraut
 wrote:
> On 9/24/17 07:00, Thomas Munro wrote:
>> Fair point.  In that case there are a few others we should consider
>> moving down too for consistency, like in the attached.
>
>> Thanks, that is much tidier.  Done that way in the attached.
>>
>> Here also is a small addition to your TAP test which exercises the
>> non-NULL code path because slapd rejects TLS by default with a
>> diagnostic message.  I'm not sure if this is worth adding, since it
>> doesn't actually verify that the code path is reached (though you can
>> see that it is from the logs).
>
> Committed.

Thanks, and thanks also for follow-up commit 7d1b8e75.  Looks like the
new macro arrived in OpenLDAP 2.4 but RHEL5 shipped with 2.3.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in EphemeralNamedRelation support

2017-10-12 Thread Thomas Munro
On Fri, Oct 13, 2017 at 12:46 PM, Tom Lane  wrote:
> Thomas Munro  writes:
>> On Fri, Oct 13, 2017 at 10:01 AM, Tom Lane  wrote:
>>> The CTE was simply not part of the available namespace for the INSERT's
>>> target, so it found the regular table instead.  v10 has thus broken
>>> cases that used to work.  I think that's a bug.
>
>> Hmm.  Yeah.  I have to say it's a bit surprising that the following
>> refers to two different objects:
>>   with mytable as (select 1 as x) insert into mytable select * from mytable
>
> Yeah, I agree --- personally I'd never write a query like that.  But
> the fact that somebody ran into it when v10 has been out for barely
> a week suggests that people are doing it.

Not exactly -- Julien's bug report was about a *qualified* reference
being incorrectly rejected.

>>> I think we need to either remove that call from setTargetTable entirely,
>>> or maybe adjust it so it can only find ENRs not CTEs.
>
>> I think it'd be better to find and reject ENRs only.  The alternative
>> would be to make ENRs invisible to DML, which seems arbitrary and
>> weird (even though it might be more consistent with our traditional
>> treatment of CTEs).  One handwavy reason I'd like them to remain
>> visible to DML (and be rejected) is that I think they're a bit like
>> table variables (see SQL Server), and someone might eventually want to
>> teach them, or something like them, how to be writable.
>
> Yeah, that would be the argument for making them visible.  I'm not
> sure how likely it is that we'll ever actually have writable ENRs,
> but I won't stand in the way if you want to do it like that.

I hope so :-)  I might be a nice way to get cheap locally scoped
temporary tables, among other things.

Okay, here's Julien's patch tweaked to reject just the ENR case.  This
takes us back to the 9.6 behaviour where CTEs don't hide tables in
this context.  I also removed the schema qualification in his
regression test so we don't break that again.  This way, his query
from the first message in the thread works with the schema
qualification (the bug he reported) and without it (the bug or at
least incompatible change from 9.6 you discovered).

I considered testing for a NULL return from parserOpenTable() instead
of the way the patch has it, since parserOpenTable() already had an
explicit test for ENRs, but its coding would give preference to an
unqualified table of the same name.  I considered moving the test for
an ENR match higher up in parserOpenTable(), and that might have been
OK, but then I realised no code in the tree actually tests for its
undocumented NULL return value anyway.  I think that NULL-returning
branch is dead code, and all tests pass without it.  Shouldn't we just
remove it, as in the attached?

I renamed the ENR used in plpgsql.sql's
transition_table_level2_bad_usage_func() and with.sql's "sane
response" test, because they both confused matters by using an ENR
with the name "d" which is also the name of an existing table.  For
example, if you start with unpatched master, rename
transition_table_level2_bad_usage_func()'s ENR to "dx" and simply
remove the check for ENRs from setTargetTable() as you originally
suggested, you'll get a segfault because the NULL return from
parserOpenTable() wasn't checked.  If you leave
transition_table_level2_bad_usage_func()'s ENR name as "d" it'll
quietly access the wrong table instead, which is misleading.

-- 
Thomas Munro
http://www.enterprisedb.com
diff --git a/src/backend/parser/parse_clause.c 
b/src/backend/parser/parse_clause.c
index 9ff80b8b40..2d56a07774 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -184,9 +184,12 @@ setTargetTable(ParseState *pstate, RangeVar *relation,
RangeTblEntry *rte;
int rtindex;
 
-   /* So far special relations are immutable; so they cannot be targets. */
-   rte = getRTEForSpecialRelationTypes(pstate, relation);
-   if (rte != NULL)
+   /*
+* ENRs hide tables of the same name, so we need to check for them 
first.
+* In contrast, CTEs don't hide tables.
+*/
+   if (relation->schemaname == NULL &&
+   scanNameSpaceForENR(pstate, relation->relname))
ereport(ERROR,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("relation \"%s\" cannot be the target 
of a modifying statement",
@@ -1072,6 +1075,12 @@ transformRangeTableSample(ParseState *pstate, 
RangeTableSample *rts)
 }
 
 
+/*
+ * getRTEForSpecialRelationTypes
+ *
+ * If given RangeVar if a CTE reference or

Re: [HACKERS] oversight in EphemeralNamedRelation support

2017-10-12 Thread Thomas Munro
On Fri, Oct 13, 2017 at 10:01 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> Before that, CTE used as modify targets produced a different error message:
>
>> postgres=# WITH d AS (SELECT 42) INSERT INTO d VALUES (1);
>> ERROR:  relation "d" does not exist
>> LINE 1: WITH d AS (SELECT 42) INSERT INTO d VALUES (1);
>>   ^
>
> Well, I think that is a poorly chosen example.  Consider this instead:
> pre-v10, you could do this:
>
> regression=# create table mytable (f1 int);
> CREATE TABLE
> regression=# with mytable as (select 1 as x) insert into mytable values(1);
> INSERT 0 1
> regression=# select * from mytable;
>  f1
> 
>   1
> (1 row)
>
> The CTE was simply not part of the available namespace for the INSERT's
> target, so it found the regular table instead.  v10 has thus broken
> cases that used to work.  I think that's a bug.

Hmm.  Yeah.  I have to say it's a bit surprising that the following
refers to two different objects:

  with mytable as (select 1 as x) insert into mytable select * from mytable

Obviously the spec is useless here since this is non-standard (at a
guess they'd probably require a qualifier there to avoid parsing as a
 if they allowed DML after ).  As you said
it's worked like that for several releases, so whatever I might think
about someone who deliberately writes such queries, the precedent
probably trumps naive notions about WITH creating a single consistent
lexical scope.

> There may or may not be a case for allowing ENRs to capture names that
> would otherwise refer to ordinary tables; I'm not sure.  But I see very
> little case for allowing CTEs to capture such references, because surely
> we are never going to allow that to do anything useful, and we have
> several years of precedent now that they don't capture.
>
> I think we need to either remove that call from setTargetTable entirely,
> or maybe adjust it so it can only find ENRs not CTEs.

I think it'd be better to find and reject ENRs only.  The alternative
would be to make ENRs invisible to DML, which seems arbitrary and
weird (even though it might be more consistent with our traditional
treatment of CTEs).  One handwavy reason I'd like them to remain
visible to DML (and be rejected) is that I think they're a bit like
table variables (see SQL Server), and someone might eventually want to
teach them, or something like them, how to be writable.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Continuous integration on Windows?

2017-10-12 Thread Thomas Munro
On Fri, Oct 13, 2017 at 10:57 AM, Andrew Dunstan
 wrote:
> Actually, that didn't take too long.
>
> No testing yet, but this runs a build successfully:
> <https://gist.github.com/adunstan/7f18e5db33bb2d73f69ff8c9337a4e6c>
>
> See results at <https://ci.appveyor.com/project/AndrewDunstan/pgdevel>

Excellent!  Thanks for looking at this, Andrew.  It's going to be
really useful for removing surprises.

It would be nice to finish up with a little library of control files
like this for different CI vendors, possibly with some different
options (different compilers, different word size, add valgrind, ...).
I don't know if it would ever make sense to have standardised CI
control files in the tree -- many projects do -- but it's very easy to
carry a commit that adds them in development branches but drop it as
part of the format-patch-and-post process.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in EphemeralNamedRelation support

2017-10-12 Thread Thomas Munro
On Fri, Oct 13, 2017 at 8:50 AM, Tom Lane  wrote:
> Julien Rouhaud  writes:
>> On Mon, Oct 9, 2017 at 10:43 PM, Thomas Munro
>>  wrote:
>>> I suppose we could consider moving the schemaname check into
>>> getRTEForSpecialRelationType(), since otherwise both callers need to
>>> do that (and as you discovered, one forgot).
>
>> Thanks for the feedback.  That was my first idea, but I assumed there
>> could be future use for this function on qualified RangeVar if it
>> wasn't done this way.
>
>> I agree it'd be much safer, so v2 attached, check moved in
>> getRTEForSpecialRelationType().
>
> Hm.  I actually think the bug here is that 18ce3a4ab introduced
> anything into setTargetTable at all.  There was never previously
> any assumption that the target could be anything but a regular
> table, so we just ignored CTEs there, and I do not think the
> new behavior is an improvement.
>
> So my proposal is to rip out the getRTEForSpecialRelationTypes
> check there.  I tend to agree that getRTEForSpecialRelationTypes
> should probably contain an explicit check for unqualified name
> rather than relying on its caller ... but that's a matter of
> future-proofing not a bug fix.

That check arrived in v11 revision of the patch:

https://www.postgresql.org/message-id/CACjxUsPfUUa813oDvJRx2wuiqHXO3VsCLQzcuy0r%3DUEyS-xOjQ%40mail.gmail.com

Before that, CTE used as modify targets produced a different error message:

postgres=# WITH d AS (SELECT 42) INSERT INTO d VALUES (1);
ERROR:  relation "d" does not exist
LINE 1: WITH d AS (SELECT 42) INSERT INTO d VALUES (1);
  ^

... but ENRs used like that caused a crash.  The change to
setTargetTable() went in to prevent that (and improved the CTE case's
error message semi-incidentally).  To take out we'll need a new check
somewhere else to prevent that.  Where?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Continuous integration on Windows?

2017-10-11 Thread Thomas Munro
Hi hackers,

I don't use Windows myself, but I'd rather avoid submitting patches
that fail to build, build with horrible warnings or blow up on that
fine operating system.  I think it would be neat to be able to have
experimental branches of PostgreSQL built and tested on Windows
automatically just by pushing them to a watched public git repo.  Just
like Travis CI and several others do for the major GNU/Linux distros,
it seems there is at least one Windows-based CI company that
generously offers free testing to open source projects:
https://www.appveyor.com (hat tip to Ilmari for the pointer).  I
wonder... has anyone here with Microsoft know-how ever tried to
produce an appveyor.yml file that would do a MSVC build and
check-world?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] oversight in EphemeralNamedRelation support

2017-10-09 Thread Thomas Munro
On Tue, Oct 10, 2017 at 2:35 AM, Julien Rouhaud  wrote:
> Hugo Mercier (in Cc) reported me an error in a query, failing since pg10.
>
> Simple test case to reproduce:
>
> CREATE TABLE public.test (id integer);
> WITH test AS (select 42) INSERT INTO public.test SELECT * FROM test;
>
> which will fail with "relation "test" cannot be the target of a
> modifying statement".
>
> IIUC, that's an oversight in 18ce3a4ab22, where setTargetTable()
> doesn't exclude qualified relation when searching for special
> relation.

I agree.

> PFA a simple patch to fix this issue, with updated regression test.

Thanks!

I suppose we could consider moving the schemaname check into
getRTEForSpecialRelationType(), since otherwise both callers need to
do that (and as you discovered, one forgot).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-10-05 Thread Thomas Munro
On Thu, Oct 5, 2017 at 7:07 PM, Rushabh Lathia  wrote:
> v20 patch set (I was trying 0008, 0009 patch)  not getting cleanly apply on
> latest commit also getting compilation error due to refactor in below
> commit.
>
> commit 0c5803b450e0cc29b3527df3f352e6f18a038cc6

Hi Rushabh

I am about to post a new patch set that fixes the deadlock problem,
but in the meantime here is a rebase of those two patches (numbers
changed to 0006 + 0007).  In the next version I think I should
probably remove that 'stripe' concept.  The idea was to spread
temporary files over the available temporary tablespaces, but it's a
terrible API, since you have to promise to use the same stripe number
when opening the same name later... Maybe I should use a hash of the
name for that instead.  Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com


0006-Remove-BufFile-s-isTemp-flag.patch
Description: Binary data


0007-Add-BufFileSet-for-sharing-temporary-files-between-b.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-03 Thread Thomas Munro
On Wed, Oct 4, 2017 at 7:32 AM, Petr Jelinek
 wrote:
> On 03/10/17 19:44, Tom Lane wrote:
>> I wrote:
>>>> So that's trouble waiting to happen, for sure.  At the very least we
>>>> need to do a single fetch of WalRcv->latch, not two.  I wonder whether
>>>> even that is sufficient, though: this coding requires an atomic fetch of
>>>> a pointer, which is something we generally don't assume to be safe.
>>
>> BTW, I had supposed that this bug was of long standing, but actually it's
>> new in v10, dating to 597a87ccc9a6fa8af7f3cf280b1e24e41807d555.  Before
>> that walreceiver start/stop just changed the owner of a long-lived shared
>> latch, and there was no question of stale pointers.
>>
>> I considered reverting that decision, but the reason for it seems to have
>> been to let libpqwalreceiver.c manipulate MyProc->procLatch rather than
>> having to know about a custom latch.  That's probably a sufficient reason
>> to justify some squishiness in the wakeup logic.  Still, we might want to
>> revisit it if we find any other problems here.
> That's correct, and because the other users of that library don't have
> special latch it seemed feasible to standardize on the procLatch. If we
> indeed wanted to change the decision about this I think we can simply
> give latch as a parameter to libpqrcv_connect and store it in the
> WalReceiverConn struct. The connection does not need to live past the
> latch (although it currently does, but that's just a matter of
> reordering the code in WalRcvDie() a little AFAICS).

I wonder if the new ConditionVariable mechanism would be worth
considering in future cases like this, where the signalling process
doesn't know the identity of the process to be woken up (or even how
many waiting processes there are), but instead any interested waiters
block on a particular ConditionVariable that models a specific
interesting condition.  In the end it's just latches anyway, but it
may be a better abstraction.  On the other hand I'm not sure how waits
on a ConditionVariable would be multiplexed with IO (a distinct wait
event, or leaky abstraction where the caller relies on the fact that
it's built on MyProc->latch, something else?).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [sqlsmith] stuck spinlock in pg_stat_get_wal_receiver after OOM

2017-10-02 Thread Thomas Munro
On Tue, Oct 3, 2017 at 9:56 AM, Andreas Seltenreich  wrote:
> low-memory testing with REL_10_STABLE at 1f19550a87 produced the
> following PANIC:
>
> stuck spinlock detected at pg_stat_get_wal_receiver, walreceiver.c:1397
>
> I was about to wrap the pstrdup()s with a PG_TRY block, but I can't find
> a spinlock being released in a PG_CATCH block anywhere, so maybe that's
> a bad idea?

No comment on what might be holding the spinlock there, but perhaps
the spinlock-protected code should strncpy into stack-local buffers
instead of calling pstrdup()?  The buffers could be statically sized
with NAMEDATALEN and MAXCONNINFO.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Causal reads take II

2017-09-30 Thread Thomas Munro
On Sun, Oct 1, 2017 at 9:05 AM, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>>> LOG:  could not remove directory "pg_tblspc/47733/PG_10_201707211/47732":
>>> Directory not empty
>>> ...
>>
>> Hmm.  The first error ("could not remove directory") could perhaps be
>> explained by temporary files from concurrent backends.
>> ...
>> Perhaps in your testing you accidentally copied a pgdata directory over
>> the
>> top of it while it was running?  In any case I'm struggling to see how
>> anything in this patch would affect anything at the REDO level.
>
> Hmm...no, I don't think so. Basically what I was doing is just running
> `installcheck` against a primary instance (I assume there is nothing wrong
> with
> this approach, am I right?). This particular error was caused by
> `tablespace`
> test which was failed in this case:
>
> ```
> INSERT INTO testschema.foo VALUES(1);
> ERROR:  could not open file "pg_tblspc/16388/PG_11_201709191/16386/16390":
> No such file or directory
> ```
>
> I tried few more times, and I've got it two times from four attempts on a
> fresh
> installation (when all instances were on the same machine). But anyway I'll
> try
> to investigate, maybe it has something to do with my environment.
>
> ...
>
> 2017-09-30 15:47:55.154 CEST [6030] LOG:  invalid magic number  in log
> segment 00010020, offset 10092544

Hi Dmitry,

Thanks for testing.  Yeah, it looks like the patch may be corrupting
the WAL stream in some case that I didn't hit in my own testing
procedure.  I will try to reproduce these failures.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] A design for amcheck heapam verification

2017-09-28 Thread Thomas Munro
On Fri, Sep 29, 2017 at 4:17 PM, Michael Paquier
 wrote:
>> As for DSM, I think that that can come later, and can be written by
>> somebody closer to that problem. There can be more than one
>> initialization function.
>
> I don't completely disagree with that, there could be multiple
> initialization functions. Still, an advantage about designing things
> right from the beginning with a set of correct APIs is that we don't
> need extra things later and this will never bother module maintainers.
> I would think that this utility interface should be minimal and
> portable to maintain a long-term stance.

FWIW I think if I were attacking that problem the first thing I'd
probably try would be getting rid of that internal pointer
filter->bitset in favour of a FLEXIBLE_ARRAY_MEMBER and then making
the interface look something like this:

extern size_t bloom_estimate(int64 total elems, int work_mem);
extern void bloom_init(bloom_filter *filter, int64 total_elems, int work_mem);

Something that allocates new memory as the patch's bloom_init()
function does I'd tend to call 'make' or 'create' or 'new' or
something, rather than 'init'.  'init' has connotations of being the
second phase in an allocate-and-init pattern for me.  Then
bloom_filt_make() would be trivially implemented on top of
bloom_estimate() and bloom_init(), and bloom_init() could be used
directly in DSM, DSA, traditional shmem without having to add any
special DSM support.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2017-09-25 Thread Thomas Munro
On Tue, Sep 26, 2017 at 10:12 AM, Tom Lane  wrote:
> Thomas Munro  writes:
>> I think the problem here is that posix_fallocate() doesn't set errno.
>
> Huh.  So the fact that it worked for me is likely because glibc's
> emulation *does* allow errno to get set.
>
>> Will write a patch.
>
> Thanks, I'm out of time for today.

See attached, which also removes the ENOSYS stuff which I believe to
be now useless.  Does this make sense?  Survives make check-world and
my simple test procedure on a
3.10.0-327.36.1.el7.x86_64 system.

postgres=# select test_dsm(1);
ERROR:  could not resize shared memory segment
"/PostgreSQL.2043796572" to 1 bytes: No space left on
device
postgres=# select test_dsm(1000);
 test_dsm
--

(1 row)

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Fix-error-handling-for-posix_fallocate.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2017-09-25 Thread Thomas Munro
On Tue, Sep 26, 2017 at 9:57 AM, Thomas Munro
 wrote:
>> On Tue, Sep 26, 2017 at 9:13 AM, Tom Lane  wrote:
>>> Pushed with that change; we'll soon see what the buildfarm thinks.
>
> Hmm.  One failure in the test modules:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2017-09-25%2020%3A45%3A02
>
> 2017-09-25 13:51:00.753 PDT [9107:6] LOG:  worker process: test_shm_mq
> (PID 9362) exited with exit code 1
> 2017-09-25 13:51:00.753 PDT [9360:7] ERROR:  could not resize shared
> memory segment "/PostgreSQL.1896677221" to 65824 bytes: Success
>
> hostname = centos7x.joeconway.com
> uname -r = 3.10.0-229.11.1.el7.x86_64

I think the problem here is that posix_fallocate() doesn't set errno.
It returns an errno-like value.  This is a difference from
fallocate().  Will write a patch.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2017-09-25 Thread Thomas Munro
> On Tue, Sep 26, 2017 at 9:13 AM, Tom Lane  wrote:
>> Pushed with that change; we'll soon see what the buildfarm thinks.

Hmm.  One failure in the test modules:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=rhinoceros&dt=2017-09-25%2020%3A45%3A02

2017-09-25 13:51:00.753 PDT [9107:6] LOG:  worker process: test_shm_mq
(PID 9362) exited with exit code 1
2017-09-25 13:51:00.753 PDT [9360:7] ERROR:  could not resize shared
memory segment "/PostgreSQL.1896677221" to 65824 bytes: Success

hostname = centos7x.joeconway.com
uname -r = 3.10.0-229.11.1.el7.x86_64

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2017-09-25 Thread Thomas Munro
On Tue, Sep 26, 2017 at 9:13 AM, Tom Lane  wrote:
> I wrote:
>> Rather than dig into the guts of glibc to find that out, though, I think
>> we should just s/fallocate/posix_fallocate/g on this patch.  The argument
>> for using the former seemed pretty thin to begin with.
>
> Pushed with that change; we'll soon see what the buildfarm thinks.

Thanks.

> I suspect that the provisions for EINTR and ENOSYS errors may now be
> dead code, since the Linux man page for posix_fallocate mentions
> neither.  They're not much code though, and POSIX itself *does*
> list EINTR, so I'm hesitant to muck with that.

Ah, it all makes sense now that I see the fallback strategy section of
the posix_fallocate() man page.  I was unaware that there were kernel
releases that had the syscall but lacked support in tmpfs.  Thanks for
testing and fixing that.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2017-09-25 Thread Thomas Munro
On Tue, Sep 26, 2017 at 7:56 AM, Tom Lane  wrote:
> I wrote:
>> Hmm, so I tested this patch on my RHEL6 box (kernel 2.6.32) and it
>> immediately fell over with
>> 2017-09-25 14:23:48.410 EDT [325] FATAL:  could not resize shared memory 
>> segment "/PostgreSQL.1682054886" to 6928 bytes: Operation not supported
>> during startup.  I wonder whether we need to round off the request.
>
> Nope, rounding off doesn't help.  What does help is using posix_fallocate
> instead.  I surmise that glibc knows something we don't about how to call
> fallocate(2) successfully on this kernel version.
>
> Rather than dig into the guts of glibc to find that out, though, I think
> we should just s/fallocate/posix_fallocate/g on this patch.  The argument
> for using the former seemed pretty thin to begin with.

I didn't dig into the glibc source but my first guess is that
posix_fallocate() sees ENOTSUPP (from tmpfs on that vintage kernel)
and falls back to a write loop.  I thought I was doing enough by
testing for ENOSYS (based on the man page for fallocate which said
that should be expected on kernels < 2.6.23), but I see now that
ENOTSUPP is possible per-filesystem implementation and tmpfs support
was added some time after the 2.6.32 era:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e2d12e22c59ce714008aa5266d769f8568d74eac

I'm not sure why glibc would provide a fallback for posix_fallocate()
but let ENOTSUPP escape from fallocate().

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SERIALIZABLE with parallel query

2017-09-25 Thread Thomas Munro
On Mon, Sep 25, 2017 at 8:37 PM, Haribabu Kommi
 wrote:
> After I tune the GUC to go with sequence scan, still I am not getting the
> error
> in the session-2 for update operation like it used to generate an error for
> parallel
> sequential scan, and also it even takes some many commands until unless the
> S1
> commits.

Hmm.  Then this requires more explanation because I don't expect a
difference.  I did some digging and realised that the error detail
message "Reason code: Canceled on identification as a pivot, during
write." was reached in a code path that requires
SxactIsPrepared(writer) and also MySerializableXact == writer, which
means that the process believes it is committing.  Clearly something
is wrong.  After some more digging I realised that
ParallelWorkerMain() calls EndParallelWorkerTransaction() which calls
CommitTransaction() which calls
PreCommit_CheckForSerializationFailure().  Since the worker is
connected to the leader's SERIALIZABLEXACT, that finishes up being
marked as preparing to commit (not true!), and then the leader get
confused during that write, causing a serialization failure to be
raised sooner (though I can't explain why it should be raised then
anyway, but that's another topic).  Oops.  I think the fix here is
just not to do that in a worker (the worker's CommitTransaction()
doesn't really mean what it says).

Here's a version with a change that makes that conditional.  This way
your test case behaves the same as non-parallel mode.

> I will continue my review on the latest patch and share any updates.

Thanks!

-- 
Thomas Munro
http://www.enterprisedb.com


ssi-parallel-v8.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-09-24 Thread Thomas Munro
On Thu, Sep 21, 2017 at 5:49 AM, Peter Geoghegan  wrote:
> Graefe's "Query Evaluation Techniques for Large Databases" has several
> pages on deadlock avoidance strategies. It was written almost 25 years
> ago, but still has some good insights IMV (you'll recall that Graefe
> is the author of the Volcano paper; this reference paper seems like
> his follow-up). Apparently, deadlock avoidance strategy becomes
> important for parallel sort with partitioning. You may be able to get
> some ideas from there. And even if you don't, his handling of the
> topic is very deliberate and high level, which suggests that ours
> should be, too.

Very interesting and certainly relevant (the parts I've read so far),
though we don't have multiple consumers.  Multiplexing one thread so
that it is both a consumer and a producer is an extra twist though.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server crash due to SIGBUS(Bus Error) when trying to access the memory created using dsm_create().

2017-09-24 Thread Thomas Munro
On Thu, Aug 17, 2017 at 11:39 AM, Thomas Munro
 wrote:
> On Thu, Jun 29, 2017 at 12:24 PM, Thomas Munro
>  wrote:
>> fallocate-v5.patch
>
> Added to commitfest so we don't lose track of this.

Rebased due to collision with recent configure.in adjustments.  I also
wrote a commit message and retested with create-dsm-test.patch (from
upthread).

So, do we want this patch?  It's annoying to expend cycles doing this,
but it only really hurts if you allocate a lot of DSM space that you
never actually use.  If that ever becomes a serious problem, perhaps
that'll be a sign that we should be reusing the space between queries
anyway?

-- 
Thomas Munro
http://www.enterprisedb.com


fallocate-v6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-09-24 Thread Thomas Munro
On Wed, Sep 20, 2017 at 7:57 AM, Peter Eisentraut
 wrote:
> In the 0001 patch, I would move the ldap_unbind() calls after the
> ereport(LOG) calls.  We do all the other resource cleanup (pfree() etc.)
> after the ereport() calls, so it would be weird to do this one
> differently.  Also, in the second patch you move one of the
> ldap_unbind() calls down anyway.

Fair point.  In that case there are a few others we should consider
moving down too for consistency, like in the attached.

> In the 0002 patch, I think this is a bit repetitive and could be
> refactored even more.  The end result could look like
>
> ereport(LOG,
> (errmsg("blah"),
>  errdetail_for_ldap(ldap)));
> ldap_unbind(ldap);

Thanks, that is much tidier.  Done that way in the attached.

Here also is a small addition to your TAP test which exercises the
non-NULL code path because slapd rejects TLS by default with a
diagnostic message.  I'm not sure if this is worth adding, since it
doesn't actually verify that the code path is reached (though you can
see that it is from the logs).

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Improve-LDAP-cleanup-code-in-error-paths.patch
Description: Binary data


0002-Log-diagnostic-messages-if-errors-occur-during-LDAP-.patch
Description: Binary data


0003-Add-a-regression-test-to-log-an-LDAP-diagnostic-mess.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [COMMITTERS] pgsql: Make new crash restart test a bit more robust.

2017-09-21 Thread Thomas Munro
 On Wed, Sep 20, 2017 at 4:42 PM, Andres Freund  wrote:
> On 2017-09-19 19:00:38 -0700, Andres Freund wrote:
>> Given this fact pattern, I'll allow the case without a received error
>> message in the recovery test. Objections?
>
> Hearing none. Pushed.
>
> While debugging this, I've also introduced a pump wrapper so that we now
> get:
> ok 4 - exactly one process killed with SIGQUIT
> # aborting wait: program died
> # stream contents: >>psql::9: WARNING:  terminating connection because 
> of crash of another server process
> # DETAIL:  The postmaster has commanded this server process to roll back the 
> current transaction and exit, because another server process exited 
> abnormally and possibly corrupted shared memory.
> # HINT:  In a moment you should be able to reconnect to the database and 
> repeat your command.
> # psql::9: server closed the connection unexpectedly
> #   This probably means the server terminated abnormally
> #   before or while processing the request.
> # psql::9: connection to server was lost
> # <<
> # pattern searched for: (?^m:MAYDAY:  terminating connection because of crash 
> of another server process)
> not ok 5 - psql query died successfully after SIGQUIT

Seeing these failures in 013_crash_restart.pl from time to time on
Travis CI.  Examples:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/278419122
https://travis-ci.org/postgresql-cfbot/postgresql/builds/278247756

There are a couple of other weird problems in the TAP test that
probably belong on another thread (see build IDs 278302509 and
278247756 which are for different CF patches but exhibit the same
symptom: some test never returns control but we can't see its output,
maybe due to -Otarget, before the whole job is nuked by Travis for not
making progress).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SERIALIZABLE with parallel query

2017-09-20 Thread Thomas Munro
On Tue, Sep 19, 2017 at 1:47 PM, Haribabu Kommi
 wrote:
> During testing of this patch, I found some behavior difference
> with the support of parallel query, while experimenting with the provided
> test case in the patch.
>
> But I tested the V6 patch, and I don't think that this version contains
> any fixes other than rebase.
>
> Test steps:
>
> CREATE TABLE bank_account (id TEXT PRIMARY KEY, balance DECIMAL NOT NULL);
> INSERT INTO bank_account (id, balance) VALUES ('X', 0), ('Y', 0);
>
> Session -1:
>
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SELECT balance FROM bank_account WHERE id = 'Y';
>
> Session -2:
>
> BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;
> SET max_parallel_workers_per_gather = 2;
> SET force_parallel_mode = on;
> set parallel_setup_cost = 0;
> set parallel_tuple_cost = 0;
> set min_parallel_table_scan_size = 0;
> set enable_indexscan = off;
> set enable_bitmapscan = off;
>
> SELECT balance FROM bank_account WHERE id = 'X';
>
> Session -1:
>
> update bank_account set balance = 10 where id = 'X';
>
> Session -2:
>
> update bank_account set balance = 10 where id = 'Y';
> ERROR:  could not serialize access due to read/write dependencies among
> transactions
> DETAIL:  Reason code: Canceled on identification as a pivot, during write.
> HINT:  The transaction might succeed if retried.
>
> Without the parallel query of select statement in session-2,
> the update statement in session-2 is passed.

Hi Haribabu,

Thanks for looking at this!

Yeah.  The difference seems to be that session 2 chooses a Parallel
Seq Scan instead of an Index Scan when you flip all those GUCs into
parallelism-is-free mode.  Seq Scan takes a table-level predicate lock
(see heap_beginscan_internal()).  But if you continue your example in
non-parallel mode (patched or unpatched), you'll find that only one of
those transactions can commit successfully.

Using the fancy notation in the papers about this stuff where w1[x=42]
means "write by transaction 1 on object x with value 42", let's see if
there is an apparent sequential order of these transactions that makes
sense:

Actual order: r1[Y=0] r2[X=0] w1[X=10] w2[Y=10] ... some commit order ...
Apparent order A: r2[X=0] w2[Y=10] c2 r1[Y=0*] w1[X=10] c1 (*nonsense)
Apparent order B: r1[Y=0] w1[X=10] c1 r2[X=0*] w2[Y=10] c2 (*nonsense)

Both potential commit orders are nonsensical.  I think what happened
in your example was that a Seq Scan allowed the SSI algorithm to
reject a transaction sooner.  Instead of r2[X=0], the executor sees
r2[X=0,Y=0] (we scanned the whole table, as if we read all objects, in
this case X and Y, even though we only asked to read X).  Then the SSI
algorithm is able to detect a "dangerous structure" at w2[Y=10],
instead of later at commit time.

So I don't think this indicates a problem with the patch.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error: dsa_area could not attach to a segment that has been freed

2017-09-20 Thread Thomas Munro
On Thu, Sep 21, 2017 at 12:59 AM, Robert Haas  wrote:
> On Wed, Sep 20, 2017 at 5:54 AM, Craig Ringer  wrote:
>> By the way, dsa.c really needs a cross-reference to shm_toc.c and vice
>> versa. With a hint as to when each is appropriate.
>
> /me blinks.
>
> Aren't those almost-entirely-unrelated facilities?

I think I see what Craig means.

1.  A DSM segment works if you know how much space you'll need up
front so that you can size it. shm_toc provides a way to exchange
pointers into it with other backends in the form of shm_toc keys
(perhaps implicitly, in the form of well known keys or a convention
like executor node ID -> shm_toc key).  Examples: Fixed sized state
for parallel-aware executor nodes, and fixed size parallel executor
infrastructure.

2.  A DSA area is good if you don't know how much space you'll need
yet.  dsa_pointer provides a way to exchange pointers into it with
other backends.  Examples: A shared cache, an in-memory database
object like Gaddam Sai Ram's graph index extension, variable sized
state for parallel-aware executor nodes, the shared record typmod
registry stuff.

Perhaps confusingly we also support DSA areas inside DSM segments,
there are DSM segments inside DSA areas.  We also use DSM segments as
a kind of shared resource cleanup mechanism, and don't yet provide an
equivalent for DSA.  I haven't proposed anything like that because I
feel like there may be a better abstraction of reliable scoped cleanup
waiting to be discovered (as I think Craig was also getting at).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error: dsa_area could not attach to a segment that has been freed

2017-09-20 Thread Thomas Munro
On Wed, Sep 20, 2017 at 6:14 PM, Gaddam Sai Ram
 wrote:
> Thank you very much! That fixed my issue! :)
> I was in an assumption that pinning the area will increase its lifetime but
> yeah after taking memory context into consideration its working fine!

So far the success rate in confusing people who first try to make
long-lived DSA areas and DSM segments is 100%.  Basically, this is all
designed to ensure automatic cleanup of resources in short-lived
scopes.

Good luck for your graph project.  I think you're going to have to
expend a lot of energy trying to avoid memory leaks if your DSA lives
as long as the database cluster, since error paths won't automatically
free any memory you allocated in it.  Right now I don't have any
particularly good ideas for mechanisms to deal with that.  PostgreSQL
C has exception-like error handling, but doesn't (and probably can't)
have a language feature like scoped destructors from C++.  IMHO
exceptions need either destructors or garbage collection to keep you
sane.  There is a kind of garbage collection for palloc'd memory and
also for other resources like file handles, but if you're using a big
long lived DSA area you have nothing like that.  You can use
PG_TRY/PG_CATCH very carefully to clean up, or (probably better) you
can try to make sure that all your interaction with shared memory is
no-throw (note that that means using dsa_allocate_extended(x,
DSA_ALLOC_NO_OOM), because dsa_allocate itself can raise errors). The
first thing I'd try would probably be to keep all shmem-allocating
code in as few routines as possible, and use only no-throw operations
in the 'critical' regions of them, and maybe look into some kind of
undo log of things to free or undo in case of error to manage
multi-allocation operations if that turned out to be necessary.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Error: dsa_area could not attach to a segment that has been freed

2017-09-19 Thread Thomas Munro
On Fri, Sep 15, 2017 at 7:51 PM, Gaddam Sai Ram
 wrote:
> As i'm pinning the dsa mapping after attach, it has to stay through out the
> backend session. But not sure why its freed/corrupted.
>
> Kindly help me in fixing this issue. Attached the copy of my extension,
> which will reproduce the same issue.

Your DSA area is pinned and the mapping is pinned, but there is one
more thing that goes away automatically unless you nail it to the
table: the backend-local dsa_area object which dsa_create() and
dsa_attach() return.  That's allocated in the "current memory
context", so if you do it from your procedure simple_udf_func without
making special arrangements it gets automatically freed at end of
transaction.  If you're going to cache it for the whole life of the
backend, you'd better make sure it's allocated in memory context that
lives long enough.  Where you have dsa_create() and dsa_attach()
calls, try coding like this:

  MemoryContext old_context;

  old_context = MemoryContextSwitchTo(TopMemoryContext);
  area = dsa_create(...);
  MemoryContextSwitchTo(old_context);

  old_context = MemoryContextSwitchTo(TopMemoryContext);
  area = dsa_attach(...);
  MemoryContextSwitchTo(old_context);

You'll need to #include "utils/memutils.h".

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-19 Thread Thomas Munro
On Mon, Sep 18, 2017 at 10:18 PM, Andres Freund  wrote:
> On 2017-09-18 21:57:04 +1200, Thomas Munro wrote:
>> WARNING:  terminating connection because of crash of another server 
>> process
>> DETAIL:  The postmaster has commanded this server process to roll
>> back the current transaction and exit, because another server process
>> exited abnormally and possibly corrupted shared memory.
>> HINT:  In a moment you should be able to reconnect to the database
>> and repeat your command.
>>
>> As far as I know these are misleading, it's really just an immediate
>> shutdown.  There is no core file, so I don't believe anything actually
>> crashed.
>
> I was about to complain about these, for entirely unrelated reasons. I
> think it's a bad idea - and there's a couple complains on the lists too,
> to emit these warnings.  It's not entirely trivial to fix though :(

This type of violent shutdown seems to be associated with occasional
corruption of .gcda files (the files output by GCC coverage builds).
The symptoms are that if you use --enable-coverage and make
check-world you'll very occasionally get a spurious TAP test failure
like this:

#   Failed test 'pg_ctl start: no stderr'
#   at 
/home/travis/build/postgresql-cfbot/postgresql/src/bin/pg_ctl/../../../src/test/perl/TestLib.pm
line 301.
#  got:
'profiling:/home/travis/build/postgresql-cfbot/postgresql/src/backend/nodes/copyfuncs.gcda:Merge
mismatch for function 94
# '
# expected: ''

I'm not sure of the exact mechanism though.  GCC supplies a function
__gcov_flush() that normally runs at exit or execve, so if you're
killed without reaching those you don't get any .gcda data.  Perhaps
we are in exit (or fork/exec) partway through writing out coverage
data in __gcov_flush(), and at that moment we are killed.  Then a
subsequent run of instrumented code will find the half-written file
and print the "Merge mismatch" message.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] why not parallel seq scan for slow functions

2017-09-19 Thread Thomas Munro
On Thu, Sep 14, 2017 at 3:19 PM, Amit Kapila  wrote:
> The attached patch fixes both the review comments as discussed above.

This cost stuff looks unstable:

test select_parallel  ... FAILED

!  Gather  (cost=0.00..623882.94 rows=9976 width=8)
 Workers Planned: 4
!->  Parallel Seq Scan on tenk1  (cost=0.00..623882.94 rows=2494 width=8)
  (3 rows)

  drop function costly_func(var1 integer);
--- 112,120 
  explain select ten, costly_func(ten) from tenk1;
   QUERY PLAN
  
!  Gather  (cost=0.00..625383.00 rows=1 width=8)
 Workers Planned: 4
!->  Parallel Seq Scan on tenk1  (cost=0.00..625383.00 rows=2500 width=8)
  (3 rows)

  drop function costly_func(var1 integer);

>From https://travis-ci.org/postgresql-cfbot/postgresql/builds/277376953 .

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Block level parallel vacuum WIP

2017-09-18 Thread Thomas Munro
On Fri, Sep 8, 2017 at 10:37 PM, Masahiko Sawada  wrote:
> Since v4 patch conflicts with current HEAD I attached the latest version 
> patch.

Hi Sawada-san,

Here is an interesting failure with this patch:

test rowsecurity  ... FAILED
test rules... FAILED

Down at the bottom of the build log in the regression diffs file you can see:

! ERROR: cache lookup failed for relation 32893

https://travis-ci.org/postgresql-cfbot/postgresql/builds/277165907

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SERIALIZABLE with parallel query

2017-09-18 Thread Thomas Munro
On Fri, Sep 1, 2017 at 5:11 PM, Thomas Munro
 wrote:
> On Wed, Jun 28, 2017 at 11:21 AM, Thomas Munro
>  wrote:
>> [ssi-parallel-v5.patch]
>
> Rebased.

Rebased again.

-- 
Thomas Munro
http://www.enterprisedb.com


ssi-parallel-v7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] src/test/subscription/t/002_types.pl hanging on particular environment

2017-09-18 Thread Thomas Munro
Hi,

The subscription tests 002_types.pl sometimes hangs for a while and
then times out when run on a Travis CI build VM running Ubuntu Trusty
if --enable-coverage is used.  I guess it might be a timing/race
problem because I can't think of any mechanism specific to coverage
instrumentation except that it slows stuff down, but I don't know.
The only other thing I can think of is that perhaps the instrumented
code is writing something to stdout or stderr and that's finding its
way into some protocol stream and confusing things, but I can't
reproduce this on any of my usual development machines.  I haven't
tried that exact operating system.  Maybe it's a bug in the toolchain,
but that's an Ubuntu LTS release so I assume other people use it
(though I don't see it in the buildfarm).

Example:

t/001_rep_changes.pl .. ok
t/002_types.pl  #
# Looks like your test exited with 29 before it could output anything.
t/002_types.pl  Dubious, test returned 29 (wstat 7424, 0x1d00)
Failed 3/3 subtests
t/003_constraints.pl .. ok
t/004_sync.pl . ok
t/005_encoding.pl . ok
t/007_ddl.pl .. ok
Test Summary Report
---
t/002_types.pl  (Wstat: 7424 Tests: 0 Failed: 0)
  Non-zero exit status: 29
  Parse errors: Bad plan.  You planned 3 tests but ran 0.

Before I figured out that --coverage was relevant, I wondered if the
recent commit 8edacab209957520423770851351ab4013cb0167 which landed
around the time I spotted this might have something to do with it, but
I tried reverting the code change in there and it didn't help.  Here's
a build log:

https://travis-ci.org/macdice/postgres/jobs/276752803

As you can see the script used was:

./configure --enable-debug --enable-cassert --enable-tap-tests
--enable-coverage && make -j4 all contrib && make -C
src/test/subscription check

In this build you can see the output of the following at the end,
which might provide clues to the initiated.  You might need to click a
small triangle to unfold the commands' output.

cat ./src/test/subscription/tmp_check/log/002_types_publisher.log
cat ./src/test/subscription/tmp_check/log/002_types_subscriber.log
cat ./src/test/subscription/tmp_check/log/regress_log_002_types

There are messages like this:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll
back the current transaction and exit, because another server process
exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database
and repeat your command.

As far as I know these are misleading, it's really just an immediate
shutdown.  There is no core file, so I don't believe anything actually
crashed.

Here's a version that's the same except it doesn't have
--enable-coverage.  It passes:

https://travis-ci.org/macdice/postgres/jobs/276771654

Any ideas?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] valgrind vs. shared typmod registry

2017-09-17 Thread Thomas Munro
On Mon, Sep 18, 2017 at 5:39 PM, Thomas Munro
 wrote:
> Here is a patch to fix that.

Here's a better one (same code, corrected commit message).

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Fix-uninitialized-variable-in-dshash.c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] valgrind vs. shared typmod registry

2017-09-17 Thread Thomas Munro
On Sun, Sep 17, 2017 at 8:49 AM, Thomas Munro
 wrote:
> On Sun, Sep 17, 2017 at 7:42 AM, Thomas Munro
>  wrote:
>> On Sun, Sep 17, 2017 at 12:30 AM, Tomas Vondra
>>  wrote:
>>> I've been running some regression tests under valgrind, and it seems
>>> select_parallel triggers some uses of uninitialized values in dshash. If
>>> I'm reading the reports right, it complains about hashtable->size_log2
>>> being not being initialized in ensure_valid_bucket_pointers.
>>
>> Thanks.  Will investigate.
>
> Yeah, it's a bug, I simply failed to initialise it.
> ensure_valid_bucket_pointers() immediately fixes the problem (unless
> the uninitialised memory had an unlikely value), explaining why it
> works anyway.  I'm a bit tied up today but will test and post a patch
> tomorrow.

Here is a patch to fix that.

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Fix-uninitialized-variable-in-dshash.c.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-17 Thread Thomas Munro
On Mon, Sep 18, 2017 at 2:39 PM, Andres Freund  wrote:
> E.g. very little of the new stuff in 
> https://codecov.io/gh/postgresql-cfbot/postgresql/commit/ceaa3dbece3c9b98abcaa28009320fde45a83f88
> is exercised.

Hoist by my own petard.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-17 Thread Thomas Munro
Hi hackers,

 A couple of new experimental features on commitfest.cputube.org:

1.  I didn't have --enable-cassert enabled before.  Oops.
2.  It'll now dump a gdb backtrace for any core files found after a
check-world failure (if you can find your way to the build log...).
Thanks to Andres for the GDB scripting for this!
3.  It'll now push gcov results to codecov.io -- see link at top of
page.  Thanks again to Andres for this idea.
4.  It now builds a little bit faster due to -j4 (Travis CI VMs have 2
virtual cores) and .proverc -j3.  (So far one entry now fails in TAP
tests with that setting, will wait longer before drawing any
conclusions about that.)

The code coverage reports at codecov.io are supposed to allow
comparison between a branch and master so you can see exactly what
changed in terms of coverage, but I ran into a technical problem and
ran out of time for now to make that happen.  But you can still click
your way through to the commit and see a coverage report for the
commit diff.  In other words you can see which modified lines are run
by make check-world, which seems quite useful.

There are plenty more things we could stick into this pipeline, like
LLVM sanitizer stuff, valgrind, Coverity, more compilers, ... but I'm not sure what things really make sense.  I may be
placing undue importance on things that I personally screwed up
recently :-D

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving DISTINCT with LooseScan node

2017-09-17 Thread Thomas Munro
On Mon, Sep 18, 2017 at 5:43 AM, Dmitriy Sarafannikov
 wrote:
> Hi hackers,
>
> Everybody knows, that we have unefficient execution of query like "SELECT
> DISTINCT id from mytable"
> if table has many-many rows and only several unique id values. Query plan
> looks like Unique + IndexScan.
>
> I have tried to implement this feature in new type of node called Loose
> Scan.
> This node must appears in plan together with IndexScan or IndexOnlyScan just
> like Unique node in this case.
> But instead of filtering rows with equal values, LooseScan must retreive
> first row from IndexScan,
> then remember and return this. With all subsequent calls LooseScan must
> initiate calling index_rescan via ExecReScan
> with search value that > or < (depending on scan direction) of previous
> value.
> Total cost of this path must be something like total_cost =
> n_distinct_values * subpath->startup_cost
> What do you think about this idea?
>
> I was able to create new LooseScan node, but i ran into difficulties with
> changing scan keys.
> I looked (for example) on the ExecReScanIndexOnlyScan function and I find it
> difficult to understand
> how i can reach the ioss_ScanKeys through the state of executor. Can you
> help me with this?
>
> I also looked on the Nested Loop node, which as i think must change scan
> keys, but i didn't become clear for me.
> The only thought that came to my head, that maybe i incorrectly create this
> subplan.
> I create it just like create_upper_unique_plan, and create subplan for
> IndexScan via create_plan_recurse.
> Can you tell me where to look or maybe somewhere there are examples?

Not an answer to your question, but generally +1 for working on this
area.  I did some early proto-hacking a while ago, which I haven't had
time to get back to yet:

https://www.postgresql.org/message-id/flat/CADLWmXWALK8NPZqdnRQiPnrzAnic7NxYKynrkzO_vxYr8enWww%40mail.gmail.com

That was based on the idea that a DISTINCT scan using a btree index to
skip ahead is always going to be using the leading N columns and
always going to be covered by the index, so I might as well teach
Index Only Scan how to do it directly rather than making a new node to
sit on top.  As you can see in that thread I did start thinking about
using a new node to sit on top and behave a bit like a nested loop for
the more advanced feature of an Index Skip Scan (trying every value of
(a) where you had an index on (a, b, c) but had a WHERE clause qual on
(b, c)), but the only feedback I had so far was from Robert Haas who
thought that too should probably be pushed into the index scan.

FWIW I'd vote for 'skip' rather than 'loose' as a term to use in this
family of related features (DISTINCT being the easiest to build).  It
seems more descriptive than the MySQL term.  (DB2 added this a little
while ago and went with 'index jump scan', suggesting that we should
consider 'hop'... (weak humour, 'a hop, skip and a jump' being an
English idiom meaning a short distance)).

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] valgrind vs. shared typmod registry

2017-09-16 Thread Thomas Munro
On Sun, Sep 17, 2017 at 7:42 AM, Thomas Munro
 wrote:
> On Sun, Sep 17, 2017 at 12:30 AM, Tomas Vondra
>  wrote:
>> I've been running some regression tests under valgrind, and it seems
>> select_parallel triggers some uses of uninitialized values in dshash. If
>> I'm reading the reports right, it complains about hashtable->size_log2
>> being not being initialized in ensure_valid_bucket_pointers.
>
> Thanks.  Will investigate.

Yeah, it's a bug, I simply failed to initialise it.
ensure_valid_bucket_pointers() immediately fixes the problem (unless
the uninitialised memory had an unlikely value), explaining why it
works anyway.  I'm a bit tied up today but will test and post a patch
tomorrow.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] valgrind vs. shared typmod registry

2017-09-16 Thread Thomas Munro
On Sun, Sep 17, 2017 at 12:30 AM, Tomas Vondra
 wrote:
> I've been running some regression tests under valgrind, and it seems
> select_parallel triggers some uses of uninitialized values in dshash. If
> I'm reading the reports right, it complains about hashtable->size_log2
> being not being initialized in ensure_valid_bucket_pointers.

Thanks.  Will investigate.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Partition-wise join for join between (declaratively) partitioned tables

2017-09-15 Thread Thomas Munro
On Sat, Sep 16, 2017 at 9:23 AM, Robert Haas  wrote:
> On the overall patch set:
>
> - I am curious to know how this has been tested.  How much of the new
> code is covered by the tests in 0007-Partition-wise-join-tests.patch?
> How much does coverage improve with
> 0008-Extra-testcases-for-partition-wise-join-NOT-FOR-COMM.patch?  What
> code, if any, is not covered by either of those test suites?  Could we
> do meaningful testing of this with something like Andreas
> Seltenreich's sqlsmith?

FWIW I'm working on an answer to both of those question, but keep
getting distracted by other things catching on fire...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-09-15 Thread Thomas Munro
On Fri, Sep 15, 2017 at 2:12 AM, Alvaro Herrera  wrote:
> I think the ldap_unbind() changes should be in a separate preliminary
> patch to be committed separately and backpatched.

OK, here it is split into two patches.

> The other bits looks fine, with nitpicks
>
> 1. please move the new support function to the bottom of the section
> dedicated to LDAP, and include a prototype

OK.

> 2. please wrap lines longer than 80 chars, other than error message
> strings.  (I don't know why this file plays fast & loose with
> project-wide line length rules, but I also see no reason to continue
> doing it.)

Done for all lines I touched in the patch.

Thanks for the review!

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Improve-LDAP-cleanup-code-in-error-paths.patch
Description: Binary data


0002-Log-diagnostic-messages-if-errors-occur-during-LDAP-.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] POC: Sharing record typmods between backends

2017-09-14 Thread Thomas Munro
On Fri, Sep 15, 2017 at 3:03 PM, Andres Freund  wrote:
> On 2017-09-04 18:14:39 +1200, Thomas Munro wrote:
>> Thanks for the review and commits so far.  Here's a rebased, debugged
>> and pgindented version of the remaining patches.
>
> I've pushed this with minor modifications:

Thank you!

> - added typedefs to typedefs.list

Should I do this manually with future patches?

> - re-pgindented, there were some missing reindents in headers
> - added a very brief intro into session.c, moved some content repeated
>   in various places to the header - some of them were bound to become
>   out-of-date due to future uses of the facility.
> - moved NULL setting in detach hook directly after the respective
>   resource deallocation, for the not really probable case of it being
>   reinvoked due to an error in a later dealloc function
>
> Two remarks:
> - I'm not sure I like the order in which things are added to the typemod
>   hashes, I wonder if some more careful organization could get rid of
>   the races. Doesn't seem critical, but would be a bit nicer.

I will have a think about whether I can improve that.  In an earlier
version I did things in a different order and had different problems.
The main hazard to worry about here is that you can't let any typmod
number escape into shmem where it might be read by others (for example
a concurrent session that wants a typmod for a TupleDesc that happens
to match) until the typmod number is resolvable back to a TupleDesc
(meaning you can look it up in shared_typmod_table).  Not
wasting/leaking memory in various failure cases is a secondary (but
obviously important) concern.

> - I'm not yet quite happy with the Session facility. I think it'd be
>   nicer if we'd a cleaner split between the shared memory notion of a
>   session and the local memory version of it. The shared memory version
>   would live in a ~max_connections sized array, referenced from
>   PGPROC. In a lot of cases it'd completely obsolete the need for a
>   shm_toc, because you could just store handles etc in there.  The local
>   memory version then would just store local pointers etc into that.
>
>   But I think we can get there incrementally.

+1 to all of the above.  I fully expect this to get changed around quite a lot.

I'll keep an eye out for problem reports.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-14 Thread Thomas Munro
On Fri, Sep 15, 2017 at 8:13 AM, Robert Haas  wrote:
> On Wed, Sep 13, 2017 at 5:18 AM, Alvaro Herrera  
> wrote:
>> Thinking further, I think changing patch status automatically may never
>> be a good idea; there's always going to be some amount of common sense
>> applied beforehand (such as if a conflict is just an Oid catalog
>> collision, or a typo fix in a recent commit, etc).  Such conflicts will
>> always raise errors with a tool, but would never cause a (decent) human
>> being from changing the patch status to WoA.
>
> Well it would perhaps be fine if sending an updated patch bounced it
> right back to Needs Review.  But if things are only being auto-flipped
> in one direction that's going to get tedious.
>
> Or one could imagine having the CF app show the CI status alongside
> the existing status column.

FWIW that last thing is the idea that I've been discussing with
Magnus.  That web page I made wouldn't exist, and the "apply" and
"build" badges would appear directly on the CF app and you'd be able
to sort and filter by them etc.  The main submission status would
still be managed by humans and the new apply and build statuses would
be managed by faithful robot servants.

How exactly that would work, I don't know.  One idea is that a future
cfbot, rewritten in better Python and adopted into the pginfra
universe, pokes CF via an HTTPS endpoint so that the CF app finishes
up with the information in its database.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-09-14 Thread Thomas Munro
On Thu, Sep 14, 2017 at 11:57 AM, Thomas Munro
 wrote:
> On Thu, Sep 14, 2017 at 12:51 AM, Prabhat Sahu
>  wrote:
>> Setting with lower "shared_buffers" and "work_mem" as below,  query getting 
>> crash but able to see explain plan.
>
> Thanks Prabhat.  A small thinko in the batch reset code means that it
> sometimes thinks the shared skew hash table is present and tries to
> probe it after batch 1.  I have a fix for that and I will post a new
> patch set just as soon as I have a good regression test figured out.

Fixed in the attached version, by adding a missing
"hashtable->shared->num_skew_buckets = 0;" to ExecHashFreeSkewTable().
I did some incidental tidying of the regression tests, but didn't
manage to find a version of your example small enough to put in a
regression tests.  I also discovered some other things:

1.  Multi-batch Parallel Hash Join could occasionally produce a
resowner warning about a leaked temporary File associated with
SharedTupleStore objects.  Fixed by making sure we call routines that
close all files handles in ExecHashTableDetach().

2.  Since last time I tested, a lot fewer TPCH queries choose a
Parallel Hash plan.  Not sure why yet.  Possibly because Gather Merge
and other things got better.  Will investigate.

3.  Gather Merge and Parallel Hash Join may have a deadlock problem.
Since Gather Merge needs to block waiting for tuples, but workers wait
for all participants (including the leader) to reach barriers.  TPCH
Q18 (with a certain set of indexes and settings, YMMV) has Gather
Merge over Sort over Parallel Hash Join, and although it usually runs
successfully I have observed one deadlock.  Ouch.  This seems to be a
more fundamental problem than the blocked TupleQueue scenario.  Not
sure what to do about that.

-- 
Thomas Munro
http://www.enterprisedb.com


parallel-hash-v20.patchset.tgz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Hash take II

2017-09-13 Thread Thomas Munro
On Thu, Sep 14, 2017 at 12:51 AM, Prabhat Sahu
 wrote:
> Setting with lower "shared_buffers" and "work_mem" as below,  query getting 
> crash but able to see explain plan.

Thanks Prabhat.  A small thinko in the batch reset code means that it
sometimes thinks the shared skew hash table is present and tries to
probe it after batch 1.  I have a fix for that and I will post a new
patch set just as soon as I have a good regression test figured out.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] <> join selectivity estimate question

2017-09-13 Thread Thomas Munro
On Wed, Sep 6, 2017 at 11:14 PM, Ashutosh Bapat
 wrote:
> On Fri, Jul 21, 2017 at 4:10 AM, Thomas Munro
>  wrote:
>> That just leaves the question of whether we should try to handle the
>> empty RHS and single-value RHS cases using statistics.  My intuition
>> is that we shouldn't, but I'll be happy to change my intuition and
>> code that up if that is the feedback from planner gurus.
>
> Empty RHS can result from dummy relations also, which are produced by
> constraint exclusion, so may be that's an interesting case. Single
> value RHS may be interesting with partitioned table with all rows in a
> given partition end up with the same partition key value. But may be
> those are just different patches. I am not sure.

Can you elaborate on the constraint exclusion case?  We don't care
about the selectivity of an excluded relation, do we?

Any other views on the empty and single value special cases, when
combined with [NOT] EXISTS (SELECT ... WHERE r.something <>
s.something)?  Looking at this again, my feeling is that they're too
obscure to spend time on, but others may disagree.

>> Please find attached a new version, and a test script I used, which
>> shows a bunch of interesting cases.  I'll add this to the commitfest.
>
> I added some "stable" tests to your patch taking inspiration from the
> test SQL file. I think those will be stable across machines and runs.
> Please let me know if those look good to you.

Hmm.  But they show actual rows, not plan->plan_rows, and although the
former is interesting as a sanity check the latter is the thing under
test here.  It seems like we don't have fine enough control of
EXPLAIN's output to show estimated rows but not cost.  I suppose we
could try to capture EXPLAIN's output somehow (plpgsql dynamic
execution or spool output from psql?) and then pull out just the row
estimates, maybe with extra rounding to cope with instability.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] proposal: psql: check env variable PSQL_PAGER

2017-09-13 Thread Thomas Munro
On Wed, Sep 6, 2017 at 4:12 AM, Pavel Stehule  wrote:
> 2017-09-05 18:06 GMT+02:00 Tom Lane :
>> Pushed, with some fooling with the documentation (notably,
>> re-alphabetizing relevant lists).
>>
> Thank you very much

I've started setting PSQL_PAGER="~/bin/pspg -s0" to try your new
column-aware pager from https://github.com/okbob/pspg for my regular
work.  Wow!  It could use some warning clean-up but it's a clever idea
and so far it works really well.  Thanks for making this.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Thomas Munro
On Wed, Sep 13, 2017 at 9:11 PM, Thomas Munro
 wrote:
> On Wed, Sep 13, 2017 at 8:58 PM, Ashutosh Sharma  
> wrote:
>> I am getting the following error message when trying to build latest
>> PG source on Windows,
>>
>> Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
>> C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468
>
> Googling around I see some indications that the macro may not be
> defined in all implementations and that some other projects test if
> it's defined:

Does this work for you Ashutosh?

-- 
Thomas Munro
http://www.enterprisedb.com


0001-Define-LDAP_NO_ATTRS-if-necessary.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting error message with latest PG source on Windows.

2017-09-13 Thread Thomas Munro
On Wed, Sep 13, 2017 at 8:58 PM, Ashutosh Sharma  wrote:
> I am getting the following error message when trying to build latest
> PG source on Windows,
>
> Error 1 error C2065: 'LDAP_NO_ATTRS' : undeclared identifier
> C:\Users\ashu\pgsql\src\backend\libpq\auth.c 2468
>
> I think, it got introduced in the following git commit,
>
> commit 83aaac41c66959a3ebaec7daadc4885b5f98f561
> Author: Peter Eisentraut 
> Date:   Tue Sep 12 09:46:14 2017 -0400
>
>  Allow custom search filters to be configured for LDAP auth.
>
> Please ignore this email if this issue has already been reported.

Hmm.  LDAP_NO_ATTRS was an addition to the patch discussed here:

https://www.postgresql.org/message-id/7fcac549-0051-34c8-0d62-63b921029f20%402ndquadrant.com

Googling around I see some indications that the macro may not be
defined in all implementations and that some other projects test if
it's defined:

https://github.com/spinn3r/squid-deb/blob/master/squid-2.7.STABLE9/helpers/basic_auth/LDAP/squid_ldap_auth.c#L160

It looks like that's OK because it's required to have the value "1.1":

https://tools.ietf.org/html/rfc4511

  3. A list containing only the OID "1.1" indicates that no
 attributes are to be returned.  If "1.1" is provided with other
 attributeSelector values, the "1.1" attributeSelector is
 ignored.  This OID was chosen because it does not (and can not)
 correspond to any attribute in use.

It seems like we need to do that.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL logging problem in 9.4.3?

2017-09-12 Thread Thomas Munro
On Wed, Sep 13, 2017 at 1:04 PM, Kyotaro HORIGUCHI
 wrote:
> The CF status of this patch turned into "Waiting on Author" by
> automated CI checking. However, I still don't get any error even
> on the current master (69835bc) after make distclean. Also I
> don't see any difference between the "problematic" patch and my
> working branch has nothing different other than patching line
> shifts. (So I haven't post a new one.)
>
> I looked on the location heapam.c:2502 where the CI complains at
> in my working branch and I found a different code with the
> complaint.
>
> https://travis-ci.org/postgresql-cfbot/postgresql/builds/27450
>
> 1363 heapam.c:2502:18: error: ‘HEAP_INSERT_SKIP_WAL’ undeclared (first use in 
> this function)
> 1364   if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))
>
> heapam.c:2502@work branch
> 2502:   /* XLOG stuff */
> 2503:   if (BufferNeedsWAL(relation, buffer))
>
> So I conclude that the CI mechinery failed to applly the patch
> correctly.

Hi Horiguchi-san,

Hmm.  Here is that line in heamap.c in unpatched master:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/heapam.c;h=d20f0381f3bc23f99c505ef8609d63240ac5d44b;hb=HEAD#l2485

It says:

2485 if (!(options & HEAP_INSERT_SKIP_WAL) && RelationNeedsWAL(relation))

After applying fix-wal-level-minimal-michael-horiguchi-3.patch from
this message:

https://www.postgresql.org/message-id/20170912.131441.20602611.horiguchi.kyotaro%40lab.ntt.co.jp

... that line is unchanged, although it has moved to line number 2502.
It doesn't compile for me, because your patch removed the definition
of HEAP_INSERT_SKIP_WAL but hasn't removed that reference to it.

I'm not sure what happened.  Is it possible that your patch was not
created by diffing against master?

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Log LDAP "diagnostic messages"?

2017-09-12 Thread Thomas Munro
On Tue, Sep 12, 2017 at 11:23 PM, Ashutosh Bapat
 wrote:
> On Wed, Aug 16, 2017 at 11:13 AM, Ashutosh Bapat
>  wrote:
>> On Wed, Aug 16, 2017 at 8:44 AM, Alvaro Herrera
>>  wrote:
>>> Christoph Berg wrote:
>>>> "Diagnostic message" doesn't really mean anything, and printing
>>>> "DETAIL: Diagnostic message: " seems redundant to me. Maybe
>>>> drop that prefix? It should be clear from the context that this is a
>>>> message from the LDAP layer.
>>>
>>> I think making it visible that the message comes from LDAP (rather than
>>> Postgres or anything else) is valuable.  How about this?
>>>
>>> LOG:  could not start LDAP TLS session: Protocol error
>>> DETAIL:  LDAP diagnostics: unsupported extended operation.
>>>
>> +1, pretty neat.

Here is a new version adopting Alvaro's wording.  I'll set this back
to "Needs review" status.

-- 
Thomas Munro
http://www.enterprisedb.com


ldap-diagnostic-message-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH v1] Add and report the new "in_hot_standby" GUC pseudo-variable.

2017-09-12 Thread Thomas Munro
On Wed, Sep 13, 2017 at 3:48 AM, Elvis Pranskevichus  wrote:
> I incorporated those bits into your patch and rebased in onto master.
> Please see attached.
>
> FWIW, I think that mixing the standby status and the default
> transaction writability is suboptimal.  They are related, yes, but not
> the same thing.  It is possible to have a master cluster in the
> read-only mode, and with this patch it would be impossible to
> distinguish from a hot-standby replica without also polling
> pg_is_in_recovery(), which defeats the purpose of having to do no
> database roundtrips.

Hi Elvis,

FYI the recovery test 001_stream_rep.pl fails with this patch applied.
You can see that if you configure with --enable-tap-tests, build and
then cd into src/test/recovery and "make check".

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More flexible LDAP auth search filters?

2017-09-12 Thread Thomas Munro
On Wed, Sep 13, 2017 at 1:55 AM, Peter Eisentraut
 wrote:
> On 9/11/17 23:58, Thomas Munro wrote:
>> Sounds good.  Here it is with $username.  It's nice not to have to
>> escape any characters in URLs.  I suppose more keywords could be added
>> in follow-up patches if someone thinks that would be useful
>> ($hostname, $dbname, ...?).  I got sick of that buffer sizing code and
>> changed it to use StringInfo.  Here also are your test patches tweaked
>> slightly: 0002 just adds FreeBSD support as per previous fixup and
>> 0003 changes to $username.
>
> Committed the feature patch.

Thanks!

> Any further thoughts on the test suite?  Otherwise I'll commit it as we
> have it, for manual use.

I wonder if there is a reasonable way to indicate or determine whether
you have slapd installed so that check-world could run this test...

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-12 Thread Thomas Munro
On Tue, Sep 12, 2017 at 12:45 AM, Tomas Vondra
 wrote:
> That won't work until (2) is reliable enough. There are patches (for
> example my "multivariate MCV lists and histograms") which fails to apply
> only because the tool picks the wrong patch. Possibly because it does
> not recognize compressed patches, or something.

FWIW I told it how to handle your .patch.gz files and Alexander
Lakhin's .tar.bz2 files.  Your patch still didn't apply anyway due to
bitrot, but I see you've just posted a new one so it should hopefully
turn green soon.  (It can take a while because it rotates through the
submissions at a rate of one submission every 5 minutes after a new
commit to master is detected, since I don't want to get in trouble for
generating excessive load against the Commitfest, Github or (mainly)
Travis CI.  That's probably too cautious and over time we can revise
it.)

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-12 Thread Thomas Munro
On Wed, Sep 13, 2017 at 2:34 AM, Alvaro Herrera  wrote:
> Tom Lane wrote:
>> Aleksander Alekseev  writes:
>> > I've ended up with this script [1]. It just generates a list of patches
>> > that are in "Needs Review" status but don't apply or don't compile. Here
>> > is the current list:
>>
>> > === Apply Failed: 29 ===
>> > https://commitfest.postgresql.org/14/1235/ (Support arrays over domain 
>> > types)
>>
>> Can you clarify what went wrong for you on that one?  I went to rebase it,
>> but I end up with the identical patch except for a few line-numbering
>> variations.
>
> I think "git apply" refuses to apply a patch if it doesn't apply
> exactly.  So you could use "git apply -3" (which merges) or just plain
> old "patch" and the patch would work fine.
>
> If the criteria is that strict, I think we should relax it a bit to
> avoid punting patches for pointless reasons.  IOW I think we should at
> least try "git apply -3".

The cfbot is not using git apply, it's using plain old GNU patch
invoked with "patch -p1".  From http://commitfest.cputube.org/ if you
click the "apply|failing" badge you can see the log from the failed
apply attempt.  It says:

== Fetched patches from message ID 3881.1502471872%40sss.pgh.pa.us
== Applying on top of commit 2d4a614e1ec34a746aca43d6a02aa3344dcf5fd4
== Applying patch 01-rationalize-coercion-APIs.patch...
== Applying patch 02-reimplement-ArrayCoerceExpr.patch...
1 out of 1 hunk FAILED -- saving rejects to file
src/pl/plpgsql/src/pl_exec.c.rej

It seems to be a legitimate complaint.  The rejected hunk is trying to
replace this line:

!   return exec_simple_check_node((Node *) ((ArrayCoerceExpr
*) node)->arg);

But you removed exec_simple_check_node in
00418c61244138bd8ac2de58076a1d0dd4f539f3, so this 02 patch needs to be
rebased.

> Also, at this point this should surely be just an experiment.

+1

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patches that don't apply or don't compile: 2017-09-12

2017-09-12 Thread Thomas Munro
On Wed, Sep 13, 2017 at 2:55 AM, Andreas Karlsson  wrote:
> On 09/12/2017 04:14 PM, Aleksander Alekseev wrote:
>>
>> Title: Foreign Key Arrays
>> Author: Mark Rofail
>> URL:https://commitfest.postgresql.org/14/1252/
>
>
> I am currently reviewing this one and it applies, compiles, and passes the
> test suite. It could be the compilation warnings which makes the system
> think it failed, but I could not find the log of the failed build.

I guess you didn't run "make check-world", because it crashes in the
contrib regression tests:

https://travis-ci.org/postgresql-cfbot/postgresql/builds/274732512

Sorry that the build logs are a bit hard to find at the moment.
Starting from http://commitfest.cputube.org/ if you click the
"build|failing" badge you'll land at
https://travis-ci.org/postgresql-cfbot/postgresql/branches and then
you have to locate the right branch, in this case commitfest/14/1252,
and then click the latest build link which (in this case) looks like
"# 4603 failed".  Eventually I'll have time to figure out how to make
the "build|failing" badge take you there directly...   Eventually I'll
also teach it how to dump a backtrace out of gdb the tests leave a
smouldering core.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Automatic testing of patches in commit fest

2017-09-12 Thread Thomas Munro
On Tue, Sep 12, 2017 at 10:03 PM, Aleksander Alekseev
 wrote:
> Unless there are any objections I'm going to give these patches "Waiting
> on Author" status today (before doing this I'll re-run the script to
> make sure that the list is up-to-date). I'm also going to write one more
> email with CC to the authors of these patches to let them know that the
> status of their patch has changed.

I vote +1 with the caveat that you should investigate each one a bit
to make sure the cfbot isn't just confused about the patch first.
I've also been poking a few threads to ask for rebases + and report
build failures etc, though I haven't been changing statuses so far.

I like your idea of automating CF state changes, but I agree with
Tomas that the quality isn't high enough yet.  I think we should treat
this is a useful tool to guide humans for now, but start trying to
figure out how to integrate some kind of CI with the CF app.  It
probably involves some stricter rules about what exactly constitutes a
patch submission (acceptable formats, whether/how dependencies are
allowed etc).  Right now if cfbot fails to understand your patch
that's cfbot's fault, but if we were to nail down the acceptable
formats then it'd become your fault if it didn't understand your patch
:-D

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   3   4   5   6   7   8   9   >