date:20180331

Re: Feature Request - DDL deployment with logical replication

2018-03-31 Thread Nikhil Sontakke

Hi Jeremy,

> My whole point is that in most architectures, DBAs decide to deploy the same
> SQL on providers and subscribers.  Yes it isn't perfect, but IMO, it is very
> helpful to try to automate that idea, as opposed to trying to actually
> replicate DDL at the low level.  The latter is better, yes, but seems to
> have proven extremely difficult.  Hence, why you see the advent of functions
> to pipe DDL through the replication stream.
>

The community is currently working on in the current commitfest to try
and get logical decoding of 2PC in into the core.

Once something like that gets in, for a majority of subset of DDLs
(which works inside transaction blocks), one of the use cases of that
functionality could be to trap these DDLs and convert them into
implicit 2PC commands. Details need to be worked out, but we would get
all the logical replication cluster nodes in sync with each other and
issue a PREPARE transaction involving this DDL on all nodes in the
logical replication cluster. If any of the nodes is not able to
successfully prepare this DDL, then we can rollback or else commit the
2PC, thus moving the entire logical cluster consistently in terms of
schema changes.

Regards,
Nikhils
-- 
 Nikhil Sontakke   http://www.2ndQuadrant.com/
 PostgreSQL/Postgres-XL Development, 24x7 Support, Training & Services

Re: 2018-03 Commitfest Summary (Andres #1)

2018-03-31 Thread Fabien COELHO



Hello Bruce,


Has anyone considered moving pgbench out of our git tree and into a
separate project where a separate team could maintain and improve it?


The movements has been the exact reverse: it was initially in contrib 
where it had some independence, and has been promoted to the main source 
tree by Peter Eisentraut in March 2015, effective on 9.5 release.


Not sure why... Handy dev tool for testing? One regression test in pgbench 
is really testing some postgres feature, and it could be used more for 
this purpose (eg generating a continuous stream of sql stuff to test 
failover, replication, ...).


As pointed out by Pavel, there is significant code sharing with psql 
(scanner, \if stuff), which may grow even more if pgbench client-side 
expressions are moved there as well (whether this is actually desired is 
pretty unclear, though).


So I do not think it would be desirable or practical to have it outside.

However, it helps explain the disagreements about pgbench features: 
pgbench internal developer-oriented use for postgres is somehow limited, 
and a lot of new features are suggested with an external performance 
benchmarking use in mind, about which core committers do not seem much 
interested.


--
Fabien.

Re: csv format for psql

2018-03-31 Thread Fabien COELHO



Bonjour Daniel,


For csv, Fabien and Peter expressed the opinion that we shouldn't
create another fieldsep-like variable specifically for it, but instead
reuse fieldsep. That's what my latest patch does.

Now it turns out that sharing fieldsep comes with some problems.


Personnaly I do not have any problem with CSV defaulting to '|' separator, 
given that anyway people often use anything but a comma for the purpose, 
including '|'.


However Pavel wants to block the patch on this point. Too bad.

Maybe you can do (again) a patch with a 'fieldsep_csv' or whatever 
additional variable? Not sure about 'recordsep'...


And then we forward to committers both version and they can chose 
whatever they think best, including not committing anything?


--
Fabien.

Re: csv format for psql

2018-03-31 Thread Gavin Flower


On 31/03/18 21:33, Fabien COELHO wrote:


Bonjour Daniel,


For csv, Fabien and Peter expressed the opinion that we shouldn't
create another fieldsep-like variable specifically for it, but instead
reuse fieldsep. That's what my latest patch does.

Now it turns out that sharing fieldsep comes with some problems.


Personnaly I do not have any problem with CSV defaulting to '|' 
separator, given that anyway people often use anything but a comma for 
the purpose, including '|'.


However Pavel wants to block the patch on this point. Too bad.

Maybe you can do (again) a patch with a 'fieldsep_csv' or whatever 
additional variable? Not sure about 'recordsep'...


And then we forward to committers both version and they can chose 
whatever they think best, including not committing anything?


Would suggest the choice of 'fieldsep_csv' out of those two (a record 
would correspond to the whole row), and the default being a comma.



Cheers,
Gavin

Re: Planning counters in pg_stat_statements

2018-03-31 Thread legrand legrand

+1
Shouldn't this be added in next CF ?

nb: As plan_time is not included into total_time, could it be added to usage
(for statement eviction calculation) ? I will try to include plan_time into
my proposed version of pgss with planid.

http://www.postgresql-archive.org/Poc-pg-stat-statements-with-planid-td6014027.html

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread Amit Kapila

On Sat, Mar 31, 2018 at 12:05 PM, CharSyam  wrote:
> Amit, I agree with you.
>
> I changed my patch :) to remove old check.
>

- if (slot->sock < 0)
+ if (slot->sock == PGINVALID_SOCKET || slot->sock < 0)

I still see the same check. I think by mistake you have attached old patch.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-31 Thread Arthur Zakirov

Hello all,

I'd like to add new optional function to text search template named fini in
addition to init() and lexize(). It will be called by
RemoveTSDictionaryById() and AlterTSDictionary(). dispell_fini() will call
ts_dict_shmem_release().

It doesn't change segments leaking situation. I think it makes text search
API more transparent.

I'll update the existing documentation. And I think I can add text search
API documentation in the 2018-09 commitfest, as Tom noticed that it doesn't
exist.

Any thoughts?


-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

bulk typos

2018-03-31 Thread Justin Pryzby

I needed another distraction so bulk-checked for typos, limited to comments in
*.[ch].

I'm not passionate about this, but it serves the purpose of reducing the
overhead of fixing them individually.

Also I heard something here recently about ugly languages..
time find . -name '*.c' -print0 |xargs -r0 sed -s '/.*\/\*/!d; s///; :l; 
/\*\/.*/!{N;b l}; s///; s/.*/\L&/' |grep -Eo '[[:alpha:]]{3,}' |sort |uniq -c 
|sort -nr |awk '$1==1{print $2}' |grep -xFvf /usr/share/dict/words |less

If any of these are disputed or objectionable, I would summarily discard them,
as I'm sure I missed some and fixing every last typo wasn't really the ghoul.

Justin
diff --git a/contrib/pgcrypto/rijndael.c b/contrib/pgcrypto/rijndael.c
index 4c074ef..6938701 100644
--- a/contrib/pgcrypto/rijndael.c
+++ b/contrib/pgcrypto/rijndael.c
@@ -163,7 +163,7 @@ gen_tabs(void)
q;
 
/* log and power tables for GF(2**8) finite field with  */
-   /* 0x11b as modular polynomial - the simplest prmitive  */
+   /* 0x11b as modular polynomial - the simplest primitive */
/* root is 0x11, used here to generate the tables   */
 
for (i = 0, p = 1; i < 256; ++i)
diff --git a/src/backend/access/common/session.c 
b/src/backend/access/common/session.c
index 617c3e1..ffa7432 100644
--- a/src/backend/access/common/session.c
+++ b/src/backend/access/common/session.c
@@ -60,7 +60,7 @@ InitializeSession(void)
  * Initialize the per-session DSM segment if it isn't already initialized, and
  * return its handle so that worker processes can attach to it.
  *
- * Unlike the per-context DSM segment, this segement and its contents are
+ * Unlike the per-context DSM segment, this segment and its contents are
  * reused for future parallel queries.
  *
  * Return DSM_HANDLE_INVALID if a segment can't be allocated due to lack of
diff --git a/src/backend/access/nbtree/nbtinsert.c 
b/src/backend/access/nbtree/nbtinsert.c
index e85abcf..4011199 100644
--- a/src/backend/access/nbtree/nbtinsert.c
+++ b/src/backend/access/nbtree/nbtinsert.c
@@ -187,9 +187,9 @@ top:
_bt_relbuf(rel, buf);
 
/*
-* Something did not workout. Just forget about 
the cached
+* Something did not work out. Just forget 
about the cached
 * block and follow the normal path. It might 
be set again if
-* the conditions are favourble.
+* the conditions are favourable.
 */
RelationSetTargetBlock(rel, InvalidBlockNumber);
}
diff --git a/src/backend/executor/execProcnode.c 
b/src/backend/executor/execProcnode.c
index 43a27a9..a3fb449 100644
--- a/src/backend/executor/execProcnode.c
+++ b/src/backend/executor/execProcnode.c
@@ -409,7 +409,7 @@ ExecSetExecProcNode(PlanState *node, ExecProcNodeMtd 
function)
 * Add a wrapper around the ExecProcNode callback that checks stack 
depth
 * during the first execution and maybe adds an instrumentation
 * wrapper. When the callback is changed after execution has already 
begun
-* that means we'll superflously execute ExecProcNodeFirst, but that 
seems
+* that means we'll superfluously execute ExecProcNodeFirst, but that 
seems
 * ok.
 */
node->ExecProcNodeReal = function;
diff --git a/src/backend/jit/llvm/llvmjit_expr.c 
b/src/backend/jit/llvm/llvmjit_expr.c
index 0d8c2bd..f37ff82 100644
--- a/src/backend/jit/llvm/llvmjit_expr.c
+++ b/src/backend/jit/llvm/llvmjit_expr.c
@@ -1768,7 +1768,7 @@ llvm_compile_expr(ExprState *state)

b_compare_result,
b_null);
 
-   /* build block analying the !NULL 
comparator result */
+   /* build block analyzing the !NULL 
comparator result */
LLVMPositionBuilderAtEnd(b, 
b_compare_result);
 
/* if results equal, compare next, 
otherwise done */
diff --git a/src/backend/optimizer/geqo/geqo_misc.c 
b/src/backend/optimizer/geqo/geqo_misc.c
index 919d288..0f96912 100644
--- a/src/backend/optimizer/geqo/geqo_misc.c
+++ b/src/backend/optimizer/geqo/geqo_misc.c
@@ -92,7 +92,7 @@ print_gen(FILE *fp, Pool *pool, int generation)
 {
int lowest;
 
-   /* Get index to lowest ranking gene in poplulation. */
+   /* Get index to lowest ranking gene in population. */
/* Use 2nd to last since last is buffer. */
lowest = pool->size > 1 ? pool->size - 2 : 0;
 
diff --git a/src/backend/optimizer/plan/planner.c 
b/src/backend/optimizer/plan/planner.c
index 60e2f21..e3c5d4

Re: bulk typos

2018-03-31 Thread Félix GERZAGUET

On Sat, Mar 31, 2018 at 12:56 PM, Justin Pryzby 
wrote:

> I needed another distraction so bulk-checked for typos, limited to
> comments in
> *.[ch].
>

I think you introduced another one while changing "explcitly" to
"expilcitly" instead of "explicitly" :-)

--
Félix

Re: Online enabling of checksums

2018-03-31 Thread Magnus Hagander

On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra 
wrote:

> Hi,
>
> I've been looking at the patch a bit more, and I think there are a
> couple of fairly serious issues in the error handling.
>

Thanks!


>
> Firstly ChecksumHelperLauncherMain spends quite a bit of effort on
> skipping dropped databases, but ChecksumHelperWorkerMain does not do the
> same thing with tables. I'm not exactly sure why, but I'd say dropped
> tables are more likely than dropped databases (e.g. because of temporary
> tables) and it's strange to gracefully handle the more rare case.
>

Uh, yes. I could've sworn we had code for that, but I fully agree with your
assessment that it's not there :)



Now, when a table gets dropped after BuildRelationList() does it's work,
> we end up calling ProcessSingleRelationByOid() on that OID. Which calls
> relation_open(), which fails with elog(ERROR), terminating the whole
> bgworker with an error like this:
>
> ERROR:  could not open relation with OID 16632
> LOG:  background worker "checksumhelper worker" (PID 27152) exited
>   with exit code 1
>

Yeah. We need to trap that error an not crash and burn.


Which however means the error handling in ChecksumHelperWorkerMain() has
> no chance to kick in, because the bgworker dies right away. The code
> looks like this:
>
> foreach(lc, RelationList)
> {
> ChecksumHelperRelation *rel
>  = (ChecksumHelperRelation *) lfirst(lc);
>
> if (!ProcessSingleRelationByOid(rel->reloid, strategy))
> {
> ChecksumHelperShmem->success = ABORTED;
> break;
> }
> else
> ChecksumHelperShmem->success = SUCCESSFUL;
> }
> list_free_deep(RelationList);
>
> Now, assume the first relation in the list still exists and gets
> processed correctly, so "success" ends up being SUCCESSFUL. Then the
> second OID is the dropped relation, which kills the bgworker ...
>

Indeed, that's just a very simple bug. It shouldn't be set to SUCCESSFUL
until *all* tables have been processed. I believe the code needs to be this:


IMHO this error handling is broken by design - two things need to
> happen, I think: (a) graceful handling of dropped relations and (b)
> proper error reporting from the bgworder.
>
> (a) Should not be difficult to do, I think. We don't have relation_open
> with a missing_ok flag, but implementing something like that should not
> be difficult. Even a simple "does OID exist" should be enough.
>

Not entirely sure what you mean with "even a simple does oid exist" means?
I mean, if we check for the file, that won't help us -- there will still be
a tiny race between the check and us opening it won't it?

However, we have try_relation_open().  Which is documented as:
 * Same as relation_open, except return NULL instead of failing
 * if the relation does not exist.

So I'm pretty sure it's just a matter of using try_relation_open() instead,
and checking for NULL?


(b) But just handling dropped relations is not enough, because I could
> simply kill the bgworker directly, and it would have exactly the same
> consequences. What needs to happen is something like this:
>


And now I see your code, which was below-fold when I first read. After
having writing a very similar fix myself. I'm glad this code looks mostly
identical to what I suggested above, so I think that's a good solution.



> BTW I don't think handling dropped relations by letting the bgworker
> crash and restart is an acceptable approach. That would pretty much mean
> any DDL changes are prohibited on the system while the checksum process
> is running, which is not quite possible (e.g. for systems doing stuff
> with temporary tables).
>

No, I don't like that at all. We need to handle them gracefully, by
skipping them - crash and restart is not acceptable for something that
common.


Which however reminds me I've also ran into a bug in the automated retry
> system, because you may get messages like this:
>
> ERROR:  failed to enable checksums in "test", giving up (attempts
> 639968292).
>
> This happens because BuildDatabaseList() does just palloc() and does not
> initialize the 'attempts' field. It may get initialized to 0 by chance,
> but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
> high value.
>

Eh. I don't have that "(attempts" part in my code at all. Is that either
from some earlier version of the patch, or did you add that yourself for
testing?



BTW both ChecksumHelperRelation and ChecksumHelperDatabase have
> 'success' field which is actually unused (and uninitialized).
>

Correct. These are old leftovers from the "partial restart" logic, and
should be removed.


But wait - there is more ;-) BuildRelationList is using heap_beginscan
> with the regular snapshot, so it does not see uncommitted transactions.
> So if you do this:
>
>   BEGIN;
>   CREATE TABLE t AS SELECT i FROM generate_series(1,1000) s(i);
>   -- run pg_enable_data_checksums() fro

Re: Problem while setting the fpw with SIGHUP

2018-03-31 Thread Amit Kapila

On Wed, Mar 28, 2018 at 12:10 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 27 Mar 2018 22:02:26 +0900, Michael Paquier  
> wrote in <20180327130226.ga1...@paquier.xyz>
>> On Tue, Mar 27, 2018 at 09:01:20PM +0900, Kyotaro HORIGUCHI wrote:
>> > The current UpdateFullPageWrites is safe on standby and promotion
>> > so what we should consider is only the non-standby case. I think
>> > what we should do is just calling RecoveryInProgress() at the
>> > beginning of CheckPointerMain, which is just the same thing with
>> > InitPostgres, but before setting up signal handler to avoid
>> > processing SIGHUP before being ready to insert xlog.
>>
>> Your proposal does not fix the issue for a checkpointer process started
>> on a standby.  After a promotion, if SIGHUP is issued with a change in
>> full_page_writes, then the initialization of InitXLogInsert() would
>> happen again in the critical section of UpdateFullPageWrites().  The
>> window is rather small for normal promotions as the startup process
>> requests a checkpoint which would do the initialization, and much larger
>> for fallback_promote where the startup process is in charge of doing the
>> end-of-recovery checkpoint.
>
> Yeah. I realized that after sending the mail.
>
> I looked closer and found several problems there.
>
> - On standby, StartupXLOG calls UpdateFullPageWrites and
>   checkpointer can call the same function simultaneously, but it
>   doesn't assume concurrent call.
>
> - StartupXLOG can make a concurrent write to
>   Insert->fullPageWrite so it needs to be locked.
>
> - At the time of the very end of recovery, the startup process
>   ignores possible change of full_page_writes GUC. It sticks with
>   the startup value. It leads to loss of
>   XLOG_CHANGE_FPW. (StartXLOG is not considering GUC changes by
>   reload)
>

Won't this be covered by checkpointer process?  Basically, the next
time checkpointer sees that it has received the sighup, it will call
UpdateFullPageWrites which will log the record if required.

In general, I was wondering why in the first place this variable
(full_page_writes) is a SIGHUP variable?  I think if the user tries to
switch it to 'on' from 'off', it won't guarantee the recovery from
torn pages.  Yeah, one can turn it to 'off' from 'on' without any
problem, but as the reverse doesn't guarantee anything, it can confuse
users. What do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread CharSyam

Thanks Amit.
I had a mistake. Thank you again to point it out :)


2018-03-31 19:33 GMT+09:00 Amit Kapila :
> On Sat, Mar 31, 2018 at 12:05 PM, CharSyam  wrote:
>> Amit, I agree with you.
>>
>> I changed my patch :) to remove old check.
>>
>
> - if (slot->sock < 0)
> + if (slot->sock == PGINVALID_SOCKET || slot->sock < 0)
>
> I still see the same check. I think by mistake you have attached old patch.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com


simple_check.patch
Description: Binary data

Re: [PATCH] Verify Checksums during Basebackups

2018-03-31 Thread Michael Banck

Hi,

On Fri, Mar 30, 2018 at 07:46:02AM -0400, Stephen Frost wrote:
> * Magnus Hagander (mag...@hagander.net) wrote:
> > On Fri, Mar 30, 2018 at 5:35 AM, David Steele  wrote:
> > 
> > > On 3/24/18 10:32 AM, Michael Banck wrote:
> > > > Am Freitag, den 23.03.2018, 17:43 +0100 schrieb Michael Banck:
> > > >> Am Freitag, den 23.03.2018, 10:54 -0400 schrieb David Steele:
> > > >>> In my experience actual block errors are relatively rare, so there
> > > >>> aren't likely to be more than a few in a file.  More common are
> > > >>> overwritten or transposed files, rogue files, etc.  These produce a 
> > > >>> lot
> > > >>> of output.
> > > >>>
> > > >>> Maybe stop after five?
> > > >
> > > > The attached patch does that, and outputs the total number of
> > > > verification failures of that file after it got sent.
> > > >
> > > >> I'm on board with this, but I have the feeling that this is not a very
> > > >> common pattern in Postgres, or might not be project style at all.  I
> > > >> can't remember even seen an error message like that.
> > > >>
> > > >> Anybody know whether we're doing this in a similar fashion elsewhere?
> > > >
> > > > I tried to have look around and couldn't find any examples, so I'm not
> > > > sure that patch should go in. On the other hand, we abort on checksum
> > > > failures usually (in pg_dump e.g.), so limiting the number of warnings
> > > > does makes sense.
> > > >
> > > > I guess we need to see what others think.
> > >
> > > Well, at this point I would say silence more or less gives consent.
> > >
> > > Can you provide a rebased patch with the validation retry and warning
> > > limiting logic added? I would like to take another pass through it but I
> > > think this is getting close.
> > 
> > I was meaning to mention it, but ran out of cycles.
> > 
> > I think this is the right way to do it, except the 5 should be a #define
> > and not an inline hardcoded value :) We could argue whether it should be "5
> > total" or "5 per file". When I read the emails I thought it was going to be
> > 5 total, but I see the implementation does 5 / file. In a super-damanged
> > system that will still lead to horrible amounts of logging, but I think
> > maybe if your system is in that bad shape, then it's a lost cause anyway.
> 
> 5/file seems reasonable to me as well.
> 
> > I also think the "total number of checksum errors" should be logged if
> > they're >0, not >5. And I think *that* one should be logged at the end of
> > the entire process, not per file. That'd be the kind of output that would
> > be the most interesting, I think (e.g. if I have it spread out with 1 block
> > each across 4 files, I want that logged at the end because it's easy to
> > otherwise miss one or two of them that may have happened a long time apart).
> 
> I definitely like having a total # of checksum errors included at the
> end, if there are any at all.  When someone is looking to see why the
> process returned a non-zero exit code, they're likely to start looking
> at the end of the log, so having that easily available and clear as to
> why the backup failed is definitely valuable.
> 
> > I don't think we have a good comparison elsewhere, and that is as David
> > mention because other codepaths fail hard when they run into something like
> > that. And we explicitly want to *not* fail hard, per previous discussion.
> 
> Agreed.

Attached is a new and rebased patch which does the above, plus
integrates the suggested changes by David Steele. The output is now:

$ initdb -k --pgdata=data1 1> /dev/null 2> /dev/null
$ pg_ctl --pgdata=data1 --log=pg1.log start > /dev/null
$ dd conv=notrunc oflag=seek_bytes seek=4000 bs=8 count=1 if=/dev/zero  
of=data1/base/12374/2610 2> /dev/null
$ for i in 4000 13000 21000 29000 37000 43000; do dd conv=notrunc 
oflag=seek_bytes seek=$i bs=8 count=1 if=/dev/zero of=data1/base/12374/1259; 
done 2> /dev/null
$ pg_basebackup -v -h /tmp --pgdata=data2  
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/260 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_13882"
WARNING:  checksum verification failed in file "./base/12374/2610", block 0: 
calculated C2C9 but expected EC78
WARNING:  checksum verification failed in file "./base/12374/1259", block 0: 
calculated 8BAE but expected 46B8
WARNING:  checksum verification failed in file "./base/12374/1259", block 1: 
calculated E413 but expected 7701
WARNING:  checksum verification failed in file "./base/12374/1259", block 2: 
calculated 5DA9 but expected D5AA
WARNING:  checksum verification failed in file "./base/12374/1259", block 3: 
calculated 5651 but expected 4F5E
WARNING:  checksum verification failed in file "./base/12374/1259", block 4: 
calculated DF39 but expected DF00
WARNING:  further checksum verification failures in file "./base/12374/1259" 
will not be

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread Amit Kapila

On Sat, Mar 31, 2018 at 6:08 PM, CharSyam  wrote:
> Thanks Amit.
> I had a mistake. Thank you again to point it out :)
>

Thanks, your new patch looks good.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: Foreign keys and partitioned tables

2018-03-31 Thread Peter Eisentraut

On 3/29/18 23:19, Alvaro Herrera wrote:
> 0001 prohibits having foreign keys pointing to partitioned tables, as
> discussed above.

This is already prohibited.  You get an error

ERROR:  cannot reference partitioned table "fk_partitioned_pk"

Your patch 0001 just adds the same error check a few lines above the
existing one, while 0003 removes the existing one.

I think you can squash 0001 and 0003 together.

> 0002 is a fixup for a bug in the row triggers patch: I had a restriction
> earlier that triggers declared internal were not cloned, and I seem to
> have lost it in rebase.  Reinstate it.

Hmm, doesn't cause any test changes?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: hot_standby_feedback vs excludeVacuum and snapshots

2018-03-31 Thread Amit Kapila

On Thu, Mar 29, 2018 at 4:47 PM, Greg Stark  wrote:
> I'm poking around to see debug a vacuuming problem and wondering if
> I've found something more serious.
>
> As far as I can tell the snapshots on HOT standby are built using a
> list of running xids that the primary builds and puts in the WAL and
> that seems to include all xids from transactions running in all
> databases. The HOT standby would then build a snapshot and eventually
> send the xmin of that snapshot back to the primary in the hot standby
> feedback and that would block vacuuming tuples that might be visible
> to the standby.
>
> Many ages ago Alvaro sweated blood to ensure vacuums could run for
> long periods of time without holding back the xmin horizon and
> blocking other vacuums from cleaning up tuples. That's the purpose of
> the excludeVacuum flag in GetCurrentVirtualXIDs(). That's possible
> because we know vacuums won't insert any tuples that queries might try
> to view and also vacuums won't try to perform any sql queries on other
> tables.
>
> I can't find anywhere that the standby snapshot building mechanism
> gets this same information about which xids are actually vacuums that
> can be ignored when building a snapshot.
>

I think the vacuum assigns xids only if it needs to truncate some of
the pages in the relation which happens towards the end of vacuum.
So, it shouldn't hold back the xmin horizon for long.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: WIP: a way forward on bootstrap data

2018-03-31 Thread John Naylor

Attached is v13, rebased against b0c90c85fc.

Patch 0001:
-The data files are formatted to at most 80 columns wide.
-Rename rewrite_dat.pl to reformat_dat_file.pl.
-Refactor Catalog.pm and reformat_dat_file.pl to have better
separation of concerns.
-Add src/include/catalog/Makefile with convenience targets for
rewriting data files.

Patch 0002:
-Some adjustments to the post-conversion cleanup of data files.

Patch 0005:
-I made a stub version of Solution.pm to simulate testing the MSVC
build. This found one bug, and also allowed me to bring in some of the
more pedantic dependencies I added to utils/Makefile.

Patch 0009:
-New patch that puts all doc changes in one patch, for flexibility.
-Split the parts of catalog/README having to do with data into a new
README.data file. Add recipes for how to edit data, with code
examples.

On 3/26/18, Tom Lane  wrote:
> It would be more work, but maybe we should move this into the main
> SGML docs.  It seems rather silly to have SGML documentation for the
> .BKI file format, which now will be an internal matter that hardly
> any developers need worry about, but not for the .DAT file format.
> But I understand if that seems a bridge too far for today --- certainly
> a README file is way better than nothing.

I had an idea to make it less silly without doing as much work: Get
rid of the SGML docs for the BKI format, and turn them into
bootstrap/README. Thoughts?

And in the department of second thoughts, it occurred to me that the
only reason that the .dat files are in include/catalog is because
that's where the DATA() statements were. Since they are separate now,
one could make the case that they actually belong in backend/catalog.
One trivial advantage here is that there is already an existing
Makefile in which to put convenience targets for formatting. On the
other hand, it kind of makes sense to have the files describing the
schema (.h) and the contents (.dat) in the same directory. I'm
inclined to leave things as they are for that reason.

-John Naylor

v13-bootstrap-data-conversion.tar.gz
Description: GNU Zip compressed data

Re: Small proposal to improve out-of-memory messages

2018-03-31 Thread Alvaro Herrera

Tom Lane wrote:

> In the wake of commit 442accc3f one might think that the message should
> also include the context "identifier" if any.  But I'm specifically not
> proposing that, because of the point made in the discussion of that patch
> that some identifier strings might contain security-sensitive query
> excerpts.  Now that that commit has required all context "names" to be
> compile-time constants, it's hard to see how there could be any security
> downside to showing them in a user-visible message.

How about using errdetail_log() to include the context identifier?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Feature Request - DDL deployment with logical replication

2018-03-31 Thread Craig Ringer

On 31 March 2018 at 15:53, Nikhil Sontakke  wrote:

> Hi Jeremy,
>
> > My whole point is that in most architectures, DBAs decide to deploy the
> same
> > SQL on providers and subscribers.  Yes it isn't perfect, but IMO, it is
> very
> > helpful to try to automate that idea, as opposed to trying to actually
> > replicate DDL at the low level.  The latter is better, yes, but seems to
> > have proven extremely difficult.  Hence, why you see the advent of
> functions
> > to pipe DDL through the replication stream.
> >
>
> The community is currently working on in the current commitfest to try
> and get logical decoding of 2PC in into the core.
>
> Once something like that gets in, for a majority of subset of DDLs
> (which works inside transaction blocks), one of the use cases of that
> functionality could be to trap these DDLs and convert them into
> implicit 2PC commands. Details need to be worked out, but we would get
> all the logical replication cluster nodes in sync with each other and
> issue a PREPARE transaction involving this DDL on all nodes in the
> logical replication cluster. If any of the nodes is not able to
> successfully prepare this DDL, then we can rollback or else commit the
> 2PC, thus moving the entire logical cluster consistently in terms of
> schema changes.
>
>
We'll still need a mechanism to transport them to downstreams (like WAL
messages) and to send responses upstream. For responses I think we will
finally want to add a backchannel to the logical replication protocol as
I've wanted for a long while: downstream can send a COPY message on COPY
BOTH proto back to upstream, which passes it to a callback on the output
plugin for the output plugin to act on.

The main issue I had when I tried to prototype this before was IIRC not
knowing how to set up the right snapshot in which to execute the callback.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Online enabling of checksums

2018-03-31 Thread Tomas Vondra

Hi,

On 03/31/2018 02:02 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
>
> ...
> 
> (a) Should not be difficult to do, I think. We don't have relation_open
> with a missing_ok flag, but implementing something like that should not
> be difficult. Even a simple "does OID exist" should be enough.
> 
> 
> Not entirely sure what you mean with "even a simple does oid exist"
> means? I mean, if we check for the file, that won't help us -- there
> will still be a tiny race between the check and us opening it won't it?
> 

I meant to say "even a simple check if the OID still exists" but it was
a bit too late / not enough caffeine issue. You're right there would be
a tiny window of race condition - it'd be much shorter, possibly enough
to make the error+restart approach acceptable.

> However, we have try_relation_open().  Which is documented as:
>  *Same as relation_open, except return NULL instead of failing
>  *if the relation does not exist.
> 
> So I'm pretty sure it's just a matter of using try_relation_open()
> instead, and checking for NULL?
> 

Oh, right. I thought we had a relation_open variant that handles this,
but have been looking for one with missing_ok flag and so I missed this.
try_relation_open should do the trick when it comes to dropped tables.

> 
> (b) But just handling dropped relations is not enough, because I could
> simply kill the bgworker directly, and it would have exactly the same
> consequences. What needs to happen is something like this:
> 
> 
> 
> And now I see your code, which was below-fold when I first read. After
> having writing a very similar fix myself. I'm glad this code looks
> mostly identical to what I suggested above, so I think that's a good
> solution.
> 

Good ;-)

>  
> 
> BTW I don't think handling dropped relations by letting the bgworker
> crash and restart is an acceptable approach. That would pretty much mean
> any DDL changes are prohibited on the system while the checksum process
> is running, which is not quite possible (e.g. for systems doing stuff
> with temporary tables).
> 
> 
> No, I don't like that at all. We need to handle them gracefully, by
> skipping them - crash and restart is not acceptable for something that
> common.
> 

Yeah, I was just thinking aloud.

> 
> Which however reminds me I've also ran into a bug in the automated retry
> system, because you may get messages like this:
> 
>     ERROR:  failed to enable checksums in "test", giving up (attempts
>             639968292).
> 
> This happens because BuildDatabaseList() does just palloc() and does not
> initialize the 'attempts' field. It may get initialized to 0 by chance,
> but I'm running with -DRANDOMIZE_ALLOCATED_MEMORY, hence the insanely
> high value.
> 
> 
> Eh. I don't have that "(attempts" part in my code at all. Is that either
> from some earlier version of the patch, or did you add that yourself for
> testing?
> 

Apologies, you're right I tweaked the message a bit (just adding the
number of attempts to it). The logic however remains the same, and the
bug is real.

>
> But wait - there is more ;-) BuildRelationList is using heap_beginscan
> with the regular snapshot, so it does not see uncommitted transactions.
> So if you do this:
> 
>   BEGIN;
>   CREATE TABLE t AS SELECT i FROM generate_series(1,1000) s(i);
>   -- run pg_enable_data_checksums() from another session
>   SELECT COUNT(*) FROM t;
> 
> then the table will be invisible to the checksum worker, it won't have
> checksums updated and the cluster will get checksums enabled. Which
> means this:
> 
> 
> Ugh. Interestingly enough I just put that on my TODO list *yesterday*
> that I forgot to check that specific case :/
> 

But I was faster in reporting it ;-)

> 
>   test=# SELECT COUNT(*) FROM t;
>   WARNING:  page verification failed, calculated checksum 27170 but
>             expected 0
>   ERROR:  invalid page in block 0 of relation base/16677/16683
> 
> Not sure what's the best way to fix this - maybe we could wait for all
> running transactions to end, before starting the work.
> 
> 
> I was thinking of that as one way to deal with it, yes.
> 
> I guess a reasonable way to do that would be to do it as part of
> BuildRelationList() -- basically have that one wait until there is no
> other running transaction in that specific database before we take the
> snapshot?
> 
> A first thought I had was to try to just take an access exclusive lock
> on pg_class for a very short time, but a transaction that does create
> table doesn't actually keep it's lock on that table so there is no conflict.
> 

Yeah, I don't think that's going to work, because you don't even know
you need to lock/wait for something.

I do think just waiting for all running transactions to complete is
fine, and it's not the first place wher

Re: Parallel Aggregates for string_agg and array_agg

2018-03-31 Thread David Rowley

On 30 March 2018 at 02:55, Tomas Vondra  wrote:
> On 03/29/2018 03:09 PM, David Rowley wrote:
>> I meant to mention earlier that I coded
>> agg_args_have_sendreceive_funcs() to only check for send/receive
>> functions. Really we could allow a byval types without send/receive
>> functions, since the serial/deserial just send the raw datums in that
>> case, but then the function becomes
>> agg_byref_args_have_sendreceive_funcs(), which seemed a bit obscure,
>> so I didn't do that.  Maybe I should?
>
> I'd do that. Not sure the function name needs to change, but perhaps
> agg_args_support_sendreceive() would be better - it covers both byref
> types (which require send/receive functions) and byval (which don't).

The attached patch implements this.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


combinefn_for_string_and_array_aggs_v10.patch
Description: Binary data

Re: Online enabling of checksums

2018-03-31 Thread Magnus Hagander

On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra 
wrote:

>
> On 03/31/2018 02:02 PM, Magnus Hagander wrote:
> > On Sat, Mar 31, 2018 at 2:08 AM, Tomas Vondra
> > mailto:tomas.von...@2ndquadrant.com>>
> wrote:
> >
> >
> > But wait - there is more ;-) BuildRelationList is using
> heap_beginscan
> > with the regular snapshot, so it does not see uncommitted
> transactions.
> > So if you do this:
> >
> >   BEGIN;
> >   CREATE TABLE t AS SELECT i FROM generate_series(1,1000) s(i);
> >   -- run pg_enable_data_checksums() from another session
> >   SELECT COUNT(*) FROM t;
> >
> > then the table will be invisible to the checksum worker, it won't
> have
> > checksums updated and the cluster will get checksums enabled. Which
> > means this:
> >
> >
> > Ugh. Interestingly enough I just put that on my TODO list *yesterday*
> > that I forgot to check that specific case :/
> >
>
> But I was faster in reporting it ;-)
>

Indeed you were :)



> >   test=# SELECT COUNT(*) FROM t;
> >   WARNING:  page verification failed, calculated checksum 27170 but
> > expected 0
> >   ERROR:  invalid page in block 0 of relation base/16677/16683
> >
> > Not sure what's the best way to fix this - maybe we could wait for
> all
> > running transactions to end, before starting the work.
> >
> >
> > I was thinking of that as one way to deal with it, yes.
> >
> > I guess a reasonable way to do that would be to do it as part of
> > BuildRelationList() -- basically have that one wait until there is no
> > other running transaction in that specific database before we take the
> > snapshot?
> >
> > A first thought I had was to try to just take an access exclusive lock
> > on pg_class for a very short time, but a transaction that does create
> > table doesn't actually keep it's lock on that table so there is no
> conflict.
> >
>
> Yeah, I don't think that's going to work, because you don't even know
> you need to lock/wait for something.


> I do think just waiting for all running transactions to complete is
> fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
> does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
> too, to some extent). So we have a precedent / working code we can copy.
>

Thinking again, I don't think it should be done as part of
BuildRelationList(). We should just do it once in the launcher before
starting, that'll be both easier and cleaner. Anything started after that
will have checksums on it, so we should be fine.

PFA one that does this.


> And if you try this with a temporary table (not hidden in transaction,
> > so the bgworker can see it), the worker will fail with this:
> >
> >   ERROR:  cannot access temporary tables of other sessions
> >
> > But of course, this is just another way how to crash without updating
> > the result for the launcher, so checksums may end up being enabled
> > anyway.
> >
> >
> > Yeah, there will be plenty of side-effect issues from that
> > crash-with-wrong-status case. Fixing that will at least make things
> > safer -- in that checksums won't be enabled when not put on all pages.
> >
>
> Sure, the outcome with checksums enabled incorrectly is a consequence of
> bogus status, and fixing that will prevent that. But that wasn't my main
> point here - not articulated very clearly, though.
>
> The bigger question is how to handle temporary tables gracefully, so
> that it does not terminate the bgworker like this at all. This might be
> even bigger issue than dropped relations, considering that temporary
> tables are pretty common part of applications (and it also includes
> CREATE/DROP).
>
> For some clusters it might mean the online checksum enabling would
> crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
>
> Unfortunately, try_relation_open() won't fix this, as the error comes
> from ReadBufferExtended. And it's not a matter of simply creating a
> ReadBuffer variant without that error check, because temporary tables
> use local buffers.
>
> I wonder if we could just go and set the checksums anyway, ignoring the
> local buffers. If the other session does some changes, it'll overwrite
> our changes, this time with the correct checksums. But it seems pretty
> dangerous (I mean, what if they're writing stuff while we're updating
> the checksums? Considering the various short-cuts for temporary tables,
> I suspect that would be a boon for race conditions.)
>
> Another option would be to do something similar to running transactions,
> i.e. wait until all temporary tables (that we've seen at the beginning)
> disappear. But we're starting to wait on more and more stuff.
>
> If we do this, we should clearly log which backends we're waiting for,
> so that the admins can go and interrupt them manually.
>


Yeah, waiting for all transactions at the beginning is pretty simple.

Making the worker simply ignore temporary tables would also be easy.

One of

Re: Small proposal to improve out-of-memory messages

2018-03-31 Thread Tom Lane

Alvaro Herrera  writes:
> Tom Lane wrote:
>> In the wake of commit 442accc3f one might think that the message should
>> also include the context "identifier" if any.  But I'm specifically not
>> proposing that, because of the point made in the discussion of that patch
>> that some identifier strings might contain security-sensitive query
>> excerpts.  Now that that commit has required all context "names" to be
>> compile-time constants, it's hard to see how there could be any security
>> downside to showing them in a user-visible message.

> How about using errdetail_log() to include the context identifier?

Not really excited about that; we have no field experience to say it'd
be useful.  If we start finding ourselves asking "exactly which ExprState
was that", we could revisit the question perhaps.

Furthermore, because elog.c constructs the whole detail string in
ErrorContext, doing this would create a significant OOM hazard anytime
the context ID didn't fit into the preallocated ErrorContext space.
Context names are generally short enough that they're not adding to
our risk there, but the ID strings could be very long in some cases.
(mcxt.c avoids this hazard by writing directly to stderr, but elog.c
can't do that.)

regards, tom lane

Re: csv format for psql

2018-03-31 Thread Isaac Morland

On 31 March 2018 at 04:33, Fabien COELHO  wrote:

>
> Bonjour Daniel,
>
> For csv, Fabien and Peter expressed the opinion that we shouldn't
>> create another fieldsep-like variable specifically for it, but instead
>> reuse fieldsep. That's what my latest patch does.
>>
>> Now it turns out that sharing fieldsep comes with some problems.
>>
>
> Personnaly I do not have any problem with CSV defaulting to '|' separator,
> given that anyway people often use anything but a comma for the purpose,
> including '|'.
>
> However Pavel wants to block the patch on this point. Too bad.
>

OK, mostly trying to avoid commenting because I doubt I have much to add.
But. If I ask for CSV and don't specify any overrides, I expect to get
"C"omma separated values, not some other character. More specifically, if I
say --csv I expect to get files that are identical with what I would get if
I used COPY ... CSV. Actually, COPY ... CSV HEADER, given that psql shows
column headings. This also implies that I expect the quoting and related
details that are associated with CSV.

And I don't think I'm a weird user. If --csv does anything even a little
different from a simple COPY invocation on the same query, at some point
it's going to bite somebody and they will rightfully curse the design
decisions taken in this thread.

Re: WIP: a way forward on bootstrap data

2018-03-31 Thread Tom Lane

John Naylor  writes:
> And in the department of second thoughts, it occurred to me that the
> only reason that the .dat files are in include/catalog is because
> that's where the DATA() statements were. Since they are separate now,
> one could make the case that they actually belong in backend/catalog.
> One trivial advantage here is that there is already an existing
> Makefile in which to put convenience targets for formatting. On the
> other hand, it kind of makes sense to have the files describing the
> schema (.h) and the contents (.dat) in the same directory. I'm
> inclined to leave things as they are for that reason.

Yeah.  The fact that, eg, both the .h and .dat files are inputs to
duplicate_oids and unused_oids makes me think it's better to keep
them together.

I'd actually been thinking of something that's about the reverse:
instead of building the derived .h files in backend/catalog and
then symlinking them into include/catalog, it'd be saner to build
them in include/catalog to begin with.  However, that would mean
that the Perl scripts need to produce output in two different
places, so maybe it'd end up more complicated not less so.
In any case, that seems like something to leave for another day.

regards, tom lane

Re: Online enabling of checksums

2018-03-31 Thread Tomas Vondra

On 03/31/2018 05:05 PM, Magnus Hagander wrote:
> On Sat, Mar 31, 2018 at 4:21 PM, Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> ...
> 
> I do think just waiting for all running transactions to complete is
> fine, and it's not the first place where we use it - CREATE SUBSCRIPTION
> does pretty much exactly the same thing (and CREATE INDEX CONCURRENTLY
> too, to some extent). So we have a precedent / working code we can copy.
> 
> 
> Thinking again, I don't think it should be done as part of
> BuildRelationList(). We should just do it once in the launcher before
> starting, that'll be both easier and cleaner. Anything started after
> that will have checksums on it, so we should be fine.
> 
> PFA one that does this.
> 

Seems fine to me. I'd however log waitforxid, not the oldest one. If
you're a DBA and you want to make the checksumming to proceed, knowing
the oldest running XID is useless for that. If we log waitforxid, it can
be used to query pg_stat_activity and interrupt the sessions somehow.

> 
> >     And if you try this with a temporary table (not hidden in 
> transaction,
> >     so the bgworker can see it), the worker will fail with this:
> >
> >       ERROR:  cannot access temporary tables of other sessions
> >
> >     But of course, this is just another way how to crash without 
> updating
> >     the result for the launcher, so checksums may end up being enabled
> >     anyway.
> >
> >
> > Yeah, there will be plenty of side-effect issues from that
> > crash-with-wrong-status case. Fixing that will at least make things
> > safer -- in that checksums won't be enabled when not put on all pages. 
> >
> 
> Sure, the outcome with checksums enabled incorrectly is a consequence of
> bogus status, and fixing that will prevent that. But that wasn't my main
> point here - not articulated very clearly, though.
> 
> The bigger question is how to handle temporary tables gracefully, so
> that it does not terminate the bgworker like this at all. This might be
> even bigger issue than dropped relations, considering that temporary
> tables are pretty common part of applications (and it also includes
> CREATE/DROP).
> 
> For some clusters it might mean the online checksum enabling would
> crash+restart infinitely (well, until reaching MAX_ATTEMPTS).
> 
> Unfortunately, try_relation_open() won't fix this, as the error comes
> from ReadBufferExtended. And it's not a matter of simply creating a
> ReadBuffer variant without that error check, because temporary tables
> use local buffers.
> 
> I wonder if we could just go and set the checksums anyway, ignoring the
> local buffers. If the other session does some changes, it'll overwrite
> our changes, this time with the correct checksums. But it seems pretty
> dangerous (I mean, what if they're writing stuff while we're updating
> the checksums? Considering the various short-cuts for temporary tables,
> I suspect that would be a boon for race conditions.)
> 
> Another option would be to do something similar to running transactions,
> i.e. wait until all temporary tables (that we've seen at the beginning)
> disappear. But we're starting to wait on more and more stuff.
> 
> If we do this, we should clearly log which backends we're waiting for,
> so that the admins can go and interrupt them manually.
> 
> 
> 
> Yeah, waiting for all transactions at the beginning is pretty simple.
> 
> Making the worker simply ignore temporary tables would also be easy.
> 
> One of the bigger issues here is temporary tables are *session* scope
> and not transaction, so we'd actually need the other session to finish,
> not just the transaction.
> 
> I guess what we could do is something like this:
> 
> 1. Don't process temporary tables in the checksumworker, period.
> Instead, build a list of any temporary tables that existed when the
> worker started in this particular database (basically anything that we
> got in our scan). Once we have processed the complete database, keep
> re-scanning pg_class until those particular tables are gone (search by oid).
> 
> That means that any temporary tables that are created *while* we are
> processing a database are ignored, but they should already be receiving
> checksums.
> 
> It definitely leads to a potential issue with long running temp tables.
> But as long as we look at the *actual tables* (by oid), we should be
> able to handle long-running sessions once they have dropped their temp
> tables.
> 
> Does that sound workable to you?
> 

Yes, that's pretty much what I meant by 'wait until all temporary tables
disappear'. Again, we need to make it easy to determine which OIDs are
we waiting for, which sessions may need DBA's attention.

I don't think it makes sense to log OIDs of the temporary tables. There
can be many of them, and in most cases the connection/sess

Re: [HACKERS] Runtime Partition Pruning

2018-03-31 Thread Alvaro Herrera

David Rowley wrote:

> To put the new patch to the test, I tried pgbench -S -M prepared -s
> 100 with and without having modified pgbench_accounts to separate into
> 10 RANGE partitions of equal size.
> 
> A non-partitioned table was getting 12503 TPS.
> With partitioned tables, the old version of this patch was getting: 5470 TPS.
> With partitioned tables, the attached version gets 11247 TPS.
> For perspective, today's master with a partitioned table gets 4719 TPS.
> 
> So you can see it's a pretty good performance boost by skipping
> initialisation of the 9 non-matching subplans.  It's not hard to
> imagine the gains getting more significant with a larger number of
> partitions.

These are excellent news!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-31 Thread Tomas Vondra

On 03/31/2018 12:42 PM, Arthur Zakirov wrote:
> Hello all,
> 
> I'd like to add new optional function to text search template named fini
> in addition to init() and lexize(). It will be called by
> RemoveTSDictionaryById() and AlterTSDictionary(). dispell_fini() will
> call ts_dict_shmem_release().
> 
> It doesn't change segments leaking situation. I think it makes text
> search API more transparent.
> 

If it doesn't actually solve the problem, why add it? I don't see a
point in adding functions for the sake of transparency, when it does not
in fact serve any use cases.

Can't we handle the segment-leaking by adding some sort of tombstone?

For example, imagine that instead of removing the hash table entry we
mark it as 'dropped'. And after that, after the lookup we would know the
dictionary was removed, and the backends would load the dictionary into
their private memory.

Of course, this could mean we end up having many tombstones in the hash
table. But those tombstones would be tiny, making it less painful than
potentially leaking much more memory for the dictionaries.

Also, I wonder if we might actually remove the dictionaries after a
while, e.g. based on XID. Imagine that we note the XID of the
transaction removing the dictionary, or perhaps XID of the most recent
running transaction. Then we could use this to decide if all running
transactions actually see the DROP, and we could remove the tombstone.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-03-31 Thread Craig Ringer

On 31 March 2018 at 21:24, Anthony Iliopoulos  wrote:

> On Fri, Mar 30, 2018 at 10:18:14AM +1300, Thomas Munro wrote:
>
> > >> Yeah, I see why you want to PANIC.
> > >
> > > Indeed. Even doing that leaves question marks about all the kernel
> > > versions before v4.13, which at this point is pretty much everything
> > > out there, not even detecting this reliably. This is messy.
>
> There may still be a way to reliably detect this on older kernel
> versions from userspace, but it will be messy whatsoever. On EIO
> errors, the kernel will not restore the dirty page flags, but it
> will flip the error flags on the failed pages. One could mmap()
> the file in question, obtain the PFNs (via /proc/pid/pagemap)
> and enumerate those to match the ones with the error flag switched
> on (via /proc/kpageflags). This could serve at least as a detection
> mechanism, but one could also further use this info to logically
> map the pages that failed IO back to the original file offsets,
> and potentially retry IO just for those file ranges that cover
> the failed pages. Just an idea, not tested.
>

That sounds like a huge amount of complexity, with uncertainty as to how
it'll behave kernel-to-kernel, for negligble benefit.

I was exploring the idea of doing selective recovery of one relfilenode,
based on the assumption that we know the filenode related to the fd that
failed to fsync(). We could redo only WAL on that relation. But it fails
the same test: it's too complex for a niche case that shouldn't happen in
the first place, so it'll probably have bugs, or grow bugs in bitrot over
time.

Remember, if you're on ext4 with errors=remount-ro, you get shut down even
harder than a PANIC. So we should just use the big hammer here.

I'll send a patch this week.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-03-31 Thread Tom Lane

Craig Ringer  writes:
> So we should just use the big hammer here.

And bitch, loudly and publicly, about how broken this kernel behavior is.
If we make enough of a stink maybe it'll get fixed.

regards, tom lane

Re: Feature Request - DDL deployment with logical replication

2018-03-31 Thread Andres Freund

On 2018-03-31 22:13:42 +0800, Craig Ringer wrote:
> We'll still need a mechanism to transport them to downstreams (like WAL
> messages) and to send responses upstream. For responses I think we will
> finally want to add a backchannel to the logical replication protocol as
> I've wanted for a long while: downstream can send a COPY message on COPY
> BOTH proto back to upstream, which passes it to a callback on the output
> plugin for the output plugin to act on.

Not necessarily?  You can just send out the prepare, wait for all
clients to ack it, and then commit/rollback prepared.

Greetings,

Andres Freund

Re: Foreign keys and partitioned tables

2018-03-31 Thread Peter Eisentraut

On 3/29/18 23:19, Alvaro Herrera wrote:
> 0003 is the matter of interest.  This is essentially the same code I
> posted earlier, rebased to the committed row triggers patch, with a few
> minor bug fixes and some changes in the regression tests to try and make
> them more comprehensive, including leaving a partitioned table with an
> FK to test that the whole pg_dump thing works via the pg_upgrade test.

I've only read the tests so far.  The functionality appears to work
correctly.  It's a bit strange how the tests are split up between
alter_table.sql and foreign_key.sql, especially since the latter also
contains ALTER TABLE checks and vice versa.

Some tests are a bit redundant, e.g., this in alter_table.sql:

+-- these fail:
+INSERT INTO at_partitioned VALUES (1000, 42);
+ERROR:  insert or update on table "at_partitioned_0" violates foreign
key constraint "at_partitioned_reg1_col1_fkey"
+DETAIL:  Key (reg1_col1)=(42) is not present in table "at_regular1".

and

+INSERT INTO at_partitioned VALUES (5000, 42);
+ERROR:  insert or update on table "at_partitioned_0" violates foreign
key constraint "at_partitioned_reg1_col1_fkey"
+DETAIL:  Key (reg1_col1)=(42) is not present in table "at_regular1".

There are no documentation changes.  The foreign key section in CREATE
TABLE does not contain anything about partitioned tables, which is
probably an existing omission, but it might be good to fix this up now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread Tom Lane

CharSyam  writes:
> [ simple_check.patch ]

This is a good catch.  However, it looks to me like the reason nobody
has noticed a problem here is that actually, this error test is useless;
we can never get here with a bad connection, because connectDatabase
would have failed.  I'm inclined to think we should just delete the whole
if-statement, instead.

BTW, I tried cross-checking for other errors of this ilk by doing

diff --git a/src/include/port.h b/src/include/port.h
index a514ab758b..9ba6a0df79 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -28,9 +28,9 @@
 
 /* socket has a different definition on WIN32 */
 #ifndef WIN32
-typedef int pgsocket;
+typedef unsigned int pgsocket;
 
-#define PGINVALID_SOCKET (-1)
+#define PGINVALID_SOCKET ((pgsocket) -1)
 #else
 typedef SOCKET pgsocket;
 
and then compiling with a compiler that would warn about "unsigned int < 0"
tests (I used Apple's current compiler, but probably any recent clang
would do as well).  It found this case and no others, which is good.
This is not a 100% cross-check, because I don't think it would notice
"unsigned int == -1" which is another way somebody might misspell the
test, and it definitely would not catch cases where somebody stored a
socket identifier in plain int instead of pgsocket.  But it's better
than nothing...

regards, tom lane

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread CharSyam

Hi, Tom, Thank you for your review.
so Do you think it is better to remove if statement like below

diff --git src/bin/scripts/vacuumdb.c src/bin/scripts/vacuumdb.c
index 887fa48fbd..243d842d06 100644
--- src/bin/scripts/vacuumdb.c
+++ src/bin/scripts/vacuumdb.c
@@ -947,13 +947,6 @@ init_slot(ParallelSlot *slot, PGconn *conn, const
char *progname)
 slot->connection = conn;
 slot->isFree = true;
 slot->sock = PQsocket(conn);
-
-if (slot->sock < 0)
-{
-fprintf(stderr, _("%s: invalid socket: %s"), progname,
-PQerrorMessage(conn));
-exit(1);
-}
 }

 static void

2018-04-01 2:16 GMT+09:00 Tom Lane :
> CharSyam  writes:
>> [ simple_check.patch ]
>
> This is a good catch.  However, it looks to me like the reason nobody
> has noticed a problem here is that actually, this error test is useless;
> we can never get here with a bad connection, because connectDatabase
> would have failed.  I'm inclined to think we should just delete the whole
> if-statement, instead.
>
> BTW, I tried cross-checking for other errors of this ilk by doing
>
> diff --git a/src/include/port.h b/src/include/port.h
> index a514ab758b..9ba6a0df79 100644
> --- a/src/include/port.h
> +++ b/src/include/port.h
> @@ -28,9 +28,9 @@
>
>  /* socket has a different definition on WIN32 */
>  #ifndef WIN32
> -typedef int pgsocket;
> +typedef unsigned int pgsocket;
>
> -#define PGINVALID_SOCKET (-1)
> +#define PGINVALID_SOCKET ((pgsocket) -1)
>  #else
>  typedef SOCKET pgsocket;
>
> and then compiling with a compiler that would warn about "unsigned int < 0"
> tests (I used Apple's current compiler, but probably any recent clang
> would do as well).  It found this case and no others, which is good.
> This is not a 100% cross-check, because I don't think it would notice
> "unsigned int == -1" which is another way somebody might misspell the
> test, and it definitely would not catch cases where somebody stored a
> socket identifier in plain int instead of pgsocket.  But it's better
> than nothing...
>
> regards, tom lane


remove_if_stat.patch
Description: Binary data

some last patches breaks plan cache

2018-03-31 Thread Pavel Stehule

Hi

CREATE OR REPLACE PROCEDURE public.proc(a integer, INOUT b integer, c
integer)
 LANGUAGE plpgsql
AS $procedure$
begin
  b := a + c;
end;
$procedure$

CREATE OR REPLACE PROCEDURE public.testproc()
 LANGUAGE plpgsql
AS $procedure$
declare r int;
begin
  call proc(10, r, 20);
end;
$procedure$

postgres=# call testproc();
CALL
postgres=# call testproc();
ERROR:  SPI_execute_plan_with_paramlist failed executing query "CALL
proc(10, r, 20)": SPI_ERROR_ARGUMENT
CONTEXT:  PL/pgSQL function testproc() line 4 at CALL
postgres=#

second call fails

Regards

Pavel

Re: some last patches breaks plan cache

2018-03-31 Thread Tomas Vondra

On 03/31/2018 07:38 PM, Pavel Stehule wrote:
> Hi
> 
> CREATE OR REPLACE PROCEDURE public.proc(a integer, INOUT b integer, c
> integer)
>  LANGUAGE plpgsql
> AS $procedure$
> begin
>   b := a + c;
> end;
> $procedure$
> 
> CREATE OR REPLACE PROCEDURE public.testproc()
>  LANGUAGE plpgsql
> AS $procedure$
> declare r int;
> begin
>   call proc(10, r, 20);
> end;
> $procedure$
> 
> postgres=# call testproc();
> CALL
> postgres=# call testproc();
> ERROR:  SPI_execute_plan_with_paramlist failed executing query "CALL
> proc(10, r, 20)": SPI_ERROR_ARGUMENT
> CONTEXT:  PL/pgSQL function testproc() line 4 at CALL
> postgres=#
> 
> second call fails

Yeah.

d92bc83c48bdea9888e64cf1e2edbac9693099c9 seems to have broken this :-/

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread Tom Lane

... Oh, just to make things even more fun, PQsocket() returns int, not
pgsocket; see its header comment.  Therefore, that test is correctly
coded as-is (though it's still useless), and the real problem is that
ParallelSlot.sock ought to be declared int not pgsocket.  If you look
around at our other places that read PQsocket() and use its result in
select() masks, they uniformly use int variables, and I think they're
right.

Actually, the more I look at this code, the more problems I'm finding.
The wait-for-a-free-connection code is just broken on its face, because
it supposes it can retry select() waits without reinitializing the
FD_SET.  The loop in vacuum_one_database() uses "conn" to call
prepare_vacuum_command() without any thought as to whether that is a
free connection --- which, typically, it isn't.  It looks to me like
libpq silently copes with this silliness, just waiting for the active
command to finish and then doing what it's told, but the net result
is that supposedly-concurrent vacuums get serialized.

regards, tom lane

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Fri, Mar 30, 2018 at 7:04 PM, Andres Freund  wrote:
> I'm just saying that there should be two functions here, rather than dropping 
> the old definition, and creating s new one with a default argument.

So you're asking for something like bt_index_check_heap() +
bt_index_parent_check_heap()? Or, are you talking about function
overloading?

-- 
Peter Geoghegan

Re: some last patches breaks plan cache

2018-03-31 Thread Tomas Vondra

On 03/31/2018 07:56 PM, Tomas Vondra wrote:
> On 03/31/2018 07:38 PM, Pavel Stehule wrote:
>> Hi
>>
>> CREATE OR REPLACE PROCEDURE public.proc(a integer, INOUT b integer, c
>> integer)
>>  LANGUAGE plpgsql
>> AS $procedure$
>> begin
>>   b := a + c;
>> end;
>> $procedure$
>>
>> CREATE OR REPLACE PROCEDURE public.testproc()
>>  LANGUAGE plpgsql
>> AS $procedure$
>> declare r int;
>> begin
>>   call proc(10, r, 20);
>> end;
>> $procedure$
>>
>> postgres=# call testproc();
>> CALL
>> postgres=# call testproc();
>> ERROR:  SPI_execute_plan_with_paramlist failed executing query "CALL
>> proc(10, r, 20)": SPI_ERROR_ARGUMENT
>> CONTEXT:  PL/pgSQL function testproc() line 4 at CALL
>> postgres=#
>>
>> second call fails
> 
> Yeah.
> 
> d92bc83c48bdea9888e64cf1e2edbac9693099c9 seems to have broken this :-/
> 

FWIW it seems the issue is somewhere in exec_stmt_call, which does this:

/*
 * Don't save the plan if not in atomic context.  Otherwise,
 * transaction ends would cause warnings about plan leaks.
 */
exec_prepare_plan(estate, expr, 0, estate->atomic);

When executed outside transaction, CALL has estate->atomic=false, and so
calls exec_prepare_plan() with keepplan=false. And on the second call it
gets bogus Plan, of course (with the usual 0x7f7f7f7f7f7f7f7f patterns).

When in a transaction, it sets keepplan=true, and everything works fine.

So either estate->atomic is not sufficient on it's own, or we need to
reset the expr->plan somewhere.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] [PATCH] Incremental sort

2018-03-31 Thread Tomas Vondra

Hi,

I've been doing a bit more review of the patch today, focusing on the
planner part, and I'm starting to have some doubts regarding the way
incremental sort paths are created. I do have some question about the
executor and other parts too.

I'll mark this as 'waiting on author' to make it clear the patch is
still being discussed, RFC is not appropriate status for that.

Attached is a patch that highlights some of the interesting places, and
also suggests some minor changes to comments and other tweaks.


1) planning/costing of incremental vs. non-incremental sorts


In sort, all the various places that create/cost sorts:

 * createplan.c (make_sort)
 * planner.c (create_sort_path)
 * pathnode.c (cost_sort)

seem to prefer incremental sorts whenever available. Consider for
example this code from create_merge_append_plan():

if (!pathkeys_common_contained_in(pathkeys, subpath->pathkeys,
  &n_common_pathkeys))
{
Sort   *sort = make_sort(subplan, numsortkeys,
 n_common_pathkeys,
 sortColIdx, sortOperators,
 collations, nullsFirst);

label_sort_with_costsize(root, sort, best_path->limit_tuples);
subplan = (Plan *) sort;
}

This essentially says that when (n_common_pathkeys > 0), the sort is
going to be incremental.

That however seems to rely on an important assumption - when the input
is presorted, the incremental sort is expected to be cheaper than
regular sort.

This assumption however seems to be proven invalid by cost_sort, which
does the common part for both sort modes (incremental/non-incremental)
first, and then does this:

/* Extra costs of incremental sort */
if (presorted_keys > 0)
{
... add something to the costs ...
}

That is, the incremental cost seems to be pretty much guaranteed to be
more expensive than regular Sort (with the exception of LIMIT queries,
where it's guaranteed to win thanks to lower startup cost).

I don't know how significant the cost difference may be (perhaps not
much), or if it may lead to inefficient plans. For example what if the
cheapest total path is partially sorted by chance, and only has a single
prefix group? Then all the comparisons with pivotSlot are unnecessary.

But I'm pretty sure it may lead to surprising behavior - for example if
you disable incremental sorts (enable_incrementalsort=off), the plan
will switch to plain sort without the additional costs. So you'll get a
cheaper plan by disabling some operation. That's surprising.

So I think it would be more appropriate if those places actually did a
costing of incremental vs. non-incremental sorts, and then constructed
the cheaper option. Essentially we should consider both plain and
incremental sort for each partially sorted input path, and then pick the
right one.

Of course, this is going to be tricky in createplan.c which builds the
plans directly - in that case it might be integrated into make_sort() or
something like that.

Also, I wonder if we could run into problems, due to incremental sort
not supporting things the regular sort does (rewind, backwards scans and
mark/restore).


2) nodeIncrementalsort.c


There's a couple of obsolete comments, that came from nodeSort.c and did
not get tweaked (and so talk about first-time through when incremental
sort needs to do that for each group, etc.). The attached diff tweaks
those, and clarifies a couple of others. I've also added some comments
explaining what the pivotSlot is about etc. There's also a couple of XXX
comments asking additional questions/clarifications.

I'm wondering if a static MIN_GROUP_SIZE is good idea. For example, what
if the subplan is expected to return only very few tuples (say, 33), but
the query includes LIMIT 1. Now, let's assume the startup/total cost of
the subplan is 1 and 100. With MIN_GROUP_SIZE 32 we're bound to
execute it pretty much till the end, while we could terminate after the
first tuple (if the prefix changes).

So I think we should use a Min(limit,MIN_GROUP_SIZE) here, and perhaps
this should depend on average group size too.

The other questionable thing seems to be this claim:

 * We unlikely will have huge groups with incremental sort.  Therefore
 * usage of abbreviated keys would be likely a waste of time.

followed by disabling abbreviated keys in the tuplesort_begin_heap call.
I find this rather dubious and unsupported by any arguments (I certainly
don't see any in the comments).

If would be more acceptable if the estimated number of groups was used
when deciding whether to use incremental sort or not, but that's not the
case - as explained in the first part, we simply prefer incremental
sorts whenever there is a prefix. In those cases we have very little
idea (or even guarantees) regarding the average group size.

Furt

Re: [HACKERS] [PATCH] Incremental sort

2018-03-31 Thread Tomas Vondra


On 03/31/2018 10:43 PM, Tomas Vondra wrote:
> ...
> But I'm pretty sure it may lead to surprising behavior - for example if
> you disable incremental sorts (enable_incrementalsort=off), the plan
> will switch to plain sort without the additional costs. So you'll get a
> cheaper plan by disabling some operation. That's surprising.
> 

To illustrate this is a valid issue, consider this trivial example:

create table t (a int, b int, c int);

insert into t select 10*random(), 10*random(), 10*random()
  from generate_series(1,100) s(i);

analyze t;

explain select * from (select * from t order by a,b) foo order by a,b,c;

   QUERY PLAN

 Incremental Sort  (cost=133100.48..264139.27 rows=100 width=12)
   Sort Key: t.a, t.b, t.c
   Presorted Key: t.a, t.b
   ->  Sort  (cost=132154.34..134654.34 rows=100 width=12)
 Sort Key: t.a, t.b
 ->  Seq Scan on t  (cost=0.00..15406.00 rows=100 width=12)
(6 rows)

set enable_incrementalsort = off;

explain select * from (select * from t order by a,b) foo order by a,b,c;
   QUERY PLAN

 Sort  (cost=261402.69..263902.69 rows=100 width=12)
   Sort Key: t.a, t.b, t.c
   ->  Sort  (cost=132154.34..134654.34 rows=100 width=12)
 Sort Key: t.a, t.b
 ->  Seq Scan on t  (cost=0.00..15406.00 rows=100 width=12)
(5 rows)

So the cost with incremental sort was 264139, and after disabling the
incremental cost it dropped to 263902. Granted, the difference is
negligible in this case, but it's still surprising.

Also, it can be made much more significant by reducing the number of
prefix groups in the data:

truncate t;

insert into t select 1,1,1 from generate_series(1,100) s(i);

analyze t;

set enable_incrementalsort = on;

explain select * from (select * from t order by a,b) foo order by a,b,c;

   QUERY PLAN

 Incremental Sort  (cost=324165.83..341665.85 rows=100 width=12)
   Sort Key: t.a, t.b, t.c
   Presorted Key: t.a, t.b
   ->  Sort  (cost=132154.34..134654.34 rows=100 width=12)
 Sort Key: t.a, t.b
 ->  Seq Scan on t  (cost=0.00..15406.00 rows=100 width=12)
(6 rows)

So that's 263902 vs. 341665, yet we still prefer the incremental mode.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PROPOSAL] Shared Ispell dictionaries

2018-03-31 Thread Arthur Zakirov

Tomas Vondra wrote:
>
> On 03/31/2018 12:42 PM, Arthur Zakirov wrote:
> > Hello all,
> >
> > I'd like to add new optional function to text search template named fini
> > in addition to init() and lexize(). It will be called by
> > RemoveTSDictionaryById() and AlterTSDictionary(). dispell_fini() will
> > call ts_dict_shmem_release().
> >
> > It doesn't change segments leaking situation. I think it makes text
> > search API more transparent.
> >
>
> If it doesn't actually solve the problem, why add it? I don't see a
> point in adding functions for the sake of transparency, when it does not
> in fact serve any use cases.

It doesn't solve the problem. But it brings more clearness, if a dictionary
requested shared location then it should release/unpin it. There are no
such scenario yet, but someone might want to release not only shared
segment but also other private data.

Can't we handle the segment-leaking by adding some sort of tombstone?

It is interesting that there are such tombstones already, without the
patch. TSDictionaryCacheEntry entries aren't deleted after DROP, they are
just marked isvalid = false.

> For example, imagine that instead of removing the hash table entry we
> mark it as 'dropped'. And after that, after the lookup we would know the
> dictionary was removed, and the backends would load the dictionary into
> their private memory.
>
> Of course, this could mean we end up having many tombstones in the hash
> table. But those tombstones would be tiny, making it less painful than
> potentially leaking much more memory for the dictionaries.

Now actually Isn't guaranteed that the hash table entry will be removed.
Even if refcnt is 0. So I think I should remove refcnt and entries won't be
removed.

There are no big problems with leaking now. Memory may leak only if a
dictionary was dropped or altered and there is no text search workload
anymore and the backend still alive. Because next using of text search
functions will unpin segments used before for invalid dictionaries (isvalid
== false). Also the segment is unpinned if the backend terminates. The
segment is destroyed when all interested processes unpin the segment (as
Tom noticed), the hash table entry becomes tombstone.

I hope I described clear.

> Also, I wonder if we might actually remove the dictionaries after a
> while, e.g. based on XID. Imagine that we note the XID of the
> transaction removing the dictionary, or perhaps XID of the most recent
> running transaction. Then we could use this to decide if all running
> transactions actually see the DROP, and we could remove the tombstone.

Maybe autovacuum should work here too :) It is joke of course. I'm not very
aware of removing dead tuples, but I think here is similar case.

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Andres Freund

On 2018-03-31 11:27:14 -0700, Peter Geoghegan wrote:
> On Fri, Mar 30, 2018 at 7:04 PM, Andres Freund  wrote:
> > I'm just saying that there should be two functions here, rather than 
> > dropping the old definition, and creating s new one with a default argument.
> 
> So you're asking for something like bt_index_check_heap() +
> bt_index_parent_check_heap()? Or, are you talking about function
> overloading?

The latter. That addresses my concerns about dropping the function and
causing issues due to dependencies.

Greetings,

Andres Freund

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Sat, Mar 31, 2018 at 2:56 PM, Andres Freund  wrote:
>> So you're asking for something like bt_index_check_heap() +
>> bt_index_parent_check_heap()? Or, are you talking about function
>> overloading?
>
> The latter. That addresses my concerns about dropping the function and
> causing issues due to dependencies.

WFM. I have all the information I need to produce the next revision now.

-- 
Peter Geoghegan

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Sat, Mar 31, 2018 at 2:59 PM, Peter Geoghegan  wrote:
> WFM. I have all the information I need to produce the next revision now.

I might as well post this one first. I'll have 0002 for you in a short while.

-- 
Peter Geoghegan


v8-0001-Add-Bloom-filter-data-structure-implementation.patch
Description: Binary data

Re: Foreign keys and partitioned tables

2018-03-31 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 3/29/18 23:19, Alvaro Herrera wrote:
> > 0003 is the matter of interest.  This is essentially the same code I
> > posted earlier, rebased to the committed row triggers patch, with a few
> > minor bug fixes and some changes in the regression tests to try and make
> > them more comprehensive, including leaving a partitioned table with an
> > FK to test that the whole pg_dump thing works via the pg_upgrade test.
> 
> I've only read the tests so far.  The functionality appears to work
> correctly.  It's a bit strange how the tests are split up between
> alter_table.sql and foreign_key.sql, especially since the latter also
> contains ALTER TABLE checks and vice versa.

Yeah, I started by putting what I thought was going to be just ALTER
TABLE in that test, then moved to the other file and added what I
thought were more complete tests there and failed to move stuff to
alter_table.  Honestly, I think these should mostly all belong in
foreign_key, but of course the line is pretty blurry as to what to put
in which file.

> Some tests are a bit redundant, e.g., this in alter_table.sql:
> 
> +-- these fail:
> +INSERT INTO at_partitioned VALUES (1000, 42);
> +ERROR:  insert or update on table "at_partitioned_0" violates foreign
> key constraint "at_partitioned_reg1_col1_fkey"
> +DETAIL:  Key (reg1_col1)=(42) is not present in table "at_regular1".
> 
> and
> 
> +INSERT INTO at_partitioned VALUES (5000, 42);
> +ERROR:  insert or update on table "at_partitioned_0" violates foreign
> key constraint "at_partitioned_reg1_col1_fkey"
> +DETAIL:  Key (reg1_col1)=(42) is not present in table "at_regular1".

Oh, right.  I had some of these to support the case of a FK pointing to
a partitioned PK, but then deleted the other partitioned table that this
referred to, so the test looks kinda silly without the stuff that was
previously interspersed there.

I think I'll remove everything from alter_table and just add what's
missing to foreign_key.

> There are no documentation changes.  The foreign key section in CREATE
> TABLE does not contain anything about partitioned tables, which is
> probably an existing omission, but it might be good to fix this up now.

Good catch.  I propose this in the PARTITIONED BY section:

  
-  Partitioned tables do not support EXCLUDE or
-  FOREIGN KEY constraints; however, you can define
-  these constraints on individual partitions.
+  Partitioned tables do not support EXCLUDE constraints;
+  however, you can define these constraints on individual partitions.
+  Also, while it's possible to define PRIMARY KEY
+  constraints on partitioned tables, it is not supported to create foreign
+  keys cannot that reference them.  This restriction will be lifted in a
+  future release.
  

I propose this under the REFERENCES clause:

  
   These clauses specify a foreign key constraint, which requires
   that a group of one or more columns of the new table must only
   contain values that match values in the referenced
   column(s) of some row of the referenced table.  If the refcolumn list is omitted, the
   primary key of the reftable
   is used.  The referenced columns must be the columns of a non-deferrable
   unique or primary key constraint in the referenced table.  The user
   must have REFERENCES permission on the referenced 
table
   (either the whole table, or the specific referenced columns).
   Note that foreign key constraints cannot be defined between temporary
-  tables and permanent tables.
+  tables and permanent tables.  Also note that while it is permitted to
+  define a foreign key on a partitioned table, declaring a foreign key
+  that references a partitioned table is not allowed.
  

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: some last patches breaks plan cache

2018-03-31 Thread Tomas Vondra



On 03/31/2018 08:28 PM, Tomas Vondra wrote:
> 
> 
> On 03/31/2018 07:56 PM, Tomas Vondra wrote:
>> On 03/31/2018 07:38 PM, Pavel Stehule wrote:
>>> Hi
>>>
>>> CREATE OR REPLACE PROCEDURE public.proc(a integer, INOUT b integer, c
>>> integer)
>>>  LANGUAGE plpgsql
>>> AS $procedure$
>>> begin
>>>   b := a + c;
>>> end;
>>> $procedure$
>>>
>>> CREATE OR REPLACE PROCEDURE public.testproc()
>>>  LANGUAGE plpgsql
>>> AS $procedure$
>>> declare r int;
>>> begin
>>>   call proc(10, r, 20);
>>> end;
>>> $procedure$
>>>
>>> postgres=# call testproc();
>>> CALL
>>> postgres=# call testproc();
>>> ERROR:  SPI_execute_plan_with_paramlist failed executing query "CALL
>>> proc(10, r, 20)": SPI_ERROR_ARGUMENT
>>> CONTEXT:  PL/pgSQL function testproc() line 4 at CALL
>>> postgres=#
>>>
>>> second call fails
>>
>> Yeah.
>>
>> d92bc83c48bdea9888e64cf1e2edbac9693099c9 seems to have broken this :-/
>>
> 
> FWIW it seems the issue is somewhere in exec_stmt_call, which does this:
> 
> /*
>  * Don't save the plan if not in atomic context.  Otherwise,
>  * transaction ends would cause warnings about plan leaks.
>  */
> exec_prepare_plan(estate, expr, 0, estate->atomic);
> 
> When executed outside transaction, CALL has estate->atomic=false, and so
> calls exec_prepare_plan() with keepplan=false. And on the second call it
> gets bogus Plan, of course (with the usual 0x7f7f7f7f7f7f7f7f patterns).
> 
> When in a transaction, it sets keepplan=true, and everything works fine.
> 
> So either estate->atomic is not sufficient on it's own, or we need to
> reset the expr->plan somewhere.
> 

The attached patch fixes this, but I'm not really sure it's the right
fix - I'd expect there to be a more principled way, doing resetting the
plan pointer when 'plan->saved == false'.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index fc0f0f0..ae8bed0 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2193,6 +2193,9 @@ exec_stmt_call(PLpgSQL_execstate *estate, PLpgSQL_stmt_call *stmt)
 	exec_eval_cleanup(estate);
 	SPI_freetuptable(SPI_tuptable);
 
+	if (!expr->plan->saved)
+		expr->plan = NULL;
+
 	return PLPGSQL_RC_OK;
 }

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Sat, Mar 31, 2018 at 3:15 PM, Peter Geoghegan  wrote:
>> WFM. I have all the information I need to produce the next revision now.
>
> I might as well post this one first. I'll have 0002 for you in a short while.

Attached is 0002 -- the amcheck enhancement itself. As requested by
Andres, this adds a new overloaded set of functions, rather than
dropping and recreating functions to change their signature.

I'm pretty sure that avoiding issues with dependencies by using this
approach is unprecedented, so I had to use my own judgement on how to
deal with a couple of things. I decided not to create a new C symbol
for the new function versions, preferring to leave it to the existing
PG_NARGS() tests. I guess this was probably what you intended I should
do, based on your "Given the PG_NARGS() checks..." remark. I also
chose to not document the single argument functions in the docs. I
suppose that we should consider these to be implementation details of
a work-around for dependency breakage, something that doesn't need to
be documented. That's a bit like how we don't document functions
within certain extensions that are designed just to get called within
a view definition. I don't feel strongly about it, though.

No other changes to report. I did mention that this would have a few
small changes yesterday; no need to repeat the details now.

Thanks
-- 
Peter Geoghegan
From 9e2c1469013374117209c9c0c00e12ecf10ab7b9 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Tue, 2 May 2017 00:19:24 -0700
Subject: [PATCH v8 2/2] Add amcheck verification of heap relations.

Add a new, optional capability to bt_index_check() and
bt_index_parent_check():  check that each heap tuple that should have an
index entry does in fact have one.  The extra checking is performed at
the end of the existing nbtree checks.

This is implemented by using a Bloom filter data structure.  The
implementation performs set membership tests within a callback (the same
type of callback that each index AM registers for CREATE INDEX).  The
Bloom filter is populated during the initial index verification scan.

Reusing the CREATE INDEX infrastructure allows the new verification
option to automatically benefit from the heap consistency checks that
CREATE INDEX already performs.  CREATE INDEX does thorough sanity
checking of HOT chains, so the new check actually manages to detect
problems in heap-only tuples.

Author: Peter Geoghegan
Reviewed-By: Pavan Deolasee, Andres Freund
Discussion: https://postgr.es/m/CAH2-Wzm5VmG7cu1N-H=nnS57wZThoSDQU+F5dewx3o84M+jY=g...@mail.gmail.com
---
 contrib/amcheck/Makefile |   2 +-
 contrib/amcheck/amcheck--1.0--1.1.sql|  29 +++
 contrib/amcheck/amcheck.control  |   2 +-
 contrib/amcheck/expected/check_btree.out |  12 +-
 contrib/amcheck/sql/check_btree.sql  |   7 +-
 contrib/amcheck/verify_nbtree.c  | 343 ---
 doc/src/sgml/amcheck.sgml| 126 +---
 7 files changed, 458 insertions(+), 63 deletions(-)
 create mode 100644 contrib/amcheck/amcheck--1.0--1.1.sql

diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index 43bed91..c5764b5 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -4,7 +4,7 @@ MODULE_big	= amcheck
 OBJS		= verify_nbtree.o $(WIN32RES)

 EXTENSION = amcheck
-DATA = amcheck--1.0.sql
+DATA = amcheck--1.0--1.1.sql amcheck--1.0.sql
 PGFILEDESC = "amcheck - function for verifying relation integrity"

 REGRESS = check check_btree
diff --git a/contrib/amcheck/amcheck--1.0--1.1.sql b/contrib/amcheck/amcheck--1.0--1.1.sql
new file mode 100644
index 000..088416e
--- /dev/null
+++ b/contrib/amcheck/amcheck--1.0--1.1.sql
@@ -0,0 +1,29 @@
+/* contrib/amcheck/amcheck--1.0--1.1.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "ALTER EXTENSION amcheck UPDATE TO '1.1'" to load this file. \quit
+
+-- In order to avoid issues with dependencies when updating amcheck to 1.1,
+-- create new, overloaded versions of the 1.0 functions
+
+--
+-- bt_index_check()
+--
+CREATE FUNCTION bt_index_check(index regclass,
+heapallindexed boolean)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'bt_index_check'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+--
+-- bt_index_parent_check()
+--
+CREATE FUNCTION bt_index_parent_check(index regclass,
+heapallindexed boolean)
+RETURNS VOID
+AS 'MODULE_PATHNAME', 'bt_index_parent_check'
+LANGUAGE C STRICT PARALLEL RESTRICTED;
+
+-- Don't want these to be available to public
+REVOKE ALL ON FUNCTION bt_index_check(regclass, boolean) FROM PUBLIC;
+REVOKE ALL ON FUNCTION bt_index_parent_check(regclass, boolean) FROM PUBLIC;
diff --git a/contrib/amcheck/amcheck.control b/contrib/amcheck/amcheck.control
index 05e2861..4690484 100644
--- a/contrib/amcheck/amcheck.control
+++ b/contrib/amcheck/amcheck.control
@@ -1,5 +1,5 @@
 # amcheck extension
 comment = 'functions for verifying relation integrity'
-default_version = '1.0'
+default_vers

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-03-31 Thread Michael Paquier

On Sat, Mar 31, 2018 at 12:38:12PM -0400, Tom Lane wrote:
> Craig Ringer  writes:
>> So we should just use the big hammer here.
> 
> And bitch, loudly and publicly, about how broken this kernel behavior is.
> If we make enough of a stink maybe it'll get fixed.

That won't fix anything released already, so as per the information
gathered something has to be done anyway.  The discussion of this thread
is spreading quite a lot actually.

Handling things at a low-level looks like a better plan for the backend.
Tools like pg_basebackup and pg_dump also issue fsync's on the data
created, we should do an equivalent for them, with some exit() calls in
file_utils.c.  As of now failures are logged to stderr but not
considered fatal.
--
Michael


signature.asc
Description: PGP signature

Re: WIP: Covering + unique indexes.

2018-03-31 Thread Peter Geoghegan

On Fri, Mar 30, 2018 at 6:24 AM, Alexander Korotkov
 wrote:
>> With an extreme enough case, this could result in a failure to find a
>> split point. Or at least, if that isn't true then it's not clear why,
>> and I think it needs to be explained.
>
>
> I don't think this could result in a failure to find a split point.
> So, it finds a split point without taking into account that hikey
> will be shorter.  If such split point exists then split point with
> truncated hikey should also exists.  If not, then it would be
> failure even without covering indexes.  I've updated comment
> accordingly.

You're right. We're going to truncate the unneeded trailing attributes
from whatever tuple is to the immediate right of the final split point
that we choose (that's the tuple that we'll copy to make a new high
key for the left page). Truncation already has to result in a tuple
that is less than or equal to the original tuple.

I also agree that it isn't worth trying harder to make sure that space
is distributed evenly when truncation will go ahead. It will only
matter in very rare cases, but the computational overhead of having an
accurate high key size for every candidate split point would be
significant.

-- 
Peter Geoghegan

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Sat, Mar 31, 2018 at 3:15 PM, Peter Geoghegan  wrote:
> On Sat, Mar 31, 2018 at 2:59 PM, Peter Geoghegan  wrote:
>> WFM. I have all the information I need to produce the next revision now.
>
> I might as well post this one first. I'll have 0002 for you in a short while.

Looks like thrips doesn't like this, though other Windows buildfarm
animals are okay with it.

round() is from C99, apparently. I'll investigate a fix.

-- 
Peter Geoghegan

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Andres Freund

On 2018-03-31 19:43:45 -0700, Peter Geoghegan wrote:
> On Sat, Mar 31, 2018 at 3:15 PM, Peter Geoghegan  wrote:
> > On Sat, Mar 31, 2018 at 2:59 PM, Peter Geoghegan  wrote:
> >> WFM. I have all the information I need to produce the next revision now.
> >
> > I might as well post this one first. I'll have 0002 for you in a short 
> > while.
> 
> Looks like thrips doesn't like this, though other Windows buildfarm
> animals are okay with it.
> 
> round() is from C99, apparently. I'll investigate a fix.

Just replacing it with a floor(val + 0.5) ought to do the trick, right?

Greetings,

Andres Freund

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Sat, Mar 31, 2018 at 8:08 PM, Andres Freund  wrote:
>> round() is from C99, apparently. I'll investigate a fix.
>
> Just replacing it with a floor(val + 0.5) ought to do the trick, right?

I was thinking of using rint(), which is what you get if you call
round(float8) from SQL.

-- 
Peter Geoghegan

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Sat, Mar 31, 2018 at 8:09 PM, Peter Geoghegan  wrote:
> I was thinking of using rint(), which is what you get if you call
> round(float8) from SQL.

Attached patch does it that way. Note that there are float/int cast
regression tests that ensure that rint() behaves consistently on
supported platforms -- see commit 06bf0dd6.

-- 
Peter Geoghegan
From ff0bd32d33ceb6b5650e28d76ee794961862a4fc Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Sat, 31 Mar 2018 19:54:48 -0700
Subject: [PATCH] Fix non-portable call to round().

round() is from C99.  Apparently, not all supported versions of Visual
Studio make it available.  Use rint() instead.  There are behavioral
differences between round() and rint(), but they should not matter to
the Bloom filter optimal_k() function.  We already assume POSIX behavior
for rint(), so there is no question of rint() not using "rounds towards
nearest" as its rounding mode.

Cleanup from commit 51bc271790eb234a1ba4d14d3e6530f70de92ab5.

Per buildfarm member thrips.

Author: Peter Geoghegan
---
 src/backend/lib/bloomfilter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/lib/bloomfilter.c b/src/backend/lib/bloomfilter.c
index eb08f4a..3565480 100644
--- a/src/backend/lib/bloomfilter.c
+++ b/src/backend/lib/bloomfilter.c
@@ -240,7 +240,7 @@ my_bloom_power(uint64 target_bitset_bits)
 static int
 optimal_k(uint64 bitset_bits, int64 total_elems)
 {
-	int			k = round(log(2.0) * bitset_bits / total_elems);
+	int			k = rint(log(2.0) * bitset_bits / total_elems);

 	return Max(1, Min(k, MAX_HASH_FUNCS));
 }
-- 
2.7.4

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread Amit Kapila

On Sat, Mar 31, 2018 at 11:33 PM, Tom Lane  wrote:
> ... Oh, just to make things even more fun, PQsocket() returns int, not
> pgsocket; see its header comment.  Therefore, that test is correctly
> coded as-is (though it's still useless), and the real problem is that
> ParallelSlot.sock ought to be declared int not pgsocket.  If you look
> around at our other places that read PQsocket() and use its result in
> select() masks, they uniformly use int variables, and I think they're
> right.
>

There is one other place where we are using pgsocket, see
walrcv_receive, but we are using Assert in that place.  I think it
would be better if the output of PQsocket() can be consistently used
everywhere.  If there is no particular restriction, then it will be
good to make it as 'int' everywhere.


> Actually, the more I look at this code, the more problems I'm finding.
> The wait-for-a-free-connection code is just broken on its face, because
> it supposes it can retry select() waits without reinitializing the
> FD_SET.  The loop in vacuum_one_database() uses "conn" to call
> prepare_vacuum_command() without any thought as to whether that is a
> free connection --- which, typically, it isn't.  It looks to me like
> libpq silently copes with this silliness, just waiting for the active
> command to finish and then doing what it's told, but the net result
> is that supposedly-concurrent vacuums get serialized.
>

I think it would have been better if this code would have found the
free_slot before preparing the command and then used the connection
from free slot.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Andres Freund

On 2018-03-31 20:25:24 -0700, Peter Geoghegan wrote:
> On Sat, Mar 31, 2018 at 8:09 PM, Peter Geoghegan  wrote:
> > I was thinking of using rint(), which is what you get if you call
> > round(float8) from SQL.
> 
> Attached patch does it that way. Note that there are float/int cast
> regression tests that ensure that rint() behaves consistently on
> supported platforms -- see commit 06bf0dd6.

LGTM, pushed.  Closing CF entry. Yay!   Only 110 to go.

- Andres

Re: [HACKERS] A design for amcheck heapam verification

2018-03-31 Thread Peter Geoghegan

On Sat, Mar 31, 2018 at 8:32 PM, Andres Freund  wrote:
> LGTM, pushed.  Closing CF entry. Yay!   Only 110 to go.

Thanks Andres!

-- 
Peter Geoghegan

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread Amit Kapila

On Sun, Apr 1, 2018 at 8:59 AM, Amit Kapila  wrote:
> On Sat, Mar 31, 2018 at 11:33 PM, Tom Lane  wrote:
>> ... Oh, just to make things even more fun, PQsocket() returns int, not
>> pgsocket; see its header comment.  Therefore, that test is correctly
>> coded as-is (though it's still useless), and the real problem is that
>> ParallelSlot.sock ought to be declared int not pgsocket.  If you look
>> around at our other places that read PQsocket() and use its result in
>> select() masks, they uniformly use int variables, and I think they're
>> right.
>>
>
> There is one other place where we are using pgsocket, see
> walrcv_receive, but we are using Assert in that place.  I think it
> would be better if the output of PQsocket() can be consistently used
> everywhere.  If there is no particular restriction, then it will be
> good to make it as 'int' everywhere.
>
>
>> Actually, the more I look at this code, the more problems I'm finding.
>> The wait-for-a-free-connection code is just broken on its face, because
>> it supposes it can retry select() waits without reinitializing the
>> FD_SET.  The loop in vacuum_one_database() uses "conn" to call
>> prepare_vacuum_command() without any thought as to whether that is a
>> free connection --- which, typically, it isn't.  It looks to me like
>> libpq silently copes with this silliness, just waiting for the active
>> command to finish and then doing what it's told, but the net result
>> is that supposedly-concurrent vacuums get serialized.
>>
>
> I think it would have been better if this code would have found the
> free_slot before preparing the command and then used the connection
> from free slot.
>

oops, I just saw that you have already pushed a fix for it.  I am not
sure if we should try to do anything about walrcv_receive's output
still uses pgsocket instead of int as the usage in itself doesn't have
any problem.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] why not parallel seq scan for slow functions

2018-03-31 Thread Amit Kapila

On Fri, Mar 30, 2018 at 1:41 AM, Robert Haas  wrote:
> On Thu, Mar 29, 2018 at 12:55 AM, Amit Kapila  wrote:
>> I think that is under acceptable range.  I am seeing few regression
>> failures with the patch series.  The order of targetlist seems to have
>> changed for Remote SQL.  Kindly find the failure report attached.  I
>> have requested my colleague Ashutosh Sharma to cross-verify this and
>> he is also seeing the same failures.
>
> Oops.  Those just require an expected output change.
>
>> It seems UPPERREL_TLIST is redundant in the patch now.  I think we can
>> remove it unless you have something else in mind.
>
> Yes.
>
>> I think the handling of partitioned rels looks okay, but we might want
>> to once check the overhead of the same unless you are sure that this
>> shouldn't be a problem.  If you think, we should check it once, then
>> is it possible that we can do it as a separate patch as this doesn't
>> look to be directly linked to the main patch.  It can be treated as an
>> optimization for partitionwise aggregates.  I think we can treat it
>> along with the main patch as well, but it might be somewhat simpler to
>> verify it if we do it separately.
>
> I don't think it should be a problem, although you're welcome to test
> it if you're concerned about it.  I think it would probably be
> penny-wise and pound-foolish to worry about the overhead of
> eliminating the Result nodes, which can occur not only with
> partition-wise aggregate but also with partition-wise join and, I
> think, really any case where the top scan/join plan would be an Append
> node.  We're talking about a very small amount of additional planning
> time to potentially get a better plan.
>
> I've committed all of these now.
>

Cool,  I have closed the corresponding CF entry.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

lazy detoasting

2018-03-31 Thread Chapman Flack

Hi,

If I copy an out-of-line, on-disk TOAST pointer into a memory context
with transaction lifetime, with an eye to detoasting it later in the
same transaction, are there circumstances where it wouldn't work?

Thanks,
-Chap

Re: lazy detoasting

2018-03-31 Thread Tom Lane

Chapman Flack  writes:
> If I copy an out-of-line, on-disk TOAST pointer into a memory context
> with transaction lifetime, with an eye to detoasting it later in the
> same transaction, are there circumstances where it wouldn't work?

Should be safe *as long as you hold onto a snapshot that can see the
toast value's parent row*.  Otherwise, it's not.

For a counterexample, see the changes we had to make to avoid depending
on out-of-line toast values in the catcaches, commit 08e261cbc.

regards, tom lane

Re: [HACKERS][PATCH] adding simple sock check for windows

2018-03-31 Thread Tom Lane

Amit Kapila  writes:
> oops, I just saw that you have already pushed a fix for it.  I am not
> sure if we should try to do anything about walrcv_receive's output
> still uses pgsocket instead of int as the usage in itself doesn't have
> any problem.

I see a few places where we're still assigning PQsocket's result to a
pgsocket variable, but none of them are worrying about the possibility
that it's not a valid socket, and I'm not sure they need to.

regards, tom lane

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2018-03-31 Thread David Rowley

On 16 March 2018 at 02:46, Robert Haas  wrote:
> If we stick with your idea of using AppendPath, do we actually need
> generate_proxy_paths()?  What paths get lost if we don't have a
> special case for them here?

Actually, we do a surprisingly good job of allowing plan shapes to
stay the same when planning for an only-child scan for an Append or
MergeAppend.  The main difference I did find was that and Append and
MergeAppend don't support Mark and Restore, so if you just generated
the same paths and simply skipped over Appends and MergeAppends you'd
still be left with Materialize nodes which might not actually be
required at all.

This might not be huge, but seeing this made me worried that there
might be some valid reason, if not today, then sometime in the future
why it might not be safe to simply pluck the singleton
Append/MergeAppend nodes out the plan tree.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: csv format for psql

2018-03-31 Thread Fabien COELHO



Hello Isaac,


Personnaly I do not have any problem with CSV defaulting to '|' separator,
given that anyway people often use anything but a comma for the purpose,
including '|'.

However Pavel wants to block the patch on this point. Too bad.


OK, mostly trying to avoid commenting because I doubt I have much to 
add. But. If I ask for CSV and don't specify any overrides, I expect to 
get "C"omma separated values, not some other character. More 
specifically, if I say --csv I expect to get files that are identical 
with what I would get if I used COPY ... CSV.


My summary was incomplete. The --csv option implementation by Daniel 
already does that.


The issue Pavel is complaining about is that in interactive mode "\pset 
format csv" does not do the same: it triggers the csv-rule string-escaping 
mechanism, but does not reset the "fieldsep" variable (eh, it sets the 
"format" variable) so the default separator under this interactive use is 
"|" if the "fieldsep" variable is shared.


I have suggested a "\csv" interactive command to set both as a convenient 
shorthand for "\pset format csv & \pset fieldsep ','", similarly to --csv, 
but Pavel still wants "\pset format csv" to trigger the full csv output.


A consequence I forgot about adding "fieldsep_csv", is that it probably 
has to duplicate the "_zero" ugly hack to be consistent with existing 
"*sep" variables, or else be inconsistent. Sigh.


--
Fabien.

64 matches

Mail list logo