Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
On 2016/03/07 23:48, Robert Haas wrote:
> On Sun, Mar 6, 2016 at 11:02 PM, Kyotaro HORIGUCHI
>  wrote:
>> The 0001-P.. adds the following interface functions.
>>
>> +extern void pgstat_progress_set_command(BackendCommandType cmdtype);
>> +extern void pgstat_progress_set_command_target(Oid objid);
>> +extern void pgstat_progress_update_param(int index, uint32 val);
>> +extern void pgstat_reset_local_progress(void);
>> +extern int pgstat_progress_get_num_param(BackendCommandType cmdtype);
>>
>> I don't like to treat the target object id differently from other
>> parameters. It could not be needed at all, or could be needed two
>> or more in contrast. Although oids are not guaranteed to fit
>> uint32, we have already stored BlockNumber there.
> 
> Well...
> 
> There's not much point in deciding that the parameters are uint32,
> because we don't have that type at the SQL level.
> pgstat_progress_update_param() really ought to take either int32 or
> int64 as an argument, because that's what we can actually handle from
> SQL, and it seems pretty clear that int64 is better since otherwise we
> can't fit, among other things, a block number.
> 
> Given that, I tend to think that treating the command target specially
> and passing that as an OID is reasonable.  We're not going to be able
> to pass variable-sized arrays through this mechanism, ever, because
> our shared memory segment doesn't work like that.  And it seems to me
> that nearly every command somebody might want to report progress on
> will touch, basically, one relation a a time.  So I don't see the harm
> in hardcoding that idea into the facility.

Updated versions attached.

* changed st_progress_param to int64 and so did the argument of
pgstat_progress_update_param().  Likewise changed param1..param10 of
pg_stat_get_progress_info()'s output columns to bigint.

* Added back the Oid field st_command_target and corresponding function
pgstat_progress_set_command_target(Oid).

* I attempted to implement a method to report index blocks done from
lazy_tid_reaped() albeit with some limitations. Patch 0003 is that
attempt.  In summary, I modified the index bulk delete callback interface
to receive a BlockNumber argument index_blkno:

 /* Typedef for callback function to determine if a tuple is bulk-deletable */
-typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
+typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr,
+ BlockNumber index_blkno,
+ void *state);

Then added 2 more fields to LVRelStats:

@@ -143,6 +143,8 @@ typedef struct LVRelStats
 int num_index_scans;
 TransactionId latestRemovedXid;
 boollock_waiter_detected;
+BlockNumber last_index_blkno;
+BlockNumber index_blks_vacuumed;

Then in lazy_tid_reaped(), if the index block number received in the
index_blkno argument has changed from the previous call, increment the
count of index blocks processed and
pgstat_report_update_param(index_blks_done). I wonder if we should reset
the the saved block number and the count for every index vacuumed by
lazy_vacuum_index(). Right now, total_index_blks counts all indexes and
counting blocks using the rough method mentioned above is sensible only
for one index at time.  Actually, the method has different kinds of
problems to deal with anyway. For example, with a btree index, one can
expect that the final count does not match total_index_blks obtained using
RelationGetNumberOfBlocks().  Moreover, each AM has its own idiosyncratic
way of traversing the index pages. I dared only touch the btree case to
make it pass current block number to the callback. It finishes with
index_blks_done << total_index_blks since I guess the callback is called
only on the leaf pages. Any ideas?

* I am also tempted to add num_dead_tuples and dead_tuples_vacuumed to add
granularity to 'vacuuming heap' phase but didn't in this version. Should we?

Thanks,
Amit
>From 1608f3bd8153045da9cb3db269597b92042a4d9c Mon Sep 17 00:00:00 2001
From: Amit 
Date: Sun, 28 Feb 2016 01:50:07 +0900
Subject: [PATCH 1/3] Provide a way for utility commands to report progress

Commands can update a set of parameters in shared memory using:

  pgstat_progress_update_param(param_index, param_value)

Up to 10 independent 64-bit integer values can be published by commands.
In addition to those, a command should always report its BackendCommandType
and the OID of the relation it is about to begin processing at the beginning
of the processing using:

  pgstat_progress_start_command(cmdtype)
  pgstat_progress_set_command_target(relid)

A view can be defined in system_views.sql that outputs the values returned
by pg_stat_get_progress_info(cmdtype), where 'cmdtype' is numeric value as
mentioned above.  Each such view has columns corresponding to the counters
published by respective commands.

There is a SQL-callable function pg_stat_reset_local_progress() which
when ca

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-08 Thread Fabien COELHO



Now I cannot see how having one context per table space would have a
significant negative performance impact.


The 'dirty data' etc. limits are global, not per block device. By having
several contexts with unflushed dirty data the total amount of dirty
data in the kernel increases.


Possibly, but how much?  Do you have experimental data to back up that this 
is really an issue?


We are talking about 32 (context size) * #table spaces * 8KB buffers = 4MB of 
dirty buffers to manage for 16 table spaces, I do not see that as a major 
issue for the kernel.


More thoughts about your theoretical argument:

To complete the argument, the 4MB is just a worst case scenario, in 
reality flushing the different context would be randomized over time, so 
the frequency of flushing a context would be exactly the same in both 
cases (shared or per table space context) if the checkpoints are the same 
size, just that with shared table space each flushing potentially targets 
all tablespace with a few pages, while with the other version each 
flushing targets one table space only.


So my handwaving analysis is that the flow of dirty buffers is the same 
with both approaches, but for the shared version buffers are more equaly 
distributed on table spaces, hence reducing sequential write 
effectiveness, and for the other the dirty buffers are grouped more 
clearly per table space, so it should get better sequential write 
performance.



--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] empty array case in plperl_ref_from_pg_array not handled correctly

2016-03-08 Thread Alex Hunsaker
On Mon, Mar 7, 2016 at 11:32 PM, Andres Freund  wrote:

> Hi,
>
> Per the new valgrind animal we get:
>
>
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink&dt=2016-03-08%2004%3A22%3A00
> 2016-03-08 05:56:05.566 UTC [56de6971.723:5] LOG:  statement: select
> plperl_sum_array('{}');
> ==1827== Invalid write of size 4
> ==1827==at 0x14E35DD1: plperl_ref_from_pg_array (plperl.c:1459)
>
>
[ I think you may have meant to CC me not Alexey K. I'm probably the person
responsible :D. ]

Indeed, I think the simplest fix is to make plperl_ref_from_pg_array()
return an "empty" array in that case. The attached fixes the valgrind
warning for me. (cassert enabled build on master).


>
> ISTM the assumption that an array always has a dimension is a bit more
> widespread... E.g. split_array() looks like it'd not work nicely with a
> zero dimensional array...
>

Hrm, plperl_ref_from_pg_array() is the only caller of split_array(), so I
added an Assert() to make its contract more clear.

Recursive calls to split_array() should be fine as nested 0 dim arrays get
collapsed down to single 0 dim array (I hand tested it nonetheless):
 # select ARRAY[ARRAY[ARRAY[]]]::int[];
 array
───
 {}
(1 row)

Looking around, everything that makes an array seems to pass through
plperl_ref_from_pg_array(), so I think once that is fixed all is good. I
also looked for dimensions and ARR_NDIM() just to make sure (didn't find
anything fishy).

Thanks,


plperl_array_valgrind.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Kyotaro HORIGUCHI
You're so quick.

At Tue, 8 Mar 2016 17:02:24 +0900, Amit Langote  
wrote in <56de8710.4070...@lab.ntt.co.jp>
> On 2016/03/07 23:48, Robert Haas wrote:
> >> I don't like to treat the target object id differently from other
> >> parameters. It could not be needed at all, or could be needed two
> >> or more in contrast. Although oids are not guaranteed to fit
> >> uint32, we have already stored BlockNumber there.
> > 
> > Well...
> > 
> > There's not much point in deciding that the parameters are uint32,
> > because we don't have that type at the SQL level.
> > pgstat_progress_update_param() really ought to take either int32 or
> > int64 as an argument, because that's what we can actually handle from
> > SQL, and it seems pretty clear that int64 is better since otherwise we
> > can't fit, among other things, a block number.
> > 
> > Given that, I tend to think that treating the command target specially
> > and passing that as an OID is reasonable.  We're not going to be able
> > to pass variable-sized arrays through this mechanism, ever, because
> > our shared memory segment doesn't work like that.  And it seems to me
> > that nearly every command somebody might want to report progress on
> > will touch, basically, one relation a a time.  So I don't see the harm
> > in hardcoding that idea into the facility.

We'd concatenate two int32s into int64s but widening each
parameters to int64 would be preferable. Additional 4 bytes by
the defalut number of maxbackends 100 by 10 parameters = 4kb, 4MB
for 1000 backends is not so big for modern machines?

> Updated versions attached.
> 
> * changed st_progress_param to int64 and so did the argument of
> pgstat_progress_update_param().  Likewise changed param1..param10 of
> pg_stat_get_progress_info()'s output columns to bigint.
> 
> * Added back the Oid field st_command_target and corresponding function
> pgstat_progress_set_command_target(Oid).

+   beentry->st_command = COMMAND_INVALID;
+   MemSet(&beentry->st_progress_param, 0, 
sizeof(beentry->st_progress_param));

The MemSet seems useless since it gets the same initialization on
setting st_command.

+   /*
+* Report values for only those backends which are running the 
given
+* command.  XXX - privilege check is maybe dubious.
+*/
+   if (!beentry ||
+   beentry->st_command != cmdtype ||
+   !has_privs_of_role(GetUserId(), beentry->st_userid))
+   continue;

We can simplly ignore unpriviledged backends, or all zeroz or
nulls to signal that the caller has no priviledge.

0002

+   FROM pg_stat_get_progress_info(1) AS S;

Ah... This magick number seems quite bad.. The function should
take the command type in maybe string type.

+   FROM pg_stat_get_progress_info('lazy vacuum') AS S;

Using an array of the names would be acceptable, maybe.

| char *progress_command_names[] = {'lazy vacuum', NULL};

However the numbers for the phases ('scanning heap' and so..) is
acceptable for me for reasons uncertain to me, it also could be
represented in names but is might be rahter bothersome..

+ WHEN 0 THEN 100::numeric(5, 2)
+ ELSE ((S.param3 + 1)::numeric / S.param2 * 
100)::numeric(5, 2)

This usage of numeric seems overkill to me.



> * I attempted to implement a method to report index blocks done from
> lazy_tid_reaped() albeit with some limitations. Patch 0003 is that
> attempt.  In summary, I modified the index bulk delete callback interface
> to receive a BlockNumber argument index_blkno:
> 
>  /* Typedef for callback function to determine if a tuple is bulk-deletable */
> -typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr, void *state);
> +typedef bool (*IndexBulkDeleteCallback) (ItemPointer itemptr,
> + BlockNumber index_blkno,
> + void *state);
> 
> Then added 2 more fields to LVRelStats:
> 
> @@ -143,6 +143,8 @@ typedef struct LVRelStats
>  int num_index_scans;
>  TransactionId latestRemovedXid;
>  boollock_waiter_detected;
> +BlockNumber last_index_blkno;
> +BlockNumber index_blks_vacuumed;
> 
> Then in lazy_tid_reaped(), if the index block number received in the
> index_blkno argument has changed from the previous call, increment the
> count of index blocks processed and
> pgstat_report_update_param(index_blks_done). I wonder if we should reset
> the the saved block number and the count for every index vacuumed by
> lazy_vacuum_index(). Right now, total_index_blks counts all indexes and
> counting blocks using the rough method mentioned above is sensible only
> for one index at time.  Actually, the method has different kinds of
> problems to deal with anyway. For example, with a btree index, one can
> expect that the final count does not match total_index_blks obtained using
> RelationGetNumberOfBlocks()

Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Amit Kapila
On Mon, Mar 7, 2016 at 8:34 PM, Robert Haas  wrote:
>
> On Fri, Mar 4, 2016 at 11:49 PM, Amit Kapila 
wrote:
> > I think one thing which needs more thoughts about this approach is that
we
> > need to maintain some number of slots so that group extend for different
> > relations can happen in parallel.  Do we want to provide simultaneous
> > extension for 1, 2, 3, 4, 5 or more number of relations?  I think
providing
> > it for three or four relations should be okay as higher the number we
want
> > to provide, bigger the size of PGPROC structure will be.
>
> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
> how big the relation extension lock wait queue is, instead of adding
> more stuff to PGPROC?
>

One idea to make it work without adding additional stuff in PGPROC is that
after acquiring relation extension lock, check if there is any available
block in fsm, if it founds any block, then release the lock and proceed,
else extend the relation by one block and then check lock's wait queue size
or number of lock requests (nRequested) and extend the relation further in
proportion to wait queue size and then release the lock and proceed.  Here,
I think we can check for wait queue size even before extending the relation
by one block.

The benefit of doing it with PGPROC is that there will be relatively less
number LockAcquire calls as compare to heavyweight lock approach, which I
think should not matter much because we are planing to extend the relation
in proportion to wait queue size (probably wait queue size * 10).


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] pam auth - add rhost item

2016-03-08 Thread Grzegorz Sampolski
Hi Hari.
To use pam modules you can use whatever backend authentication method
you want.

This is example configuration:

Install this library https://github.com/pam-pgsql/pam-pgsql
Create some example database , schema access and two tables:
pam_auth and pam_account with example defintion:

pam_account:
db_user character varying(16) NOT NULL,
host character varying(255) NOT NULL

pam_auth:
db_user character varying(16) NOT NULL,
password character varying(512) NOT NULL

Sample /etc/pam_pgsql.conf:
connect = dbname= user= password=
auth_query = SELECT password FROM access.pam_auth WHERE db_user = %u LIMIT 1
acct_query = SELECT '0','0','' FROM access.pam_account WHERE db_user =
%u AND (host = %h OR %h LIKE host) ORDER BY host DESC LIMIT 1;
pw_type = crypt

Sample pam config /etc/pam.d/postgres_auth:
authrequiredpam_pgsql.so
account requiredpam_pgsql.so

Sample pg_hba.conf:
host samerole all 0.0.0.0/0 pam pamservice=postgres_auth

This will give you define access restriction from one host, group of
hosts, etc.


I will try to update documentation in regard to this chagnes, but please
take into account that my english isn't fluent so much. So if I'll do
some mistakes please correct me.

Regards.
Grzegorz Sampolski.

On 03/08/2016 05:30 AM, Haribabu Kommi wrote:
> On Tue, Dec 29, 2015 at 10:46 AM, Grzegorz Sampolski  wrote:
>> Hi.
>> I thought link on commitfest to github url was sufficient.
>> Sorry. Attached new patch.
> 
> I reviewed and tested the patch. With the addition of
> new RHOST member to the passed items in the PAM
> authentication doesn't have any impact with existing
> behavior.
> 
> As Tomas said in up thread that RHOST is the item
> that I also that can be added to PAM authentication.
> 
> I am not able to test PAM authentication using the
> RHOST, can you please let me know the way for
> the same?
> 
> And also the patch lacks of documentation changes,
> As it adds the new pamusedns option and also it
> sends the RHOST, so the documentation needs to be
> updated.
> 
> Regards,
> Hari Babu
> Fujitsu Australia
> 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Shulgin, Oleksandr
On Mon, Mar 7, 2016 at 6:02 PM, Jeff Janes  wrote:

> On Mon, Mar 7, 2016 at 3:17 AM, Shulgin, Oleksandr
>  wrote:
> >
> > They might get that different plan when they upgrade to the latest major
> > version anyway.  Is it set somewhere that minor version upgrades should
> > never affect the planner?  I doubt so.
>
> People with meticulous standards are expected to re-validate their
> application, including plans and performance, before doing major
> version updates into production. They can continue to use a *fully
> patched* server from a previous major release while they do that.
>
> This is not the case for minor version updates.  We do not want to put
> people in the position where getting a security or corruption-risk
> update forces them to also accept changes which may destroy the
> performance of their system.
>
> I don't know if it is set out somewhere else, but there are many
> examples in this list of us declining to back-patch performance bug
> fixes which might negatively impact some users.  The only times we
> have done it that I can think of are when there is almost no
> conceivable way it could have a meaningful negative effect, or if the
> bug was tied in with security or stability bugs that needed to be
> fixed anyway and couldn't be separated.
>

The necessity to perform security upgrades is indeed a valid argument
against back-patching this, since this is not a bug that causes incorrect
results or data corruption, etc.

Thank you all for the thoughtful replies!
--
Alex


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Masahiko Sawada
On Tue, Mar 8, 2016 at 1:20 PM, Kyotaro HORIGUCHI
 wrote:
> Hello, thank you for updating this tool.
>
> At Mon, 7 Mar 2016 14:03:08 -0500, Robert Haas  wrote 
> in 
>> On Mon, Mar 7, 2016 at 12:41 PM, Masahiko Sawada  
>> wrote:
>> > Attached latest version optimisation patch.
>> > I'm still consider regarding pg_upgrade regression test code, so I
>> > will submit that patch later.
>>
> I was thinking more about this today and I think that we don't
> actually need the PD_ALL_FROZEN page-level bit for anything.  It's
> enough that the bit is present in the visibility map.  The only point
> of PD_ALL_VISIBLE is that it tells us that we need to clear the
> visibility map bit, but that bit is enough to tell us to clear both
> visibility map bits.  So I propose the attached cleanup patch.

Thank you for updating tool and proposing it.
I agree with you, and the patch you attached looks good to me except
for Horiguchi-san's comment.

Regarding pg_visibility module, I'd like to share some bugs and
propose to add a relation type condition to each functions.
Including it, I've attached remaining  2 patches; one is removing page
conversion code from pg_upgarde, and another is supporting pg_upgrade
for frozen bit.

Please have a look at them.

Regards,

--
Masahiko Sawada
diff --git a/contrib/pg_visibility/pg_visibility--1.0.sql b/contrib/pg_visibility/pg_visibility--1.0.sql
index 9616e1f..da511e5 100644
--- a/contrib/pg_visibility/pg_visibility--1.0.sql
+++ b/contrib/pg_visibility/pg_visibility--1.0.sql
@@ -12,7 +12,7 @@ AS 'MODULE_PATHNAME', 'pg_visibility_map'
 LANGUAGE C STRICT;
 
 -- Show visibility map and page-level visibility information.
-CREATE FUNCTION pg_visibility(regclass, blkno, bigint,
+CREATE FUNCTION pg_visibility(regclass, blkno bigint,
 			  all_visible OUT boolean,
 			  all_frozen OUT boolean,
 			  pd_all_visible OUT boolean)
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index d4336ce..2993bcb 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -14,6 +14,7 @@
 #include "funcapi.h"
 #include "miscadmin.h"
 #include "storage/bufmgr.h"
+#include "utils/rel.h"
 
 PG_MODULE_MAGIC;
 
@@ -55,6 +56,14 @@ pg_visibility_map(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid block number")));
 
+	/* Check for relation type */
+	if (!(rel->rd_rel->relkind == RELKIND_RELATION ||
+		  rel->rd_rel->relkind == RELKIND_MATVIEW))
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table or materialized view",
+		RelationGetRelationName(rel;
+
 	tupdesc = pg_visibility_tupdesc(false, false);
 	MemSet(nulls, 0, sizeof(nulls));
 
@@ -94,6 +103,14 @@ pg_visibility(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid block number")));
 
+	/* Check for relation type */
+	if (!(rel->rd_rel->relkind == RELKIND_RELATION ||
+		  rel->rd_rel->relkind == RELKIND_MATVIEW))
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table or materialized view",
+		RelationGetRelationName(rel;
+
 	tupdesc = pg_visibility_tupdesc(false, true);
 	MemSet(nulls, 0, sizeof(nulls));
 
@@ -147,9 +164,10 @@ pg_visibility_map_rel(PG_FUNCTION_ARGS)
 		HeapTuple	tuple;
 
 		MemSet(nulls, 0, sizeof(nulls));
-		values[0] = Int64GetDatum(info->next++);
+		values[0] = Int64GetDatum(info->next);
 		values[1] = BoolGetDatum((info->bits[info->next] & (1 << 0)) != 0);
 		values[2] = BoolGetDatum((info->bits[info->next] & (1 << 1)) != 0);
+		info->next++;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
@@ -190,10 +208,11 @@ pg_visibility_rel(PG_FUNCTION_ARGS)
 		HeapTuple	tuple;
 
 		MemSet(nulls, 0, sizeof(nulls));
-		values[0] = Int64GetDatum(info->next++);
+		values[0] = Int64GetDatum(info->next);
 		values[1] = BoolGetDatum((info->bits[info->next] & (1 << 0)) != 0);
 		values[2] = BoolGetDatum((info->bits[info->next] & (1 << 1)) != 0);
 		values[3] = BoolGetDatum((info->bits[info->next] & (1 << 2)) != 0);
+		info->next++;
 
 		tuple = heap_form_tuple(funcctx->tuple_desc, values, nulls);
 		SRF_RETURN_NEXT(funcctx, HeapTupleGetDatum(tuple));
@@ -223,6 +242,14 @@ pg_visibility_map_summary(PG_FUNCTION_ARGS)
 	rel = relation_open(relid, AccessShareLock);
 	nblocks = RelationGetNumberOfBlocks(rel);
 
+	/* Check for relation type */
+	if (!(rel->rd_rel->relkind == RELKIND_RELATION ||
+		  rel->rd_rel->relkind == RELKIND_MATVIEW))
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("\"%s\" is not a table or materialized view",
+		RelationGetRelationName(rel;
+
 	for (blkno = 0; blkno < nblocks; ++blkno)
 	{
 		int32		mapbits;
@@ -296,6 +323,15 @@ collect_visibility_data(Oid relid, bool include_pd)
 	BufferAccessStrategy	bstrategy = GetAccessStrategy(BAS_BULKREAD);
 
 	rel = relation_open(relid, 

[HACKERS] Parallel query fails on standby server

2016-03-08 Thread Ashutosh Sharma
Hi All,

While testing a parallel scan feature on standby server, it is found that
the parallel query fails with an error "*ERROR:  failed to initialize
transaction_read_only to 0*".

Following are the steps used to reproduce the issue:

*Master :-*

edb=# create table ert(n int);
edb=# insert into  ert values (generate_series(1,500));
edb=# analyze ert;
edb=# vacuum ert;

*Slave :-*

edb=# set max_parallel_degree =5;
SET
edb=# explain analyze verbose  select * from ert where n<=1000;
ERROR:  failed to initialize transaction_read_only to 0
CONTEXT:  parallel worker, PID 26042

*Root cause Analysis:*  After debugging the worker, it is observed that in
*RestoreGUCState()*, if a guc var can't be skipped it is Initialiazed with
a default value and
in this process when a guc variable "*transaction_read_only*" is being
Initialzed it calls a check_hook *check_transaction_read_only()* which
eventually fails due to
below check which says the guc var "*transaction_read_only*" can't be set
while recovery is in progress:






*if
(RecoveryInProgress()){GUC_check_errcode(ERRCODE_FEATURE_NOT_SUPPORTED);GUC_check_errmsg("cannot
set transaction read-write mode during recovery");return false;}*

*Solution:* Make use of a global variable "*InitializingParallelWorker"* to
protect the check for *RecoveryInProgress()* when Parallel Worker is being
Initialsed.
PFA patch to fix the issue.

With Regards,
Ashutosh Sharma
EnterpriseDB: *http://www.enterprisedb.com *
diff --git a/src/backend/commands/variable.c b/src/backend/commands/variable.c
index 903b3a6..c7173c9 100644
--- a/src/backend/commands/variable.c
+++ b/src/backend/commands/variable.c
@@ -482,11 +482,13 @@ show_log_timezone(void)
  * nothing since XactReadOnly will be reset by the next StartTransaction().
  * The IsTransactionState() test protects us against trying to check
  * RecoveryInProgress() in contexts where shared memory is not accessible.
+ * We can also skip the check for RecoveryInProgress while initializing the
+ * Parallel Workers by making use of the global variable InitializingParallelWorker.
  */
 bool
 check_transaction_read_only(bool *newval, void **extra, GucSource source)
 {
-	if (*newval == false && XactReadOnly && IsTransactionState())
+	if (*newval == false && XactReadOnly && IsTransactionState() && !InitializingParallelWorker)
 	{
 		/* Can't go to r/w mode inside a r/o transaction */
 		if (IsSubTransaction())

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query fails on standby server

2016-03-08 Thread Craig Ringer
On 8 March 2016 at 20:30, Ashutosh Sharma  wrote:

>
> While testing a parallel scan feature on standby server, it is found that
> the parallel query fails with an error "*ERROR:  failed to initialize
> transaction_read_only to 0*".
>
>
Looks like it might be a good idea to add some tests to src/test/recovery
for parallel query on standby servers...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Parallel query fails on standby server

2016-03-08 Thread Michael Paquier
On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringer  wrote:
> On 8 March 2016 at 20:30, Ashutosh Sharma  wrote:
>>
>>
>> While testing a parallel scan feature on standby server, it is found that
>> the parallel query fails with an error "ERROR:  failed to initialize
>> transaction_read_only to 0".
>>
>
> Looks like it might be a good idea to add some tests to src/test/recovery
> for parallel query on standby servers...

An even better thing would be a set of read-only tests based on the
database "regression" generated by make check, itself run with
pg_regress.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 7:26 AM, Masahiko Sawada  wrote:
> Regarding pg_visibility module, I'd like to share some bugs and
> propose to add a relation type condition to each functions.

OK, thanks.

> Including it, I've attached remaining  2 patches; one is removing page
> conversion code from pg_upgarde, and another is supporting pg_upgrade
> for frozen bit.

Committed 001 with minor tweaks.

I find rewrite_vm_table to be pretty opaque.  There's not even a
comment explaining what it is supposed to do.  And I wonder why we
really need to be this efficient about it anyway.  Like, would it be
too expensive to just do this:

for (i = 0; i < BITS_PER_BYTE; ++i)
if ((old & (1 << i)) != 0)
new |= 1 << (2 * i);

And how about adding some more comments explaining why we are doing
this rewriting, like this:

In versions of PostgreSQL prior to catversion 201602181, PostgreSQL's
visibility map included one bit per heap page; it now includes two.
When upgrading a cluster from before that time to a current PostgreSQL
version, we could refuse to copy visibility maps from the old cluster
to the new cluster; the next VACUUM would recreate them, but at the
price of scanning the entire table.  So, instead, we rewrite the old
visibility maps in the new format.  That way, the all-visible bit
remains set for the pages for which it was set previously.  The
all-frozen bit is never set by this conversion; we leave that to
VACUUM.

Also, I'm slightly perplexed by the fact that I can't see how this
code succeeds in turning each page into two pages, which is something
that it seems like it would need to do.  Wouldn't we need to write out
the old page header twice, one for the first of the two new pages and
again for the second?  I probably need more caffeine here, so please
tell me what I'm missing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-08 Thread Michael Paquier
On Mon, Mar 7, 2016 at 10:40 PM, Michael Paquier
 wrote:
> On Sun, Mar 6, 2016 at 5:55 AM, Andrew Dunstan  wrote:
>> On 03/05/2016 01:31 PM, Michael Paquier wrote:
>>> On Sat, Mar 5, 2016 at 11:34 PM, Andrew Dunstan 
>>> wrote:

 Here is a translation into perl of the sed script, courtesy of the s2p
 incarnation of psed:
 
 The sed script appears to have been stable for a long time, so I don't
 think
 we need to be too concerned about possibly maintaining two versions.
>>>
>>> That's 95% of the work already done, nice! If I finish wrapping up a
>>> patch for this issue at least would you backpatch? It would be saner
>>> to get rid of this dependency everywhere I think regarding compilation
>>> with perl 5.22.
>>
>> Sure.
>
> OK, so after some re-lecture of the script and perltidy-ing I finish
> with the attached. How does that look?

Attached is a new set for the support of MS 2015 + the psed issue,
please use those ones for testing:
- 0001 is the replacement of psed by a dedicated perl script to
generate probes.h
- 0002 Fix of the declaration of TIMEZONE_GLOBAL and TZNAME_GLOBAL for WIN32
- 0003 addresses the issue with lc_codepage missing from _locale_t.
- 0004 adds support for MS2015 in src/tools/scripts/
Regards,
-- 
Michael
From d7a100dae8816ff1287ae0eee2829d2b7ce6ef47 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 8 Mar 2016 22:18:16 +0900
Subject: [PATCH 1/4] Remove dependency to psed in MSVC scripts

psed has been removed from the core distribution of perl in 5.22, causing
the build of Postgres to fail in the case of MSVC on Windows should at
least this version of perl be used. This commit replaces the dependency
with psed by an equivalent perl script, the non-MSVC build use this script
as well instead of the former sed script when dtrace is not enabled in a
build.
---
 src/backend/utils/Gen_dummy_probes.pl  | 247 +
 src/backend/utils/Gen_dummy_probes.sed |  23 ---
 src/backend/utils/Makefile |   4 +-
 src/tools/msvc/Solution.pm |   2 +-
 4 files changed, 250 insertions(+), 26 deletions(-)
 create mode 100644 src/backend/utils/Gen_dummy_probes.pl
 delete mode 100644 src/backend/utils/Gen_dummy_probes.sed

diff --git a/src/backend/utils/Gen_dummy_probes.pl b/src/backend/utils/Gen_dummy_probes.pl
new file mode 100644
index 000..30c6d65
--- /dev/null
+++ b/src/backend/utils/Gen_dummy_probes.pl
@@ -0,0 +1,247 @@
+#! /usr/bin/perl -w
+#-
+#
+# Gen_dummy_probes.pl
+#Perl script that generates probes.h file when dtrace is not available
+#
+# Portions Copyright (c) 2008-2016, PostgreSQL Global Development Group
+#
+#
+# IDENTIFICATION
+#src/backend/utils/Gen_dummy_probes.pl
+#
+#-
+
+$0 =~ s/^.*?(\w+)[\.\w+]*$/$1/;
+
+use strict;
+use Symbol;
+use vars qw{ $isEOF $Hold %wFiles @Q $CondReg
+  $doAutoPrint $doOpenWrite $doPrint };
+$doAutoPrint = 1;
+$doOpenWrite = 1;
+
+# prototypes
+sub openARGV();
+sub getsARGV(;\$);
+sub eofARGV();
+sub printQ();
+
+# Run: the sed loop reading input and applying the script
+#
+sub Run()
+{
+	my ($h, $icnt, $s, $n);
+
+	# hack (not unbreakable :-/) to avoid // matching an empty string
+	my $z = "\000";
+	$z =~ /$z/;
+
+	# Initialize.
+	openARGV();
+	$Hold= '';
+	$CondReg = 0;
+	$doPrint = $doAutoPrint;
+  CYCLE:
+	while (getsARGV())
+	{
+		chomp();
+		$CondReg = 0;# cleared on t
+	  BOS:;
+
+		# /^[ 	]*probe /!d
+		unless (m /^[ \t]*probe /s)
+		{
+			$doPrint = 0;
+			goto EOS;
+		}
+
+		# s/^[ 	]*probe \([^(]*\)\(.*\);/\1\2/
+		{
+			$s = s /^[ \t]*probe ([^(]*)(.*);/${1}${2}/s;
+			$CondReg ||= $s;
+		}
+
+		# s/__/_/g
+		{
+			$s = s /__/_/sg;
+			$CondReg ||= $s;
+		}
+
+		# y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/
+		{ y{abcdefghijklmnopqrstuvwxyz}{ABCDEFGHIJKLMNOPQRSTUVWXYZ}; }
+
+		# s/^/#define TRACE_POSTGRESQL_/
+		{
+			$s = s /^/#define TRACE_POSTGRESQL_/s;
+			$CondReg ||= $s;
+		}
+
+		# s/([^,)]\{1,\})/(INT1)/
+		{
+			$s = s /\([^,)]+\)/(INT1)/s;
+			$CondReg ||= $s;
+		}
+
+		# s/([^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2)/
+		{
+			$s = s /\([^,)]+, [^,)]+\)/(INT1, INT2)/s;
+			$CondReg ||= $s;
+		}
+
+		# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3)/
+		{
+			$s = s /\([^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4)/
+		{
+			$s =
+s /\([^,)]+, [^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3, INT4)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\})/(INT1, INT2, INT3, INT4, INT5)/
+		{
+			$s =
+s /\([^,)]+, [^,)]+, [^,)]+, [^,)]+, [^,)]+\)/(INT1, INT2, INT3, INT4, INT5)/s;
+			$CondReg ||= $s;
+		}
+
+# s/([^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\{1,\}, [^,)]\

Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 4:27 AM, Amit Kapila  wrote:
>> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
>> how big the relation extension lock wait queue is, instead of adding
>> more stuff to PGPROC?
>
> One idea to make it work without adding additional stuff in PGPROC is that
> after acquiring relation extension lock, check if there is any available
> block in fsm, if it founds any block, then release the lock and proceed,
> else extend the relation by one block and then check lock's wait queue size
> or number of lock requests (nRequested) and extend the relation further in
> proportion to wait queue size and then release the lock and proceed.  Here,
> I think we can check for wait queue size even before extending the relation
> by one block.
>
> The benefit of doing it with PGPROC is that there will be relatively less
> number LockAcquire calls as compare to heavyweight lock approach, which I
> think should not matter much because we are planing to extend the relation
> in proportion to wait queue size (probably wait queue size * 10).

I don't think switching relation extension from heavyweight locks to
lightweight locks is going to work.  It would mean, for example, that
we lose the ability to service interrupts while extending a relation;
not to mention that we lose scalability if many relations are being
extended at once.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Splitting lengthy sgml files

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 10:09 AM, Tom Lane  wrote:
> Tatsuo Ishii  writes:
>> There are very lengthy (over 10k lines, for example) SGML files in
>> docs. While working on translating docs using GitHub, I noticed that
>> sometimes diffs are not showed in pull requests due to the limitation
>> of GitHub, which makes me pretty difficult to review PR. Any chance to
>> split those lengthy SGML files into smaller SGML files?
>
> Surely that's a github bug that you should be complaining to them about?

Well I'm sure it's not like they did it for no reason.  At some point
displaying a diff on a giant file is going to result in a page that
takes too long to load.

> I'm disinclined to split existing files because (a) it would complicate
> back-patching and (b) it would be completely destructive to git history.
> git claims to understand about file moves but it doesn't do a terribly
> good job with that history-wise (try git log or git blame on
> recently-moved files such as pgbench).  And I've never heard even
> a claim that it understands splits.

But we've split very long source files in the past, and I don't see
why splitting doc files is any stupider.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 8:30 AM, Robert Haas  wrote:
> On Tue, Mar 8, 2016 at 7:26 AM, Masahiko Sawada  wrote:
>> Regarding pg_visibility module, I'd like to share some bugs and
>> propose to add a relation type condition to each functions.
>
> OK, thanks.

I left out the relkind check from the final commit because, for one
thing, the check you added isn't actually right: toast relations can
also have a visibility map.  And also, I'm sort of wondering what the
point of that check is.  What does it protect us from?  It doesn't
seem very future-proof ... what if we add a new relkind in the future?
 Do we really want to have to update this?

How about instead changing things so that we specifically reject
indexes?  And maybe some kind of a check that will reject anything
that lacks a relfilnode?  That seems like it would be more on point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing to run a buildfarm animal under valgrind

2016-03-08 Thread Andrew Dunstan



On 03/07/2016 08:39 PM, Andres Freund wrote:

Hi,

I'm setting up a buildfarm animal that runs under
valgrind. Unfortunately there's not really any good solution to force
make check et al. to start postgres wrapped in valgrind.  For now I've
resorted to adding something like

sub replace_postgres
{
 my $srcdir=$use_vpath ? "../pgsql/" : ".";
 my $builddir=abs_path("$pgsql");
 $srcdir=abs_path("$pgsql/$srcdir");
 chdir "$pgsql/src/backend/";
 rename "postgres", "postgres.orig";
 sysopen my $fh, "postgres", O_CREAT|O_TRUNC|O_RDWR, 0700
 or die "Could not create postgres wrapper";
 print $fh <<"END";
#!/bin/bash
~/src/valgrind/vg-in-place \\
 --quiet \\
 --error-exitcode=128 \\
 --suppressions=$srcdir/src/tools/valgrind.supp \\
 --trace-children=yes --track-origins=yes --read-var-info=yes \\
 --leak-check=no \\
 $builddir/src/backend/postgres.orig \\
 "\$@"
END
 close $fh;
 chdir $branch_root;
}
to the buildfarm client.

i.e. a script that replaces the postgres binary with a wrapper that
invokes postgres via valgrind.

That's obviously not a very good approach though. It's buildfarm
specific and thus can't be invoked by developers and it doesn't really
support being installed somewhere.

Does anybody have a better idea about how to do this?





Why not just create a make target which does this? It could be run after 
'make' and before 'make check'. I would make it assume valgrind was 
installed and in the path rather than using vg-in-place.


If that doesn't work and you want to do something with the buildfarm, 
probably a buildfarm module would be the way to go. We might need to add 
a new module hook to support it, to run right after make(), but that 
would be a one line addition to run_build.pl.


cheers

andrew









--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Joel Jacobson
Hi Alex,

Thanks for excellent research.

I've ran your queries against Trustly's production database and I can
confirm your findings, the results are similar:

WITH ...
SELECT count(1),
   min(hist_ratio)::real,
   avg(hist_ratio)::real,
   max(hist_ratio)::real,
   stddev(hist_ratio)::real
  FROM stats2
 WHERE histogram_bounds IS NOT NULL;

-[ RECORD 1 ]
count  | 2814
min| 0.193548
avg| 0.927357
max| 1
stddev | 0.164134


WHERE distinct_hist < num_hist
-[ RECORD 1 ]
count  | 624
min| 0.193548
avg| 0.672407
max| 0.990099
stddev | 0.194901


WITH ..
SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
   n_distinct, null_frac,
   num_mcv, most_common_vals, most_common_freqs,
   mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
   distinct_hist, num_hist, hist_ratio,
   histogram_bounds
  FROM stats2
 ORDER BY hist_ratio
 LIMIT 1;

 -[ RECORD 1 
]-+-
columnname| public.x.y
n_distinct| 103
null_frac | 0
num_mcv   | 10
most_common_vals  | {0,1,2,3,4,5,6,7,8,9}
most_common_freqs |
{0.4765,0.141733,0.1073,0.0830667,0.0559667,0.037,0.0251,0.0188,0.0141,0.0113667}
mcv_frac  | 0.971267
nonnull_mcv_frac  | 0.971267
distinct_hist | 18
num_hist  | 93
hist_ratio| 0.193548387096774
histogram_bounds  |
{10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}



On Mon, Jan 18, 2016 at 4:46 PM, Shulgin, Oleksandr
 wrote:
> On Wed, Dec 2, 2015 at 10:20 AM, Shulgin, Oleksandr
>  wrote:
>>
>> On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane  wrote:
>>>
>>> "Shulgin, Oleksandr"  writes:
>>> > This post summarizes a few weeks of research of ANALYZE statistics
>>> > distribution on one of our bigger production databases with some
>>> > real-world
>>> > data and proposes a patch to rectify some of the oddities observed.
>>>
>>> Please add this to the 2016-01 commitfest ...
>>
>>
>> Added: https://commitfest.postgresql.org/8/434/
>
>
> It would be great if some folks could find a moment to run the queries I was
> showing on their data to confirm (or refute) my findings, or to contribute
> to the picture in general.
>
> As I was saying, the queries were designed in such a way that even
> unprivileged user can run them (the results will be limited to the stats
> data available to that user, obviously; and for custom-tailored statstarget
> one still needs superuser to join the pg_statistic table directly).  Also,
> on the scale of ~30k attribute statistics records, the queries take only a
> few seconds to finish.
>
> Cheers!
> --
> Alex
>



-- 
Joel Jacobson

Mobile: +46703603801
Trustly.com | Newsroom | LinkedIn | Twitter


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Amit Kapila
On Tue, Mar 8, 2016 at 7:23 PM, Robert Haas  wrote:
>
> On Tue, Mar 8, 2016 at 4:27 AM, Amit Kapila 
wrote:
> >> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
> >> how big the relation extension lock wait queue is, instead of adding
> >> more stuff to PGPROC?
> >
> > One idea to make it work without adding additional stuff in PGPROC is
that
> > after acquiring relation extension lock, check if there is any available
> > block in fsm, if it founds any block, then release the lock and proceed,
> > else extend the relation by one block and then check lock's wait queue
size
> > or number of lock requests (nRequested) and extend the relation further
in
> > proportion to wait queue size and then release the lock and proceed.
Here,
> > I think we can check for wait queue size even before extending the
relation
> > by one block.
> >
> > The benefit of doing it with PGPROC is that there will be relatively
less
> > number LockAcquire calls as compare to heavyweight lock approach, which
I
> > think should not matter much because we are planing to extend the
relation
> > in proportion to wait queue size (probably wait queue size * 10).
>
> I don't think switching relation extension from heavyweight locks to
> lightweight locks is going to work.
>

Sorry, but I am not suggesting to change it to lightweight locks.  I am
just suggesting how to make batching works with heavyweight locks as asked
by you.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Optimizer questions

2016-03-08 Thread Alvaro Herrera
Tom Lane wrote:
> Konstantin Knizhnik  writes:
> > Attached please find improved version of the optimizer patch for LIMIT 
> > clause.

> For now, I've set this commitfest entry to Waiting on Author.  There's
> still time to consider a rewrite in this 'fest, if you can get it done
> in a week or two.

Yes.  Given that Konstantin will have to struggle to get this patch
rebased on top of upper-planner pathification which appeared out of the
blue at the last minute, it seems fair to give some additional time
for the required work.

However, we still have a commitfest schedule to adhere to, and
Konstantin has two other patches in the commitfest:
* Support ALTER INDEX ... WHERE ... clause
* eXtensible Transaction Manager API (v2)

and since we also need his contribution as a patch reviewer, it seems
unfair to just let all his patches move forward --- if we did that, he
would have no time at all to review other's patches, which is a
requirement.

Since we're only one week into the commitfest, I think it's his
prerogative to decide what to do.  I think there are two options: he can
either continue with this patch only, and get back from WoA to
Needs-Review in (hopefully) one week; or he can drop this one from the
commitfest right now and concentrate on the two other ones.  Either way,
as I already stated, we need his contribution as a reviewer for other
patche, too.

(If I were in his socks, I wouldn't have any hope that the XTM patch
would go in for 9.6 at this point; the most I'd hope is to have lots of
feedback in order to have something to propose for early 9.7.  I don't
know the status of the ALTER INDEX one, so I can't comment there.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregate

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
 wrote:
> My concerns are:
> 1. Since there's no cheapest_partial_path in RelOptInfo the code is
> currently considering every partial_path for parallel hash aggregate.
> With normal aggregation we only ever use the cheapest path, so this
> may not be future proof. As of today we do only have at most one
> partial path in the list, but there's no reason to code this with that
> assumption. I didn't put in much effort to improve this as I see code
> in generate_gather_paths() which also makes assumptions about there
> just being 1 partial path. Perhaps we should expand RelOptInfo to
> track the cheapest partial path? or maybe allpaths.c should have a
> function to fetch the cheapest out of the list?

The first one in the list will be the cheapest; why not just look at
that?  Sorted partial paths are interesting if some subsequent path
construction step can make use of that sort ordering, but they're
never interesting from the point of view of matching the query's
pathkeys because of the fact that Gather is order-destroying.

> 3. Nothing in create_grouping_paths() looks at the force_parallel_mode
> GUC. I had a quick look at this GUC and was a bit surprised to see 3
> possible states, but no explanation of what they do, so I've not added
> code which pays attention to this setting yet. I'd imagine this is
> just a matter of skipping serial path generation when parallel is
> possible when force_parallel_mode is FORCE_PARALLEL_ON. I've no idea
> what FORCE_PARALLEL_REGRESS is for yet.

The GUC is documented just like all the other GUCs are documented.
Maybe that's not enough, but I don't think "no explanation of what
they do" is accurate.  But I don't see why this patch should need to
care about force_parallel_mode at all.  force_parallel_mode is about
making queries that wouldn't choose to run in parallel do on their own
do so anyway, whereas this patch is about making more queries able to
do more work in parallel.

> The setrefs.c parts of the patch remain completely broken. I've not
> had time to look at this again yet, sorry.

I hope you get time soon.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Way to check whether a particular block is on the shared_buffer?

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 4:32 AM, Kouhei Kaigai  wrote:
>> Why not FileDescriptor(), FileFlags(), FileMode() as separate
>> functions like FilePathName()?
>>
> Here is no deep reason. The attached patch adds three individual
> functions.

This seems unobjectionable to me, so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-03-08 Thread Alvaro Herrera
Michael Paquier wrote:
> On Wed, Mar 2, 2016 at 2:04 PM, Michael Paquier
>  wrote:
> > Here are a couple of ways to address this problem:
> > 1) Remove the check before applying the delay
> > 2) Increase recovery_min_apply_delay to a time that will allow even
> > slow machines to see a difference. By experience with the other tests
> > 30s would be enough. The sleep time needs to be increased as well,
> > making the time taken for the test to run longer
> > 3) Remove all together 005, because doing either 1) or 2) reduces the
> > value of the test.
> > I'd like 1) personally, I still see value in this test.
> 
> So, as doing 1) would be actually equivalent to simply having a master
> and checking that its standby replicates correctly, I have been
> looking at 2) to see to how long the delay has to be set to make the
> test failure-proof. After doing some measurements with hamster, 10s
> and 15s have proved to not be enough unfortunately, 20s has not failed
> in 10 attempts though. Attached is a patch to bump it to 20s, though I
> would not complain if the test is actually removed to accelerate the
> runs of this test suite.

Is there anything we can do to short-circuit the wait in the case that
replication happens promptly?  A one-minute wait would be acceptable we
terminate it early by checking every second.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 3:02 AM, Amit Langote
 wrote:
> Updated versions attached.
>
> * changed st_progress_param to int64 and so did the argument of
> pgstat_progress_update_param().  Likewise changed param1..param10 of
> pg_stat_get_progress_info()'s output columns to bigint.
>
> * Added back the Oid field st_command_target and corresponding function
> pgstat_progress_set_command_target(Oid).

What the heck do we have an SQL-visible pg_stat_reset_local_progress()
for?  Surely if we ever need that, it's a bug.

I think pgstat_progress_update_param() should Assert(index >= 0 &&
index < N_PROGRESS_PARAM).  But I'd rename N_PROGRESS_PARAM to
PGSTAT_NUM_PROGRESS_PARAM.

Regarding "XXX - privilege check is maybe dubious" - I think the
privilege check here should match pg_stat_activity.  If it does,
there's nothing dubious about that IMHO.

This patch has been worked on by so many people and reviewed by so
many people that I can't keep track of who should be credited when it
gets committed.  Could someone provide a list of author(s) and
reviewer(s)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-08 Thread Oleg Bartunov
On Tue, Mar 8, 2016 at 6:40 AM, Craig Ringer  wrote:



> Either that, or bless experimental features/API as an official concept.
> I'd quite like that myself - stuff that's in Pg, but documented as "might
> change or go away in the next release, experimental feature". As we're
> doing more stuff that spans multiple release cycles, where patches in a
> prior cycle might need revision based on what we learn in a later one, we
> might need more freedom to change things that're committed and user visible.
>
>
+1


> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-08 Thread Andrew Dunstan



On 03/08/2016 08:32 AM, Michael Paquier wrote:

On Mon, Mar 7, 2016 at 10:40 PM, Michael Paquier
 wrote:

On Sun, Mar 6, 2016 at 5:55 AM, Andrew Dunstan  wrote:

On 03/05/2016 01:31 PM, Michael Paquier wrote:

On Sat, Mar 5, 2016 at 11:34 PM, Andrew Dunstan 
wrote:

Here is a translation into perl of the sed script, courtesy of the s2p
incarnation of psed:

The sed script appears to have been stable for a long time, so I don't
think
we need to be too concerned about possibly maintaining two versions.

That's 95% of the work already done, nice! If I finish wrapping up a
patch for this issue at least would you backpatch? It would be saner
to get rid of this dependency everywhere I think regarding compilation
with perl 5.22.

Sure.

OK, so after some re-lecture of the script and perltidy-ing I finish
with the attached. How does that look?

Attached is a new set for the support of MS 2015 + the psed issue,
please use those ones for testing:
- 0001 is the replacement of psed by a dedicated perl script to
generate probes.h
- 0002 Fix of the declaration of TIMEZONE_GLOBAL and TZNAME_GLOBAL for WIN32
- 0003 addresses the issue with lc_codepage missing from _locale_t.
- 0004 adds support for MS2015 in src/tools/scripts/



Thanks for doing this work.

Do we already have a hard dependency on perl for all non-Windows builds? 
I know it's been discussed but I don't recall. If so, back to what version?


The comment in the codepage patch is a bit unclear. Can you reword it a bit?

cheers

andrew





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel query fails on standby server

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 8:23 AM, Michael Paquier
 wrote:
> On Tue, Mar 8, 2016 at 9:51 PM, Craig Ringer  wrote:
>> On 8 March 2016 at 20:30, Ashutosh Sharma  wrote:
>>>
>>> While testing a parallel scan feature on standby server, it is found that
>>> the parallel query fails with an error "ERROR:  failed to initialize
>>> transaction_read_only to 0".
>>>
>>
>> Looks like it might be a good idea to add some tests to src/test/recovery
>> for parallel query on standby servers...
>
> An even better thing would be a set of read-only tests based on the
> database "regression" generated by make check, itself run with
> pg_regress.

I'm not sure anything in the main regression suite actually goes
parallel right now, which is probably the first thing to fix.

Unless, of course, you use force_parallel_mode=regress, max_parallel_degree>0.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Odd warning from pg_dump

2016-03-08 Thread Tom Lane
In connection with a question on -general, I tried this:

$ pg_dump -n '*' regression >regression.dump
pg_dump: WARNING: typtype of data type "any" appears to be invalid
pg_dump: WARNING: typtype of data type "anyarray" appears to be invalid
pg_dump: WARNING: typtype of data type "anyelement" appears to be invalid
pg_dump: WARNING: typtype of data type "anyenum" appears to be invalid
pg_dump: WARNING: typtype of data type "anynonarray" appears to be invalid
pg_dump: WARNING: typtype of data type "anyrange" appears to be invalid
pg_dump: WARNING: typtype of data type "cstring" appears to be invalid
pg_dump: WARNING: typtype of data type "event_trigger" appears to be invalid
pg_dump: WARNING: typtype of data type "fdw_handler" appears to be invalid
pg_dump: WARNING: typtype of data type "index_am_handler" appears to be invalid
pg_dump: WARNING: typtype of data type "internal" appears to be invalid
pg_dump: WARNING: typtype of data type "language_handler" appears to be invalid
pg_dump: WARNING: typtype of data type "opaque" appears to be invalid
pg_dump: WARNING: typtype of data type "pg_ddl_command" appears to be invalid
pg_dump: WARNING: typtype of data type "record" appears to be invalid
pg_dump: WARNING: typtype of data type "trigger" appears to be invalid
pg_dump: WARNING: typtype of data type "tsm_handler" appears to be invalid
pg_dump: WARNING: typtype of data type "void" appears to be invalid
$

The datatypes being complained of are evidently all the pseudo-types.
I haven't looked into the code to figure out why this happens.
The dump is produced anyway, so it's only a cosmetic issue, but seems
probably worth fixing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-08 Thread Tom Lane
Andrew Dunstan  writes:
> Do we already have a hard dependency on perl for all non-Windows builds? 
> I know it's been discussed but I don't recall. If so, back to what version?

I think the policy is we require perl for building from a git pull,
but not for building from a tarball.  Thus, any files that perl is used
to produce have to be shipped in tarballs.

However, there definitely *is* a hard requirement on perl for Windows
builds, even from tarballs, and I thought this patch was only about
the Windows build?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd warning from pg_dump

2016-03-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> In connection with a question on -general, I tried this:
> 
> $ pg_dump -n '*' regression >regression.dump
> pg_dump: WARNING: typtype of data type "any" appears to be invalid
> pg_dump: WARNING: typtype of data type "anyarray" appears to be invalid
> pg_dump: WARNING: typtype of data type "anyelement" appears to be invalid
> pg_dump: WARNING: typtype of data type "anyenum" appears to be invalid
> pg_dump: WARNING: typtype of data type "anynonarray" appears to be invalid
> pg_dump: WARNING: typtype of data type "anyrange" appears to be invalid
> pg_dump: WARNING: typtype of data type "cstring" appears to be invalid
> pg_dump: WARNING: typtype of data type "event_trigger" appears to be invalid
> pg_dump: WARNING: typtype of data type "fdw_handler" appears to be invalid
> pg_dump: WARNING: typtype of data type "index_am_handler" appears to be 
> invalid
> pg_dump: WARNING: typtype of data type "internal" appears to be invalid
> pg_dump: WARNING: typtype of data type "language_handler" appears to be 
> invalid
> pg_dump: WARNING: typtype of data type "opaque" appears to be invalid
> pg_dump: WARNING: typtype of data type "pg_ddl_command" appears to be invalid
> pg_dump: WARNING: typtype of data type "record" appears to be invalid
> pg_dump: WARNING: typtype of data type "trigger" appears to be invalid
> pg_dump: WARNING: typtype of data type "tsm_handler" appears to be invalid
> pg_dump: WARNING: typtype of data type "void" appears to be invalid
> $
> 
> The datatypes being complained of are evidently all the pseudo-types.
> I haven't looked into the code to figure out why this happens.
> The dump is produced anyway, so it's only a cosmetic issue, but seems
> probably worth fixing.

This is fixed in my changes to pg_dump, though I didn't expect you'd be
able to hit them in released versions and so hadn't been planning to
break it out.

This is the hunk for that particular change (line numbers are probably
off due to my other changes in the overall patch):

*** dumpType(Archive *fout, TypeInfo *tyinfo
*** 8722,8727 
--- 8727,8743 
dumpRangeType(fout, tyinfo);
else if (tyinfo->typtype == TYPTYPE_PSEUDO && !tyinfo->isDefined)
dumpUndefinedType(fout, tyinfo);
+   else if (tyinfo->typtype == TYPTYPE_PSEUDO && tyinfo->isDefined &&
+strncmp(tyinfo->dobj.namespace->dobj.name, "pg_catalog", 10) == 0)
+   /*
+* skip defined pseudo-types in pg_catalog, they are special cases in
+* the type system which are defined at initdb time only.
+*
+* Should a user manage to create one (which would require hand hacking
+* the catalog, currently), throwing the below error seems entirely
+* reasonable.
+*/
+   return;
else
write_msg(NULL, "WARNING: typtype of data type \"%s\" appears to be 
invalid\n",
  tyinfo->dobj.name);

I can certainly look at committing that independently from the other
pg_dump changes that I'm working on.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] extend pgbench expressions with functions

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 4:58 AM, Fabien COELHO  wrote:
>> - 32-b: add double functions, including double variables
>> - 32-c: remove \setrandom support (advice to use \set + random instead)
>
> Here is a rebased version after Tom's updates, 33-b & 33-c. I also extended
> the floating point syntax to signed accept signed exponents, and changed the
> regexpr style to match Toms changes.

Having a look at 33-b, this looks pretty good now, but:

// comments are not allowed.  I'd just remove the two you have.

It make no sense to exit(1) and then return 0, so don't do that.  I
might write this code as:

if (pval->type == PGBT_INT)
   return pval->u.ival;

Assert(pval->type == PGBT_DOUBLE);
/* do double stuff */

This would get rid of the internal-error case here altogether in favor
of testing it via an assertion.

I think that coerceToInt() should not exit(1) when an overflow occurs;
instead, I think the function should be declared as bool
coerceToInt(PgBenchValue *pval, int64 *result), and the error return
should be propagated back to the caller, which can then return false
as we do for other error cases.

I think that some of the places you've used coerceToInt() are not
appropriate.  Like, for example:

+   if
(coerceToInt(rval) == 0)
+   {
+
fprintf(stderr, "division by zero\n");
+   return false;
+   }

Now, if rval is out of range of an integer, that is going to overflow
while trying to see whether it should divide by zero.  Please work a
little harder here and in similar cases.  Maybe add a helper function
checkIntegerEquality(PgBenchValue *, int64).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd warning from pg_dump

2016-03-08 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> pg_dump: WARNING: typtype of data type "any" appears to be invalid

> This is fixed in my changes to pg_dump, though I didn't expect you'd be
> able to hit them in released versions and so hadn't been planning to
> break it out.

Oh, this was with HEAD; I've not checked it in released versions.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd warning from pg_dump

2016-03-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> pg_dump: WARNING: typtype of data type "any" appears to be invalid
> 
> > This is fixed in my changes to pg_dump, though I didn't expect you'd be
> > able to hit them in released versions and so hadn't been planning to
> > break it out.
> 
> Oh, this was with HEAD; I've not checked it in released versions.

Using your "-n '*'", I suspect you'd be able to hit it in released
versions also.

I think the real question is if "-n '*'" should still exclude
'pg_catalog'.  Fixing the issue with defined pseudo types is wonderful,
but aren't you going to end up with a dump you can't restore,
regardless?

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Odd warning from pg_dump

2016-03-08 Thread Tom Lane
Stephen Frost  writes:
> I think the real question is if "-n '*'" should still exclude
> 'pg_catalog'.  Fixing the issue with defined pseudo types is wonderful,
> but aren't you going to end up with a dump you can't restore,
> regardless?

Yeah, perhaps so.  The thread on -general has also produced the
information that pg_dump -t '*' tries to dump system catalogs as if
they were user tables, which is another pretty useless bit of behavior.
So maybe we should drop the hunk you've got there (which frankly seems a
bit of a kluge) and instead hot-wire things so that stuff in pg_catalog
is excluded even if it would otherwise match the inclusion lists.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-08 Thread Andrew Dunstan



On 03/08/2016 10:43 AM, Tom Lane wrote:

Andrew Dunstan  writes:

Do we already have a hard dependency on perl for all non-Windows builds?
I know it's been discussed but I don't recall. If so, back to what version?

I think the policy is we require perl for building from a git pull,
but not for building from a tarball.  Thus, any files that perl is used
to produce have to be shipped in tarballs.

However, there definitely *is* a hard requirement on perl for Windows
builds, even from tarballs, and I thought this patch was only about
the Windows build?





Michael's patch proposes to replace the use of sed to generate probes.h 
with the perl equivalent everywhere. That has the advantage that we keep 
to one script to generate probes.h, but it does impose a perl dependency.


We could get around that by shipping probes.h in tarballs, which seems 
like a perfectly reasonable thing to do. If we don't like that we can 
fall back to using the perl script just for MSVC builds.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Amit Langote
On Wed, Mar 9, 2016 at 12:24 AM, Robert Haas  wrote:
> This patch has been worked on by so many people and reviewed by so
> many people that I can't keep track of who should be credited when it
> gets committed.  Could someone provide a list of author(s) and
> reviewer(s)?

Original authors are Rahila Syed and Vinayak Pokale.

I have been reviewing this for last few CFs. I sent in last few
revisions as well.

Thanks,
Amit


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-08 Thread Tom Lane
Andrew Dunstan  writes:
> On 03/08/2016 10:43 AM, Tom Lane wrote:
>> However, there definitely *is* a hard requirement on perl for Windows
>> builds, even from tarballs, and I thought this patch was only about
>> the Windows build?

> Michael's patch proposes to replace the use of sed to generate probes.h 
> with the perl equivalent everywhere. That has the advantage that we keep 
> to one script to generate probes.h, but it does impose a perl dependency.

Meh.

> We could get around that by shipping probes.h in tarballs, which seems 
> like a perfectly reasonable thing to do.

Well, it'd be messier than you think, because you could not just ship
a dummy probes.h, or make would think it was up to date and not replace
it even in a dtrace-using build.  You'd have to ship it as something
like "probes.dummy.h" and cp it into place at build time.

On the whole, I'd rather that this patch left the non-Windows side alone.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Dilip Kumar
On Tue, Mar 8, 2016 at 8:34 PM, Amit Kapila  wrote:

> > >> Hmm.  Can we drive this off of the heavyweight lock manager's idea of
> > >> how big the relation extension lock wait queue is, instead of adding
> > >> more stuff to PGPROC?
> > >
> > > One idea to make it work without adding additional stuff in PGPROC is
> that
> > > after acquiring relation extension lock, check if there is any
> available
> > > block in fsm, if it founds any block, then release the lock and
> proceed,
> > > else extend the relation by one block and then check lock's wait queue
> size
> > > or number of lock requests (nRequested) and extend the relation
> further in
> > > proportion to wait queue size and then release the lock and proceed.
> Here,
> > > I think we can check for wait queue size even before extending the
> relation
> > > by one block.
>


I have come up with this patch..

If this approach looks fine then I will prepare final patch (more comments,
indentation, and improve some code) and do some long run testing (current
results are 5 mins run).

Idea is same what Robert and Amit suggested up thread.

/* First we try the lock and if get just extend one block, this will give
two benefit ,
1. Single thread performance will not impact by checking lock waiters and
all
2. If we check the waiter in else part it will give time for more waiter to
get collected and will get better estimation of contention*/

TryRelExtLock ()
{
extend one block
}
else
{
   RelextLock()
   if (recheck the FSM if somebody have added blocks for me)
  -- don't extend any block just reuse
   else
  --we have to extend blocks
  -- get the waiter = lock->nRequested
  --add extra block to FSM extraBlock = waiter*20;
}

Result looks like this
---

Test1(COPY)
-./psql -d postgres -c "COPY (select g.i::text FROM generate_series(1,
1) g(i)) TO '/tmp/copybinary' WITH BINARY";echo COPY data from
'/tmp/copybinary' WITH BINARY; > copy_script
./pgbench -f copy_script -T 300 -c$ -j$ postgres

Shared Buffer:8GBmax_wal_size:10GB  Storage:Magnetic DiskWAL:SSD
---
ClientBase  multi_extend by 20 page lock_waiter patch(waiter*20)
1  105157 148
2  217255 252
4  210 494442
8  166702 645
16145 563773
32124 480957

Note: @1 thread there should not be any improvement, so many be run to run
variance.

Test2(INSERT)
./psql -d postgres -c "create table test_data(a int, b text)"./psql
-d postgres -c "insert into test_data
values(generate_series(1,1000),repeat('x', 1024))"./psql -d postgres -c
"create table data (a int, b text)
echo "insert into data select * from test_data;" >> insert_script

 shared_buffers=512GB  max_wal_size=20GB  checkpoint_timeout=10min
./pgbench -c $ -j $ -f insert_script -T 300 postgres

ClientBaseMulti-extend by 1000lock_waiter patch(waiter*20)
1117118  117

2111140  132

4 51 190  134

843  254  148

16   40 243  206
32   - -  264


* (waiter*20)-> First process got the lock will find the lock waiters and
add waiter*20 extra blocks.

In next run I will run beyond 32 also, as we can see even at 32 client its
increasing.. so its clear when it see more contentions, adapting to
contention and adding more blocks..


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


multi_extend_v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Jeff Janes
On Tue, Mar 8, 2016 at 5:30 AM, Robert Haas  wrote:
> On Tue, Mar 8, 2016 at 7:26 AM, Masahiko Sawada  wrote:
>> Regarding pg_visibility module, I'd like to share some bugs and
>> propose to add a relation type condition to each functions.
>
> OK, thanks.
>
>> Including it, I've attached remaining  2 patches; one is removing page
>> conversion code from pg_upgarde, and another is supporting pg_upgrade
>> for frozen bit.
>
> Committed 001 with minor tweaks.
>
> I find rewrite_vm_table to be pretty opaque.  There's not even a
> comment explaining what it is supposed to do.  And I wonder why we
> really need to be this efficient about it anyway.  Like, would it be
> too expensive to just do this:
>
> for (i = 0; i < BITS_PER_BYTE; ++i)
> if ((old & (1 << i)) != 0)
> new |= 1 << (2 * i);
>
> And how about adding some more comments explaining why we are doing
> this rewriting, like this:
>
> In versions of PostgreSQL prior to catversion 201602181, PostgreSQL's
> visibility map included one bit per heap page; it now includes two.
> When upgrading a cluster from before that time to a current PostgreSQL
> version, we could refuse to copy visibility maps from the old cluster
> to the new cluster; the next VACUUM would recreate them, but at the
> price of scanning the entire table.  So, instead, we rewrite the old
> visibility maps in the new format.  That way, the all-visible bit
> remains set for the pages for which it was set previously.  The
> all-frozen bit is never set by this conversion; we leave that to
> VACUUM.
>
> Also, I'm slightly perplexed by the fact that I can't see how this
> code succeeds in turning each page into two pages, which is something
> that it seems like it would need to do.  Wouldn't we need to write out
> the old page header twice, one for the first of the two new pages and
> again for the second?  I probably need more caffeine here, so please
> tell me what I'm missing.

I think that this loop:

   while (blkend >= end)

Executes exactly twice for each iteration of the outer loop.  I'd
rather see it written as a loop which explicitly executes twice,
rather looking like it might execute a dynamic number of times.  I
can't imagine that this code needs to be future-proof.  If we change
the format again in the future, surely we can't just change this code,
we would have to write new code for the new format.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VS 2015 support in src/tools/msvc

2016-03-08 Thread Andrew Dunstan



On 03/08/2016 11:17 AM, Tom Lane wrote:

On the whole, I'd rather that this patch left the non-Windows side alone.



OK, that's why I raised the issue. We'll do it that way.

As I noted upthread, the sed script has been very stable so the overhead 
of having to maintain two scripts is pretty minimal.


cheers

andrew



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd warning from pg_dump

2016-03-08 Thread Stephen Frost
* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > I think the real question is if "-n '*'" should still exclude
> > 'pg_catalog'.  Fixing the issue with defined pseudo types is wonderful,
> > but aren't you going to end up with a dump you can't restore,
> > regardless?
> 
> Yeah, perhaps so.  The thread on -general has also produced the
> information that pg_dump -t '*' tries to dump system catalogs as if
> they were user tables, which is another pretty useless bit of behavior.

Indeed.

> So maybe we should drop the hunk you've got there (which frankly seems a
> bit of a kluge) and instead hot-wire things so that stuff in pg_catalog
> is excluded even if it would otherwise match the inclusion lists.

An idealistic viewpoint might be that we should provide a way to
actually create defined pseudo-types and then make pg_dump work with
them, but I have a hard time seeing why that would be worth the effort.
One might also argue that we should have a way of dealing with every
type of object which exists and defined psuedo-types seem to be the only
thing left out.

I agree that it seems like the best idea is to just hot-wire pg_dump to
exclude pg_catalog, though I'm inclined to suggest that we just exclude
it from pattern matches.  We know that objects sometimes end up in
pg_catalog which aren't really supposed to be there and there might be
other reasons to want to dump it out, so having 'pg_dump -n pg_catalog'
still do its best to dump out what's been asked for seems reasonable.

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] extend pgbench expressions with functions

2016-03-08 Thread Fabien COELHO


Hello Robert,


Having a look at 33-b, this looks pretty good now, but:

// comments are not allowed.  I'd just remove the two you have.


Oops, C89 did not make it everywhere yet!


It make no sense to exit(1) and then return 0, so don't do that. [...]
This would get rid of the internal-error case here altogether in favor
of testing it via an assertion.


Ok. Fine with me.


I think that coerceToInt() should not exit(1) when an overflow occurs;
instead, I think the function should be declared as bool
coerceToInt(PgBenchValue *pval, int64 *result), and the error return
should be propagated back to the caller, which can then return false
as we do for other error cases.


This is the pain (ugly repeated code) that I wish to avoid, because then 
we cannot write a simple addition for doing an addition, but have to do 
several ugly tests instead. Ok, elegance is probably not a sufficient 
argument against doing that.


Moreover, I do not see any condition in which doing what you suggest makes 
much sense from the user perspective: if there is an internal error in the 
bench code it seems much more logical to ask for the user for a sensible 
bench script, because I would not know how to interpret tps on a bench 
with internal failures in the client code anyway.


For me, exiting immediatly on internal script errors is the right action.

If this is a blocker I'll do them, but I'm convince it is not what should 
be done.



I think that some of the places you've used coerceToInt() are not
appropriate.  Like, for example:

+   if
(coerceToInt(rval) == 0)




+
fprintf(stderr, "division by zero\n");
+   return false;
+   }

Now, if rval is out of range of an integer, that is going to overflow
while trying to see whether it should divide by zero.  Please work a
little harder here and in similar cases.


Ok.


 Maybe add a helper function
checkIntegerEquality(PgBenchValue *, int64).


Maybe.

--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] raw output from copy

2016-03-08 Thread Pavel Stehule
2016-03-04 18:06 GMT+01:00 Pavel Stehule :

>
>
> 2016-03-04 15:54 GMT+01:00 Daniel Verite :
>
>> Corey Huinker wrote:
>>
>> > So, for me, RAW is the right solution, or at least *a* right solution.
>>
>> Questions on how to extract from a bytea column come up on a regular
>> basis, as in [1] [2] [3], or [4] a few days ago, and so far the answers
>> are to encode the contents in text and decode them in an additional
>> step, or use COPY BINARY and filter out the headers.
>>
>> But none of this is as straightforward and efficient as the proposed
>> COPY RAW.
>> Also the conversion to text can't be used at all on very large
>> contents (>512MB), as mentioned in another recent thread [5]
>> (this is the same reason why pg_dump can't dump such rows),
>> but COPY RAW doesn't have this limitation.
>>
>> Technically COPY BINARY should be sufficient, but it seems that
>> people dislike having to deal with its headers.
>>
> Also it's not supported by any of the drivers of popular
>> script languages that otherwise provide COPY in text format
>> (DBD::Pg, php, psycopg2...)
>> Maybe the RAW format would have a better chance to get support
>> there, because of its simplicity.
>>
>
> exactly - I would to decrease dependency on PostgreSQL internals. Working
> with clean content is simple and possible with any environment without
> unclean operations.
>

COPY RAW can be used for import. I am not sure if this use case was tested.

cat image.jpg | psql -c "CREATE TEMP TABLE auxbuf(image bytea); COPY
auxbuf(image) FROM stdin RAW; ..." postgres

Regards

Pavel


> Regards
>
> Pavel
>
>
>>
>> [1]
>>
>> http://www.postgresql.org/message-id/038517CEB6DE43BD8422D7947B6BE8D8@fanliji
>> ng
>>
>> [2] http://www.postgresql.org/message-id/4c8272c4.1000...@arcor.de
>>
>> [3] http://stackoverflow.com/questions/6730729
>>
>> [4]
>> http://www.postgresql.org/message-id/56c66565.50...@consistentstate.com
>>
>> [5] http://www.postgresql.org/message-id/14620.1456851...@sss.pgh.pa.us
>>
>>
>> Best regards,
>> --
>> Daniel Vérité
>> PostgreSQL-powered mailer: http://www.manitou-mail.org
>> Twitter: @DanielVerite
>>
>
>


Re: [HACKERS] WIP: Upper planner pathification

2016-03-08 Thread Tom Lane
I wrote:
> There might be some other things we could do to provide a fast-path for
> particularly trivial cases.

I wanted to look into that before the code or tests had drifted far enough
to make comparisons dubious.  Attached is a simple patch that lets
grouping_planner fall out with a minimum amount of work if the query is
just "SELECT expression(s)", and a couple of scatter plots of regression
test query timing.  The first plot compares pre-pathification timing with
HEAD+this patch, and the second compares HEAD with HEAD+this patch.

I had hoped to see more of a benefit, actually, but it seems like this
might be enough of a win to be worth committing.  Probably the main
argument against it is that we'd have to remember to maintain the list
of things-to-check-before-using-the-fast-path.

Thoughts?

regards, tom lane

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5fc8e5b..151f27f 100644
*** a/src/backend/optimizer/plan/planner.c
--- b/src/backend/optimizer/plan/planner.c
*** grouping_planner(PlannerInfo *root, bool
*** 1458,1463 
--- 1458,1504 
  			parse->sortClause,
  			tlist);
  	}
+ 	else if (parse->rtable == NIL &&
+ 			 parse->commandType == CMD_SELECT &&
+ 			 parse->jointree->quals == NULL &&
+ 			 !parse->hasAggs && !parse->hasWindowFuncs &&
+ 			 parse->groupClause == NIL && parse->groupingSets == NIL &&
+ 			 !root->hasHavingQual &&
+ 			 parse->distinctClause == NIL &&
+ 			 parse->sortClause == NIL &&
+ 			 parse->limitOffset == NULL && parse->limitCount == NULL)
+ 	{
+ 		/*
+ 		 * Trivial "SELECT expression(s)" query.  This case would be handled
+ 		 * correctly by the code below, but it comes up often enough to be
+ 		 * worth having a simplified fast-path for.  Need only create a Result
+ 		 * path with the desired targetlist and shove it into the final rel.
+ 		 */
+ 		Path	   *path;
+ 		double		tlist_rows;
+ 
+ 		/* Need not bother with preprocess_targetlist in a SELECT */
+ 		root->processed_tlist = tlist;
+ 
+ 		final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL);
+ 
+ 		path = (Path *) create_result_path(final_rel,
+ 		   create_pathtarget(root, tlist),
+ 		   NIL);
+ 
+ 		/* We do take the trouble to fix the rows estimate for SRFs, though */
+ 		tlist_rows = tlist_returns_set_rows(tlist);
+ 		if (tlist_rows > 1)
+ 		{
+ 			path->total_cost += path->rows * (tlist_rows - 1) *
+ cpu_tuple_cost / 2;
+ 			path->rows *= tlist_rows;
+ 		}
+ 
+ 		add_path(final_rel, path);
+ 
+ 		return;
+ 	}
  	else
  	{
  		/* No set operations, do regular planning */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd warning from pg_dump

2016-03-08 Thread Alvaro Herrera
Tom Lane wrote:
> Stephen Frost  writes:
> > I think the real question is if "-n '*'" should still exclude
> > 'pg_catalog'.  Fixing the issue with defined pseudo types is wonderful,
> > but aren't you going to end up with a dump you can't restore,
> > regardless?
> 
> Yeah, perhaps so.  The thread on -general has also produced the
> information that pg_dump -t '*' tries to dump system catalogs as if
> they were user tables, which is another pretty useless bit of behavior.
> So maybe we should drop the hunk you've got there (which frankly seems a
> bit of a kluge) and instead hot-wire things so that stuff in pg_catalog
> is excluded even if it would otherwise match the inclusion lists.

Not sure that's reasonable.  We have at least one extension in contrib
that creates objects in pg_catalog.  ISTM that's enough precedent that
more could be created in the future.  (Now of course extensions get
special treatment anyway, but my point is that there's no prohibition
against creating objects in pg_catalog.)

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 12:41 PM, Masahiko Sawada  wrote:
> Attached latest version optimisation patch.
> I'm still consider regarding pg_upgrade regression test code, so I
> will submit that patch later.

I just spent some time looking at this and I'm a bit worried about the
following (existing) comment in vacuumlazy.c:

 * Note: The value returned by visibilitymap_get_status could be slightly
 * out-of-date, since we make this test before reading the corresponding
 * heap page or locking the buffer.  This is OK.  If we mistakenly think
 * that the page is all-visible when in fact the flag's just been cleared,
 * we might fail to vacuum the page.  But it's OK to skip pages when
 * scan_all is not set, so no great harm done; the next vacuum will find
 * them.  If we make the reverse mistake and vacuum a page unnecessarily,
 * it'll just be a no-op.

The patch makes some attempt to update the comment mechanically, but
that's not nearly enough.  That comment is explaining that you *can't*
rely on the visibility map to tell you *for sure* that a page does not
require vacuuming.  For current uses, that's OK, because if we miss a
page we'll pick it up later.  But now we want to skip vacuuming pages
for relfrozenxid/relminmxid advancement, that rationale doesn't apply.
Missing pages that need to be frozen and advancing relfrozenxid anyway
would be _bad_.

However, after some further thought, I think we might actually be OK.
If a page goes from all-frozen to not-all-frozen while VACUUM is
running, any new XID added to the page must be newer than the
oldestXmin value computed by vacuum_set_xid_limits(), so it won't
affect the value to which we can safely set relfrozenxid.  Similarly,
any MXID added to the page will be newer than GetOldestMultiXactId(),
so setting relminmxid is still safe for similar reasons.

I'd appreciate it if any other senior hackers could review that chain
of reasoning.  It would be really bad to get this wrong.

On another note, I didn't really like the way you updated the
documentation.  "eager freezing" doesn't seem like a great term to me,
and I think your changes were a little too localized.  Here's a draft
alternative where I used the term "aggressive vacuum" to describe
freezing all of the pages except for those already known to be
all-frozen.  Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a09ceb2..2f72633 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5984,12 +5984,15 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
   

-VACUUM performs a whole-table scan if the table's
+VACUUM performs an aggressive scan if the table's
 pg_class.relfrozenxid field has reached
-the age specified by this setting.  The default is 150 million
-transactions.  Although users can set this value anywhere from zero to
-two billions, VACUUM will silently limit the effective value
-to 95% of , so that a
+the age specified by this setting.  An aggressive scan differs from
+a regular VACUUM in that it visits every page that might
+contain unfrozen XIDs or MXIDs, not just those that might contain dead
+tuples.  The default is 150 million transactions.  Although users can
+set this value anywhere from zero to two billions, VACUUM
+will silently limit the effective value to 95% of
+, so that a
 periodical manual VACUUM has a chance to run before an
 anti-wraparound autovacuum is launched for the table. For more
 information see
@@ -6028,9 +6031,12 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
   

-VACUUM performs a whole-table scan if the table's
+VACUUM performs an aggressive scan if the table's
 pg_class.relminmxid field has reached
-the age specified by this setting.  The default is 150 million multixacts.
+the age specified by this setting.  An aggressive scan differs from
+a regular VACUUM in that it visits every page that might
+contain unfrozen XIDs or MXIDs, not just those that might contain dead
+tuples.  The default is 150 million multixacts.
 Although users can set this value anywhere from zero to two billions,
 VACUUM will silently limit the effective value to 95% of
 , so that a
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index 5204b34..d742ec9 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -438,22 +438,27 @@

 

-VACUUM normally skips pages that don't have any dead row
-versions, but those pages might still have row versions with old XID
-values.  To ensure all old row versions have been frozen, a
-scan of the whole table is needed.
- 

Re: [HACKERS] extend pgbench expressions with functions

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 11:48 AM, Fabien COELHO  wrote:
> If this is a blocker I'll do them, but I'm convince it is not what should be
> done.

Well, I think it's a blocker.  Exiting within deeply-nested code
instead of propagating the error upward does not strike me as a good
practice.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Odd warning from pg_dump

2016-03-08 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> So maybe we should drop the hunk you've got there (which frankly seems a
>> bit of a kluge) and instead hot-wire things so that stuff in pg_catalog
>> is excluded even if it would otherwise match the inclusion lists.

> Not sure that's reasonable.  We have at least one extension in contrib
> that creates objects in pg_catalog.  ISTM that's enough precedent that
> more could be created in the future.  (Now of course extensions get
> special treatment anyway, but my point is that there's no prohibition
> against creating objects in pg_catalog.)

True, and given the lack of prior complaints, it might be better to
leave well enough alone here.  What the -general thread was actually
suggesting is that pg_dump needs a way to forcibly omit blobs; the
question about behavior of the pattern-match switches was a sideshow.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Tom Lane
Robert Haas  writes:
> The patch makes some attempt to update the comment mechanically, but
> that's not nearly enough.  That comment is explaining that you *can't*
> rely on the visibility map to tell you *for sure* that a page does not
> require vacuuming.  For current uses, that's OK, because if we miss a
> page we'll pick it up later.  But now we want to skip vacuuming pages
> for relfrozenxid/relminmxid advancement, that rationale doesn't apply.
> Missing pages that need to be frozen and advancing relfrozenxid anyway
> would be _bad_.

Check.

> However, after some further thought, I think we might actually be OK.
> If a page goes from all-frozen to not-all-frozen while VACUUM is
> running, any new XID added to the page must be newer than the
> oldestXmin value computed by vacuum_set_xid_limits(), so it won't
> affect the value to which we can safely set relfrozenxid.  Similarly,
> any MXID added to the page will be newer than GetOldestMultiXactId(),
> so setting relminmxid is still safe for similar reasons.

Yeah, I agree with this, as long as the issue is only that the visibility
map result is slightly stale and not that it's, say, not crash-safe.
We can reasonably assume that any newly-added XID must be one that was
in progress while VACUUM was running, and hence will be after the xmin
horizon we computed earlier.  This requires the existence of a read
barrier somewhere between computing xmin horizon and inspecting the
visibility map, but I find it hard to believe there aren't plenty.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] JPUG wants to have a copyright notice on the translated doc

2016-03-08 Thread Josh berkus
On 03/04/2016 06:01 PM, Tatsuo Ishii wrote:

> I imagine kind of an extream case: a bad guy removes "Copyright
> 1996-2016 The PostgreSQL Global Development Group" and replaces it
> with his/her copyright.

The PostgreSQL license does not permit that; you have to retain the
original copyright notice.  You can *add* whatever you want.


-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Masahiko Sawada
On Wed, Mar 9, 2016 at 1:23 AM, Jeff Janes  wrote:
> On Tue, Mar 8, 2016 at 5:30 AM, Robert Haas  wrote:

> I left out the relkind check from the final commit because, for one
> thing, the check you added isn't actually right: toast relations can
> also have a visibility map.  And also, I'm sort of wondering what the
> point of that check is.  What does it protect us from?  It doesn't
> seem very future-proof ... what if we add a new relkind in the future?
>  Do we really want to have to update this?
>
> How about instead changing things so that we specifically reject
> indexes?  And maybe some kind of a check that will reject anything
> that lacks a relfilnode?  That seems like it would be more on point.
>

I agree, I don't have strong opinion about this.
It would be good to add condition for rejecting only indexes.
Attached patches are,
 - Change heap2 rmgr description
 - Add condition to pg_visibility
 - Fix typo in pgvisibility.sgml
(Sorry for the late notice..)

Regards,

--
Masahiko Sawada
diff --git a/doc/src/sgml/pgvisibility.sgml b/doc/src/sgml/pgvisibility.sgml
index 6a98b55..cdd6a6f 100644
--- a/doc/src/sgml/pgvisibility.sgml
+++ b/doc/src/sgml/pgvisibility.sgml
@@ -21,7 +21,7 @@
   until such time as a tuple is inserted, updated, deleted, or locked on
   that page.  The page-level PD_ALL_VISIBLE bit has the
   same meaning as the all-visible bit in the visibility map, but is stored
-  within the data page itself rather than a separate data tructure.  These
+  within the data page itself rather than a separate data structure.  These
   will normally agree, but the page-level bit can sometimes be set while the
   visibility map bit is clear after a crash recovery; or they can disagree
   because of a change which occurs after pg_visibility examines
diff --git a/contrib/pg_visibility/pg_visibility.c b/contrib/pg_visibility/pg_visibility.c
index 5e5c7cc..c916d0d 100644
--- a/contrib/pg_visibility/pg_visibility.c
+++ b/contrib/pg_visibility/pg_visibility.c
@@ -56,6 +56,13 @@ pg_visibility_map(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid block number")));
 
+	/* Check for relation type */
+	if (rel->rd_rel->relkind == RELKIND_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot use for index \"%s\"",
+		RelationGetRelationName(rel;
+
 	tupdesc = pg_visibility_tupdesc(false, false);
 	MemSet(nulls, 0, sizeof(nulls));
 
@@ -95,6 +102,13 @@ pg_visibility(PG_FUNCTION_ARGS)
 (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
  errmsg("invalid block number")));
 
+	/* Check for relation type */
+	if (rel->rd_rel->relkind == RELKIND_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot use for index \"%s\"",
+		RelationGetRelationName(rel;
+
 	tupdesc = pg_visibility_tupdesc(false, true);
 	MemSet(nulls, 0, sizeof(nulls));
 
@@ -226,6 +240,13 @@ pg_visibility_map_summary(PG_FUNCTION_ARGS)
 	rel = relation_open(relid, AccessShareLock);
 	nblocks = RelationGetNumberOfBlocks(rel);
 
+	/* Check for relation type */
+	if (rel->rd_rel->relkind == RELKIND_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot use for index \"%s\"",
+		RelationGetRelationName(rel;
+
 	for (blkno = 0; blkno < nblocks; ++blkno)
 	{
 		int32		mapbits;
@@ -300,6 +321,13 @@ collect_visibility_data(Oid relid, bool include_pd)
 
 	rel = relation_open(relid, AccessShareLock);
 
+	/* Check for relation type */
+	if (rel->rd_rel->relkind == RELKIND_INDEX)
+		ereport(ERROR,
+(errcode(ERRCODE_WRONG_OBJECT_TYPE),
+ errmsg("cannot use for index \"%s\"",
+		RelationGetRelationName(rel;
+
 	nblocks = RelationGetNumberOfBlocks(rel);
 	info = palloc0(offsetof(vbits, bits) + nblocks);
 	info->next = 0;
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index a63162c..2b31ea4 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -125,7 +125,8 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
 
-		appendStringInfo(buf, "cutoff xid %u", xlrec->cutoff_xid);
+		appendStringInfo(buf, "cutoff xid %u flags %d",
+		 xlrec->cutoff_xid, xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 12:49 PM, Tom Lane  wrote:
>> However, after some further thought, I think we might actually be OK.
>> If a page goes from all-frozen to not-all-frozen while VACUUM is
>> running, any new XID added to the page must be newer than the
>> oldestXmin value computed by vacuum_set_xid_limits(), so it won't
>> affect the value to which we can safely set relfrozenxid.  Similarly,
>> any MXID added to the page will be newer than GetOldestMultiXactId(),
>> so setting relminmxid is still safe for similar reasons.
>
> Yeah, I agree with this, as long as the issue is only that the visibility
> map result is slightly stale and not that it's, say, not crash-safe.

If the visibility map isn't crash safe, we've got big problems even
without this patch, but we dealt with that when index-only scans went
in.  Maybe this patch introduces more stringent requirements in this
area, but I can't think of any reason why that should be true.  If
anything occurs to you (or anyone else), it would be good to mention
that before I go further destroy the world.

> We can reasonably assume that any newly-added XID must be one that was
> in progress while VACUUM was running, and hence will be after the xmin
> horizon we computed earlier.  This requires the existence of a read
> barrier somewhere between computing xmin horizon and inspecting the
> visibility map, but I find it hard to believe there aren't plenty.

I'll check that, but I agree that it should be OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Shulgin, Oleksandr
On Tue, Mar 8, 2016 at 3:36 PM, Joel Jacobson  wrote:

> Hi Alex,
>
> Thanks for excellent research.
>

Joel,

Thank you for spending your time to run these :-)

I've ran your queries against Trustly's production database and I can
> confirm your findings, the results are similar:
>
> WITH ...
> SELECT count(1),
>min(hist_ratio)::real,
>avg(hist_ratio)::real,
>max(hist_ratio)::real,
>stddev(hist_ratio)::real
>   FROM stats2
>  WHERE histogram_bounds IS NOT NULL;
>
> -[ RECORD 1 ]
> count  | 2814
> min| 0.193548
> avg| 0.927357
> max| 1
> stddev | 0.164134
>
>
> WHERE distinct_hist < num_hist
> -[ RECORD 1 ]
> count  | 624
> min| 0.193548
> avg| 0.672407
> max| 0.990099
> stddev | 0.194901
>
>
> WITH ..
> SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited
> WHEN TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
>n_distinct, null_frac,
>num_mcv, most_common_vals, most_common_freqs,
>mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
>distinct_hist, num_hist, hist_ratio,
>histogram_bounds
>   FROM stats2
>  ORDER BY hist_ratio
>  LIMIT 1;
>
>  -[ RECORD 1
> ]-+-
> columnname| public.x.y
> n_distinct| 103
> null_frac | 0
> num_mcv   | 10
> most_common_vals  | {0,1,2,3,4,5,6,7,8,9}
> most_common_freqs |
>
> {0.4765,0.141733,0.1073,0.0830667,0.0559667,0.037,0.0251,0.0188,0.0141,0.0113667}
> mcv_frac  | 0.971267
> nonnull_mcv_frac  | 0.971267
> distinct_hist | 18
> num_hist  | 93
> hist_ratio| 0.193548387096774
> histogram_bounds  |
>
> {10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,14,14,14,14,14,15,15,15,15,16,16,16,16,21,23,5074,5437,5830,6049,6496,7046,7784,14629,21285}
>

I don't want to be asking for too much here, but is there a chance you
could try the effects of the proposed patch on an offline copy of your
database?

Do you envision or maybe have experienced problems with query plans
referring to the columns that are near the top of the above hist_ratio
report?  In other words: what are the practical implications for you with
the values being duplicated rather badly throughout the histogram like in
the example you shown?

Thank you!
--
Alex


Re: [HACKERS] Freeze avoidance of very large table.

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 12:59 PM, Masahiko Sawada  wrote:
>> How about instead changing things so that we specifically reject
>> indexes?  And maybe some kind of a check that will reject anything
>> that lacks a relfilnode?  That seems like it would be more on point.
>
> I agree, I don't have strong opinion about this.
> It would be good to add condition for rejecting only indexes.
> Attached patches are,
>  - Change heap2 rmgr description
>  - Add condition to pg_visibility
>  - Fix typo in pgvisibility.sgml
> (Sorry for the late notice..)

OK, committed the first and last of those.  I think the other one
needs some work yet; the error message doesn't seem like it is quite
our usual style, and if we're going to do something here we should
probably also insert a check to throw a better error when there is no
relfilenode.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optimizer questions

2016-03-08 Thread Konstantin Knizhnik

On 03/08/2016 07:01 AM, Tom Lane wrote:

Konstantin Knizhnik  writes:

Attached please find improved version of the optimizer patch for LIMIT clause.

This patch isn't anywhere close to working after 3fc6e2d7f5b652b4.
(TBH, the reason I was negative about this upthread is that I had that
one in the oven and knew it would conflict spectacularly.)  I encourage
you to think about how an optimization of this sort could be made to
work in a non-kluge fashion in the new code structure.

I've not spent a lot of time on this, but I think maybe what would make
sense is to consider both the case where function calculations are
postponed to after ORDER BY and the case where they aren't, and generate
Paths for both.  Neither approach is a slam-dunk win.  For example,
suppose that one of the tlist columns is md5(wide_column) --- it will
likely not be preferable to pass the wide column data through the sort
step rather than reducing it to a hash first.  This would require some
work in grouping_planner to track two possible pathtargets, and work in
create_ordered_paths to generate paths for both possibilities.  A possible
objection is that this would add planning work even when no real benefit
is possible; so maybe we should only consider the new way if the tlist has
significant eval cost?  Not sure about that.  There is also something
to be said for the idea that we should try to guarantee consistent
semantics when the tlist contains volatile functions.

For now, I've set this commitfest entry to Waiting on Author.  There's
still time to consider a rewrite in this 'fest, if you can get it done
in a week or two.

regards, tom lane


Attached please find rebased patch.
Unfortunately 3fc6e2d7f5b652b4 still doesn't fix the problem with "lazy" 
evaluation of target list.
This is why my patch is still useful. But frankly speaking I am not sure that 
it is best way of fixing this problem,
because it takes in account only one case: sort+limit. May be the same 
optimization can be useful for other queries.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5fc8e5b..709d1ad 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -126,8 +126,9 @@ static RelOptInfo *create_ordered_paths(PlannerInfo *root,
 	 RelOptInfo *input_rel,
 	 double limit_tuples);
 static PathTarget *make_scanjoin_target(PlannerInfo *root, List *tlist,
-	 AttrNumber **groupColIdx);
+		AttrNumber **groupColIdx, bool* splitted_projection);
 static int	get_grouping_column_index(Query *parse, TargetEntry *tle);
+static int	get_sort_column_index(Query *parse, TargetEntry *tle);
 static List *postprocess_setop_tlist(List *new_tlist, List *orig_tlist);
 static List *select_active_windows(PlannerInfo *root, WindowFuncLists *wflists);
 static List *make_windowInputTargetList(PlannerInfo *root,
@@ -1381,6 +1382,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	RelOptInfo *current_rel;
 	RelOptInfo *final_rel;
 	ListCell   *lc;
+	bool splitted_projection = false;
 
 	/* Tweak caller-supplied tuple_fraction if have LIMIT/OFFSET */
 	if (parse->limitCount || parse->limitOffset)
@@ -1657,7 +1659,7 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 		 * that were obtained within query_planner().
 		 */
 		sub_target = make_scanjoin_target(root, tlist,
-		  &groupColIdx);
+		  &groupColIdx, &splitted_projection);
 
 		/*
 		 * Forcibly apply that tlist to all the Paths for the scan/join rel.
@@ -1801,6 +1803,13 @@ grouping_planner(PlannerInfo *root, bool inheritance_update,
 	{
 		Path	   *path = (Path *) lfirst(lc);
 
+		if (splitted_projection)
+		{			
+			path = apply_projection_to_path(root, current_rel,
+			path, create_pathtarget(root, tlist));
+		}
+
+
 		/*
 		 * If there is a FOR [KEY] UPDATE/SHARE clause, add the LockRows node.
 		 * (Note: we intentionally test parse->rowMarks not root->rowMarks
@@ -3775,15 +3784,17 @@ create_ordered_paths(PlannerInfo *root,
 static PathTarget *
 make_scanjoin_target(PlannerInfo *root,
 	 List *tlist,
-	 AttrNumber **groupColIdx)
+	 AttrNumber **groupColIdx,
+	 bool* splitted_projection)
 {
 	Query	   *parse = root->parse;
-	List	   *sub_tlist;
-	List	   *non_group_cols;
+	List	   *sub_tlist = NIL;
+	List	   *non_group_cols = NIL;
 	List	   *non_group_vars;
 	int			numCols;
 
 	*groupColIdx = NULL;
+	*splitted_projection = false;
 
 	/*
 	 * If we're not grouping or aggregating or windowing, there's nothing to
@@ -3791,14 +3802,66 @@ make_scanjoin_target(PlannerInfo *root,
 	 */
 	if (!parse->hasAggs && !parse->groupClause && !parse->groupingSets &&
 		!root->hasHavingQual && !parse->hasWindowFuncs)
+	{
+		if (parse->sortClause && limit_needed(parse)) {
+			ListCell   *tl;
+			bool contains_non_vars = false;
+			fore

Re: [HACKERS] Getting sorted data from foreign server for merge join

2016-03-08 Thread Robert Haas
On Thu, Jan 7, 2016 at 4:05 AM, Ashutosh Bapat
 wrote:
> In get_useful_ecs_for_relation(), while checking whether to use left or
> right argument of a mergejoinable operator, the arguments to bms_is_subset()
> are passed in reverse order. bms_is_subset() checks whether the first
> argument in subset of the second, but in this function the subset to be
> checked is passed as the second argument. Because of this following query
> when run in contrib_regression database after "make installcheck" in
> contrib/postgres_fdw trips assertion Assert(bms_is_subset(relids,
> restrictinfo->left_ec->ec_relids));
>
> EXPLAIN (COSTS false, VERBOSE)
> SELECT t1."C 1" FROM "S 1"."T 1" t1 left join ft1 t2 join ft2 t3 on
> (t2.c1 = t3.c1) on (t3.c1 = t1."C 1") OFFSET 100 LIMIT 10;
>
> PFA patch to fix it.

The test case failed for me, possibly because of Tom's upper planner
pathification, but the substantive part of the fix looks right to me,
so committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pushing down sorted joins

2016-03-08 Thread Robert Haas
On Tue, Feb 23, 2016 at 7:48 AM, Ashutosh Bapat
 wrote:
> Rushabh pointed out that declarations of helper functions
> get_useful_ecs_for_relation and get_useful_pathkeys_for_relation() are part
> of FDW routines declarations rather than helper function declaration. Since
> those functions are related to this patch, the attached patch moves those
> declaration in their right place.

This patch needs to be rebased.

+   /*
+* TODO: we should worry about EPQ path but should
that path have
+* pathkeys? I guess, that's not really important
since it's just
+* going to evaluate the join from whole-row
references stuffed in the
+* corresponding EPQ slots, for which the order doesn't matter.
+*/

The pathkeys for the EPQ path don't matter.  It'll only be called to
recheck one single row, and there's only one order in which you can
return one row.

-   if (bms_equal(em->em_relids, rel->relids))
+   if (bms_is_subset(em->em_relids, rel->relids))

Why do we need this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor documentation tweak to GetForeignPlan documentation

2016-03-08 Thread Robert Haas
On Fri, Jan 15, 2016 at 6:20 AM, Etsuro Fujita
 wrote:
> Attached patch makes minor modification to the GetForeignPlan
> documentation.  This adds the description about outer_plan, the new
> parameter added in 9.5.

Good catch.  Committed and back-patched to 9.5.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing to run a buildfarm animal under valgrind

2016-03-08 Thread Andres Freund
On 2016-03-08 08:58:22 -0500, Andrew Dunstan wrote:
> On 03/07/2016 08:39 PM, Andres Freund wrote:
> >Does anybody have a better idea about how to do this?
> 
> Why not just create a make target which does this? It could be run after
> 'make' and before 'make check'. I would make it assume valgrind was
> installed and in the path rather than using vg-in-place.

I don't really see how that'd work. make check et al. start postgres via
pg_ctl, so we need to invoke valgrind from there. Additionally I don't
want to just be able to run make check via valgrind, but all contrib
modules et al too, including eventually the replication regression tests
and such.

> If that doesn't work and you want to do something with the buildfarm,
> probably a buildfarm module would be the way to go. We might need to add a
> new module hook to support it, to run right after make(), but that would be
> a one line addition to run_build.pl.

Yea, I think that's what it probably has to be. What I'm decidedly
unhappy with right now is that this seems to require moving make install
up, or manually add a new file for installation. The reason for that is
that if we replace the postgres binary with the wrapper, the original
file obviously doesn't get installed anymore. So it's invoked at it's
original location; which only works if share files are installed in the
correct location.


I do wonder if adding a PGCTLPOSTGRESWRAPPER or something to pg_ctl
would be ok. That'd just supplant calling the postgres binary, making
all this easier.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Minor comment update in setrefs.c

2016-03-08 Thread Robert Haas
On Fri, Jan 15, 2016 at 5:36 AM, Etsuro Fujita
 wrote:
> The point in the previous patch was to update the list of expressions to be
> adjusted for the case of scanrelid=0 like that for the case of scanrelid>0
> case in set_foreignscan_references.  So, I'd like to propose to add
> *fdw_recheck_quals* to both lists, then.  Updated patch attached.

OK, sure.  Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Robert Haas
On Wed, Jan 20, 2016 at 5:09 PM, Tom Lane  wrote:
> Alvaro Herrera  writes:
>> Tom Lane wrote:
>>> "Shulgin, Oleksandr"  writes:
 This post summarizes a few weeks of research of ANALYZE statistics
 distribution on one of our bigger production databases with some real-world
 data and proposes a patch to rectify some of the oddities observed.
>
>>> Please add this to the 2016-01 commitfest ...
>
>> Tom, are you reviewing this for the current commitfest?
>
> Um, I would like to review it, but I doubt I'll find time before the end
> of the month.

Tom, can you pick this up?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Relation extension scalability

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 11:20 AM, Dilip Kumar  wrote:
> I have come up with this patch..
>
> If this approach looks fine then I will prepare final patch (more comments,
> indentation, and improve some code) and do some long run testing (current
> results are 5 mins run).
>
> Idea is same what Robert and Amit suggested up thread.

So this seems promising, but I think the code needs some work.

LockWaiterCount() bravely accesses a shared memory data structure that
is mutable with no locking at all.  That might actually be safe for
our purposes, but I think it would be better to take the partition
lock in shared mode if it doesn't cost too much performance.  If
that's too expensive, then it should at least have a comment
explaining (1) that it is doing this without the lock and (2) why
that's safe (sketch: the LOCK can't go away because we hold it, and
nRequested could change but we don't mind a stale value, and a 32-bit
read won't be torn).

A few of the other functions in here also lack comments, and perhaps
should have them.

The changes to RelationGetBufferForTuple need a visit from the
refactoring police.  Look at the way you are calling
RelationAddOneBlock.  The logic looks about like this:

if (needLock)
{
  if (trylock relation for extension)
RelationAddOneBlock();
  else
  {
lock relation for extension;
if (use fsm)
{
  complicated;
}
else
  RelationAddOneBlock();
}
else
  RelationAddOneBlock();

So basically you want to do the RelationAddOneBlock() thing if
!needLock || !use_fsm || can't trylock.  See if you can rearrange the
code so that there's only one fallthrough call to
RelationAddOneBlock() instead of three separate ones.

Also, consider moving the code that adds multiple blocks at a time to
its own function instead of including it all in line.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Joel Jacobson
On Wed, Mar 9, 2016 at 1:25 AM, Shulgin, Oleksandr
 wrote:
> Thank you for spending your time to run these :-)

n/p, it took like 30 seconds :-)

> I don't want to be asking for too much here, but is there a chance you could
> try the effects of the proposed patch on an offline copy of your database?

Yes, I think that should be possible.

> Do you envision or maybe have experienced problems with query plans
> referring to the columns that are near the top of the above hist_ratio
> report?  In other words: what are the practical implications for you with
> the values being duplicated rather badly throughout the histogram like in
> the example you shown?

I don't know much about the internals of query planner,
I just read the "57.1. Row Estimation Examples" to get a basic understanding.

If I understand it correctly, if the histogram_bounds contains a lot
of duplicated values,
then the row estimation will be inaccurate, which in turn will trick
the query planner
into a sub-optimal plan?

We've had some problems lately with the query planner, or actually we've always
had them but never noticed them nor cared about them, but now during peak times
we've had short periods where we haven't been able to fully cope up
with the traffic.

I tracked down the most self_time-consuming functions and quickly saw
how to optimize them.
Many of them where on the form:
SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND Col2
= [some constant value] AND Col3 = [some other constant value]
The number of rows matching the WHERE clause were very tiny, perfect
match for a partial index:
CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2 = [some
constant value] AND Col3 = [some other constant value];

Even though the new partial index matched the query perfectly, the
query planner didn't want to use it. Instead it continued to use some
other sub-optimal index.

The only way to force it to use the correct index was to use the
"+0"-trick which I recently learned from one of my colleagues:
SELECT .. FROM SomeBigTable WHERE Col1 = [some dynamic value] AND
Col2+0 = [some constant value] AND Col3+0 = [some other constant
value]
CREATE INDEX .. ON SomeBigTable USING btree (Col1) WHERE Col2+0 =
[some constant value] AND Col3+0 = [some other constant value];

By adding +0 to the columns, the query planner will as I understand it
be extremely motivated to use the correct index, as otherwise it would
have to do a seq scan on the entire big table, which would be very
costly.

I'm glad the trick worked, now the system is fast again.

We're still on 9.1, so maybe these problems will go away once we upgrade to 9.5.

I don't know if these problems I described can be fixed by your patch,
but I wanted to share this story since I know our systems (Trustly's
and Zalando's) are quite similar in design,
so maybe you have experienced something similar.

(Side note: My biggest wish would be some way to specify explicitly on
a per top-level function level a list of indexes the query planner is
allowed to consider or is NOT allowed to consider.)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Alvaro Herrera
Shulgin, Oleksandr wrote:

> Alright.  I'm attaching the latest version of this patch split in two
> parts: the first one is NULLs-related bugfix and the second is the
> "improvement" part, which applies on top of the first one.

I went over patch 0001 and it seems pretty reasonable.  It's missing
some comment updates -- at least the large comments that talk about Duj1
should be modified to indicate why the code is now subtracting the null
count.  Also, I can't quite figure out why the "else" now in line 2131
is now "else if track_cnt != 0".  What happens if track_cnt is zero?
The comment above the "if" block doesn't provide any guidance.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 11:18 PM, Tom Lane  wrote:
> Craig Ringer  writes:
>> On 8 March 2016 at 08:56, Igal @ Lucee.org  wrote:
>>> I'm not sure why it was not accepted at the end?
>
>> The biggest issue, though it might not be clear from that thread, is that
>> what exactly it means to "return generated keys" is poorly defined by JDBC,
>> and not necessarily the same thing as "return the PRIMARY KEY".
>>
>> Should we return the DEFAULT on a UNIQUE column, for example?
>>
>> IMO other vendors' drivers should be tested for behaviour in a variety of
>> cases.
>
> Yeah.  It was asserted in the earlier thread that other vendors implement
> this feature as "return the pkey", but that seems to conflict with the
> plain language of the JDBC spec: generated columns are an entirely
> different thing than primary key columns.  So really what I'd like to see
> is some work on surveying other implementations to confirm exactly what
> behavior they implement.  If we're to go against what the spec seems to
> say, I want to see a whole lot of evidence that other people do it
> consistently in a different way.

I agree that some research should be done on how this works in other
systems, but I think we have a general problem with the server lacking
certain capabilities that make it easy to implement a high-quality
JDBC driver.  And I think it would be good to work on figuring out how
to fix that.  I feel that some of the replies on this thread were
rather hostile considering that the goal -- good connectors for the
database server -- is extremely important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: Generic WAL logical messages

2016-03-08 Thread Artur Zakirov

I think here


+const char *
+logicalmsg_identify(uint8 info)
+{
+   if (info & ~XLR_INFO_MASK == XLOG_LOGICAL_MESSAGE)
+   return "MESSAGE";
+
+   return NULL;
+}


we should use brackets

const char *
logicalmsg_identify(uint8 info)
{
if ((info & ~XLR_INFO_MASK) == XLOG_LOGICAL_MESSAGE)
return "MESSAGE";

return NULL;
}

Because of operator priorities 
http://en.cppreference.com/w/c/language/operator_precedence we may get 
errors.


On 01.03.2016 00:10, Petr Jelinek wrote:

Hi,

attached is the newest version of the patch.

I removed the registry, renamed the 'send' to 'emit', documented the
callback parameters properly. I also added the test to ddl.sql for the
serialization and deserialization (and of course found a bug there) and
in general fixed all the stuff Andres reported.

(see more inline)


--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-03-08 Thread Fabien COELHO


Hello Robert.

Here is a v34 b & c.


// comments are not allowed.  I'd just remove the two you have.


Back to the eighties!


It make no sense to exit(1) and then return 0, so don't do that.  I
might write this code as:
This would get rid of the internal-error case here altogether in favor
of testing it via an assertion.


I've put assertions instead of exit in some places.


I think that coerceToInt() should not exit(1) when an overflow occurs;


I think that it should, because the only sane option for the user is to 
fix the script and relaunch the bench: counting errors has no added value 
for the user.


The attached version does some error handling instead, too bad.


Now, if rval is out of range of an integer, that is going to overflow
while trying to see whether it should divide by zero.


I could not find a place where there where such potential issue. If the 
value is zero, it cannot overflow when cast to int. If it is not zero but 
it overflows, then it is an overflow, so it should overflow. Maybe I 
misunderstood your point.


--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index cc80b3f..2133bf7 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -794,9 +794,10 @@ pgbench  options  dbname
 
 
  
-  Sets variable varname to an integer value calculated
+  Sets variable varname to a value calculated
   from expression.
   The expression may contain integer constants such as 5432,
+  double constants such as 3.14159,
   references to variables :variablename,
   and expressions composed of unary (-) or binary operators
   (+, -, *, /,
@@ -809,7 +810,7 @@ pgbench  options  dbname
   Examples:
 
 \set ntellers 10 * :scale
-\set aid (1021 * :aid) % (10 * :scale) + 1
+\set aid (1021 * random(1, 10 * :scale)) % (10 * :scale) + 1
 
 

@@ -829,66 +830,35 @@ pgbench  options  dbname
  
 
  
-  By default, or when uniform is specified, all values in the
-  range are drawn with equal probability.  Specifying gaussian
-  or  exponential options modifies this behavior; each
-  requires a mandatory parameter which determines the precise shape of the
-  distribution.
- 
+  
+   
+
+ \setrandom n 1 10 or \setrandom n 1 10 uniform
+ is equivalent to \set n random(1, 10) and uses a uniform
+ distribution.
+
+   
 
- 
-  For a Gaussian distribution, the interval is mapped onto a standard
-  normal distribution (the classical bell-shaped Gaussian curve) truncated
-  at -parameter on the left and +parameter
-  on the right.
-  Values in the middle of the interval are more likely to be drawn.
-  To be precise, if PHI(x) is the cumulative distribution
-  function of the standard normal distribution, with mean mu
-  defined as (max + min) / 2.0, with
-
- f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
-(2.0 * PHI(parameter) - 1.0)
-
-  then value i between min and
-  max inclusive is drawn with probability:
-  f(i + 0.5) - f(i - 0.5).
-  Intuitively, the larger parameter, the more
-  frequently values close to the middle of the interval are drawn, and the
-  less frequently values close to the min and
-  max bounds. About 67% of values are drawn from the
-  middle 1.0 / parameter, that is a relative
-  0.5 / parameter around the mean, and 95% in the middle
-  2.0 / parameter, that is a relative
-  1.0 / parameter around the mean; for instance, if
-  parameter is 4.0, 67% of values are drawn from the
-  middle quarter (1.0 / 4.0) of the interval (i.e. from
-  3.0 / 8.0 to 5.0 / 8.0) and 95% from
-  the middle half (2.0 / 4.0) of the interval (second and
-  third quartiles). The minimum parameter is 2.0 for
-  performance of the Box-Muller transform.
- 
+  
+   
+\setrandom n 1 10 exponential 3.0 is equivalent to
+\set n random_exponential(1, 10, 3.0) and uses an
+exponential distribution.
+   
+  
 
- 
-  For an exponential distribution, parameter
-  controls the distribution by truncating a quickly-decreasing
-  exponential distribution at parameter, and then
-  projecting onto integers between the bounds.
-  To be precise, with
-
-f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter))
-
-  Then value i between min and
-  max inclusive is drawn with probability:
-  f(x) - f(x + 1).
-  Intuitively, the larger parameter, the more
-  frequently values close to min are accessed, and the
-  less frequently values close to max are accessed.
-  The closer to 0 parameter, the flatter (more uniform)
-  the access distribution.
-  A crude approximation of the distribution is that the most frequent 1%
-  values in the range, close to min, are drawn
-  parameter% o

Re: [HACKERS] empty array case in plperl_ref_from_pg_array not handled correctly

2016-03-08 Thread Oleksii Kliukin

> On 08 Mar 2016, at 10:11, Alex Hunsaker  wrote:
> 
> 
> 
> On Mon, Mar 7, 2016 at 11:32 PM, Andres Freund  > wrote:
> Hi,
> 
> Per the new valgrind animal we get:
> 
> http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink&dt=2016-03-08%2004%3A22%3A00
> 2016-03-08 
> 
>  05:56:05.566 UTC [56de6971.723:5] LOG:  statement: select 
> plperl_sum_array('{}');
> ==1827== Invalid write of size 4
> ==1827==at 0x14E35DD1: plperl_ref_from_pg_array (plperl.c:1459)
> 
> 
> [ I think you may have meant to CC me not Alexey K. I'm probably the person 
> responsible :D. ]
> 
> Indeed, I think the simplest fix is to make plperl_ref_from_pg_array() return 
> an "empty" array in that case. The attached fixes the valgrind warning for 
> me. (cassert enabled build on master).

Looks good to me, thank you. Judging from the size in the error message it’s 
likely the

info->nelems[0] = items

line that caused this issue. The patch fixes it at first glance, although I 
have yet to make my valgrind setup on OS X working to check this for real :-)

Kind regards,
--
Oleksii



Re: [HACKERS] Allowing to run a buildfarm animal under valgrind

2016-03-08 Thread Alvaro Herrera
Andres Freund wrote:

> I do wonder if adding a PGCTLPOSTGRESWRAPPER or something to pg_ctl
> would be ok. That'd just supplant calling the postgres binary, making
> all this easier.

This seems a reasonably principled way to go about this.  Eventually we
might plug other things in it ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allowing to run a buildfarm animal under valgrind

2016-03-08 Thread Andres Freund
On 2016-03-08 18:24:23 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> 
> > I do wonder if adding a PGCTLPOSTGRESWRAPPER or something to pg_ctl
> > would be ok. That'd just supplant calling the postgres binary, making
> > all this easier.
> 
> This seems a reasonably principled way to go about this.  Eventually we
> might plug other things in it ...

It's not that easy to write such a wrapper though - you *have* to exec
the final binary, because -w assumes that the child pid is going to
appear in postmaster.pid...

To be actually useful we kinda would have to backpatch this to 9.4
(where the valgrind hardening stuff started)...

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] More stable query plans via more predictable column statistics

2016-03-08 Thread Tom Lane
Robert Haas  writes:
> On Wed, Jan 20, 2016 at 5:09 PM, Tom Lane  wrote:
>> Um, I would like to review it, but I doubt I'll find time before the end
>> of the month.

> Tom, can you pick this up?

Yes, now that I've gotten out from under the pathification thing,
I have cycles for patch review.  I'll take this one.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Parallel Aggregate

2016-03-08 Thread David Rowley
On 9 March 2016 at 04:06, Robert Haas  wrote:
> On Mon, Mar 7, 2016 at 5:15 PM, David Rowley
>  wrote:
>> My concerns are:
>> 1. Since there's no cheapest_partial_path in RelOptInfo the code is
>> currently considering every partial_path for parallel hash aggregate.
>> With normal aggregation we only ever use the cheapest path, so this
>> may not be future proof. As of today we do only have at most one
>> partial path in the list, but there's no reason to code this with that
>> assumption. I didn't put in much effort to improve this as I see code
>> in generate_gather_paths() which also makes assumptions about there
>> just being 1 partial path. Perhaps we should expand RelOptInfo to
>> track the cheapest partial path? or maybe allpaths.c should have a
>> function to fetch the cheapest out of the list?
>
> The first one in the list will be the cheapest; why not just look at
> that?  Sorted partial paths are interesting if some subsequent path
> construction step can make use of that sort ordering, but they're
> never interesting from the point of view of matching the query's
> pathkeys because of the fact that Gather is order-destroying.

In this case a sorted partial path is useful as the partial agg node
sits below Gather. The sorted input is very interesting for the
partial agg node with a strategy of AGG_SORTED. In most cases with
parallel aggregate it's the partial stage that will take the most
time, so if we do get pre-sorted partial paths, this will be very good
indeed for parallel agg.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal: RETURNING primary_key()

2016-03-08 Thread Igal @ Lucee.org

On 3/8/2016 12:12 PM, Robert Haas wrote:

I agree that some research should be done on how this works in other
systems, but I think we have a general problem with the server lacking
certain capabilities that make it easy to implement a high-quality
JDBC driver.  And I think it would be good to work on figuring out how
to fix that.
I will try to gather more information about the other DBMSs and drivers 
and will post my findings here when I have them.


Best,


Igal


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-03-08 Thread Robert Haas
On Sun, Jan 31, 2016 at 8:33 AM, Vik Fearing  wrote:
> Attached is a rebased and revised version of my
> idle_in_transaction_session_timeout patch from last year.
>
> This version does not suffer the problems the old one did where it would
> jump out of SSL code thanks to Andres' patch in commit
> 4f85fde8eb860f263384fffdca660e16e77c7f76.
>
> The basic idea is if a session remains idle in a transaction for longer
> than the configured time, that connection will be dropped thus releasing
> the connection slot and any locks that may have been held by the broken
> client.
>
> Added to the March commitfest.

I see this patch has been marked Ready for Committer despite the lack
of any really substantive review.  Generally, I think it looks good.
But I have a couple of questions/comments:

- I really wonder if the decision to ignore sessions that are idle in
transaction (aborted) should revisited.  Consider this:

rhaas=# begin;
BEGIN
rhaas=# lock table pg_class;
LOCK TABLE
rhaas=# savepoint a;
SAVEPOINT
rhaas=# select 1/0;
ERROR:  division by zero

- I wonder if the documentation should mention potential advantages
related to holding back xmin.

- What's the right order of events in PostgresMain?  Right now the
patch disables the timeout after checking for interrupts and clearing
DoingCommandRead, but maybe it would be better to disable the timeout
first, so that we can't have it happen that start to execute the
command and then, in medias res, bomb out because of the idle timeout.
Then again, maybe you had some compelling reason for doing it this
way, in which case we should document that in the comments.

- It would be nice if you reviewed someone else's patch in turn.

I'm attaching a lightly-edited version of your patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index a09ceb2..cdd5d77 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -5976,6 +5976,30 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;
   
  
 
+ 
+  idle_in_transaction_session_timeout (integer)
+  
+   idle_in_transaction_session_timeout configuration parameter
+  
+  
+  
+   
+   Terminate any session with an open transaction that has been idle for
+   longer than the specified duration in milliseconds. This allows any
+   locks to be released and the connection slot to be reused.
+   
+   
+   Sessions in the state "idle in transaction (aborted)" occupy a
+   connection slot, but because they hold no locks, they are not considered
+   by this parameter.
+   
+   
+   The default value of 0 means that sessions which are idle in transaction
+   will will not be terminated.
+   
+  
+ 
+
  
   vacuum_freeze_table_age (integer)
   
diff --git a/doc/src/sgml/mvcc.sgml b/doc/src/sgml/mvcc.sgml
index dd3c775..6352e12 100644
--- a/doc/src/sgml/mvcc.sgml
+++ b/doc/src/sgml/mvcc.sgml
@@ -725,7 +725,9 @@ ERROR:  could not serialize access due to read/write dependencies among transact
  
   
Don't leave connections dangling idle in transaction
-   longer than necessary.
+   longer than necessary.  The configuration parameter
+may be used to
+   automatically disconnect lingering sessions.
   
  
  
diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 6453b88..109d090 100644
--- a/src/backend/storage/lmgr/proc.c
+++ b/src/backend/storage/lmgr/proc.c
@@ -57,6 +57,7 @@
 int			DeadlockTimeout = 1000;
 int			StatementTimeout = 0;
 int			LockTimeout = 0;
+int			IdleInTransactionSessionTimeout = 0;
 bool		log_lock_waits = false;
 
 /* Pointer to this process's PGPROC and PGXACT structs, if any */
diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c
index 115166b..8a75dd2 100644
--- a/src/backend/tcop/postgres.c
+++ b/src/backend/tcop/postgres.c
@@ -2978,6 +2978,11 @@ ProcessInterrupts(void)
 		}
 	}
 
+	if (IdleInTransactionTimeoutSessionPending)
+		ereport(FATAL,
+(errcode(ERRCODE_IDLE_IN_TRANSACTION_SESSION_TIMEOUT),
+ errmsg("terminating connection due to idle-in-transaction timeout")));
+
 	if (ParallelMessagePending)
 		HandleParallelMessages();
 }
@@ -3947,6 +3952,11 @@ PostgresMain(int argc, char *argv[],
 			{
 set_ps_display("idle in transaction", false);
 pgstat_report_activity(STATE_IDLEINTRANSACTION, NULL);
+
+/* Start the idle-in-transaction timer */
+if (IdleInTransactionSessionTimeout > 0)
+	enable_timeout_after(IDLE_IN_TRANSACTION_SESSION_TIMEOUT,
+		 IdleInTransactionSessionTimeout);
 			}
 			else
 			{
@@ -3987,7 +3997,13 @@ PostgresMain(int argc, char *argv[],
 		DoingCommandRead = false;
 
 		/*
-		 * (5) check for any other interesting events that happened while we
+		 * (5) turn off the idle-in-transaction timeout
+		 */
+		if (IdleInTransaction

Re: [HACKERS] GCC 6 warning fixes

2016-03-08 Thread Robert Haas
On Mon, Feb 29, 2016 at 4:50 PM, Thomas Munro
 wrote:
> On Sat, Feb 20, 2016 at 5:14 PM, Peter Eisentraut  wrote:
>> Here are three patches to fix new warnings in GCC 6.
>>
>> 0001 is apparently a typo.
>
> Right, looks like it.  Builds and tests OK with this change (though I
> didn't get any warning from GCC6.0.0 -Wall for this one).
>
>> 0002 was just (my?) stupid code to begin with.
>
> Right, it makes sense to define QL_HELP in just one translation unit
> with external linkage.  Builds and works fine.  I got the 'defined but
> not used' warning from GCC6 and it went away with this patch.
>
>> 0003 is more of a workaround.  There could be other ways address this, too.
>
> This way seems fine to me (you probably want the function to continue
> to exist rather than, say, becoming a macro evaluating to false on
> non-WIN32, if this gets backpatched).  I got this warning from GCC6
> and it went away with this patch.

Peter, are you going to commit this?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] empty array case in plperl_ref_from_pg_array not handled correctly

2016-03-08 Thread Andres Freund
On 2016-03-08 02:11:03 -0700, Alex Hunsaker wrote:
> On Mon, Mar 7, 2016 at 11:32 PM, Andres Freund  wrote:
> 
> > Hi,
> >
> > Per the new valgrind animal we get:
> >
> >
> > http://pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink&dt=2016-03-08%2004%3A22%3A00
> > 2016-03-08 05:56:05.566 UTC [56de6971.723:5] LOG:  statement: select
> > plperl_sum_array('{}');
> > ==1827== Invalid write of size 4
> > ==1827==at 0x14E35DD1: plperl_ref_from_pg_array (plperl.c:1459)
> >
> >

> [ I think you may have meant to CC me not Alexey K. I'm probably the person
> responsible :D. ]

I just took the first person mentioned in the commit message
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=87bb2ade2ce646083f39d5ab3e3307490211ad04
sorry for leaving you out ;)

> Recursive calls to split_array() should be fine as nested 0 dim arrays get
> collapsed down to single 0 dim array (I hand tested it nonetheless):
>  # select ARRAY[ARRAY[ARRAY[]]]::int[];
>  array
> ───
>  {}
> (1 row)
> 
> Looking around, everything that makes an array seems to pass through
> plperl_ref_from_pg_array(), so I think once that is fixed all is good. I
> also looked for dimensions and ARR_NDIM() just to make sure (didn't find
> anything fishy).

Thanks for looking.

backpatched all the way.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgbench small bug fix

2016-03-08 Thread Robert Haas
On Wed, Jan 27, 2016 at 2:31 PM, Fabien COELHO  wrote:
>  - when a duration (-T) is specified, ensure that pgbench ends at that
>time (i.e. do not wait for a transaction beyond the end of the run).

Every other place where doCustom() returns false is implemented as
return clientDone(...).  I think this should probably do the same.

I also think that we should probably store the end time someplace
instead of recomputing it in multiple places (this patch adds two such
places).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] extend pgbench expressions with functions

2016-03-08 Thread Robert Haas
On Tue, Mar 8, 2016 at 3:52 PM, Fabien COELHO  wrote:
> [ new patch and assorted color commentary ]

This is not acceptable:

+   /* guess double type (n for "inf",
"-inf" and "nan") */
+   if (strchr(var, '.') != NULL ||
strchr(var, 'n') != NULL)
+   {
+   double dv;
+   sscanf(var, "%lf", &dv);
+   setDoubleValue(retval, dv);
+   }
+   else
+   setIntValue(retval, strtoint64(var));

That totally breaks the error handling which someone carefully established here.

+   PgBenchValue *varg = & vargs[0];

Extra space.

+   if (!coerceToDouble(st, & vargs[0], &dval))
+   return false;

Another extra space.

-   int nargs = 0;
-   int64   iargs[MAX_FARGS];
-   PgBenchExprLink *l = args;
+   int nargs = 0;
+   PgBenchValuevargs[MAX_FARGS];
+   PgBenchExprLink*l = args;

Completely unnecessary reindentation of the first and third lines.

+   setIntValue(retval, i < 0? -i: i);

Not project style.

Please fix the whitespace damage git diff --check complains about, and
check for other places where you haven't followed project style.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-08 Thread Robert Haas
On Mon, Mar 7, 2016 at 10:18 PM, Andres Freund  wrote:
>> Instead of "durable" I think that "persistent" makes more sense.
>
> I find durable a lot more descriptive. persistent could refer to
> retrying the rename or something.

Yeah, I like durable, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-08 Thread Oleg Bartunov
On Thu, Mar 3, 2016 at 11:45 AM, Emre Hasegeli  wrote:

> > Emre, I checked original thread and didn't find sample data. Could you
> provide them for testing ?
>
> I found it on the Git history:
>
>
> https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true
>

Thanks !

spgist index creates 2 times faster than gist, but index size is
noticeably  bugger

\di+ route_*
List of relations
 Schema | Name | Type  |  Owner   | Table  |  Size  | Description
+--+---+--+++-
 public | route_gist   | index | postgres | routes | 96 MB  |
 public | route_spgist | index | postgres | routes | 132 MB |
(2 rows)

Spgist index tree is much better  than gist - 12149 pages vs 1334760 !



EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
   QUERY PLAN

 Nested Loop  (cost=0.41..570430.27 rows=2338 width=7) (actual
time=5.730..12085.747 rows=8127 loops=1)
   Buffers: shared hit=1334760
   ->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.528 rows=732 loops=1)
 Buffers: shared hit=4
   ->  Index Only Scan using route_gist on routes  (cost=0.41..550.26
rows=22900 width=7) (actual time=2.491..16.503 rows=11 loops=732)
 Index Cond: (route && (hmm.route)::inet)
 Heap Fetches: 8127
 Buffers: shared hit=1334756
 Planning time: 0.827 ms
 Execution time: 12086.513 ms
(10 rows)

EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
routes.route && hmm.route;
   QUERY PLAN
-
 Nested Loop  (cost=0.41..588634.27 rows=2338 width=7) (actual
time=0.043..12.150 rows=8127 loops=1)
   Buffers: shared hit=12149
   ->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
time=0.013..0.075 rows=732 loops=1)
 Buffers: shared hit=4
   ->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
rows=22900 width=7) (actual time=0.011..0.015 rows=11 loops=732)
 Index Cond: (route && (hmm.route)::inet)
 Heap Fetches: 8127
 Buffers: shared hit=12145
 Planning time: 0.779 ms
 Execution time: 12.603 ms
(10 rows)


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-08 Thread Robert Haas
On Fri, Mar 4, 2016 at 3:17 PM, Corey Huinker  wrote:
>>
>> I feel rather uneasy about simply removing the 'infinity' checks. Is there
>> a way to differentiate those two cases, i.e. when the generate_series is
>> called in target list and in the FROM part? If yes, we could do the check
>> only in the FROM part, which is the case that does not work (and consumes
>> arbitrary amounts of memory).
>
> It would be simple enough to remove the infinity test on the "stop" and
> leave it on the "start". Or yank both. Just waiting for others to agree
> which checks should remain.

Let's yank 'em.  This is a minor issue which is distracting us from
the main point of this patch, and I don't think it's worth getting
distracted.

+ 
+  generate_series(start,
stop, step
integer)
+  date
+  setof date
+  
+   Generate a series of values, from start
to stop
+   with a step size of step

I think this should be followed by the word "days" and a period.

+   else
+   /* do when there is no more left */
+   SRF_RETURN_DONE(funcctx);

I think we should drop the "else" and unindent the next two lines.
That's the style I have seen elsewhere.  Plus less indentation equals
more happiness.

I'm pretty meh about the whole idea of this function, though,
actually, and I don't see a single clear +1 vote for this
functionality upthread.  (Apologies if I've missed one.)  In the
absence of a few of those, I recommend we reject this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SP-GiST support for inet datatypes

2016-03-08 Thread Oleg Bartunov
On Tue, Mar 8, 2016 at 11:17 PM, Oleg Bartunov  wrote:

>
>
> On Thu, Mar 3, 2016 at 11:45 AM, Emre Hasegeli  wrote:
>
>> > Emre, I checked original thread and didn't find sample data. Could you
>> provide them for testing ?
>>
>> I found it on the Git history:
>>
>>
>> https://github.com/job/irrexplorer/blob/9e8b5330d7ef0022abbe1af18291257e044eb24b/data/irrexplorer_dump.sql.gz?raw=true
>>
>
> Thanks !
>
> spgist index creates 2 times faster than gist, but index size is
> noticeably  bugger
>
> \di+ route_*
> List of relations
>  Schema | Name | Type  |  Owner   | Table  |  Size  | Description
> +--+---+--+++-
>  public | route_gist   | index | postgres | routes | 96 MB  |
>  public | route_spgist | index | postgres | routes | 132 MB |
> (2 rows)
>
> Spgist index tree is much better  than gist - 12149 pages vs 1334760 !
>

I also noticed, that spgist is much faster than gist for other inet
operators. I'd like to see in 9.6.



>
>
>
> EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
> routes.route && hmm.route;
>QUERY PLAN
>
> 
>  Nested Loop  (cost=0.41..570430.27 rows=2338 width=7) (actual
> time=5.730..12085.747 rows=8127 loops=1)
>Buffers: shared hit=1334760
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
> time=0.013..0.528 rows=732 loops=1)
>  Buffers: shared hit=4
>->  Index Only Scan using route_gist on routes  (cost=0.41..550.26
> rows=22900 width=7) (actual time=2.491..16.503 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Buffers: shared hit=1334756
>  Planning time: 0.827 ms
>  Execution time: 12086.513 ms
> (10 rows)
>
> EXPLAIN (ANALYZE, buffers) SELECT routes.route FROM routes JOIN hmm ON
> routes.route && hmm.route;
>QUERY PLAN
>
> -
>  Nested Loop  (cost=0.41..588634.27 rows=2338 width=7) (actual
> time=0.043..12.150 rows=8127 loops=1)
>Buffers: shared hit=12149
>->  Seq Scan on hmm  (cost=0.00..11.32 rows=732 width=7) (actual
> time=0.013..0.075 rows=732 loops=1)
>  Buffers: shared hit=4
>->  Index Only Scan using route_spgist on routes  (cost=0.41..575.13
> rows=22900 width=7) (actual time=0.011..0.015 rows=11 loops=732)
>  Index Cond: (route && (hmm.route)::inet)
>  Heap Fetches: 8127
>  Buffers: shared hit=12145
>  Planning time: 0.779 ms
>  Execution time: 12.603 ms
> (10 rows)
>
>
>
>
>


[HACKERS] [PATCH v6] GSSAPI encryption support

2016-03-08 Thread Robbie Harwood
Hello friends,

Here's yet another version of GSSAPI encryption support.  It's also
available for viewing on my github:

https://github.com/frozencemetery/postgres/tree/feature/gssencrypt6

Let me hit the highlights of this time around:

- Fallback code is back!  It's almost unchanged from early versions of
  this patchset.  Corresponding doc changes for this and the next item
  are of course included.

- Minor protocol change.  I did not realize that connection parameters
  were not read until after auth was complete, which means that in this
  version I go back to sending the AUTH_REQ_OK in the clear.  Though I
  found this initially irritating since it required re-working the
  should_crypto conditions, it ends up being a net positive since I can
  trade a library call for a couple variables.

- Client buffer flush on completion of authentication.  This should
  prevent the issue with the client getting unexpected message type of
  NUL due to encrypted data not getting decrypted.  I continue to be
  unable to replicate this issue, but since the codepath triggers in the
  "no data buffered case" all the math is sound.  (Famous last words I'm
  sure.)

- Code motion is its own patch.  This was requested and hopefully
  clarifies what's going on.

- Some GSSAPI authentication fixes have been applied.  I've been staring
  at this code too long now and writing this made me feel better.  If it
  should be a separate change that's fine and easy to do.

Thanks!
From 5674aa74effab4931bac1044f32dee83d915aa90 Mon Sep 17 00:00:00 2001
From: Robbie Harwood 
Date: Fri, 26 Feb 2016 16:07:05 -0500
Subject: [PATCH 1/3] Move common GSSAPI code into its own files

On both the frontend and backend, prepare for GSSAPI encryption suport
by moving common code for error handling into a common file.  Other than
build-system changes, no code changes occur in this patch.
---
 configure   |  2 +
 configure.in|  1 +
 src/Makefile.global.in  |  1 +
 src/backend/libpq/Makefile  |  4 ++
 src/backend/libpq/auth.c| 63 +--
 src/backend/libpq/be-gssapi-common.c| 75 +
 src/include/libpq/be-gssapi-common.h| 26 
 src/interfaces/libpq/Makefile   |  4 ++
 src/interfaces/libpq/fe-auth.c  | 48 +
 src/interfaces/libpq/fe-gssapi-common.c | 63 +++
 src/interfaces/libpq/fe-gssapi-common.h | 21 +
 11 files changed, 199 insertions(+), 109 deletions(-)
 create mode 100644 src/backend/libpq/be-gssapi-common.c
 create mode 100644 src/include/libpq/be-gssapi-common.h
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.c
 create mode 100644 src/interfaces/libpq/fe-gssapi-common.h

diff --git a/configure b/configure
index b3f3abe..a5bd629 100755
--- a/configure
+++ b/configure
@@ -713,6 +713,7 @@ with_systemd
 with_selinux
 with_openssl
 krb_srvtab
+with_gssapi
 with_python
 with_perl
 with_tcl
@@ -5491,6 +5492,7 @@ $as_echo "$with_gssapi" >&6; }
 
 
 
+
 #
 # Kerberos configuration parameters
 #
diff --git a/configure.in b/configure.in
index 0bd90d7..4fd8f05 100644
--- a/configure.in
+++ b/configure.in
@@ -636,6 +636,7 @@ PGAC_ARG_BOOL(with, gssapi, no, [build with GSSAPI support],
   krb_srvtab="FILE:\$(sysconfdir)/krb5.keytab"
 ])
 AC_MSG_RESULT([$with_gssapi])
+AC_SUBST(with_gssapi)
 
 
 AC_SUBST(krb_srvtab)
diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index e94d6a5..3dbc5c2 100644
--- a/src/Makefile.global.in
+++ b/src/Makefile.global.in
@@ -183,6 +183,7 @@ with_perl	= @with_perl@
 with_python	= @with_python@
 with_tcl	= @with_tcl@
 with_openssl	= @with_openssl@
+with_gssapi	= @with_gssapi@
 with_selinux	= @with_selinux@
 with_systemd	= @with_systemd@
 with_libxml	= @with_libxml@
diff --git a/src/backend/libpq/Makefile b/src/backend/libpq/Makefile
index 09410c4..a8cc9a0 100644
--- a/src/backend/libpq/Makefile
+++ b/src/backend/libpq/Makefile
@@ -21,4 +21,8 @@ ifeq ($(with_openssl),yes)
 OBJS += be-secure-openssl.o
 endif
 
+ifeq ($(with_gssapi),yes)
+OBJS += be-gssapi-common.o
+endif
+
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 57c2f48..73d493e 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -136,11 +136,7 @@ bool		pg_krb_caseins_users;
  *
  */
 #ifdef ENABLE_GSS
-#if defined(HAVE_GSSAPI_H)
-#include 
-#else
-#include 
-#endif
+#include "libpq/be-gssapi-common.h"
 
 static int	pg_GSS_recvauth(Port *port);
 #endif   /* ENABLE_GSS */
@@ -715,63 +711,6 @@ recv_and_check_password_packet(Port *port, char **logdetail)
  */
 #ifdef ENABLE_GSS
 
-#if defined(WIN32) && !defined(WIN32_ONLY_COMPILER)
-/*
- * MIT Kerberos GSSAPI DLL doesn't properly export the symbols for MingW
- * that contain the OIDs required. Redefine here, values copied
- 

[HACKERS] fun with "Ready for Committer" patches

2016-03-08 Thread Robert Haas
OK, so I made a pass through the "Ready for Committer" patches in the
current CF.  One I committed, several I replied to the thread with
review comments and set back to "Waiting on Author". Here's where we
are with the rest:

Silent data loss with ext4 / all current versions - It looks to me
like Andres is handling this.  I set him as the committer.
pg_resetxlog reference page reorganization - Peter Eisentraut wrote
this patch.  I assume he will commit it.  If not, I'm not sure why
anyone else should.
GCC 6 warning fixes - Ditto.
TAP test enhancements - It looks to me like Alvaro is handling this.
I set him as the committer.
Unique Joins - This patch has had a lot of review and discussion.  It
would be best if Tom Lane looked at it.  If not, one of us lesser
mortals will have to have a go.
index-only scans with partial indexes - Kevin has claimed this as committer.
Fix lock contention for HASHHDR.mutex - I guess I need to go revisit
this one, unless somebody else is willing to jump in.  I wouldn't mind
a few more opinions on this patch.
plpgsql - possibility to get element or array type for referenced
variable type - No committer yet.  I don't have enough interest or
knowledge to want to handle this.
pl/pgSQL, get diagnostics and big data - No committer yet.  Seems like
a reasonable concept.  A borderline bug fix.
enhanced custom error in PLPythonu - No committer yet.  Tom Lane and
Peter Eisentraut are the usual suspects for PL/python.  Again, I have
neither the interest nor the knowledge.

It's hard to miss the fact that there are an absolutely breathtaking
number of patches in this CommitFest - 80! - that are in the "needs
review" state.  We really, really, really need more review to happen -
and there's no place for that to come from except other people who
would like their own patches reviewed in turn, other than the limited
bandwidth of committers and of course the absolutely stunning efforts
of Michael Paquier to review everything in sight.  But that's not
going to be enough: we really, really need other people to be
reviewing also.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-08 Thread Tomas Vondra
Hi,

On Mon, 2016-03-07 at 21:55 -0800, Andres Freund wrote:
> On 2016-03-08 12:26:34 +0900, Michael Paquier wrote:
> > On Tue, Mar 8, 2016 at 12:18 PM, Andres Freund  wrote:
> > > On 2016-03-08 12:01:18 +0900, Michael Paquier wrote:
> > >> I have spent a couple of hours looking at that in details, and the
> > >> patch is neat.
> > >
> > > Cool. Doing some more polishing right now. Will be back with an updated
> > > version soonish.
> > >
> > > Did you do some testing?
> > 
> > Not much in details yet, I just ran a check-world with fsync enabled
> > for the recovery tests, plus quick manual tests with a cluster
> > manually set up. I'll do more with your new version now that I know
> > there will be one.
> 
> Here's my updated version.
> 
> Note that I've split the patch into two. One for the infrastructure, and
> one for the callsites.

I've repeated the power-loss testing today. With the patches applied I'm
not longer able to reproduce the issue (despite trying about 10x), while
without them I've hit it on the first try. This is on kernel 4.4.2.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-08 Thread Andres Freund
On 2016-03-08 23:47:48 +0100, Tomas Vondra wrote:
> I've repeated the power-loss testing today. With the patches applied I'm
> not longer able to reproduce the issue (despite trying about 10x), while
> without them I've hit it on the first try. This is on kernel 4.4.2.

Yay, thanks for testing!

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fun with "Ready for Committer" patches

2016-03-08 Thread Joe Conway
On 03/08/2016 02:45 PM, Robert Haas wrote:
> OK, so I made a pass through the "Ready for Committer" patches in the
> current CF.  One I committed, several I replied to the thread with
> review comments and set back to "Waiting on Author". Here's where we
> are with the rest:

> plpgsql - possibility to get element or array type for referenced
> variable type - No committer yet.  I don't have enough interest or
> knowledge to want to handle this.

I'll try to move this one forward, although later in the week as I am
traveling all day tomorrow

> pl/pgSQL, get diagnostics and big data - No committer yet.  Seems like
> a reasonable concept.  A borderline bug fix.

possibly this one too...

Joe

-- 
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-08 Thread Corey Huinker
>
> > It would be simple enough to remove the infinity test on the "stop" and
> > leave it on the "start". Or yank both. Just waiting for others to agree
> > which checks should remain.
>
> Let's yank 'em.  This is a minor issue which is distracting us from
> the main point of this patch, and I don't think it's worth getting
> distracted.
>

+1. It leaves this function consistent with the others, and if we want to
add checks later we can do them all at the same time.


>
> + 
> +
> generate_series(start,
> stop, step
> integer)
> +  date
> +  setof date
> +  
> +   Generate a series of values, from start
> to stop
> +   with a step size of step
>
> I think this should be followed by the word "days" and a period.
>
>
No objections. I just followed the pattern of the other generate_series()
docs.


> +   else
> +   /* do when there is no more left */
> +   SRF_RETURN_DONE(funcctx);
>
> I think we should drop the "else" and unindent the next two lines.
> That's the style I have seen elsewhere.  Plus less indentation equals
> more happiness.
>

No objections here either. I just followed the pattern of generate_series()
for int there.


>
> I'm pretty meh about the whole idea of this function, though,
> actually, and I don't see a single clear +1 vote for this
> functionality upthread.  (Apologies if I've missed one.)  In the
> absence of a few of those, I recommend we reject this.
>

Just David and Vik so far. The rest were either against(Simon), meh(Robert)
or +1ed/-1ed the backpatch, leaving their thoughts on the function itself
unspoken.

Happy to make the changes above if we're moving forward with it.


Re: [HACKERS] Idle In Transaction Session Timeout, revived

2016-03-08 Thread Andres Freund
On 2016-03-08 16:42:37 -0500, Robert Haas wrote:
> - I really wonder if the decision to ignore sessions that are idle in
> transaction (aborted) should revisited.  Consider this:
> 
> rhaas=# begin;
> BEGIN
> rhaas=# lock table pg_class;
> LOCK TABLE
> rhaas=# savepoint a;
> SAVEPOINT
> rhaas=# select 1/0;
> ERROR:  division by zero

Probably only if the toplevel transaction is also aborted. Might not be
entirely trivial to determine.

> - What's the right order of events in PostgresMain?  Right now the
> patch disables the timeout after checking for interrupts and clearing
> DoingCommandRead, but maybe it would be better to disable the timeout
> first, so that we can't have it happen that start to execute the
> command and then, in medias res, bomb out because of the idle timeout.
> Then again, maybe you had some compelling reason for doing it this
> way, in which case we should document that in the comments.

Hm, we should never bomb out of the middle of anything with this, right?
I mean all the itmeout handler does is set a volatile var and set a
latch, the rest is done in the interrupt handler? Which is not called in
the signal handler.

I'm no targuing for the current order, just that one argument ;). FWIW,
I think Vik wrote this after discussing with me, and IIRC there was not
a lot of reasoning going into the specific location for doing this.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: index-only scans with partial indexes

2016-03-08 Thread Tom Lane
Kyotaro HORIGUCHI  writes:
> Hello, This is a (maybe) committer-ready patch of a Tomas
> Vondra's project.

I think this needs quite a bit of work yet.  A few comments:

* If we're going to pay the price of identifying implied restriction
conditions in check_partial_indexes(), we should at least recoup some
of that investment by not doing it again in create_indexscan_plan().

* create_indexscan_plan() has this comment about what it's doing:
 * We can also discard quals that are implied by a partial index's
 * predicate, but only in a plain SELECT; when scanning a target relation
 * of UPDATE/DELETE/SELECT FOR UPDATE, we must leave such quals in the
 * plan so that they'll be properly rechecked by EvalPlanQual testing.
I believe that that problem applies for this optimization as well,
and thus that you can only remove implied quals in plain SELECT.
At least, if there's some reason why that problem does *not* apply,
there had darn well better be a comment explaining why it's safe.

* Adding indexrinfos to IndexPath seems unnecessary, since that struct
already has the "index" pointer --- you can just get the field out of the
IndexOptInfo when you need it.  If you insist on having the extra field,
this patch is short of the threshold for correctness on adding fields to
paths.  It missed _outIndexPath for instance.

* The additional #include in costsize.c has no apparent reason.

* The changes in cost_index() really ought to come with some change
in the associated comments.

* Personally I'd not change the signature of
match_restriction_clauses_to_index; that's just code churn that
may have to get undone someday.

* The block comment in check_index_only needs more thought than this,
as the phrase "The same is true" is now invalid; the "same" it refers
to *isn't* the same anymore.

* I'm not too thrilled with injecting the initialization of
index->indrinfos into the initial loop in check_partial_indexes().
If it stays there, I'd certainly expect the comment ahead of the
loop to be changed to have something to do with reality.  But can't
we find some more-appropriate place to initialize it?  Like maybe
where the IndexOptInfo is first created?  I would not really expect
check_partial_indexes() to have side-effects on non-partial indexes.

* I think the double loop in check_partial_indexes() is too cute by half.
I'd be inclined to just build the replacement list unconditionally while
we do the predicate_implied_by() tests.  Those are expensive enough that
saving one lappend per implication-test is a useless optimization,
especially if it requires code as contorted and bug-prone as this.

* The comment added to IndexOptInfo is not very adequate, and not spelled
correctly either.  There's a block comment you should be adding a para to
(probably take the text you added for struct IndexPath).  And again,
there is more work to do to add a field to such a struct, eg outfuncs.c.
Usually a good way to find all the places to touch is to grep for some of
the existing field names in the struct.

* I don't much care for the field name "indrinfos"; it's neither very
readable nor descriptive.  Don't have a better suggestion right now
though.

* Not sure if new regression test cases would be appropriate.  The changes
in the existing cases seem a bit unfortunate actually; I'm afraid that
this may be defeating the original intent of those tests.

I'm setting this back to Waiting on Author.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-08 Thread David G. Johnston
On Tue, Mar 8, 2016 at 3:57 PM, Corey Huinker 
wrote:

>
>> I'm pretty meh about the whole idea of this function, though,
>> actually, and I don't see a single clear +1 vote for this
>> functionality upthread.  (Apologies if I've missed one.)  In the
>> absence of a few of those, I recommend we reject this.
>>
>
> Just David and Vik so far. The rest were either against(Simon),
> meh(Robert) or +1ed/-1ed the backpatch, leaving their thoughts on the
> function itself unspoken.
>
> Happy to make the changes above if we're moving forward with it.
>

​I'll toss a user-land +1 for this.

David J.


Re: [HACKERS] Add generate_series(date,date) and generate_series(date,date,integer)

2016-03-08 Thread Alvaro Herrera
Robert Haas wrote:

> Let's yank 'em.  This is a minor issue which is distracting us from
> the main point of this patch, and I don't think it's worth getting
> distracted.

+1

> I'm pretty meh about the whole idea of this function, though,
> actually, and I don't see a single clear +1 vote for this
> functionality upthread.  (Apologies if I've missed one.)  In the
> absence of a few of those, I recommend we reject this.

+1

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fun with "Ready for Committer" patches

2016-03-08 Thread Tom Lane
Robert Haas  writes:
> Unique Joins - This patch has had a lot of review and discussion.  It
> would be best if Tom Lane looked at it.

Yeah, I'll pick it up soon.  I've basically been kicking as much as
I could down the road for the last couple of months, trying to get the
pathification changes done.  Now that that's in, I expect to be
substantially less AWOL from the commitfest.

> enhanced custom error in PLPythonu - No committer yet.  Tom Lane and
> Peter Eisentraut are the usual suspects for PL/python.  Again, I have
> neither the interest nor the knowledge.

I don't mind touching plpython at the C level, but this one requires
somebody who uses Python enough to have an informed opinion on the
tastefulness of the proposed language features.  That's not me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] fun with "Ready for Committer" patches

2016-03-08 Thread Tatsuo Ishii
> It's hard to miss the fact that there are an absolutely breathtaking
> number of patches in this CommitFest - 80! - that are in the "needs
> review" state.  We really, really, really need more review to happen -

Many of "needs review" state patches already have reviewer(s). Do you
mean we want more reviewers in addition to them for such patches?

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] silent data loss with ext4 / all current versions

2016-03-08 Thread Joshua D. Drake

On 03/08/2016 02:16 PM, Robert Haas wrote:

On Mon, Mar 7, 2016 at 10:18 PM, Andres Freund  wrote:

Instead of "durable" I think that "persistent" makes more sense.


I find durable a lot more descriptive. persistent could refer to
retrying the rename or something.


Yeah, I like durable, too.


There is also precedent, DURABLE as in aciD

JD



--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-03-08 Thread David Rowley
On 23 January 2016 at 05:36, Tomas Vondra  wrote:
> Hi,
>
> On 12/17/2015 02:17 PM, David Rowley wrote:
>>
>> On 17 December 2015 at 19:11, Simon Riggs > > wrote:
>>
>> On 17 December 2015 at 00:17, Tomas Vondra
>> mailto:tomas.von...@2ndquadrant.com>>
>> wrote:
>>
>> I'd go with match_first_tuple_only.
>>
>>
>> +1
>>
>> unique_inner is a state that has been detected,
>> match_first_tuple_only is the action we take as a result.
>>
>>
>> Ok great. I've made it so in the attached. This means the comment in the
>> join code where we perform the skip can be a bit less verbose and all
>> the details can go in where we're actually setting the
>> match_first_tuple_only to true.
>
>
> OK. I've looked at the patch again today, and it seems broken bv 45be99f8 as
> the partial paths were not passing the unique_inner to the create_*_path()
> functions. The attached patch should fix that.
>
> Otherwise I think the patch is ready for committer - I think this is a
> valuable optimization and I see nothing wrong with the code.

I've attached an updated patch which updates it to fix the conflicts
with the recent upper planner changes.
I also notice that some regression tests, which I think some of which
Tom updated in the upper planner changes have now changed back again
due to the slightly reduced costs on hash and nested loop joins where
the inner side is unique. I checked the costs of one of these by
disabling hash join and noticed that the final totla cost is the same,
so it's not too surprising that they keep switching plans with these
planner changes going in. I verified that these remain as is when I
comment out the cost changing code in this patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


unique_joins_dbcecda_2016-03-09.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgcrypto: add s2k-count

2016-03-08 Thread Alvaro Herrera
Jeff Janes wrote:
> pgcrypto supports s2k-mode for key-stretching during symmetric
> encryption, and even defaults to s2k-mode=3, which means configurable
> iterations.  But it doesn't support s2k-count to actually set those
> iterations to be anything other than the default.  If you are
> interested in key-stretching, the default is not going to cut it.

Talking about deep rabbit holes ...

I gave this a look.  I adjusted it here and there for project style and
general cleanliness (something that could be applied to pgcrypto much
more generally) and eventually arrived at trying to figure out how is
the s2k count actually used by the underlying algorithm.  I arrived at
the function calc_s2k_iter_salted which is where it is actually used to
encrypt things.  But that function is completely devoid of comments and
not completely trivial.  At this point I cannot for the life of me
determine whether that function should use the one-byte format specified
by the relevant RFC (4880) or the decoded, human-understandable number
of iterations.

I would love to be able to read gnupg's code to figure out what it is
that they do, but the structure of their code is even more impenetrable
than pgcrypto's.  Perhaps it would be easier to measure the time it
takes to run some s2k operations ...

I CCed Marko here.  Hopefully he can chime in on whether this patch is
correct.

Anyway, assuming that the iteration count was already being used
correctly, then as far as I'm concerned we're ready to go.  The attached
patch is what I would commit.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/pgcrypto/expected/pgp-encrypt.out b/contrib/pgcrypto/expected/pgp-encrypt.out
index b35de79..8fc558c 100644
--- a/contrib/pgcrypto/expected/pgp-encrypt.out
+++ b/contrib/pgcrypto/expected/pgp-encrypt.out
@@ -103,6 +103,25 @@ select pgp_sym_decrypt(
  Secret.
 (1 row)
 
+-- s2k count change
+select pgp_sym_decrypt(
+	pgp_sym_encrypt('Secret.', 'key', 's2k-count=1024'),
+	'key', 'expect-s2k-count=1024');
+ pgp_sym_decrypt 
+-
+ Secret.
+(1 row)
+
+-- s2k_count rounds up
+select pgp_sym_decrypt(
+	pgp_sym_encrypt('Secret.', 'key', 's2k-count=6500'),
+	'key', 'expect-s2k-count=6500');
+NOTICE:  pgp_decrypt: unexpected s2k_count: expected 6500 got 65011712
+ pgp_sym_decrypt 
+-
+ Secret.
+(1 row)
+
 -- s2k digest change
 select pgp_sym_decrypt(
 	pgp_sym_encrypt('Secret.', 'key', 's2k-digest-algo=md5'),
diff --git a/contrib/pgcrypto/pgp-decrypt.c b/contrib/pgcrypto/pgp-decrypt.c
index 5c69745..37f374a 100644
--- a/contrib/pgcrypto/pgp-decrypt.c
+++ b/contrib/pgcrypto/pgp-decrypt.c
@@ -643,6 +643,7 @@ parse_symenc_sesskey(PGP_Context *ctx, PullFilter *src)
 	if (res < 0)
 		return res;
 	ctx->s2k_mode = ctx->s2k.mode;
+	ctx->s2k_count = s2k_iter_to_count(ctx->s2k.iter);
 	ctx->s2k_digest_algo = ctx->s2k.digest_algo;
 
 	/*
diff --git a/contrib/pgcrypto/pgp-encrypt.c b/contrib/pgcrypto/pgp-encrypt.c
index 2320c75..c9148fd 100644
--- a/contrib/pgcrypto/pgp-encrypt.c
+++ b/contrib/pgcrypto/pgp-encrypt.c
@@ -567,7 +567,7 @@ init_s2k_key(PGP_Context *ctx)
 	if (ctx->s2k_cipher_algo < 0)
 		ctx->s2k_cipher_algo = ctx->cipher_algo;
 
-	res = pgp_s2k_fill(&ctx->s2k, ctx->s2k_mode, ctx->s2k_digest_algo);
+	res = pgp_s2k_fill(&ctx->s2k, ctx->s2k_mode, ctx->s2k_digest_algo, ctx->s2k_count);
 	if (res < 0)
 		return res;
 
diff --git a/contrib/pgcrypto/pgp-pgsql.c b/contrib/pgcrypto/pgp-pgsql.c
index 1842985..1f65b66 100644
--- a/contrib/pgcrypto/pgp-pgsql.c
+++ b/contrib/pgcrypto/pgp-pgsql.c
@@ -181,6 +181,7 @@ struct debug_expect
 	int			expect;
 	int			cipher_algo;
 	int			s2k_mode;
+	int			s2k_count;
 	int			s2k_cipher_algo;
 	int			s2k_digest_algo;
 	int			compress_algo;
@@ -196,6 +197,7 @@ fill_expect(struct debug_expect * ex, int text_mode)
 	ex->expect = 0;
 	ex->cipher_algo = -1;
 	ex->s2k_mode = -1;
+	ex->s2k_count = -1;
 	ex->s2k_cipher_algo = -1;
 	ex->s2k_digest_algo = -1;
 	ex->compress_algo = -1;
@@ -218,6 +220,7 @@ check_expect(PGP_Context *ctx, struct debug_expect * ex)
 {
 	EX_CHECK(cipher_algo);
 	EX_CHECK(s2k_mode);
+	EX_CHECK(s2k_count);
 	EX_CHECK(s2k_digest_algo);
 	EX_CHECK(use_sess_key);
 	if (ctx->use_sess_key)
@@ -247,6 +250,8 @@ set_arg(PGP_Context *ctx, char *key, char *val,
 		res = pgp_set_sess_key(ctx, atoi(val));
 	else if (strcmp(key, "s2k-mode") == 0)
 		res = pgp_set_s2k_mode(ctx, atoi(val));
+	else if (strcmp(key, "s2k-count") == 0)
+		res = pgp_set_s2k_count(ctx, atoi(val));
 	else if (strcmp(key, "s2k-digest-algo") == 0)
 		res = pgp_set_s2k_digest_algo(ctx, val);
 	else if (strcmp(key, "s2k-cipher-algo") == 0)
@@ -286,6 +291,11 @@ set_arg(PGP_Context *ctx, char *key, char *val,
 		ex->expect = 1;
 		ex->s2k_mode = atoi(val);
 	}
+	else if (ex != NULL && strcmp(key, "expect-s2k-count") == 0)
+	{
+		ex->expect = 1;
+		ex->s2k_count = atoi(val);
+	}
 	else if (ex != NULL && str

Re: [HACKERS] Declarative partitioning

2016-03-08 Thread Corey Huinker
>
> Sorry for replying so late.
>

No worries! We have jobs to do aside from this.


>
> Everything seemed to go dandy until I tried FOR VALUES (blah , blah],
> where psql wouldn't send the command string without accepting the closing
> parenthesis, :(.  So maybe I should try to put the whole thing in '', that
> is, accept the full range_spec in a string, but then we are back to
> requiring full-blown range parse function which I was trying to avoid by
> using the aforementioned grammar.  So, I decided to move ahead with the
> following grammar for time being:
>
> START (lower-bound) [ EXCLUSIVE ]
> | END (upper-bound) [ INCLUSIVE ]
> | START (lower-bound) [ EXCLUSIVE ] END (upper-bound) [ INCLUSIVE ]
>
> Where,
>
> *-bound: a_expr
>  | *-bound ',' a_expr
>
> Note that in the absence of explicit specification, lower-bound is
> inclusive and upper-bound is exclusive.
>

Thanks for trying. I agree that it would be a full blown range parser, and
I'm not yet advanced enough to help you with that.

So presently partitions that are unbounded on the lower end aren't
possible, but that's a creation syntax issue, not an infrastructure issue.
Correct?


> Okay, perhaps I should not presume a certain usage.  However, as you know,
> the usage like yours requires some mechanism of data redistribution (also
> not without some syntax), which I am not targeting with the initial patch.
>

I'm quite fine with limitations in this initial patch, especially if they
don't limit what's possible in the future.


> > Question: I haven't dove into the code, but I was curious about your
> tuple
> > routing algorithm. Is there any way for the algorithm to begin it's scan
> of
> > candidate partitions based on the destination of the last row inserted
> this
> > statement? I ask because most use cases (that I am aware of) have data
> that
> > would naturally cluster in the same partition.
>
> No.  Actually the tuple-routing function starts afresh for each row.  For
> range partitions, it's binary search over an array of upper bounds.  There
> is no row-to-row state caching in the partition module itself.
>
>
bsearch should be fine, that's what I've used in my own custom partitioning
schemes.

Was there a new patch, and if so, is it the one you want me to kick the
tires on?


Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-03-08 Thread Tom Lane
David Rowley  writes:
> I also notice that some regression tests, which I think some of which
> Tom updated in the upper planner changes have now changed back again
> due to the slightly reduced costs on hash and nested loop joins where
> the inner side is unique.

?? I don't see anything in this patch that touches the same queries
that changed plans in my previous patch.

I do think that the verbosity this adds to the EXPLAIN output is not
desirable at all, at least not in text mode.  Another line for every
darn join is a pretty high price.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2016-03-08 Thread Kyotaro HORIGUCHI
At Wed, 9 Mar 2016 01:16:26 +0900, Amit Langote  wrote 
in 
> On Wed, Mar 9, 2016 at 12:24 AM, Robert Haas  wrote:
> > This patch has been worked on by so many people and reviewed by so
> > many people that I can't keep track of who should be credited when it
> > gets committed.  Could someone provide a list of author(s) and
> > reviewer(s)?
> 
> Original authors are Rahila Syed and Vinayak Pokale.
> 
> I have been reviewing this for last few CFs. I sent in last few
> revisions as well.

The owner of this is Vinayak and, ah, I forgot to add myself as a
reviewer. I have also reviewed this for last few CFs. 

So, as looking into CF app, it seems not so inconsistent with the
persons who appears in this thread for thses three CFs.

Authors: Vinayak Pokale, Rahila Syed, Amit Langote
Reviewers: Amit Langote, Kyotaro Horiguchi

Is there anyone who shold be added in this list?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >