Re: [HACKERS] Runtime Partition Pruning

2017-12-18 Thread Amit Langote
Hi.

On 2017/12/16 15:05, David Rowley wrote:
> On 13 December 2017 at 00:33, Beena Emerson  wrote:
>> PFA the updated patch, this can be applied over the v13 patches [1]
>> over commit 487a0c1518af2f3ae2d05b7fd23d636d687f28f3
> 
> Hi Beena,
> 
> Thanks for posting an updated patch.
> 
> I've been looking over this and I think that the use of the
> PartitionDispatch in set_append_subplan_indexes is not correct. What
> we need here is the index of the Append's subnode and that's not what
> RelationGetPartitionDispatchInfo() gives you. Remember that some
> partitions could have been pruned away already during planning.

A somewhat similar concern is being discussed on the "UPDATE partition
key" thread [1].  In that case, ExecInitModifyTable(), when initializing
tuple routing information to handle the "update partition key" case, will
have to deal with the fact that there might be fewer sub-plans in the
ModifyTable node than there are partitions in the partition tree.  That
is, source partitions that planner would have determined after pruning,
could be fewer than possible target partitions for rows from the source
partitions to move to, of which the latter consists of *all* partitions.
So, we have to have a mapping from leaf partition indexes as figured out
by RelationGetPartitionDispatchInfo() (indexes that are offsets into a
global array for *all* partitions), to sub-plan indexes which are offsets
into the array for only those partitions that have a sub-plan.  Such
mapping is built (per the latest patch on that thread) by
ExecSetupPartitionTupleRouting() in execPartition.c.

We could do something similar here using a similar code structure.  Maybe,
add a ExecSetupPartitionRuntimePruning() in execPartition.c (mimicking
ExecSetupPartitionTupleRouting), that accepts AppendState node.
Furthermore, it might be a good idea to have something similar to
ExecFindPartition(), say, ExecGetPartitions().  That is, we have new
functions for run-time pruning that are counterparts to corresponding
functions for tuple routing.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/c5e1d4ad-d243-52c5-608b-5dbb7183e465%40lab.ntt.co.jp




Re: [HACKERS] Runtime Partition Pruning

2017-12-18 Thread Amit Langote
On 2017/12/09 0:57, Robert Haas wrote:
> On Thu, Dec 7, 2017 at 2:22 AM, Beena Emerson  wrote:
>> I have added the partition quals that are used for pruning.
>>
>> PFA the updated patch. I have changed the names of variables to make
>> it more appropriate, along with adding more code comments and doing
>> some refactoring and other code cleanups.
> 
> - I am surprised that set_append_subplan_indexes() needs to worry
> about multi-level partitioning directly.  I would have thought that
> Amit's patch would take care of that, just returning a Bitmapset of
> indexes which this function could use directly.

Actually, the partition.c code that my patch adds is limited to consider
one partitioned table at a time, not the whole tree.  As of 0a480502b09
[1], we call set_append_rel_size() separately for each partitioned table
in a partition tree.  In each such call, we call partition.c to perform
partition pruning for the given partitioned table.

In the run-time pruning case, we should get, via Append, a list of pruning
clauses for each partitioned table in the tree that survived plan-time
pruning.  Then, just like ExecFindPartition() calls
get_partition_for_tuple() for each partitioned table until we get to a
leaf partition, we should call partition.c for each un-pruned partitioned
table that have run-time pruning clauses associated.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b09




Re: [HACKERS] Custom compression methods

2017-12-18 Thread Ildus Kurbangaliev
On Thu, 14 Dec 2017 10:29:10 -0500
Robert Haas  wrote:

> On Wed, Dec 13, 2017 at 7:18 AM, Ildus Kurbangaliev
>  wrote:
> > Since we agreed on ALTER syntax, i want to clear things about
> > CREATE. Should it be CREATE ACCESS METHOD .. TYPE СOMPRESSION or
> > CREATE COMPRESSION METHOD? I like the access method approach, and it
> > simplifies the code, but I'm just not sure a compression is an
> > access method or not.  
> 
> +1 for ACCESS METHOD.

An access method then.

> 
> > Current implementation
> > --
> >
> > To avoid extra patches I also want to clear things about current
> > implementation. Right now there are two tables, "pg_compression" and
> > "pg_compression_opt". When compression method is linked to a column
> > it creates a record in pg_compression_opt. This record's Oid is
> > stored in the varlena. These Oids kept in first column so I can
> > move them in pg_upgrade but in all other aspects they behave like
> > usual Oids. Also it's easy to restore them.  
> 
> pg_compression_opt -> pg_attr_compression, maybe.
> 
> > Compression options linked to a specific column. When tuple is
> > moved between relations it will be decompressed.  
> 
> Can we do this only if the compression method isn't OK for the new
> column?  For example, if the old column is COMPRESS foo PRESERVE bar
> and the new column is COMPRESS bar PRESERVE foo, we don't need to
> force decompression in any case.

Thanks, sounds right, i will add it to the patch.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-18 Thread Amit Langote
Fujita-san,

On 2017/12/12 21:21, Etsuro Fujita wrote:
> Hi Maksim,
> 
> (2017/12/12 17:57), Maksim Milyutin wrote:
>> Your patch already is not applied on master. Please rebase it.
> 
> Done.  Please find attached an updated version of the patch.

Thanks for sending the updated version of the patch.

I noticed that this patch introduces a partition_rels field in
ModifyTable, which contains the RT indexes of *all* leaf partitions in the
partition tree.  That means we now expand the partition inheritance tree
in the planner even in the INSERT case, simply because some of the leaf
partitions might be foreign tables which must be looked at by the planner.
 I'm somewhat concerned about the performance implications of that.  Now,
to insert even a single row into a partitioned table, which may not even
be routed to any of its foreign partitions, we are going to have to expand
the inheritance twice -- once by the planner to handle foreign partitions
and then by the executor when setting up the tuple routing information.

Is there any reason why we have to to things this way, beside the fact
that the PlanForeignModify() API appears to be callable from only within a
valid planner context?

Thanks,
Amit




Re: GSoC 2018

2017-12-18 Thread Aleksander Alekseev
Hello hackers,

Thanks you a lot for your feedback. I modified the project description
to make it more clear that it implies augmenting an existing HA
solution, particularly Stolon, and doesn't imply solving existing
limitations of logical replication like lack of DDL replication.

Removing some of these limitations or adding logical replication support
to other existing HA applications like RepMgr or Patroni or corosync /
Pacemaker or maybe Postgres-XL sounds like good ideas for other GSoC
projects. I strongly encourage to add these ideas to wiki anyone who is
willing to be a mentor.

Unfortunately I don't have any experience of using these applications
and have no idea how they behave in different corner cases like
netsplit, slow network, etc. Also I don't have much motivation to figure
this out because I'm pretty happy with Stolon. For these reasons I
personally can't be a mentor for corresponding projects.

Also today I'm going to contact Simone Gotti, the main developer of
Stolon, to let him know about this thread. I wonder what are his
thoughts on all this.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Small typo in comment in json_agg_transfn

2017-12-18 Thread Magnus Hagander
On Sun, Dec 17, 2017 at 2:43 PM, David Rowley 
wrote:

> The attached fixed a small typo in json_agg_transfn.
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Runtime Partition Pruning

2017-12-18 Thread David Rowley
On 18 December 2017 at 21:31, Amit Langote
 wrote:
> On 2017/12/16 15:05, David Rowley wrote:
>> I've been looking over this and I think that the use of the
>> PartitionDispatch in set_append_subplan_indexes is not correct. What
>> we need here is the index of the Append's subnode and that's not what
>> RelationGetPartitionDispatchInfo() gives you. Remember that some
>> partitions could have been pruned away already during planning.
>
> A somewhat similar concern is being discussed on the "UPDATE partition
> key" thread [1].  In that case, ExecInitModifyTable(), when initializing
> tuple routing information to handle the "update partition key" case, will
> have to deal with the fact that there might be fewer sub-plans in the
> ModifyTable node than there are partitions in the partition tree.  That
> is, source partitions that planner would have determined after pruning,
> could be fewer than possible target partitions for rows from the source
> partitions to move to, of which the latter consists of *all* partitions.
> So, we have to have a mapping from leaf partition indexes as figured out
> by RelationGetPartitionDispatchInfo() (indexes that are offsets into a
> global array for *all* partitions), to sub-plan indexes which are offsets
> into the array for only those partitions that have a sub-plan.  Such
> mapping is built (per the latest patch on that thread) by
> ExecSetupPartitionTupleRouting() in execPartition.c.

Surely this is a different problem? With UPDATE of a partition key, if
the planner eliminates all but 1 partition the UPDATE could cause that
tuple to be "moved" into any leaf partition, very possibly one that's
been eliminated during planning.

In the case of runtime Append pruning, we can forget about all
partitions that the planner managed to eliminate, we'll never need to
touch those, ever. All we care about here is trying to reduce the
number of partitions down further using values that were not available
during planning.

> We could do something similar here using a similar code structure.  Maybe,
> add a ExecSetupPartitionRuntimePruning() in execPartition.c (mimicking
> ExecSetupPartitionTupleRouting), that accepts AppendState node.
> Furthermore, it might be a good idea to have something similar to
> ExecFindPartition(), say, ExecGetPartitions().  That is, we have new
> functions for run-time pruning that are counterparts to corresponding
> functions for tuple routing.

Seems to me in this case we're better to build this structure during
planning and save it with the plan so that it can be used over and
over, rather than building it again and again each time the plan is
executed. Likely a common use case for run-time pruning is when the
plan is going to be used multiple times with different parameters, so
we really don't want to repeat any work that we don't have to here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: genomic locus

2017-12-18 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

maybe this one is also helpful: 
https://wiki.postgresql.org/images/1/1b/Postbis_pgcon_eu_2012.pdf
it seems they have put a lot of work into that.

regards,

hans


> On 15 Dec 2017, at 20:49, Gene Selkov  wrote:
> 
> Greetings everyone,
> 
> I need a data type to represent genomic positions, which will consist of a 
> string and a pair of integers with interval logic and access methods. Sort of 
> like my seg type, but more straightforward.
> 
> I noticed somebody took a good care of seg while I was away for the last 20 
> years, and I am extremely grateful for that. I have been using it. In the 
> meantime, things have changed and now I am almost clueless about how you deal 
> with contributed modules and what steps I should take once I get it to work. 
> Also, is it a good idea to clone and fix seg for this purpose, or is there a 
> more appropriate template? Or maybe just building it from scratch will be a 
> better idea?
> 
> I have seen a lot of bit rot in other extensions (never contributed) that I 
> have not maintained since 2009 and I now I am unable to fix some of them, so 
> I wonder how much of old knowledge is still applicable. In other words, is 
> what I see in new code just a change of macros or the change of principles?
> 
> Thanks,
> 
> --Gene

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de , 
http://www.cybertec.at 






Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-18 Thread Etsuro Fujita

(2017/12/18 18:14), Amit Langote wrote:

I noticed that this patch introduces a partition_rels field in
ModifyTable, which contains the RT indexes of *all* leaf partitions in the
partition tree.  That means we now expand the partition inheritance tree
in the planner even in the INSERT case, simply because some of the leaf
partitions might be foreign tables which must be looked at by the planner.


Yeah, the patch expands the inheritance tree at planning time as well to 
build an RTE for each partition so that the FDW can use that RTE eg, 
when called from PlanForeignModify.



  I'm somewhat concerned about the performance implications of that.  Now,
to insert even a single row into a partitioned table, which may not even
be routed to any of its foreign partitions, we are going to have to expand
the inheritance twice -- once by the planner to handle foreign partitions
and then by the executor when setting up the tuple routing information.

Is there any reason why we have to to things this way, beside the fact
that the PlanForeignModify() API appears to be callable from only within a
valid planner context?


Another reason for that is for set_plan_references to get such RTEs to 
record plan dependencies so that plancache.c invalidates cached plans 
for foreign partitions.


I suspect that we could avoid the planning-time expansion by doing some 
optimization when inserting a single row into a partitioned table.


Best regards,
Etsuro Fujita



Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Aleksander Alekseev
Hello Anastasia,

Personally I'm very glad PTRACK is finally proposed to be ported to the
community PostgreSQL.

> Since ptrack is basically just an API for use in backup tools, it is
> impossible to test the patch independently.

I believe it's worth to create an extension that will provide access to
the PTRACK's public API. Even if there will be not many users interested
in this extension, we should have at least some basic tests like "if we
write to the table, its pages change", "if we update a tuple,
corresponding indexes change", "if we clean up a bitmap, PTRACK says there
are no changed pages", etc.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Aleksander Alekseev
Hello Anastasia,

> ptrack_9.6.6_v1.4.patch

Also I'm surprised you proposed a patch for 9.6. Since PTRACK is a new
feature I don't believe we are going to backport it.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


File name as application name in regression tests and elsewhere

2017-12-18 Thread Andrew Dunstan

I was doing some work over the weekend and it occurred to me that it
would be quite nice to have the input file name from regression tests
set as the application name, and then use a log_line_prefix setting
including %a, so that this would appear on the logs.


My first thought was to alter all the tests with something like "SET
application_name 'testname.sql';", but then I thought maybe a better way
would be to provide psql with a switch (--file-application-name ?) that
would do this for us, and use it in the regression suite. That should be
a pretty small patch and could be more generally useful.


Thoughts?


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: File name as application name in regression tests and elsewhere

2017-12-18 Thread Aleksander Alekseev
Hello Andrew,

> I was doing some work over the weekend and it occurred to me that it
> would be quite nice to have the input file name from regression tests
> set as the application name, and then use a log_line_prefix setting
> including %a, so that this would appear on the logs.
> 
> My first thought was to alter all the tests with something like "SET
> application_name 'testname.sql';", but then I thought maybe a better way
> would be to provide psql with a switch (--file-application-name ?) that
> would do this for us, and use it in the regression suite. That should be
> a pretty small patch and could be more generally useful.
> 
> Thoughts?

Personally, I see how it could be useful and don't see any drawbacks of
such a patch. I think it's a good idea.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Andrey Borodin
Hello!

Thanks for sharing this work! I think this is important feature to make backups 
more efficient.
> 18 дек. 2017 г., в 15:18, Anastasia Lubennikova 
>  написал(а):
> 
> Patches for v 10.1 and v 9.6 are attached.
> Since ptrack is basically just an API for use in backup tools, it is
> impossible to test the patch independently.
> Now it is integrated with our backup utility, called pg_probackup. You can
> find it herehttps://github.com/postgrespro/pg_probackup
I can add experimental support for WAL-G too. We have QA tools for delta 
backups too.
> 
> Spoiler: Please consider this patch and README as a proof of concept. It
> can be improved in some ways, but in its current state PTRACK is a
> stable prototype, reviewed and tested well enough to find many
> non-trivial corner cases and subtle problems. And any discussion of
> change track algorithm must be aware of them. Feel free to share your
> concerns and point out any shortcomings of the idea or the implementation.
This version of the patch is quite big - basically it accompanies most of 
START_CRIT_SECTION() calls with PTRACK calls.
I have two concerns about this:
1. Does this affect the performance of the database when PTRACK is not enabled?
2. What is the cost of having PTRACK enabled?

Best regards, Andrey Borodin.


Re: genomic locus

2017-12-18 Thread Craig Ringer
On 16 December 2017 at 03:49, Gene Selkov  wrote:

> Greetings everyone,
>
> I need a data type to represent genomic positions, which will consist of a
> string and a pair of integers with interval logic and access methods. Sort
> of like my seg type, but more straightforward.
>
> I noticed somebody took a good care of seg while I was away for the last
> 20 years, and I am extremely grateful for that. I have been using it. In
> the meantime, things have changed and now I am almost clueless about how
> you deal with contributed modules and what steps I should take once I get
> it to work. Also, is it a good idea to clone and fix seg for this purpose,
> or is there a more appropriate template? Or maybe just building it from
> scratch will be a better idea?
>
> I have seen a lot of bit rot in other extensions (never contributed) that
> I have not maintained since 2009 and I now I am unable to fix some of them,
> so I wonder how much of old knowledge is still applicable. In other words,
> is what I see in new code just a change of macros or the change of
> principles?
>

Welcome back!

I think your chances of modifying 'seg' are slim, since you'd surely have
to change the on-disk format and that'd break existing users. Plus probably
change the API.

If you think it'd make logical sense to extend seg with a string descriptor
of some sort and could come up with a name/use case that's not quite so
narrowly focused as genetics alone, then I could see adding it as a
secondary type in the same extension.

But it's more likely that the best course would be to extract the seg
extension from core, rename it, hack it as desired, and build it as an
extension maintained out-of-tree.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


RE: User defined data types in Logical Replication

2017-12-18 Thread Huong Dangminh
Hi Sawada-san,

> Sent: Tuesday, December 12, 2017 9:41 AM
> To: Dang Minh Huong 
> Cc: PostgreSQL Hackers ; Yanagisawa
> Hiroshi(柳澤 博) ; Dangminh Huong(ダンM
> フーン) 
> Subject: Re: User defined data types in Logical Replication
> 
> On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong 
> wrote:
> > On 2017/12/08 13:18, Huong Dangminh wrote:
> >
> >> Hi Sawada-san,
> >>
> >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada
> >>> 
> >>> wrote:
> 
>  On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong
>  
> >>>
> >>> wrote:
> >
> > Hi Sawada-san,
> >
> > Sorry for my late response.
> >
> > On 2017/12/05 0:11, Masahiko Sawada wrote:
> >
> > There is one more case that user-defined data type is not
> > supported in Logical Replication.
> > That is when remote data type's name does not exist in SUBSCRIBE.
> >
> > In relation.c:logicalrep_typmap_gettypname
> > We search OID in syscache by remote's data type name and mapping
> > it, if it does not exist in syscache We will be faced with the
> > bellow error.
> >
> >  if (!OidIsValid(entry->typoid))
> >  ereport(ERROR,
> >
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >   errmsg("data type \"%s.%s\"
> > required for logical replication does not exist",
> >  entry->nspname,
> > entry->typname)));
> >
> > I think, it is not necessary to check typoid here in order to
> > avoid above case, is that right?
> >
> > I think it's not right. We should end up with an error in the case
> > where the same type name doesn't exist on subscriber. With your
> > proposed patch,
> > logicalrep_typmap_gettypname() can return an invalid string
> > (entry->typname) in that case, which can be a cause of SEGV.
> >
> > Thanks, I think you are right.
> > # I thought that entry->typname was received from walsender, and
> > it is already be qualified in logicalrep_write_typ.
> > # But we also need check it in subscriber, because it could be
> > lost info in transmission.
> 
>  Oops, the last sentence of my previous mail was wrong.
>  logicalrep_typmap_gettypname() doesn't return an invalid string
>  since
>  entry->typname is initialized with a type name got from wal sender.
> >
> > Yeah, so we do not need to check the existing of publisher's type name
> > in subscriber.
> 
>  After more thought, we might not need to raise an error even if
>  there is not the same data type on both publisher and subscriber.
>  Because data is sent after converted to the text representation and
>  is converted to a data type according to the local table definition
>  subscribers don't always need to have the same data type. If it's
>  right, slot_store_error_callback() doesn't need to find a
>  corresponding local data type OID but just finds the corresponding
>  type name by remote data type OID. What do you think?
> >
> > I totally agree. It will make logical replication more flexible with
> > data type.
> 
>  --- a/src/backend/replication/logical/worker.c
>  +++ b/src/backend/replication/logical/worker.c
>  @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
>  DLIST_STATIC_INIT(lsn_mapping);
> 
>    typedef struct SlotErrCallbackArg
>    {
>  -LogicalRepRelation *rel;
>  -intattnum;
>  +LogicalRepRelMapEntry *rel;
>  +intremote_attnum;
>  +intlocal_attnum;
>    } SlotErrCallbackArg;
> 
>  Since LogicalRepRelMapEntry has a map of local attributes to remote
>  ones we don't need to have two attribute numbers.
> 
> >> Sorry for the late reply.
> >>
> >>> Attached the patch incorporated what I have on mind. Please review it.
> >>
> >> Thanks for the patch, I will do it at this weekend.
> >
> > Your patch is fine for me.
> > But logicalrep_typmap_getid will be unused.
> 
> Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow to
> replicate data to an another data type of column, what is the data type
> mapping on subscriber for? It seems enough to have remote data type oid,
> namespace and name.

Yeah, so the meaning of the type map will be disappeared, aren't you?
I updated the patch with removing of LogicalRepTyp.typoid and changing
concept "typmap" to "remote type".
How do you think?

---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/





fix_slot_store_error_callback_v6.patch
Description: fix_slot_store_error_callback_v6.patch


RE: User defined data types in Logical Replication

2017-12-18 Thread Huong Dangminh
Hi Sawada-san,

> > Sent: Tuesday, December 12, 2017 9:41 AM
> > To: Dang Minh Huong 
> > Cc: PostgreSQL Hackers ; Yanagisawa
> > Hiroshi(柳澤 博) ; Dangminh Huong(ダン
> M
> > フーン) 
> > Subject: Re: User defined data types in Logical Replication
> >
> > On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong
> > 
> > wrote:
> > > On 2017/12/08 13:18, Huong Dangminh wrote:
> > >
> > >> Hi Sawada-san,
> > >>
> > >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada
> > >>> 
> > >>> wrote:
> > 
> >  On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong
> >  
> > >>>
> > >>> wrote:
> > >
> > > Hi Sawada-san,
> > >
> > > Sorry for my late response.
> > >
> > > On 2017/12/05 0:11, Masahiko Sawada wrote:
> > >
> > > There is one more case that user-defined data type is not
> > > supported in Logical Replication.
> > > That is when remote data type's name does not exist in SUBSCRIBE.
> > >
> > > In relation.c:logicalrep_typmap_gettypname
> > > We search OID in syscache by remote's data type name and mapping
> > > it, if it does not exist in syscache We will be faced with the
> > > bellow error.
> > >
> > >  if (!OidIsValid(entry->typoid))
> > >  ereport(ERROR,
> > >
> > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >   errmsg("data type \"%s.%s\"
> > > required for logical replication does not exist",
> > >
> entry->nspname,
> > > entry->typname)));
> > >
> > > I think, it is not necessary to check typoid here in order to
> > > avoid above case, is that right?
> > >
> > > I think it's not right. We should end up with an error in the
> > > case where the same type name doesn't exist on subscriber. With
> > > your proposed patch,
> > > logicalrep_typmap_gettypname() can return an invalid string
> > > (entry->typname) in that case, which can be a cause of SEGV.
> > >
> > > Thanks, I think you are right.
> > > # I thought that entry->typname was received from walsender, and
> > > it is already be qualified in logicalrep_write_typ.
> > > # But we also need check it in subscriber, because it could be
> > > lost info in transmission.
> > 
> >  Oops, the last sentence of my previous mail was wrong.
> >  logicalrep_typmap_gettypname() doesn't return an invalid string
> >  since
> >  entry->typname is initialized with a type name got from wal sender.
> > >
> > > Yeah, so we do not need to check the existing of publisher's type
> > > name in subscriber.
> > 
> >  After more thought, we might not need to raise an error even if
> >  there is not the same data type on both publisher and subscriber.
> >  Because data is sent after converted to the text representation
> >  and is converted to a data type according to the local table
> >  definition subscribers don't always need to have the same data
> >  type. If it's right, slot_store_error_callback() doesn't need to
> >  find a corresponding local data type OID but just finds the
> >  corresponding type name by remote data type OID. What do you think?
> > >
> > > I totally agree. It will make logical replication more flexible with
> > > data type.
> > 
> >  --- a/src/backend/replication/logical/worker.c
> >  +++ b/src/backend/replication/logical/worker.c
> >  @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> >  DLIST_STATIC_INIT(lsn_mapping);
> > 
> >    typedef struct SlotErrCallbackArg
> >    {
> >  -LogicalRepRelation *rel;
> >  -intattnum;
> >  +LogicalRepRelMapEntry *rel;
> >  +intremote_attnum;
> >  +intlocal_attnum;
> >    } SlotErrCallbackArg;
> > 
> >  Since LogicalRepRelMapEntry has a map of local attributes to
> >  remote ones we don't need to have two attribute numbers.
> > 
> > >> Sorry for the late reply.
> > >>
> > >>> Attached the patch incorporated what I have on mind. Please review
> it.
> > >>
> > >> Thanks for the patch, I will do it at this weekend.
> > >
> > > Your patch is fine for me.
> > > But logicalrep_typmap_getid will be unused.
> >
> > Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
> > to replicate data to an another data type of column, what is the data
> > type mapping on subscriber for? It seems enough to have remote data
> > type oid, namespace and name.
> 
> Yeah, so the meaning of the type map will be disappeared, aren't you?
> I updated the patch with removing of LogicalRepTyp.typoid and changing
> concept "typmap" to "remote type".
> How do you think?

Sorry, Please confirm V7 of patch (attached in this mail).


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/





fix_slot_store_error_callback_v7.patch
Description: fix_slot_store_error_callback_v7.

Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Aleksander Alekseev
Hello hackers,

> I have two concerns about this:
> 1. Does this affect the performance of the database when PTRACK is not 
> enabled?
> 2. What is the cost of having PTRACK enabled?

I played with this patch a bit and did a simple benchmark on my laptop.

Configuration:

```
make distclean && ./configure prefix=/some/path/ && make -s -j4
```

postgresql.conf:

```
max_prepared_transactions = 100
wal_level = logical
wal_keep_segments = 128
max_connections = 100
wal_log_hints = on
max_wal_senders = 8
wal_keep_segments = 64
listen_addresses = '*'
hot_standby = on
log_statement = all
max_locks_per_transaction = 256
shared_buffers = 1GB
```

The benchmark:

```
pgbench -i && pgbench -j 4 -c 4 -T 300 -P 10
```

Here are the results.

10.1, ptrack_enable=false

transaction type: 
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 28772
latency average = 41.705 ms
latency stddev = 18.274 ms
tps = 95.895605 (including connections establishing)
tps = 95.906434 (excluding connections establishing)

10.1, ptrack_enable=true

scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 28607
latency average = 41.951 ms
latency stddev = 18.454 ms
tps = 95.344363 (including connections establishing)
tps = 95.345290 (excluding connections establishing)


10.1, without ptrack

transaction type: 
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 28622
latency average = 41.928 ms
latency stddev = 18.238 ms
tps = 95.396155 (including connections establishing)
tps = 95.397406 (excluding connections establishing)


At first glance PTRACK doesn't seem to affect the overall performance
significantly.

There are a few minor issues with the patch. There is a missing '/'
symbol in the comment before ptrack_get_and_clear procedure:

```
 * Get ptrack file as bytea and clear it */
bytea *
ptrack_get_and_clear(Oid tablespace_oid, Oid table_oid)
{
```

Also I believe the patch should include some changes of
postgresql.conf.sample.

I suggest to add this patch to the closest commitfest. Otherwise it can
be lost.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: User defined data types in Logical Replication

2017-12-18 Thread Petr Jelinek
Hi,

On 18/12/17 14:28, Huong Dangminh wrote:

 Your patch is fine for me.
 But logicalrep_typmap_getid will be unused.
>>>
>>> Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
>>> to replicate data to an another data type of column, what is the data
>>> type mapping on subscriber for? It seems enough to have remote data
>>> type oid, namespace and name.
>>
>> Yeah, so the meaning of the type map will be disappeared, aren't you?
>> I updated the patch with removing of LogicalRepTyp.typoid and changing
>> concept "typmap" to "remote type".
>> How do you think?
> 
> Sorry, Please confirm V7 of patch (attached in this mail).
> 

I think the changes make sense in terms of how it all works now.

That said I don't think the renaming idea is a good one, the naming was
chosen to be future proof because eventually we'll need to map types to
local oid (and possibly more) where the local info is cached so that we
can interpret binary representation of replicated data (which we'll add
at some point since it's big performance boost).

So I am afraid that if we do the rename of typmap to remotetype in this
patch it will a) make backports of fixes in the related code harder, b)
force us to rename it back again in the future.

I'd keep your general approach but keep using typmap naming.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: GSoC 2018

2017-12-18 Thread Stephen Frost
Aleksander,

* Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote:
> Thanks you a lot for your feedback. I modified the project description
> to make it more clear that it implies augmenting an existing HA
> solution, particularly Stolon, and doesn't imply solving existing
> limitations of logical replication like lack of DDL replication.

Sounds great, thanks!

> Removing some of these limitations or adding logical replication support
> to other existing HA applications like RepMgr or Patroni or corosync /
> Pacemaker or maybe Postgres-XL sounds like good ideas for other GSoC
> projects. I strongly encourage to add these ideas to wiki anyone who is
> willing to be a mentor.

Yes, it'd be great for anyone who can mentor a project along these lines
to add it to the wiki.

> Also today I'm going to contact Simone Gotti, the main developer of
> Stolon, to let him know about this thread. I wonder what are his
> thoughts on all this.

Great.

Thanks again!

Stephen


signature.asc
Description: Digital signature


RE: User defined data types in Logical Replication

2017-12-18 Thread Huong Dangminh
Hi Petr Jelineks, Sawada-san

> I think the changes make sense in terms of how it all works now.
> 
> That said I don't think the renaming idea is a good one, the naming was
> chosen to be future proof because eventually we'll need to map types to
> local oid (and possibly more) where the local info is cached so that we
> can interpret binary representation of replicated data (which we'll add
> at some point since it's big performance boost).
> 
> So I am afraid that if we do the rename of typmap to remotetype in this
> patch it will a) make backports of fixes in the related code harder, b)
> force us to rename it back again in the future.

Thanks for your comment.

> I'd keep your general approach but keep using typmap naming.

I update the patch as Petr Jelineks mention, keep using typmap naming.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


fix_slot_store_error_callback_v8.patch
Description: fix_slot_store_error_callback_v8.patch


Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-18 Thread Alvaro Herrera
InitResultRelInfo becomes unintelligible after this patch -- it was
straightforward but adding partition_root makes things shaky.  Please
add a proper comment indicating what each argument is.  (I wonder why
this function needs a local variable "partition_check" -- seems
pointless).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [PATCH] Improve geometric types

2017-12-18 Thread Alvaro Herrera
Does this patch series fix bug #14971?
https://www.postgresql.org/message-id/20171213172806.20142.73...@wrigleys.postgresql.org

Eric: see
https://www.postgresql.org/message-id/CAE2gYzzvp=uvpw4fytoaevyk-wze4jw7u2s1glrok35mr1-...@mail.gmail.com
for last version of patch.

Motivation for patch is at
https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pltcl valgrind output

2017-12-18 Thread Andrew Dunstan

I've been adding support for valgrind to the buildfarm client (code will
hit the git repo shortly). Mostly the results have been pretty clean,
but the pltcl tests generated the attached output. Perhaps someone with
more valgrind-fu than I have so far would like to use the information to
extend our supp file appropriately (or fix what it's complaining about).


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

==10747== Conditional jump or move depends on uninitialised value(s)
==10747==at 0x19A0AD6F: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x1997E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19A93F4C: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19981DB2: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x1997E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10747==by 0x199808C1: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E72: Tcl_EvalEx (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E94: Tcl_Eval (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19A88C4E: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19982E8F: Tcl_CreateInterp (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19732869: _PG_init (pltcl.c:432)
==10747==by 0x995105: internal_load_library (dfmgr.c:281)
==10747==by 0x994B8D: load_external_function (dfmgr.c:105)
==10747==by 0x57149B: fmgr_c_validator (pg_proc.c:819)
==10747==by 0x998459: OidFunctionCall1Coll (fmgr.c:1327)
==10747==by 0x57117A: ProcedureCreate (pg_proc.c:722)
==10747==by 0x62C700: CreateProceduralLanguage (proclang.c:122)
==10747==by 0x83C598: ProcessUtilitySlow (utility.c:1489)
==10747==by 0x83B580: standard_ProcessUtility (utility.c:932)
==10747==by 0x83A6F4: ProcessUtility (utility.c:357)
==10747==  Uninitialised value was created by a stack allocation
==10747==at 0x19A0AAFD: ??? (in /usr/lib64/libtcl8.6.so)
==10747== 
{
   
   Memcheck:Cond
   obj:/usr/lib64/libtcl8.6.so
   fun:TclNRRunCallbacks
   obj:/usr/lib64/libtcl8.6.so
   obj:/usr/lib64/libtcl8.6.so
   fun:TclNRRunCallbacks
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_EvalEx
   fun:Tcl_Eval
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_CreateInterp
   fun:_PG_init
   fun:internal_load_library
   fun:load_external_function
   fun:fmgr_c_validator
   fun:OidFunctionCall1Coll
   fun:ProcedureCreate
   fun:CreateProceduralLanguage
   fun:ProcessUtilitySlow
   fun:standard_ProcessUtility
   fun:ProcessUtility
}
==10747== Conditional jump or move depends on uninitialised value(s)
==10747==at 0x19A0AD6F: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x1997E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10747==by 0x199808C1: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E72: Tcl_EvalEx (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E94: Tcl_Eval (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19A88C61: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19982E8F: Tcl_CreateInterp (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19732869: _PG_init (pltcl.c:432)
==10747==by 0x995105: internal_load_library (dfmgr.c:281)
==10747==by 0x994B8D: load_external_function (dfmgr.c:105)
==10747==by 0x57149B: fmgr_c_validator (pg_proc.c:819)
==10747==by 0x998459: OidFunctionCall1Coll (fmgr.c:1327)
==10747==by 0x57117A: ProcedureCreate (pg_proc.c:722)
==10747==by 0x62C700: CreateProceduralLanguage (proclang.c:122)
==10747==by 0x83C598: ProcessUtilitySlow (utility.c:1489)
==10747==by 0x83B580: standard_ProcessUtility (utility.c:932)
==10747==by 0x83A6F4: ProcessUtility (utility.c:357)
==10747==by 0x60E62C: execute_sql_string (extension.c:763)
==10747==by 0x60EAB2: execute_extension_script (extension.c:924)
==10747==by 0x60F983: CreateExtensionInternal (extension.c:1529)
==10747==  Uninitialised value was created by a stack allocation
==10747==at 0x19A0AAFD: ??? (in /usr/lib64/libtcl8.6.so)
==10747== 
{
   
   Memcheck:Cond
   obj:/usr/lib64/libtcl8.6.so
   fun:TclNRRunCallbacks
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_EvalEx
   fun:Tcl_Eval
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_CreateInterp
   fun:_PG_init
   fun:internal_load_library
   fun:load_external_function
   fun:fmgr_c_validator
   fun:OidFunctionCall1Coll
   fun:ProcedureCreate
   fun:CreateProceduralLanguage
   fun:ProcessUtilitySlow
   fun:standard_ProcessUtility
   fun:ProcessUtility
   fun:execute_sql_string
   fun:execute_extension_script
   fun:CreateExtensionInternal
}
==10834== Conditional jump or move depends on uninitialised value(s)
==10834==at 0x1A20AD6F: ??? (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A17E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A293F4C: ??? (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A181DB2: ??? (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A17E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A1808C1

Re: [HACKERS] [PATCH] Improve geometric types

2017-12-18 Thread Emre Hasegeli
> Does this patch series fix bug #14971?

No, it doesn't.  I am trying to improve things without touching the EPSILON.



Re: [HACKERS] Custom compression methods

2017-12-18 Thread Tomas Vondra


On 12/17/2017 04:32 AM, Robert Haas wrote:
> On Thu, Dec 14, 2017 at 12:23 PM, Tomas Vondra
>  wrote:
>> Can you give an example of such algorithm? Because I haven't seen such
>> example, and I find arguments based on hypothetical compression methods
>> somewhat suspicious.
>>
>> FWIW I'm not against considering such compression methods, but OTOH it
>> may not be such a great primary use case to drive the overall design.
> 
> Well it isn't, really.  I am honestly not sure what we're arguing
> about at this point.  I think you've agreed that (1) opening avenues
> for extensibility is useful, (2) substitution a general-purpose
> compression algorithm could be useful, and (3) having datatype
> compression that is enabled through TOAST rather than built into the
> datatype might sometimes be desirable.  That's more than adequate
> justification for this proposal, whether half-general compression
> methods exist or not.  I am prepared to concede that there may be no
> useful examples of such a thing.
> 

I don't think we're arguing - we're discussing if a proposed patch is
the right design solving relevant use cases.

I personally am not quite convinced about that, for the reason I tried
to explain in my previous messages. I see it as a poor alternative to
compression built into the data type. I do like the idea of compression
with external dictionary, however.

But don't forget that it's not me in this thread - it's my evil twin,
moonlighting as Mr. Devil's lawyer ;-)

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-18 Thread Alvaro Herrera
After this discussion, this is how I see things working:

1. pg_dump
   a) creates indexes on partitions normally
   b) once all existing indexes are done, index on parent is created,
  with ONLY.  No cascading occurs, no indexes are attached.
   c) ATTACH is run for each existing partition index.  After each
  ATTACH, we check that all indexes exist.  If so, the parent is
  marked valid.
   d) if not all indexes existed in partitions, index on parent remains
  invalid.  (It was invalid in the dumped database, so this is
  correct.)

2. other uses
   Normal CREATE INDEX (without ONLY) recurses and attaches the first
   matching index it finds (no duplicate indexes are created);
   partitions without a matching index get one created.

3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
   it remains so forever.

I think this satisfies all concerns.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera
 wrote:
> After this discussion, this is how I see things working:
>
> 1. pg_dump
>a) creates indexes on partitions normally
>b) once all existing indexes are done, index on parent is created,
>   with ONLY.  No cascading occurs, no indexes are attached.
>c) ATTACH is run for each existing partition index.  After each
>   ATTACH, we check that all indexes exist.  If so, the parent is
>   marked valid.
>d) if not all indexes existed in partitions, index on parent remains
>   invalid.  (It was invalid in the dumped database, so this is
>   correct.)
>
> 2. other uses
>Normal CREATE INDEX (without ONLY) recurses and attaches the first
>matching index it finds (no duplicate indexes are created);
>partitions without a matching index get one created.
>
> 3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
>it remains so forever.
>
> I think this satisfies all concerns.

Sounds great to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Custom compression methods

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 10:43 AM, Tomas Vondra
 wrote:
> I personally am not quite convinced about that, for the reason I tried
> to explain in my previous messages. I see it as a poor alternative to
> compression built into the data type. I do like the idea of compression
> with external dictionary, however.

I think that compression built into the datatype and what is proposed
here are both useful and everybody's free to work on either one as the
prefer, so I don't see that as a reason not to accept this patch.  And
I think this patch can be a stepping stone toward compression with an
external dictionary, so that seems like an affirmative reason to
accept this patch.

> But don't forget that it's not me in this thread - it's my evil twin,
> moonlighting as Mr. Devil's lawyer ;-)

Well, I don't mind you objecting to the patch under any persona, but
so far I'm not finding your reasons convincing...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Protect syscache from bloating with negative cache entries

2017-12-18 Thread Andres Freund
On 2017-12-17 19:23:45 -0500, Robert Haas wrote:
> On Sat, Dec 16, 2017 at 11:42 PM, Andres Freund  wrote:
> >> I'm not sure we should regard very quick bloating as a problem in need
> >> of solving.  Doesn't that just mean we need the cache to be bigger, at
> >> least temporarily?
> >
> > Leaving that aside, is that actually not at least to a good degree,
> > solved by that problem? By bumping the generation on hash resize, we
> > have recency information we can take into account.
>
> I agree that we can do it.  I'm just not totally sure it's a good
> idea.  I'm also not totally sure it's a bad idea, either.  That's why
> I asked the question.

I'm not 100% convinced either - but I also don't think it matters all
that terribly much. As long as the overall hash hit rate is decent,
minor increases in the absolute number of misses don't really matter
that much for syscache imo.  I'd personally go for something like:

1) When about to resize, check if there's entries of a generation -2
   around.

   Don't resize if more than 15% of entries could be freed. Also, stop
   reclaiming at that threshold, to avoid unnecessary purging cache
   entries.

   Using two generations allows a bit more time for cache entries to
   marked as fresh before resizing next.

2) While resizing increment generation count by one.

3) Once a minute, increment generation count by one.


The one thing I'm not quite have a good handle upon is how much, and if
any, cache reclamation to do at 3). We don't really want to throw away
all the caches just because a connection has been idle for a few
minutes, in a connection pool that can happen occasionally. I think I'd
for now *not* do any reclamation except at resize boundaries.

Greetings,

Andres Freund



Re: Protect syscache from bloating with negative cache entries

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 11:46 AM, Andres Freund  wrote:
> I'm not 100% convinced either - but I also don't think it matters all
> that terribly much. As long as the overall hash hit rate is decent,
> minor increases in the absolute number of misses don't really matter
> that much for syscache imo.  I'd personally go for something like:
>
> 1) When about to resize, check if there's entries of a generation -2
>around.
>
>Don't resize if more than 15% of entries could be freed. Also, stop
>reclaiming at that threshold, to avoid unnecessary purging cache
>entries.
>
>Using two generations allows a bit more time for cache entries to
>marked as fresh before resizing next.
>
> 2) While resizing increment generation count by one.
>
> 3) Once a minute, increment generation count by one.
>
>
> The one thing I'm not quite have a good handle upon is how much, and if
> any, cache reclamation to do at 3). We don't really want to throw away
> all the caches just because a connection has been idle for a few
> minutes, in a connection pool that can happen occasionally. I think I'd
> for now *not* do any reclamation except at resize boundaries.

My starting inclination was almost the opposite.  I think that you
might be right that a minute or two of idle time isn't sufficient
reason to flush our local cache, but I'd be inclined to fix that by
incrementing the generation count every 10 minutes or so rather than
every minute, and still flush things more then 1 generation old.  The
reason for that is that I think we should ensure that the system
doesn't sit there idle forever with a giant cache.  If it's not using
those cache entries, I'd rather have it discard them and rebuild the
cache when it becomes active again.

Now, I also see that your point about trying to clean up before
resizing.  That does seem like a good idea, although we have to be
careful not to be too eager to clean up there, or we'll just result in
artificially limiting the cache size when it's unwise to do so.  But I
guess that's what you meant by "Also, stop reclaiming at that
threshold, to avoid unnecessary purging cache entries."  I think the
idea you are proposing is that:

1. The first time we are due to expand the hash table, we check
whether we can forestall that expansion by doing a cleanup; if so, we
do that instead.

2. After that, we just expand.

That seems like a fairly good idea, although it might be a better idea
to allow cleanup if enough time has passed.  If we hit the expansion
threshold twice an hour apart, there's no reason not to try cleanup
again.

Generally, the way I'm viewing this is that a syscache entry means
paying memory to save CPU time.  Each 8kB of memory we use to store
system cache entries is one less block we have for the OS page cache
to hold onto our data blocks.  If we had an oracle (the kind from
Delphi, not Redwood City) that told us with perfect accuracy when to
discard syscache entries, it would throw away syscache entries
whenever the marginal execution-time performance we could buy from
another 8kB in the page cache is greater than the marginal
execution-time performance we could buy from those syscache entries.
In reality, it's hard to know which of those things is of greater
value.  If the system isn't meaningfully memory-constrained, we ought
to just always hang onto the syscache entries, as we do today, but
it's hard to know that.  I think the place where this really becomes a
problem is on system with hundreds of connections + thousands of
tables + connection pooling; without some back-pressure, every backend
eventually caches everything, putting the system under severe memory
pressure for basically no performance gain.  Each new use of the
connection is probably for a limited set of tables, and only those
tables really syscache entries; holding onto things used long in the
past doesn't save enough to justify the memory used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: es_query_dsa is broken

2017-12-18 Thread Robert Haas
On Sun, Dec 17, 2017 at 7:35 PM, Thomas Munro
 wrote:
> Andreas Seltenreich found a query[1] that suffers from this bug
> (thanks!), and Amit confirmed that this patch fixes it, but further
> sqlsmith fuzz testing with the patch applied also revealed that it
> failed to handle the case where pei is NULL because parallelism was
> inhibited.  Here's a new version to fix that.  Someone might prefer a
> coding with "if" rather than the ternary operator, but I didn't have a
> strong preference.

Looks OK to me.  Committed and back-patched to v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: autoprewarm is fogetting to register a tranche.

2017-12-18 Thread Robert Haas
On Fri, Dec 15, 2017 at 3:32 AM, Kyotaro HORIGUCHI
 wrote:
> Hello, I noticed while an investigation that pg_prewarm is
> forgetting to register a tranche.

Oops.

> In passing, I found it hard to register a tranche in
> apw_init_shmem() because a process using the initialized shared
> memory struct cannot find the tranche id (without intruding into
> the internal of LWLock strcut). So I'd like to propose another
> tranche registering interface LWLockRegisterTrancheOfLock(LWLock
> *, int). The interface gets rid of the necessity of storing
> tranche id separately from LWLock. (in patch 0001)

I don't think we really need this.  If it suits a particular caller to
to pass somelock->tranche to LWLockRegisterTranche, fine, but they can
do that with or without this function.  It's basically a one-line
function, so I don't see the point.

> The comment cited above looks a bit different from how the
> interface is actually used. So I rearranged the comment following
> a typical use I have seen in several cases. (in patch 0001)

I don't really see a need to change this.  It's true that what's
currently #3 could be done before #2, but I hesitate to call that a
typical practice.  Also, I'm worried that it could create the
impression that it's OK to use an LWLock before registering the
tranche, and it's really not.

> The final patch 0003 should be arguable, or anticipated to be
> rejected. It cannot be detect that a tranche is not registered
> until its name is actually accessed (or many eewon't care even if
> the name were printed as '(null)' in an error message that is the
> result of the developer's own bug). On the other hand there's no
> chance to detect that before the lock is actually used. By the
> last patch I added an assertion in LWLockAcquire() but it must
> hit performance in certain dgree (even if it is only in
> --enable-cassert build) so I don't insisit that it must be there.

Actually, I think this is a good idea as long as it doesn't hurt
performance too much.  It catches something that would otherwise be
hard to check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 8:34 AM, Aleksander Alekseev
 wrote:
> 10.1, without ptrack
>
> transaction type: 
> scaling factor: 1
> query mode: simple
> number of clients: 4
> number of threads: 4
> duration: 300 s
> number of transactions actually processed: 28622
> latency average = 41.928 ms
> latency stddev = 18.238 ms
> tps = 95.396155 (including connections establishing)
> tps = 95.397406 (excluding connections establishing)
>
>
> At first glance PTRACK doesn't seem to affect the overall performance
> significantly.

I think this doesn't really show much because it's apparently limited
by the speed of fsync() on your filesystem.  You might try running the
test with synchronous_commit=off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-12-18 Thread Fujii Masao
On Fri, Dec 15, 2017 at 9:51 AM, Masahiko Sawada  wrote:
> On Fri, Dec 15, 2017 at 6:46 AM, Michael Paquier
>  wrote:
>> On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao  wrote:
>>> On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
>>>  wrote:
 On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada  
 wrote:
> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas  wrote:
>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>  wrote:
>>> I would just write "To
>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>> LWLock" and be done with it. There is no point to list a full function
>>> dependency list, which could change in the future with static routines
>>> of lwlock.c.
>
> Agreed. Updated the comment.

 Robert actually liked adding the complete routine list. Let's see what
 Fujii-san thinks at the end, there is still some time until the next
 round of minor releases.
>>>
>>> What I think is the patch I attached. Thought?
>>
>> That's OK for me. Thanks.
>
> +1 from me.

Committed. Thanks!

Regards,

-- 
Fujii Masao



Re: File name as application name in regression tests and elsewhere

2017-12-18 Thread Peter Eisentraut
On 12/18/17 06:59, Andrew Dunstan wrote:
> I was doing some work over the weekend and it occurred to me that it
> would be quite nice to have the input file name from regression tests
> set as the application name, and then use a log_line_prefix setting
> including %a, so that this would appear on the logs.

It does do that already, as of commit
a4327296df7366ecc657b706a9b5e87aa921311a.  Is it not working for you?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] static assertions in C++

2017-12-18 Thread Peter Eisentraut
On 12/11/17 17:12, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 12/11/17 16:45, Tom Lane wrote:
>>> (BTW, why is it that we can't fall back on the negative-width-bitfield
>>> trick for old g++?)
> 
>> The complaint is
>> error: types may not be defined in 'sizeof' expressions
> 
> Hmm, well, surely there's more than one way to do that; the sizeof
> is just a convenient way to wrap it in C.  Wouldn't a typedef serve
> just as well?

Here is another attempt, which has the desired effect with the handful
of compilers I have available.

(With the recent changes to AllocSetContextCreate() using a static
assertion, the current state now breaks actual extensions written in C++
in some configurations, so this has become a bit of a must-fix for PG11.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d631fcb1fb2babef618bc9b3eba3f5591970a609 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 30 Aug 2016 12:00:00 -0400
Subject: [PATCH v4] Add support for static assertions in C++

This allows modules written in C++ to use or include header files that
use StaticAssertStmt() etc.
---
 src/include/c.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/include/c.h b/src/include/c.h
index 11fcffbae3..22535a7deb 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -754,6 +754,7 @@ typedef NameData *Name;
  * about a negative width for a struct bit-field.  This will not include a
  * helpful error message, but it beats not getting an error at all.
  */
+#ifndef __cplusplus
 #ifdef HAVE__STATIC_ASSERT
 #define StaticAssertStmt(condition, errmessage) \
do { _Static_assert(condition, errmessage); } while(0)
@@ -765,6 +766,19 @@ typedef NameData *Name;
 #define StaticAssertExpr(condition, errmessage) \
StaticAssertStmt(condition, errmessage)
 #endif /* HAVE__STATIC_ASSERT 
*/
+#else  /* C++ */
+#if defined(__cpp_static_assert) && __cpp_static_assert >= 200410
+#define StaticAssertStmt(condition, errmessage) \
+   static_assert(condition, errmessage)
+#define StaticAssertExpr(condition, errmessage) \
+   StaticAssertStmt(condition, errmessage)
+#else
+#define StaticAssertStmt(condition, errmessage) \
+   do { struct static_assert_struct { int static_assert_failure : 
(condition) ? 1 : -1; }; } while(0)
+#define StaticAssertExpr(condition, errmessage) \
+   ({ StaticAssertStmt(condition, errmessage); })
+#endif
+#endif /* C++ */
 
 
 /*

base-commit: 56a95ee5118bf6d46e261b8d352f7dafac10587d
-- 
2.15.1



Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-12-18 Thread Masahiko Sawada
On Tue, Dec 19, 2017 at 3:52 AM, Fujii Masao  wrote:
> On Fri, Dec 15, 2017 at 9:51 AM, Masahiko Sawada  
> wrote:
>> On Fri, Dec 15, 2017 at 6:46 AM, Michael Paquier
>>  wrote:
>>> On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao  wrote:
 On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
  wrote:
> On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada  
> wrote:
>> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas  
>> wrote:
>>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>>  wrote:
 I would just write "To
 avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
 LWLock" and be done with it. There is no point to list a full function
 dependency list, which could change in the future with static routines
 of lwlock.c.
>>
>> Agreed. Updated the comment.
>
> Robert actually liked adding the complete routine list. Let's see what
> Fujii-san thinks at the end, there is still some time until the next
> round of minor releases.

 What I think is the patch I attached. Thought?
>>>
>>> That's OK for me. Thanks.
>>
>> +1 from me.
>
> Committed. Thanks!
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-18 Thread David Rowley
On 19 December 2017 at 05:08, Robert Haas  wrote:
> On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera
>  wrote:
>> After this discussion, this is how I see things working:
>>
>> 1. pg_dump
>>a) creates indexes on partitions normally
>>b) once all existing indexes are done, index on parent is created,
>>   with ONLY.  No cascading occurs, no indexes are attached.
>>c) ATTACH is run for each existing partition index.  After each
>>   ATTACH, we check that all indexes exist.  If so, the parent is
>>   marked valid.
>>d) if not all indexes existed in partitions, index on parent remains
>>   invalid.  (It was invalid in the dumped database, so this is
>>   correct.)
>>
>> 2. other uses
>>Normal CREATE INDEX (without ONLY) recurses and attaches the first
>>matching index it finds (no duplicate indexes are created);
>>partitions without a matching index get one created.
>>
>> 3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
>>it remains so forever.
>>
>> I think this satisfies all concerns.
>
> Sounds great to me.

and me.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Logical replication without a Primary Key

2017-12-18 Thread Joshua D. Drake

On 12/07/2017 12:39 PM, Andres Freund wrote:


Not a problem. If you updated both rows, then there's two cases:
a) the update actually changed the column values. In which case the first 
per-row
change that's replicated updates the first row, but the second one won't
again find it as matching in all columns.
b) the update didn't actually change anything. In which case the same
row gets updated twice, but because the column values didn't change,
that doesn't matter.


I may be misunderstanding what is said above but if I ran a test:

Publisher:
reptest=# \d foorep
 Table "public.foorep"
 Column | Type | Collation | Nullable | Default
+--+---+--+-
 one    | text |   |  |
 two    | text |   |  |
Publications:
    "reptestpub"

reptest=# select * from foorep;
 one | two
-+-
 c   | b
 c   | b
 c   | b
(3 rows)

reptest=# update foorep set one = 'd';
UPDATE 3
reptest=# select * from foorep;
 one | two
-+-
 d   | b
 d   | b
 d   | b
(3 rows)

Subscriber before Publisher update:
reptest=# select * from foorep ;
 one | two
-+-
 c   | b
 c   | b
 c   | b
(3 rows)

Subscriber after Publisher update:
reptest=# select * from foorep ;
 one | two
-+-
 d   | b
 d   | b
 d   | b
(3 rows)

This is the behavior I was expecting. As I said, I may have 
misunderstood the responses but it is acting as I would expect.


Thanks!

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Logical replication without a Primary Key

2017-12-18 Thread Andres Freund
On 2017-12-18 12:43:24 -0800, Joshua D. Drake wrote:
> This is the behavior I was expecting. As I said, I may have misunderstood
> the responses but it is acting as I would expect.

Just ot make sure: You're saying there's no problem here, and that
logical rep is behaving correctly, right?

FWIW, I wonder if we need to add a warning somewhere about FULL
replication, given it's essentially O(#changes * #rows) -> O(n^2) for
updating the whole table.

Greetings,

Andres Freund



Re: Logical replication without a Primary Key

2017-12-18 Thread Joshua D. Drake

On 12/18/2017 12:52 PM, Andres Freund wrote:


Just ot make sure: You're saying there's no problem here, and that
logical rep is behaving correctly, right?


Correct. I am not sure where the miscommunication was (fully willing to 
accept it was on my side) but if I update multiple rows in a single 
statement, all the rows that were modified get replicated. That is the 
behavior I would have expected.



FWIW, I wonder if we need to add a warning somewhere about FULL
replication, given it's essentially O(#changes * #rows) -> O(n^2) for
updating the whole table.


The docs do mention it is a performance hit but something a little more 
"IF YOU DO THIS, BEWARE" may be good.


Thanks,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Logical replication without a Primary Key

2017-12-18 Thread Petr Jelinek
On 18/12/17 21:57, Joshua D. Drake wrote:
> On 12/18/2017 12:52 PM, Andres Freund wrote:
>>
>> Just ot make sure: You're saying there's no problem here, and that
>> logical rep is behaving correctly, right?
> 
> Correct. I am not sure where the miscommunication was (fully willing to
> accept it was on my side) but if I update multiple rows in a single
> statement, all the rows that were modified get replicated. That is the
> behavior I would have expected.
> 

I think it's because we said if you update single row on upstream which
may be confusing in case of multiple rows. It will update single row on
downstream even though there is 4 same rows on downstream. That's still
true. In your test however you have 4 same rows on downstream but you
also have same 4 rows on upstream which you all updated. So you got 4
row updates which were replicated and each of those 4 updates changed
one row.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: BUG #14941: Vacuum crashes

2017-12-18 Thread Masahiko Sawada
On Tue, Dec 19, 2017 at 2:35 AM, Bossart, Nathan  wrote:
> On 12/11/17, 9:41 PM, "Masahiko Sawada"  wrote:
>> For 0003 patch, if we do VACUUM (NOWAIT) to whole database we don't
>> get the above WARNING messages, but I think we should get them. The
>> reported issue also did VACUUM FULL and REINDEX to whole database.
>> Especially when VACUUM to whole database the information of skipped
>> relations would be useful for users.
>
> Here is a new set of patches.  0001 and 0002 are essentially the same
> as before except for a rebase and some small indentation fixes.  0003
> now includes logic to log all skipped relations regardless of whether
> they were specified in the command.  I've also modified the isolation
> test slightly to use partitioned tables instead of running VACUUM and
> ANALYZE on the whole database.  This helps prevent timeouts on build-
> farm machines with CLOBBER_CACHE_ALWAYS enabled (like with 0a3edbb3).
>

Thank you for updating the patches.

According to the following old comment, there might be reason why we
didn't pass the information to vacuum_rel(). But your patch fetches
the relation
name even if the "relation" is not provided. I wonder if it can be
problem in some cases.

-* If the RangeVar is not defined, we do not have
enough information
-* to provide a meaningful log statement.  Chances are that
-* vacuum_rel's caller has intentionally not provided
this information
-* so that this logging is skipped, anyway.
+* If relname is NULL, we do not have enough
information to provide a
+* meaningful log statement.  Chances are that this
information was
+* intentionally not provided so that this logging is
skipped, anyway.


It would be an another discussion but autovacuum logs usually use
explicit names. Since the following two logs could be emitted by
autovacuum I wonder if we can make an explicit relation name if we're
autovacuum.

 if (elevel != 0)
 {
 if (!rel_lock)
 ereport(elevel,
 (errcode(ERRCODE_LOCK_NOT_AVAILABLE),
  errmsg("skipping vacuum of \"%s\" --- lock not available",
relname)));
 else
 ereport(elevel,
 (errcode(ERRCODE_UNDEFINED_TABLE),
  errmsg("skipping vacuum of \"%s\" --- relation
no longer exists",
relname)));
 }

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: BUG #14941: Vacuum crashes

2017-12-18 Thread Bossart, Nathan
On 12/18/17, 3:30 PM, "Masahiko Sawada"  wrote:
> According to the following old comment, there might be reason why we
> didn't pass the information to vacuum_rel(). But your patch fetches
> the relation
> name even if the "relation" is not provided. I wonder if it can be
> problem in some cases.

Thanks for taking another look.

I've thought through a few edge cases here, but I haven't noticed
anything that I think is a problem.  If an unspecified relation is
renamed prior to get_rel_name(), we'll use the updated name, which
doesn't seem like an issue.  If an unspecified relation is renamed
between get_rel_name() and the log statement, we'll use the old name,
which seems possible in the current logging logic for VACUUM/ANALYZE.
And if an unspecified relation is dropped just prior to
get_rel_name(), the result will be NULL, and the logging will be
skipped (as it already is for concurrently dropped relations that are
not specified in the command).

Nathan



Re: [HACKERS] Parallel Hash take II

2017-12-18 Thread Andres Freund
On 2017-12-14 21:06:34 +1300, Thomas Munro wrote:
> On Thu, Dec 14, 2017 at 11:45 AM, Andres Freund  wrote:
> > +   if (accessor->write_pointer + 
> > accessor->sts->meta_data_size >=
> > +   accessor->write_end)
> > +   elog(ERROR, "meta-data too long");
> >
> > That seems more like an Assert than a proper elog? Given that we're
> > calculating size just a few lines above...
> 
> It's an error because the logic is not smart enough to split the
> optional meta-data and tuple size over multiple chunks.  I have added
> comments there to explain.

I don't see how that requires it to be an elog rather than an assert. As
far as I can tell this is only reachable if
meta_data_size + sizeof(uint32) >= STS_CHUNK_DATA_SIZE. For anything
smaller the sts_flush_chunk() a few lines above would have started a new
chunk.

IOW, we should detect this at sts_initialize() time, elog there, and
Assert() out in puttuple.

I've changed the code like that, fixed my own < vs >= confusion during
the conversion, fixed a typo, and pushed.  If you're not ok with that
change, we can easily whack this around.

I've not changed this, but I wonder whether we should rename
sts_puttuple() to sts_put_tuple(), for consistency with
sts_read_tuple(). Or the other way round. Or...

Greetings,

Andres Freund



Re: [HACKERS] Moving relation extension locks out of heavyweight lock manager

2017-12-18 Thread Masahiko Sawada
On Mon, Dec 18, 2017 at 2:04 PM, Masahiko Sawada  wrote:
> On Sun, Dec 17, 2017 at 12:27 PM, Robert Haas  wrote:
>> On Thu, Dec 14, 2017 at 5:45 AM, Masahiko Sawada  
>> wrote:
>>> Here is the result.
>>> I've measured the through-put with some cases on my virtual machine.
>>> Each client loads 48k file to each different relations located on
>>> either xfs filesystem or ext4 filesystem, for 30 sec.
>>>
>>> Case 1: COPYs to relations on different filessystems(xfs and ext4) and
>>> N_RELEXTLOCK_ENTS is 1024
>>>
>>> clients = 2, avg = 296.2068
>>> clients = 5, avg = 372.0707
>>> clients = 10, avg = 389.8850
>>> clients = 50, avg = 428.8050
>>>
>>> Case 2: COPYs to relations on different filessystems(xfs and ext4) and
>>> N_RELEXTLOCK_ENTS is 1
>>>
>>> clients = 2, avg = 294.3633
>>> clients = 5, avg = 358.9364
>>> clients = 10, avg = 383.6945
>>> clients = 50, avg = 424.3687
>>>
>>> And the result of current HEAD is following.
>>>
>>> clients = 2, avg = 284.9976
>>> clients = 5, avg = 356.1726
>>> clients = 10, avg = 375.9856
>>> clients = 50, avg = 429.5745
>>>
>>> In case2, the through-put got decreased compare to case 1 but it seems
>>> to be almost same as current HEAD. Because the speed of acquiring and
>>> releasing extension lock got x10 faster than current HEAD as I
>>> mentioned before, the performance degradation may not have gotten
>>> decreased than I expected even in case 2.
>>> Since my machine doesn't have enough resources the result of clients =
>>> 50 might not be a valid result.
>>
>> I have to admit that result is surprising to me.
>>
>
> I think the environment I used for performance measurement did not
> have enough resources. I will do the same benchmark on an another
> environment to see if it was a valid result, and will share it.
>

I did performance measurement on an different environment where has 4
cores and physically separated two disk volumes. Also I've change the
benchmarking so that COPYs load only 300 integer tuples which are not
fit within single page, and changed tables to unlogged tables to
observe the overhead of locking/unlocking relext locks.

Case 1: COPYs to relations on different filessystems(xfs and ext4) and
N_RELEXTLOCK_ENTS is 1024

clients = 1, avg = 3033.8933
clients = 2, avg = 5992.9077
clients = 4, avg = 8055.9515
clients = 8, avg = 8468.9306
clients = 16, avg = 7718.6879

Case 2: COPYs to relations on different filessystems(xfs and ext4) and
N_RELEXTLOCK_ENTS is 1

clients = 1, avg = 3012.4993
clients = 2, avg = 5854.9966
clients = 4, avg = 7380.6082
clients = 8, avg = 7091.8367
clients = 16, avg = 7573.2904

And the result of current HEAD is following.

clients = 1, avg = 2962.2416
clients = 2, avg = 5856.9774
clients = 4, avg = 7561.1376
clients = 8, avg = 7252.0192
clients = 16, avg = 7916.7651

As per the above results, compared with current HEAD the through-put
of case 1 got increased up to 17%. On the other hand, the through-put
of case 2 got decreased 2%~5%.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-12-18 Thread Michael Paquier
On Tue, Dec 19, 2017 at 5:11 AM, Masahiko Sawada  wrote:
> On Tue, Dec 19, 2017 at 3:52 AM, Fujii Masao  wrote:
>> Committed. Thanks!
>
> Thank you!

Thanks all. I can see that I have been credited as author as well,
though it seems to me that I played mainly a reviewer role.
-- 
Michael



Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Michael Paquier
On Mon, Dec 18, 2017 at 7:18 PM, Anastasia Lubennikova
 wrote:
> Patches for v 10.1 and v 9.6 are attached.

Why no patch for HEAD? If you are planning to show some performance
numbers of some kind you had better run on the latest development
version, which would also avoid interference with any load bottleneck
that could have been removed during the development of v10 or v11. At
quick glance the patch proposed does not interfere with such areas.
-- 
Michael



Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Michael Paquier
On Tue, Dec 19, 2017 at 2:57 AM, Robert Haas  wrote:
> I think this doesn't really show much because it's apparently limited
> by the speed of fsync() on your filesystem.  You might try running the
> test with synchronous_commit=off.

You may want to run Postgres on scissors as much as possible by
decreasing checkpoint frequency. etc.

+/*
+ * Do not track changes for unlogged and temp relations,
+ * since we are not going to backup them anyway.
+ */
+if (rel->rd_rel->relpersistence != RELPERSISTENCE_PERMANENT)
+return;
This is not true for init forks of unlogged tables, which are logged
and included in backups.
-- 
Michael



Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-18 Thread Amit Langote
On 2017/12/18 23:25, Alvaro Herrera wrote:
> (I wonder why
> this function needs a local variable "partition_check" -- seems
> pointless).

Before 15ce775faa4 [1], there were more than one line where
partition_check was being set, but maybe it still didn't have to be a
separate variable.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=15ce775faa4




Statically linking ICU with Postgres

2017-12-18 Thread leoaaryan
I was able to link ICU library with postgres as shared objects using the
configure command:
./configure --prefix=/leoaaryan/postgres-10 ... --with-icu
ICU_CFLAGS="-I/leoaaryan/postgres-10/include"
ICU_LIBS="-L/leoaaryan/postgres-10/lib -licuuc -licudata -licui18n"

Now I'm trying link ICU with postgres as a static library.
This is how I have compiled and installed ICU:
./runConfigureICU Linux/gcc --prefix=/leoaaryan/postgres-10
--enable-shared=no --enable-static
make
make install

I can see libicu*.a in the directory /leoaaryan/postgres-10/lib. But I'm not
able to make the postgres source code with it.
./configure --prefix=/leoaaryan/postgres-10 ... --with-icu
ICU_CFLAGS="-I/leoaaryan/postgres-10/include"
ICU_LIBS="-L/leoaaryan/postgres-10/lib"

Is there a way to link ICU library to Postgres statically?





--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: Statically linking ICU with Postgres

2017-12-18 Thread Michael Paquier
On Tue, Dec 19, 2017 at 12:10 PM, leoaaryan  wrote:
> I can see libicu*.a in the directory /leoaaryan/postgres-10/lib. But I'm not
> able to make the postgres source code with it.
> ./configure --prefix=/leoaaryan/postgres-10 ... --with-icu
> ICU_CFLAGS="-I/leoaaryan/postgres-10/include"
> ICU_LIBS="-L/leoaaryan/postgres-10/lib"
>
> Is there a way to link ICU library to Postgres statically?

Why would you want to do that? This does not improve the user
experience and you are asking for more pain with your package
management by not relying on what the OS provides.
-- 
Michael



Re: [HACKERS] path toward faster partition pruning

2017-12-18 Thread David Rowley
On 12 December 2017 at 22:13, Amit Langote
 wrote:
> Attached updated patches.

Hi Amit,

I'm sorry to say this is another micro review per code I'm stumbling
over when looking at the run-time partition pruning stuff.

1. In get_partitions_from_clauses_recurse(), since you're assigning
the result to the first input, the following should use
bms_add_members and not bms_union. The logical end result is the same,
but using bms_union means a wasted palloc and a small memory leak
within the memory context.

/*
* Partition sets obtained from mutually-disjunctive clauses are
* combined using set union.
*/
or_partset = bms_union(or_partset, arg_partset);


2. Also in get_partitions_from_clauses_recurse(), it might also be
worth putting in a bms_free(or_partset) after:

/*
* Partition sets obtained from mutually-conjunctive clauses are
* combined using set intersection.
*/
result = bms_intersect(result, or_partset);

Also, instead of using bms_intersect here, would it be better to do:

result = bms_del_members(result, or_partset); ?

That way you don't do a bms_copy and leak member for each OR branch
since bms_intersect also does a bms_copy()

The resulting set could end up with a few more trailing 0 words than
what you have now, but it to be a better idea not allocate a new set
each time.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] path toward faster partition pruning

2017-12-18 Thread David Rowley
On 19 December 2017 at 17:36, David Rowley  wrote:
> Also, instead of using bms_intersect here, would it be better to do:
>
> result = bms_del_members(result, or_partset); ?

I should have said bms_int_members() rather than bms_del_members()

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Statically linking ICU with Postgres

2017-12-18 Thread leoaaryan
IMO the whole idea behind building Postgres with ICU was to remove the
dependency provided by the OS package. Installing ICU as shared object and
configuring Postgres with the libicu*.so file may have a dependency on
LD_LIBRARY_PATH. In shared object situation un-setting/re-setting the
LD_LIBRARY_PATH will give inconsistent behavior.
If I'm able to link ICU library to Postgres statically I'll be able to
remove these dependencies. 

I know there are libraries which are statically linked with the Postgres
code and even Postgres internally links modules statically. I'm aiming to
install ICU outside the Postgres source code directory and then link it
statically.



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [HACKERS] path toward faster partition pruning

2017-12-18 Thread David Rowley
On 12 December 2017 at 22:13, Amit Langote
 wrote:
> Attached updated patches.

Hi Amit,

I was also wondering about your thoughts on the design of
get_partitions_for_keys() and more generally how there are many
functions which have some special treatment doing something based on
->strategy == PARTITION_STRATEGY_XXX.

If I do:

git grep PARTITION_STRATEGY -- src/backend/catalog/partition.c | wc -l

I get 62 matches, most of which are case statements, and most of the
remainder are things like if (key->strategy ==
PARTITION_STRATEGY_HASH).

git grep --show-function PARTITION_STRATEGY -- src/backend/catalog/partition.c

shows that get_partitions_for_keys() is probably the most guilty of
having the most strategy condition tests.

Also, if we look at get_partitions_for_keys() there's an unconditional:

memset(hash_isnull, false, sizeof(hash_isnull));

which is only used for PARTITION_STRATEGY_HASH, but LIST and RANGE
must pay the price of that memset. Perhaps it's not expensive enough
to warrant only doing that when partkey->strategy ==
PARTITION_STRATEGY_HASH, but it does make me question if we should
have 3 separate functions for this and just have a case statement to
call the correct one.

I think if we were to put this off as something we'll fix later, then
the job would just become harder and harder as time goes on.

It might have been fine when we just had RANGE and LIST partitioning,
but I think HASH really tips the scales over to this being needed.

What do you think?

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2017-12-18 Thread Amit Langote
Hi.

On 2017/12/16 15:05, David Rowley wrote:
> On 13 December 2017 at 00:33, Beena Emerson  wrote:
>> PFA the updated patch, this can be applied over the v13 patches [1]
>> over commit 487a0c1518af2f3ae2d05b7fd23d636d687f28f3
> 
> Hi Beena,
> 
> Thanks for posting an updated patch.
> 
> I've been looking over this and I think that the use of the
> PartitionDispatch in set_append_subplan_indexes is not correct. What
> we need here is the index of the Append's subnode and that's not what
> RelationGetPartitionDispatchInfo() gives you. Remember that some
> partitions could have been pruned away already during planning.

A somewhat similar concern is being discussed on the "UPDATE partition
key" thread [1].  In that case, ExecInitModifyTable(), when initializing
tuple routing information to handle the "update partition key" case, will
have to deal with the fact that there might be fewer sub-plans in the
ModifyTable node than there are partitions in the partition tree.  That
is, source partitions that planner would have determined after pruning,
could be fewer than possible target partitions for rows from the source
partitions to move to, of which the latter consists of *all* partitions.
So, we have to have a mapping from leaf partition indexes as figured out
by RelationGetPartitionDispatchInfo() (indexes that are offsets into a
global array for *all* partitions), to sub-plan indexes which are offsets
into the array for only those partitions that have a sub-plan.  Such
mapping is built (per the latest patch on that thread) by
ExecSetupPartitionTupleRouting() in execPartition.c.

We could do something similar here using a similar code structure.  Maybe,
add a ExecSetupPartitionRuntimePruning() in execPartition.c (mimicking
ExecSetupPartitionTupleRouting), that accepts AppendState node.
Furthermore, it might be a good idea to have something similar to
ExecFindPartition(), say, ExecGetPartitions().  That is, we have new
functions for run-time pruning that are counterparts to corresponding
functions for tuple routing.

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/c5e1d4ad-d243-52c5-608b-5dbb7183e465%40lab.ntt.co.jp




Re: [HACKERS] Runtime Partition Pruning

2017-12-18 Thread Amit Langote
On 2017/12/09 0:57, Robert Haas wrote:
> On Thu, Dec 7, 2017 at 2:22 AM, Beena Emerson  wrote:
>> I have added the partition quals that are used for pruning.
>>
>> PFA the updated patch. I have changed the names of variables to make
>> it more appropriate, along with adding more code comments and doing
>> some refactoring and other code cleanups.
> 
> - I am surprised that set_append_subplan_indexes() needs to worry
> about multi-level partitioning directly.  I would have thought that
> Amit's patch would take care of that, just returning a Bitmapset of
> indexes which this function could use directly.

Actually, the partition.c code that my patch adds is limited to consider
one partitioned table at a time, not the whole tree.  As of 0a480502b09
[1], we call set_append_rel_size() separately for each partitioned table
in a partition tree.  In each such call, we call partition.c to perform
partition pruning for the given partitioned table.

In the run-time pruning case, we should get, via Append, a list of pruning
clauses for each partitioned table in the tree that survived plan-time
pruning.  Then, just like ExecFindPartition() calls
get_partition_for_tuple() for each partitioned table until we get to a
leaf partition, we should call partition.c for each un-pruned partitioned
table that have run-time pruning clauses associated.

Thanks,
Amit

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=0a480502b09




Re: [HACKERS] Custom compression methods

2017-12-18 Thread Ildus Kurbangaliev
On Thu, 14 Dec 2017 10:29:10 -0500
Robert Haas  wrote:

> On Wed, Dec 13, 2017 at 7:18 AM, Ildus Kurbangaliev
>  wrote:
> > Since we agreed on ALTER syntax, i want to clear things about
> > CREATE. Should it be CREATE ACCESS METHOD .. TYPE СOMPRESSION or
> > CREATE COMPRESSION METHOD? I like the access method approach, and it
> > simplifies the code, but I'm just not sure a compression is an
> > access method or not.  
> 
> +1 for ACCESS METHOD.

An access method then.

> 
> > Current implementation
> > --
> >
> > To avoid extra patches I also want to clear things about current
> > implementation. Right now there are two tables, "pg_compression" and
> > "pg_compression_opt". When compression method is linked to a column
> > it creates a record in pg_compression_opt. This record's Oid is
> > stored in the varlena. These Oids kept in first column so I can
> > move them in pg_upgrade but in all other aspects they behave like
> > usual Oids. Also it's easy to restore them.  
> 
> pg_compression_opt -> pg_attr_compression, maybe.
> 
> > Compression options linked to a specific column. When tuple is
> > moved between relations it will be decompressed.  
> 
> Can we do this only if the compression method isn't OK for the new
> column?  For example, if the old column is COMPRESS foo PRESERVE bar
> and the new column is COMPRESS bar PRESERVE foo, we don't need to
> force decompression in any case.

Thanks, sounds right, i will add it to the patch.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company



Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-18 Thread Amit Langote
Fujita-san,

On 2017/12/12 21:21, Etsuro Fujita wrote:
> Hi Maksim,
> 
> (2017/12/12 17:57), Maksim Milyutin wrote:
>> Your patch already is not applied on master. Please rebase it.
> 
> Done.  Please find attached an updated version of the patch.

Thanks for sending the updated version of the patch.

I noticed that this patch introduces a partition_rels field in
ModifyTable, which contains the RT indexes of *all* leaf partitions in the
partition tree.  That means we now expand the partition inheritance tree
in the planner even in the INSERT case, simply because some of the leaf
partitions might be foreign tables which must be looked at by the planner.
 I'm somewhat concerned about the performance implications of that.  Now,
to insert even a single row into a partitioned table, which may not even
be routed to any of its foreign partitions, we are going to have to expand
the inheritance twice -- once by the planner to handle foreign partitions
and then by the executor when setting up the tuple routing information.

Is there any reason why we have to to things this way, beside the fact
that the PlanForeignModify() API appears to be callable from only within a
valid planner context?

Thanks,
Amit




Re: GSoC 2018

2017-12-18 Thread Aleksander Alekseev
Hello hackers,

Thanks you a lot for your feedback. I modified the project description
to make it more clear that it implies augmenting an existing HA
solution, particularly Stolon, and doesn't imply solving existing
limitations of logical replication like lack of DDL replication.

Removing some of these limitations or adding logical replication support
to other existing HA applications like RepMgr or Patroni or corosync /
Pacemaker or maybe Postgres-XL sounds like good ideas for other GSoC
projects. I strongly encourage to add these ideas to wiki anyone who is
willing to be a mentor.

Unfortunately I don't have any experience of using these applications
and have no idea how they behave in different corner cases like
netsplit, slow network, etc. Also I don't have much motivation to figure
this out because I'm pretty happy with Stolon. For these reasons I
personally can't be a mentor for corresponding projects.

Also today I'm going to contact Simone Gotti, the main developer of
Stolon, to let him know about this thread. I wonder what are his
thoughts on all this.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Small typo in comment in json_agg_transfn

2017-12-18 Thread Magnus Hagander
On Sun, Dec 17, 2017 at 2:43 PM, David Rowley 
wrote:

> The attached fixed a small typo in json_agg_transfn.
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] Runtime Partition Pruning

2017-12-18 Thread David Rowley
On 18 December 2017 at 21:31, Amit Langote
 wrote:
> On 2017/12/16 15:05, David Rowley wrote:
>> I've been looking over this and I think that the use of the
>> PartitionDispatch in set_append_subplan_indexes is not correct. What
>> we need here is the index of the Append's subnode and that's not what
>> RelationGetPartitionDispatchInfo() gives you. Remember that some
>> partitions could have been pruned away already during planning.
>
> A somewhat similar concern is being discussed on the "UPDATE partition
> key" thread [1].  In that case, ExecInitModifyTable(), when initializing
> tuple routing information to handle the "update partition key" case, will
> have to deal with the fact that there might be fewer sub-plans in the
> ModifyTable node than there are partitions in the partition tree.  That
> is, source partitions that planner would have determined after pruning,
> could be fewer than possible target partitions for rows from the source
> partitions to move to, of which the latter consists of *all* partitions.
> So, we have to have a mapping from leaf partition indexes as figured out
> by RelationGetPartitionDispatchInfo() (indexes that are offsets into a
> global array for *all* partitions), to sub-plan indexes which are offsets
> into the array for only those partitions that have a sub-plan.  Such
> mapping is built (per the latest patch on that thread) by
> ExecSetupPartitionTupleRouting() in execPartition.c.

Surely this is a different problem? With UPDATE of a partition key, if
the planner eliminates all but 1 partition the UPDATE could cause that
tuple to be "moved" into any leaf partition, very possibly one that's
been eliminated during planning.

In the case of runtime Append pruning, we can forget about all
partitions that the planner managed to eliminate, we'll never need to
touch those, ever. All we care about here is trying to reduce the
number of partitions down further using values that were not available
during planning.

> We could do something similar here using a similar code structure.  Maybe,
> add a ExecSetupPartitionRuntimePruning() in execPartition.c (mimicking
> ExecSetupPartitionTupleRouting), that accepts AppendState node.
> Furthermore, it might be a good idea to have something similar to
> ExecFindPartition(), say, ExecGetPartitions().  That is, we have new
> functions for run-time pruning that are counterparts to corresponding
> functions for tuple routing.

Seems to me in this case we're better to build this structure during
planning and save it with the plan so that it can be used over and
over, rather than building it again and again each time the plan is
executed. Likely a common use case for run-time pruning is when the
plan is going to be used multiple times with different parameters, so
we really don't want to repeat any work that we don't have to here.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: genomic locus

2017-12-18 Thread PostgreSQL - Hans-Jürgen Schönig
hello …

maybe this one is also helpful: 
https://wiki.postgresql.org/images/1/1b/Postbis_pgcon_eu_2012.pdf
it seems they have put a lot of work into that.

regards,

hans


> On 15 Dec 2017, at 20:49, Gene Selkov  wrote:
> 
> Greetings everyone,
> 
> I need a data type to represent genomic positions, which will consist of a 
> string and a pair of integers with interval logic and access methods. Sort of 
> like my seg type, but more straightforward.
> 
> I noticed somebody took a good care of seg while I was away for the last 20 
> years, and I am extremely grateful for that. I have been using it. In the 
> meantime, things have changed and now I am almost clueless about how you deal 
> with contributed modules and what steps I should take once I get it to work. 
> Also, is it a good idea to clone and fix seg for this purpose, or is there a 
> more appropriate template? Or maybe just building it from scratch will be a 
> better idea?
> 
> I have seen a lot of bit rot in other extensions (never contributed) that I 
> have not maintained since 2009 and I now I am unable to fix some of them, so 
> I wonder how much of old knowledge is still applicable. In other words, is 
> what I see in new code just a change of macros or the change of principles?
> 
> Thanks,
> 
> --Gene

--
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de , 
http://www.cybertec.at 






Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-18 Thread Etsuro Fujita

(2017/12/18 18:14), Amit Langote wrote:

I noticed that this patch introduces a partition_rels field in
ModifyTable, which contains the RT indexes of *all* leaf partitions in the
partition tree.  That means we now expand the partition inheritance tree
in the planner even in the INSERT case, simply because some of the leaf
partitions might be foreign tables which must be looked at by the planner.


Yeah, the patch expands the inheritance tree at planning time as well to 
build an RTE for each partition so that the FDW can use that RTE eg, 
when called from PlanForeignModify.



  I'm somewhat concerned about the performance implications of that.  Now,
to insert even a single row into a partitioned table, which may not even
be routed to any of its foreign partitions, we are going to have to expand
the inheritance twice -- once by the planner to handle foreign partitions
and then by the executor when setting up the tuple routing information.

Is there any reason why we have to to things this way, beside the fact
that the PlanForeignModify() API appears to be callable from only within a
valid planner context?


Another reason for that is for set_plan_references to get such RTEs to 
record plan dependencies so that plancache.c invalidates cached plans 
for foreign partitions.


I suspect that we could avoid the planning-time expansion by doing some 
optimization when inserting a single row into a partitioned table.


Best regards,
Etsuro Fujita



Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Aleksander Alekseev
Hello Anastasia,

Personally I'm very glad PTRACK is finally proposed to be ported to the
community PostgreSQL.

> Since ptrack is basically just an API for use in backup tools, it is
> impossible to test the patch independently.

I believe it's worth to create an extension that will provide access to
the PTRACK's public API. Even if there will be not many users interested
in this extension, we should have at least some basic tests like "if we
write to the table, its pages change", "if we update a tuple,
corresponding indexes change", "if we clean up a bitmap, PTRACK says there
are no changed pages", etc.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Aleksander Alekseev
Hello Anastasia,

> ptrack_9.6.6_v1.4.patch

Also I'm surprised you proposed a patch for 9.6. Since PTRACK is a new
feature I don't believe we are going to backport it.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


File name as application name in regression tests and elsewhere

2017-12-18 Thread Andrew Dunstan

I was doing some work over the weekend and it occurred to me that it
would be quite nice to have the input file name from regression tests
set as the application name, and then use a log_line_prefix setting
including %a, so that this would appear on the logs.


My first thought was to alter all the tests with something like "SET
application_name 'testname.sql';", but then I thought maybe a better way
would be to provide psql with a switch (--file-application-name ?) that
would do this for us, and use it in the regression suite. That should be
a pretty small patch and could be more generally useful.


Thoughts?


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: File name as application name in regression tests and elsewhere

2017-12-18 Thread Aleksander Alekseev
Hello Andrew,

> I was doing some work over the weekend and it occurred to me that it
> would be quite nice to have the input file name from regression tests
> set as the application name, and then use a log_line_prefix setting
> including %a, so that this would appear on the logs.
> 
> My first thought was to alter all the tests with something like "SET
> application_name 'testname.sql';", but then I thought maybe a better way
> would be to provide psql with a switch (--file-application-name ?) that
> would do this for us, and use it in the regression suite. That should be
> a pretty small patch and could be more generally useful.
> 
> Thoughts?

Personally, I see how it could be useful and don't see any drawbacks of
such a patch. I think it's a good idea.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Andrey Borodin
Hello!

Thanks for sharing this work! I think this is important feature to make backups 
more efficient.
> 18 дек. 2017 г., в 15:18, Anastasia Lubennikova 
>  написал(а):
> 
> Patches for v 10.1 and v 9.6 are attached.
> Since ptrack is basically just an API for use in backup tools, it is
> impossible to test the patch independently.
> Now it is integrated with our backup utility, called pg_probackup. You can
> find it herehttps://github.com/postgrespro/pg_probackup
I can add experimental support for WAL-G too. We have QA tools for delta 
backups too.
> 
> Spoiler: Please consider this patch and README as a proof of concept. It
> can be improved in some ways, but in its current state PTRACK is a
> stable prototype, reviewed and tested well enough to find many
> non-trivial corner cases and subtle problems. And any discussion of
> change track algorithm must be aware of them. Feel free to share your
> concerns and point out any shortcomings of the idea or the implementation.
This version of the patch is quite big - basically it accompanies most of 
START_CRIT_SECTION() calls with PTRACK calls.
I have two concerns about this:
1. Does this affect the performance of the database when PTRACK is not enabled?
2. What is the cost of having PTRACK enabled?

Best regards, Andrey Borodin.


Re: genomic locus

2017-12-18 Thread Craig Ringer
On 16 December 2017 at 03:49, Gene Selkov  wrote:

> Greetings everyone,
>
> I need a data type to represent genomic positions, which will consist of a
> string and a pair of integers with interval logic and access methods. Sort
> of like my seg type, but more straightforward.
>
> I noticed somebody took a good care of seg while I was away for the last
> 20 years, and I am extremely grateful for that. I have been using it. In
> the meantime, things have changed and now I am almost clueless about how
> you deal with contributed modules and what steps I should take once I get
> it to work. Also, is it a good idea to clone and fix seg for this purpose,
> or is there a more appropriate template? Or maybe just building it from
> scratch will be a better idea?
>
> I have seen a lot of bit rot in other extensions (never contributed) that
> I have not maintained since 2009 and I now I am unable to fix some of them,
> so I wonder how much of old knowledge is still applicable. In other words,
> is what I see in new code just a change of macros or the change of
> principles?
>

Welcome back!

I think your chances of modifying 'seg' are slim, since you'd surely have
to change the on-disk format and that'd break existing users. Plus probably
change the API.

If you think it'd make logical sense to extend seg with a string descriptor
of some sort and could come up with a name/use case that's not quite so
narrowly focused as genetics alone, then I could see adding it as a
secondary type in the same extension.

But it's more likely that the best course would be to extract the seg
extension from core, rename it, hack it as desired, and build it as an
extension maintained out-of-tree.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


RE: User defined data types in Logical Replication

2017-12-18 Thread Huong Dangminh
Hi Sawada-san,

> Sent: Tuesday, December 12, 2017 9:41 AM
> To: Dang Minh Huong 
> Cc: PostgreSQL Hackers ; Yanagisawa
> Hiroshi(柳澤 博) ; Dangminh Huong(ダンM
> フーン) 
> Subject: Re: User defined data types in Logical Replication
> 
> On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong 
> wrote:
> > On 2017/12/08 13:18, Huong Dangminh wrote:
> >
> >> Hi Sawada-san,
> >>
> >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada
> >>> 
> >>> wrote:
> 
>  On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong
>  
> >>>
> >>> wrote:
> >
> > Hi Sawada-san,
> >
> > Sorry for my late response.
> >
> > On 2017/12/05 0:11, Masahiko Sawada wrote:
> >
> > There is one more case that user-defined data type is not
> > supported in Logical Replication.
> > That is when remote data type's name does not exist in SUBSCRIBE.
> >
> > In relation.c:logicalrep_typmap_gettypname
> > We search OID in syscache by remote's data type name and mapping
> > it, if it does not exist in syscache We will be faced with the
> > bellow error.
> >
> >  if (!OidIsValid(entry->typoid))
> >  ereport(ERROR,
> >
> > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> >   errmsg("data type \"%s.%s\"
> > required for logical replication does not exist",
> >  entry->nspname,
> > entry->typname)));
> >
> > I think, it is not necessary to check typoid here in order to
> > avoid above case, is that right?
> >
> > I think it's not right. We should end up with an error in the case
> > where the same type name doesn't exist on subscriber. With your
> > proposed patch,
> > logicalrep_typmap_gettypname() can return an invalid string
> > (entry->typname) in that case, which can be a cause of SEGV.
> >
> > Thanks, I think you are right.
> > # I thought that entry->typname was received from walsender, and
> > it is already be qualified in logicalrep_write_typ.
> > # But we also need check it in subscriber, because it could be
> > lost info in transmission.
> 
>  Oops, the last sentence of my previous mail was wrong.
>  logicalrep_typmap_gettypname() doesn't return an invalid string
>  since
>  entry->typname is initialized with a type name got from wal sender.
> >
> > Yeah, so we do not need to check the existing of publisher's type name
> > in subscriber.
> 
>  After more thought, we might not need to raise an error even if
>  there is not the same data type on both publisher and subscriber.
>  Because data is sent after converted to the text representation and
>  is converted to a data type according to the local table definition
>  subscribers don't always need to have the same data type. If it's
>  right, slot_store_error_callback() doesn't need to find a
>  corresponding local data type OID but just finds the corresponding
>  type name by remote data type OID. What do you think?
> >
> > I totally agree. It will make logical replication more flexible with
> > data type.
> 
>  --- a/src/backend/replication/logical/worker.c
>  +++ b/src/backend/replication/logical/worker.c
>  @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
>  DLIST_STATIC_INIT(lsn_mapping);
> 
>    typedef struct SlotErrCallbackArg
>    {
>  -LogicalRepRelation *rel;
>  -intattnum;
>  +LogicalRepRelMapEntry *rel;
>  +intremote_attnum;
>  +intlocal_attnum;
>    } SlotErrCallbackArg;
> 
>  Since LogicalRepRelMapEntry has a map of local attributes to remote
>  ones we don't need to have two attribute numbers.
> 
> >> Sorry for the late reply.
> >>
> >>> Attached the patch incorporated what I have on mind. Please review it.
> >>
> >> Thanks for the patch, I will do it at this weekend.
> >
> > Your patch is fine for me.
> > But logicalrep_typmap_getid will be unused.
> 
> Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow to
> replicate data to an another data type of column, what is the data type
> mapping on subscriber for? It seems enough to have remote data type oid,
> namespace and name.

Yeah, so the meaning of the type map will be disappeared, aren't you?
I updated the patch with removing of LogicalRepTyp.typoid and changing
concept "typmap" to "remote type".
How do you think?

---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/





fix_slot_store_error_callback_v6.patch
Description: fix_slot_store_error_callback_v6.patch


RE: User defined data types in Logical Replication

2017-12-18 Thread Huong Dangminh
Hi Sawada-san,

> > Sent: Tuesday, December 12, 2017 9:41 AM
> > To: Dang Minh Huong 
> > Cc: PostgreSQL Hackers ; Yanagisawa
> > Hiroshi(柳澤 博) ; Dangminh Huong(ダン
> M
> > フーン) 
> > Subject: Re: User defined data types in Logical Replication
> >
> > On Sun, Dec 10, 2017 at 12:33 AM, Dang Minh Huong
> > 
> > wrote:
> > > On 2017/12/08 13:18, Huong Dangminh wrote:
> > >
> > >> Hi Sawada-san,
> > >>
> > >>> On Thu, Dec 7, 2017 at 11:07 AM, Masahiko Sawada
> > >>> 
> > >>> wrote:
> > 
> >  On Thu, Dec 7, 2017 at 12:23 AM, Dang Minh Huong
> >  
> > >>>
> > >>> wrote:
> > >
> > > Hi Sawada-san,
> > >
> > > Sorry for my late response.
> > >
> > > On 2017/12/05 0:11, Masahiko Sawada wrote:
> > >
> > > There is one more case that user-defined data type is not
> > > supported in Logical Replication.
> > > That is when remote data type's name does not exist in SUBSCRIBE.
> > >
> > > In relation.c:logicalrep_typmap_gettypname
> > > We search OID in syscache by remote's data type name and mapping
> > > it, if it does not exist in syscache We will be faced with the
> > > bellow error.
> > >
> > >  if (!OidIsValid(entry->typoid))
> > >  ereport(ERROR,
> > >
> > > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > >   errmsg("data type \"%s.%s\"
> > > required for logical replication does not exist",
> > >
> entry->nspname,
> > > entry->typname)));
> > >
> > > I think, it is not necessary to check typoid here in order to
> > > avoid above case, is that right?
> > >
> > > I think it's not right. We should end up with an error in the
> > > case where the same type name doesn't exist on subscriber. With
> > > your proposed patch,
> > > logicalrep_typmap_gettypname() can return an invalid string
> > > (entry->typname) in that case, which can be a cause of SEGV.
> > >
> > > Thanks, I think you are right.
> > > # I thought that entry->typname was received from walsender, and
> > > it is already be qualified in logicalrep_write_typ.
> > > # But we also need check it in subscriber, because it could be
> > > lost info in transmission.
> > 
> >  Oops, the last sentence of my previous mail was wrong.
> >  logicalrep_typmap_gettypname() doesn't return an invalid string
> >  since
> >  entry->typname is initialized with a type name got from wal sender.
> > >
> > > Yeah, so we do not need to check the existing of publisher's type
> > > name in subscriber.
> > 
> >  After more thought, we might not need to raise an error even if
> >  there is not the same data type on both publisher and subscriber.
> >  Because data is sent after converted to the text representation
> >  and is converted to a data type according to the local table
> >  definition subscribers don't always need to have the same data
> >  type. If it's right, slot_store_error_callback() doesn't need to
> >  find a corresponding local data type OID but just finds the
> >  corresponding type name by remote data type OID. What do you think?
> > >
> > > I totally agree. It will make logical replication more flexible with
> > > data type.
> > 
> >  --- a/src/backend/replication/logical/worker.c
> >  +++ b/src/backend/replication/logical/worker.c
> >  @@ -100,8 +100,9 @@ static dlist_head lsn_mapping =
> >  DLIST_STATIC_INIT(lsn_mapping);
> > 
> >    typedef struct SlotErrCallbackArg
> >    {
> >  -LogicalRepRelation *rel;
> >  -intattnum;
> >  +LogicalRepRelMapEntry *rel;
> >  +intremote_attnum;
> >  +intlocal_attnum;
> >    } SlotErrCallbackArg;
> > 
> >  Since LogicalRepRelMapEntry has a map of local attributes to
> >  remote ones we don't need to have two attribute numbers.
> > 
> > >> Sorry for the late reply.
> > >>
> > >>> Attached the patch incorporated what I have on mind. Please review
> it.
> > >>
> > >> Thanks for the patch, I will do it at this weekend.
> > >
> > > Your patch is fine for me.
> > > But logicalrep_typmap_getid will be unused.
> >
> > Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
> > to replicate data to an another data type of column, what is the data
> > type mapping on subscriber for? It seems enough to have remote data
> > type oid, namespace and name.
> 
> Yeah, so the meaning of the type map will be disappeared, aren't you?
> I updated the patch with removing of LogicalRepTyp.typoid and changing
> concept "typmap" to "remote type".
> How do you think?

Sorry, Please confirm V7 of patch (attached in this mail).


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/





fix_slot_store_error_callback_v7.patch
Description: fix_slot_store_error_callback_v7.

Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Aleksander Alekseev
Hello hackers,

> I have two concerns about this:
> 1. Does this affect the performance of the database when PTRACK is not 
> enabled?
> 2. What is the cost of having PTRACK enabled?

I played with this patch a bit and did a simple benchmark on my laptop.

Configuration:

```
make distclean && ./configure prefix=/some/path/ && make -s -j4
```

postgresql.conf:

```
max_prepared_transactions = 100
wal_level = logical
wal_keep_segments = 128
max_connections = 100
wal_log_hints = on
max_wal_senders = 8
wal_keep_segments = 64
listen_addresses = '*'
hot_standby = on
log_statement = all
max_locks_per_transaction = 256
shared_buffers = 1GB
```

The benchmark:

```
pgbench -i && pgbench -j 4 -c 4 -T 300 -P 10
```

Here are the results.

10.1, ptrack_enable=false

transaction type: 
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 28772
latency average = 41.705 ms
latency stddev = 18.274 ms
tps = 95.895605 (including connections establishing)
tps = 95.906434 (excluding connections establishing)

10.1, ptrack_enable=true

scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 28607
latency average = 41.951 ms
latency stddev = 18.454 ms
tps = 95.344363 (including connections establishing)
tps = 95.345290 (excluding connections establishing)


10.1, without ptrack

transaction type: 
scaling factor: 1
query mode: simple
number of clients: 4
number of threads: 4
duration: 300 s
number of transactions actually processed: 28622
latency average = 41.928 ms
latency stddev = 18.238 ms
tps = 95.396155 (including connections establishing)
tps = 95.397406 (excluding connections establishing)


At first glance PTRACK doesn't seem to affect the overall performance
significantly.

There are a few minor issues with the patch. There is a missing '/'
symbol in the comment before ptrack_get_and_clear procedure:

```
 * Get ptrack file as bytea and clear it */
bytea *
ptrack_get_and_clear(Oid tablespace_oid, Oid table_oid)
{
```

Also I believe the patch should include some changes of
postgresql.conf.sample.

I suggest to add this patch to the closest commitfest. Otherwise it can
be lost.

-- 
Best regards,
Aleksander Alekseev


signature.asc
Description: PGP signature


Re: User defined data types in Logical Replication

2017-12-18 Thread Petr Jelinek
Hi,

On 18/12/17 14:28, Huong Dangminh wrote:

 Your patch is fine for me.
 But logicalrep_typmap_getid will be unused.
>>>
>>> Yeah, actually we don't use LogicalRepTyp.typoid as well. If we allow
>>> to replicate data to an another data type of column, what is the data
>>> type mapping on subscriber for? It seems enough to have remote data
>>> type oid, namespace and name.
>>
>> Yeah, so the meaning of the type map will be disappeared, aren't you?
>> I updated the patch with removing of LogicalRepTyp.typoid and changing
>> concept "typmap" to "remote type".
>> How do you think?
> 
> Sorry, Please confirm V7 of patch (attached in this mail).
> 

I think the changes make sense in terms of how it all works now.

That said I don't think the renaming idea is a good one, the naming was
chosen to be future proof because eventually we'll need to map types to
local oid (and possibly more) where the local info is cached so that we
can interpret binary representation of replicated data (which we'll add
at some point since it's big performance boost).

So I am afraid that if we do the rename of typmap to remotetype in this
patch it will a) make backports of fixes in the related code harder, b)
force us to rename it back again in the future.

I'd keep your general approach but keep using typmap naming.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: GSoC 2018

2017-12-18 Thread Stephen Frost
Aleksander,

* Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote:
> Thanks you a lot for your feedback. I modified the project description
> to make it more clear that it implies augmenting an existing HA
> solution, particularly Stolon, and doesn't imply solving existing
> limitations of logical replication like lack of DDL replication.

Sounds great, thanks!

> Removing some of these limitations or adding logical replication support
> to other existing HA applications like RepMgr or Patroni or corosync /
> Pacemaker or maybe Postgres-XL sounds like good ideas for other GSoC
> projects. I strongly encourage to add these ideas to wiki anyone who is
> willing to be a mentor.

Yes, it'd be great for anyone who can mentor a project along these lines
to add it to the wiki.

> Also today I'm going to contact Simone Gotti, the main developer of
> Stolon, to let him know about this thread. I wonder what are his
> thoughts on all this.

Great.

Thanks again!

Stephen


signature.asc
Description: Digital signature


RE: User defined data types in Logical Replication

2017-12-18 Thread Huong Dangminh
Hi Petr Jelineks, Sawada-san

> I think the changes make sense in terms of how it all works now.
> 
> That said I don't think the renaming idea is a good one, the naming was
> chosen to be future proof because eventually we'll need to map types to
> local oid (and possibly more) where the local info is cached so that we
> can interpret binary representation of replicated data (which we'll add
> at some point since it's big performance boost).
> 
> So I am afraid that if we do the rename of typmap to remotetype in this
> patch it will a) make backports of fixes in the related code harder, b)
> force us to rename it back again in the future.

Thanks for your comment.

> I'd keep your general approach but keep using typmap naming.

I update the patch as Petr Jelineks mention, keep using typmap naming.


---
Thanks and best regards,
Dang Minh Huong
NEC Solution Innovators, Ltd.
http://www.nec-solutioninnovators.co.jp/en/


fix_slot_store_error_callback_v8.patch
Description: fix_slot_store_error_callback_v8.patch


Re: [HACKERS] Add support for tuple routing to foreign partitions

2017-12-18 Thread Alvaro Herrera
InitResultRelInfo becomes unintelligible after this patch -- it was
straightforward but adding partition_root makes things shaky.  Please
add a proper comment indicating what each argument is.  (I wonder why
this function needs a local variable "partition_check" -- seems
pointless).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] [PATCH] Improve geometric types

2017-12-18 Thread Alvaro Herrera
Does this patch series fix bug #14971?
https://www.postgresql.org/message-id/20171213172806.20142.73...@wrigleys.postgresql.org

Eric: see
https://www.postgresql.org/message-id/CAE2gYzzvp=uvpw4fytoaevyk-wze4jw7u2s1glrok35mr1-...@mail.gmail.com
for last version of patch.

Motivation for patch is at
https://www.postgresql.org/message-id/CAE2gYzxF7-5djV6-cEvqQu-fNsnt%3DEqbOURx7ZDg%2BVv6ZMTWbg%40mail.gmail.com

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



pltcl valgrind output

2017-12-18 Thread Andrew Dunstan

I've been adding support for valgrind to the buildfarm client (code will
hit the git repo shortly). Mostly the results have been pretty clean,
but the pltcl tests generated the attached output. Perhaps someone with
more valgrind-fu than I have so far would like to use the information to
extend our supp file appropriately (or fix what it's complaining about).


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

==10747== Conditional jump or move depends on uninitialised value(s)
==10747==at 0x19A0AD6F: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x1997E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19A93F4C: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19981DB2: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x1997E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10747==by 0x199808C1: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E72: Tcl_EvalEx (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E94: Tcl_Eval (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19A88C4E: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19982E8F: Tcl_CreateInterp (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19732869: _PG_init (pltcl.c:432)
==10747==by 0x995105: internal_load_library (dfmgr.c:281)
==10747==by 0x994B8D: load_external_function (dfmgr.c:105)
==10747==by 0x57149B: fmgr_c_validator (pg_proc.c:819)
==10747==by 0x998459: OidFunctionCall1Coll (fmgr.c:1327)
==10747==by 0x57117A: ProcedureCreate (pg_proc.c:722)
==10747==by 0x62C700: CreateProceduralLanguage (proclang.c:122)
==10747==by 0x83C598: ProcessUtilitySlow (utility.c:1489)
==10747==by 0x83B580: standard_ProcessUtility (utility.c:932)
==10747==by 0x83A6F4: ProcessUtility (utility.c:357)
==10747==  Uninitialised value was created by a stack allocation
==10747==at 0x19A0AAFD: ??? (in /usr/lib64/libtcl8.6.so)
==10747== 
{
   
   Memcheck:Cond
   obj:/usr/lib64/libtcl8.6.so
   fun:TclNRRunCallbacks
   obj:/usr/lib64/libtcl8.6.so
   obj:/usr/lib64/libtcl8.6.so
   fun:TclNRRunCallbacks
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_EvalEx
   fun:Tcl_Eval
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_CreateInterp
   fun:_PG_init
   fun:internal_load_library
   fun:load_external_function
   fun:fmgr_c_validator
   fun:OidFunctionCall1Coll
   fun:ProcedureCreate
   fun:CreateProceduralLanguage
   fun:ProcessUtilitySlow
   fun:standard_ProcessUtility
   fun:ProcessUtility
}
==10747== Conditional jump or move depends on uninitialised value(s)
==10747==at 0x19A0AD6F: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x1997E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10747==by 0x199808C1: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E72: Tcl_EvalEx (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19980E94: Tcl_Eval (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19A88C61: ??? (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19982E8F: Tcl_CreateInterp (in /usr/lib64/libtcl8.6.so)
==10747==by 0x19732869: _PG_init (pltcl.c:432)
==10747==by 0x995105: internal_load_library (dfmgr.c:281)
==10747==by 0x994B8D: load_external_function (dfmgr.c:105)
==10747==by 0x57149B: fmgr_c_validator (pg_proc.c:819)
==10747==by 0x998459: OidFunctionCall1Coll (fmgr.c:1327)
==10747==by 0x57117A: ProcedureCreate (pg_proc.c:722)
==10747==by 0x62C700: CreateProceduralLanguage (proclang.c:122)
==10747==by 0x83C598: ProcessUtilitySlow (utility.c:1489)
==10747==by 0x83B580: standard_ProcessUtility (utility.c:932)
==10747==by 0x83A6F4: ProcessUtility (utility.c:357)
==10747==by 0x60E62C: execute_sql_string (extension.c:763)
==10747==by 0x60EAB2: execute_extension_script (extension.c:924)
==10747==by 0x60F983: CreateExtensionInternal (extension.c:1529)
==10747==  Uninitialised value was created by a stack allocation
==10747==at 0x19A0AAFD: ??? (in /usr/lib64/libtcl8.6.so)
==10747== 
{
   
   Memcheck:Cond
   obj:/usr/lib64/libtcl8.6.so
   fun:TclNRRunCallbacks
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_EvalEx
   fun:Tcl_Eval
   obj:/usr/lib64/libtcl8.6.so
   fun:Tcl_CreateInterp
   fun:_PG_init
   fun:internal_load_library
   fun:load_external_function
   fun:fmgr_c_validator
   fun:OidFunctionCall1Coll
   fun:ProcedureCreate
   fun:CreateProceduralLanguage
   fun:ProcessUtilitySlow
   fun:standard_ProcessUtility
   fun:ProcessUtility
   fun:execute_sql_string
   fun:execute_extension_script
   fun:CreateExtensionInternal
}
==10834== Conditional jump or move depends on uninitialised value(s)
==10834==at 0x1A20AD6F: ??? (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A17E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A293F4C: ??? (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A181DB2: ??? (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A17E7B6: TclNRRunCallbacks (in /usr/lib64/libtcl8.6.so)
==10834==by 0x1A1808C1

Re: [HACKERS] [PATCH] Improve geometric types

2017-12-18 Thread Emre Hasegeli
> Does this patch series fix bug #14971?

No, it doesn't.  I am trying to improve things without touching the EPSILON.



Re: [HACKERS] Custom compression methods

2017-12-18 Thread Tomas Vondra


On 12/17/2017 04:32 AM, Robert Haas wrote:
> On Thu, Dec 14, 2017 at 12:23 PM, Tomas Vondra
>  wrote:
>> Can you give an example of such algorithm? Because I haven't seen such
>> example, and I find arguments based on hypothetical compression methods
>> somewhat suspicious.
>>
>> FWIW I'm not against considering such compression methods, but OTOH it
>> may not be such a great primary use case to drive the overall design.
> 
> Well it isn't, really.  I am honestly not sure what we're arguing
> about at this point.  I think you've agreed that (1) opening avenues
> for extensibility is useful, (2) substitution a general-purpose
> compression algorithm could be useful, and (3) having datatype
> compression that is enabled through TOAST rather than built into the
> datatype might sometimes be desirable.  That's more than adequate
> justification for this proposal, whether half-general compression
> methods exist or not.  I am prepared to concede that there may be no
> useful examples of such a thing.
> 

I don't think we're arguing - we're discussing if a proposed patch is
the right design solving relevant use cases.

I personally am not quite convinced about that, for the reason I tried
to explain in my previous messages. I see it as a poor alternative to
compression built into the data type. I do like the idea of compression
with external dictionary, however.

But don't forget that it's not me in this thread - it's my evil twin,
moonlighting as Mr. Devil's lawyer ;-)

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-18 Thread Alvaro Herrera
After this discussion, this is how I see things working:

1. pg_dump
   a) creates indexes on partitions normally
   b) once all existing indexes are done, index on parent is created,
  with ONLY.  No cascading occurs, no indexes are attached.
   c) ATTACH is run for each existing partition index.  After each
  ATTACH, we check that all indexes exist.  If so, the parent is
  marked valid.
   d) if not all indexes existed in partitions, index on parent remains
  invalid.  (It was invalid in the dumped database, so this is
  correct.)

2. other uses
   Normal CREATE INDEX (without ONLY) recurses and attaches the first
   matching index it finds (no duplicate indexes are created);
   partitions without a matching index get one created.

3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
   it remains so forever.

I think this satisfies all concerns.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera
 wrote:
> After this discussion, this is how I see things working:
>
> 1. pg_dump
>a) creates indexes on partitions normally
>b) once all existing indexes are done, index on parent is created,
>   with ONLY.  No cascading occurs, no indexes are attached.
>c) ATTACH is run for each existing partition index.  After each
>   ATTACH, we check that all indexes exist.  If so, the parent is
>   marked valid.
>d) if not all indexes existed in partitions, index on parent remains
>   invalid.  (It was invalid in the dumped database, so this is
>   correct.)
>
> 2. other uses
>Normal CREATE INDEX (without ONLY) recurses and attaches the first
>matching index it finds (no duplicate indexes are created);
>partitions without a matching index get one created.
>
> 3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
>it remains so forever.
>
> I think this satisfies all concerns.

Sounds great to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Custom compression methods

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 10:43 AM, Tomas Vondra
 wrote:
> I personally am not quite convinced about that, for the reason I tried
> to explain in my previous messages. I see it as a poor alternative to
> compression built into the data type. I do like the idea of compression
> with external dictionary, however.

I think that compression built into the datatype and what is proposed
here are both useful and everybody's free to work on either one as the
prefer, so I don't see that as a reason not to accept this patch.  And
I think this patch can be a stepping stone toward compression with an
external dictionary, so that seems like an affirmative reason to
accept this patch.

> But don't forget that it's not me in this thread - it's my evil twin,
> moonlighting as Mr. Devil's lawyer ;-)

Well, I don't mind you objecting to the patch under any persona, but
so far I'm not finding your reasons convincing...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Protect syscache from bloating with negative cache entries

2017-12-18 Thread Andres Freund
On 2017-12-17 19:23:45 -0500, Robert Haas wrote:
> On Sat, Dec 16, 2017 at 11:42 PM, Andres Freund  wrote:
> >> I'm not sure we should regard very quick bloating as a problem in need
> >> of solving.  Doesn't that just mean we need the cache to be bigger, at
> >> least temporarily?
> >
> > Leaving that aside, is that actually not at least to a good degree,
> > solved by that problem? By bumping the generation on hash resize, we
> > have recency information we can take into account.
>
> I agree that we can do it.  I'm just not totally sure it's a good
> idea.  I'm also not totally sure it's a bad idea, either.  That's why
> I asked the question.

I'm not 100% convinced either - but I also don't think it matters all
that terribly much. As long as the overall hash hit rate is decent,
minor increases in the absolute number of misses don't really matter
that much for syscache imo.  I'd personally go for something like:

1) When about to resize, check if there's entries of a generation -2
   around.

   Don't resize if more than 15% of entries could be freed. Also, stop
   reclaiming at that threshold, to avoid unnecessary purging cache
   entries.

   Using two generations allows a bit more time for cache entries to
   marked as fresh before resizing next.

2) While resizing increment generation count by one.

3) Once a minute, increment generation count by one.


The one thing I'm not quite have a good handle upon is how much, and if
any, cache reclamation to do at 3). We don't really want to throw away
all the caches just because a connection has been idle for a few
minutes, in a connection pool that can happen occasionally. I think I'd
for now *not* do any reclamation except at resize boundaries.

Greetings,

Andres Freund



Re: Protect syscache from bloating with negative cache entries

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 11:46 AM, Andres Freund  wrote:
> I'm not 100% convinced either - but I also don't think it matters all
> that terribly much. As long as the overall hash hit rate is decent,
> minor increases in the absolute number of misses don't really matter
> that much for syscache imo.  I'd personally go for something like:
>
> 1) When about to resize, check if there's entries of a generation -2
>around.
>
>Don't resize if more than 15% of entries could be freed. Also, stop
>reclaiming at that threshold, to avoid unnecessary purging cache
>entries.
>
>Using two generations allows a bit more time for cache entries to
>marked as fresh before resizing next.
>
> 2) While resizing increment generation count by one.
>
> 3) Once a minute, increment generation count by one.
>
>
> The one thing I'm not quite have a good handle upon is how much, and if
> any, cache reclamation to do at 3). We don't really want to throw away
> all the caches just because a connection has been idle for a few
> minutes, in a connection pool that can happen occasionally. I think I'd
> for now *not* do any reclamation except at resize boundaries.

My starting inclination was almost the opposite.  I think that you
might be right that a minute or two of idle time isn't sufficient
reason to flush our local cache, but I'd be inclined to fix that by
incrementing the generation count every 10 minutes or so rather than
every minute, and still flush things more then 1 generation old.  The
reason for that is that I think we should ensure that the system
doesn't sit there idle forever with a giant cache.  If it's not using
those cache entries, I'd rather have it discard them and rebuild the
cache when it becomes active again.

Now, I also see that your point about trying to clean up before
resizing.  That does seem like a good idea, although we have to be
careful not to be too eager to clean up there, or we'll just result in
artificially limiting the cache size when it's unwise to do so.  But I
guess that's what you meant by "Also, stop reclaiming at that
threshold, to avoid unnecessary purging cache entries."  I think the
idea you are proposing is that:

1. The first time we are due to expand the hash table, we check
whether we can forestall that expansion by doing a cleanup; if so, we
do that instead.

2. After that, we just expand.

That seems like a fairly good idea, although it might be a better idea
to allow cleanup if enough time has passed.  If we hit the expansion
threshold twice an hour apart, there's no reason not to try cleanup
again.

Generally, the way I'm viewing this is that a syscache entry means
paying memory to save CPU time.  Each 8kB of memory we use to store
system cache entries is one less block we have for the OS page cache
to hold onto our data blocks.  If we had an oracle (the kind from
Delphi, not Redwood City) that told us with perfect accuracy when to
discard syscache entries, it would throw away syscache entries
whenever the marginal execution-time performance we could buy from
another 8kB in the page cache is greater than the marginal
execution-time performance we could buy from those syscache entries.
In reality, it's hard to know which of those things is of greater
value.  If the system isn't meaningfully memory-constrained, we ought
to just always hang onto the syscache entries, as we do today, but
it's hard to know that.  I think the place where this really becomes a
problem is on system with hundreds of connections + thousands of
tables + connection pooling; without some back-pressure, every backend
eventually caches everything, putting the system under severe memory
pressure for basically no performance gain.  Each new use of the
connection is probably for a limited set of tables, and only those
tables really syscache entries; holding onto things used long in the
past doesn't save enough to justify the memory used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: es_query_dsa is broken

2017-12-18 Thread Robert Haas
On Sun, Dec 17, 2017 at 7:35 PM, Thomas Munro
 wrote:
> Andreas Seltenreich found a query[1] that suffers from this bug
> (thanks!), and Amit confirmed that this patch fixes it, but further
> sqlsmith fuzz testing with the patch applied also revealed that it
> failed to handle the case where pei is NULL because parallelism was
> inhibited.  Here's a new version to fix that.  Someone might prefer a
> coding with "if" rather than the ternary operator, but I didn't have a
> strong preference.

Looks OK to me.  Committed and back-patched to v10.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: autoprewarm is fogetting to register a tranche.

2017-12-18 Thread Robert Haas
On Fri, Dec 15, 2017 at 3:32 AM, Kyotaro HORIGUCHI
 wrote:
> Hello, I noticed while an investigation that pg_prewarm is
> forgetting to register a tranche.

Oops.

> In passing, I found it hard to register a tranche in
> apw_init_shmem() because a process using the initialized shared
> memory struct cannot find the tranche id (without intruding into
> the internal of LWLock strcut). So I'd like to propose another
> tranche registering interface LWLockRegisterTrancheOfLock(LWLock
> *, int). The interface gets rid of the necessity of storing
> tranche id separately from LWLock. (in patch 0001)

I don't think we really need this.  If it suits a particular caller to
to pass somelock->tranche to LWLockRegisterTranche, fine, but they can
do that with or without this function.  It's basically a one-line
function, so I don't see the point.

> The comment cited above looks a bit different from how the
> interface is actually used. So I rearranged the comment following
> a typical use I have seen in several cases. (in patch 0001)

I don't really see a need to change this.  It's true that what's
currently #3 could be done before #2, but I hesitate to call that a
typical practice.  Also, I'm worried that it could create the
impression that it's OK to use an LWLock before registering the
tranche, and it's really not.

> The final patch 0003 should be arguable, or anticipated to be
> rejected. It cannot be detect that a tranche is not registered
> until its name is actually accessed (or many eewon't care even if
> the name were printed as '(null)' in an error message that is the
> result of the developer's own bug). On the other hand there's no
> chance to detect that before the lock is actually used. By the
> last patch I added an assertion in LWLockAcquire() but it must
> hit performance in certain dgree (even if it is only in
> --enable-cassert build) so I don't insisit that it must be there.

Actually, I think this is a good idea as long as it doesn't hurt
performance too much.  It catches something that would otherwise be
hard to check.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Tracking of page changes for backup purposes. PTRACK [POC]

2017-12-18 Thread Robert Haas
On Mon, Dec 18, 2017 at 8:34 AM, Aleksander Alekseev
 wrote:
> 10.1, without ptrack
>
> transaction type: 
> scaling factor: 1
> query mode: simple
> number of clients: 4
> number of threads: 4
> duration: 300 s
> number of transactions actually processed: 28622
> latency average = 41.928 ms
> latency stddev = 18.238 ms
> tps = 95.396155 (including connections establishing)
> tps = 95.397406 (excluding connections establishing)
>
>
> At first glance PTRACK doesn't seem to affect the overall performance
> significantly.

I think this doesn't really show much because it's apparently limited
by the speed of fsync() on your filesystem.  You might try running the
test with synchronous_commit=off.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-12-18 Thread Fujii Masao
On Fri, Dec 15, 2017 at 9:51 AM, Masahiko Sawada  wrote:
> On Fri, Dec 15, 2017 at 6:46 AM, Michael Paquier
>  wrote:
>> On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao  wrote:
>>> On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
>>>  wrote:
 On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada  
 wrote:
> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas  wrote:
>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>  wrote:
>>> I would just write "To
>>> avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
>>> LWLock" and be done with it. There is no point to list a full function
>>> dependency list, which could change in the future with static routines
>>> of lwlock.c.
>
> Agreed. Updated the comment.

 Robert actually liked adding the complete routine list. Let's see what
 Fujii-san thinks at the end, there is still some time until the next
 round of minor releases.
>>>
>>> What I think is the patch I attached. Thought?
>>
>> That's OK for me. Thanks.
>
> +1 from me.

Committed. Thanks!

Regards,

-- 
Fujii Masao



Re: File name as application name in regression tests and elsewhere

2017-12-18 Thread Peter Eisentraut
On 12/18/17 06:59, Andrew Dunstan wrote:
> I was doing some work over the weekend and it occurred to me that it
> would be quite nice to have the input file name from regression tests
> set as the application name, and then use a log_line_prefix setting
> including %a, so that this would appear on the logs.

It does do that already, as of commit
a4327296df7366ecc657b706a9b5e87aa921311a.  Is it not working for you?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] static assertions in C++

2017-12-18 Thread Peter Eisentraut
On 12/11/17 17:12, Tom Lane wrote:
> Peter Eisentraut  writes:
>> On 12/11/17 16:45, Tom Lane wrote:
>>> (BTW, why is it that we can't fall back on the negative-width-bitfield
>>> trick for old g++?)
> 
>> The complaint is
>> error: types may not be defined in 'sizeof' expressions
> 
> Hmm, well, surely there's more than one way to do that; the sizeof
> is just a convenient way to wrap it in C.  Wouldn't a typedef serve
> just as well?

Here is another attempt, which has the desired effect with the handful
of compilers I have available.

(With the recent changes to AllocSetContextCreate() using a static
assertion, the current state now breaks actual extensions written in C++
in some configurations, so this has become a bit of a must-fix for PG11.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From d631fcb1fb2babef618bc9b3eba3f5591970a609 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 30 Aug 2016 12:00:00 -0400
Subject: [PATCH v4] Add support for static assertions in C++

This allows modules written in C++ to use or include header files that
use StaticAssertStmt() etc.
---
 src/include/c.h | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/src/include/c.h b/src/include/c.h
index 11fcffbae3..22535a7deb 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -754,6 +754,7 @@ typedef NameData *Name;
  * about a negative width for a struct bit-field.  This will not include a
  * helpful error message, but it beats not getting an error at all.
  */
+#ifndef __cplusplus
 #ifdef HAVE__STATIC_ASSERT
 #define StaticAssertStmt(condition, errmessage) \
do { _Static_assert(condition, errmessage); } while(0)
@@ -765,6 +766,19 @@ typedef NameData *Name;
 #define StaticAssertExpr(condition, errmessage) \
StaticAssertStmt(condition, errmessage)
 #endif /* HAVE__STATIC_ASSERT 
*/
+#else  /* C++ */
+#if defined(__cpp_static_assert) && __cpp_static_assert >= 200410
+#define StaticAssertStmt(condition, errmessage) \
+   static_assert(condition, errmessage)
+#define StaticAssertExpr(condition, errmessage) \
+   StaticAssertStmt(condition, errmessage)
+#else
+#define StaticAssertStmt(condition, errmessage) \
+   do { struct static_assert_struct { int static_assert_failure : 
(condition) ? 1 : -1; }; } while(0)
+#define StaticAssertExpr(condition, errmessage) \
+   ({ StaticAssertStmt(condition, errmessage); })
+#endif
+#endif /* C++ */
 
 
 /*

base-commit: 56a95ee5118bf6d46e261b8d352f7dafac10587d
-- 
2.15.1



Re: [HACKERS] Assertion failure when the non-exclusive pg_stop_backup aborted.

2017-12-18 Thread Masahiko Sawada
On Tue, Dec 19, 2017 at 3:52 AM, Fujii Masao  wrote:
> On Fri, Dec 15, 2017 at 9:51 AM, Masahiko Sawada  
> wrote:
>> On Fri, Dec 15, 2017 at 6:46 AM, Michael Paquier
>>  wrote:
>>> On Fri, Dec 15, 2017 at 3:52 AM, Fujii Masao  wrote:
 On Mon, Dec 11, 2017 at 2:16 PM, Michael Paquier
  wrote:
> On Mon, Dec 11, 2017 at 2:03 PM, Masahiko Sawada  
> wrote:
>> On Sat, Dec 9, 2017 at 2:24 AM, Robert Haas  
>> wrote:
>>> On Fri, Dec 8, 2017 at 4:13 AM, Michael Paquier
>>>  wrote:
 I would just write "To
 avoid calling CHECK_FOR_INTERRUPTS which can happen when releasing a
 LWLock" and be done with it. There is no point to list a full function
 dependency list, which could change in the future with static routines
 of lwlock.c.
>>
>> Agreed. Updated the comment.
>
> Robert actually liked adding the complete routine list. Let's see what
> Fujii-san thinks at the end, there is still some time until the next
> round of minor releases.

 What I think is the patch I attached. Thought?
>>>
>>> That's OK for me. Thanks.
>>
>> +1 from me.
>
> Committed. Thanks!
>

Thank you!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] Proposal: Local indexes for partitioned table

2017-12-18 Thread David Rowley
On 19 December 2017 at 05:08, Robert Haas  wrote:
> On Mon, Dec 18, 2017 at 11:02 AM, Alvaro Herrera
>  wrote:
>> After this discussion, this is how I see things working:
>>
>> 1. pg_dump
>>a) creates indexes on partitions normally
>>b) once all existing indexes are done, index on parent is created,
>>   with ONLY.  No cascading occurs, no indexes are attached.
>>c) ATTACH is run for each existing partition index.  After each
>>   ATTACH, we check that all indexes exist.  If so, the parent is
>>   marked valid.
>>d) if not all indexes existed in partitions, index on parent remains
>>   invalid.  (It was invalid in the dumped database, so this is
>>   correct.)
>>
>> 2. other uses
>>Normal CREATE INDEX (without ONLY) recurses and attaches the first
>>matching index it finds (no duplicate indexes are created);
>>partitions without a matching index get one created.
>>
>> 3. ALTER INDEX DETACH is not provided.  Therefore: once index is valid,
>>it remains so forever.
>>
>> I think this satisfies all concerns.
>
> Sounds great to me.

and me.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Logical replication without a Primary Key

2017-12-18 Thread Joshua D. Drake

On 12/07/2017 12:39 PM, Andres Freund wrote:


Not a problem. If you updated both rows, then there's two cases:
a) the update actually changed the column values. In which case the first 
per-row
change that's replicated updates the first row, but the second one won't
again find it as matching in all columns.
b) the update didn't actually change anything. In which case the same
row gets updated twice, but because the column values didn't change,
that doesn't matter.


I may be misunderstanding what is said above but if I ran a test:

Publisher:
reptest=# \d foorep
 Table "public.foorep"
 Column | Type | Collation | Nullable | Default
+--+---+--+-
 one    | text |   |  |
 two    | text |   |  |
Publications:
    "reptestpub"

reptest=# select * from foorep;
 one | two
-+-
 c   | b
 c   | b
 c   | b
(3 rows)

reptest=# update foorep set one = 'd';
UPDATE 3
reptest=# select * from foorep;
 one | two
-+-
 d   | b
 d   | b
 d   | b
(3 rows)

Subscriber before Publisher update:
reptest=# select * from foorep ;
 one | two
-+-
 c   | b
 c   | b
 c   | b
(3 rows)

Subscriber after Publisher update:
reptest=# select * from foorep ;
 one | two
-+-
 d   | b
 d   | b
 d   | b
(3 rows)

This is the behavior I was expecting. As I said, I may have 
misunderstood the responses but it is acting as I would expect.


Thanks!

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Logical replication without a Primary Key

2017-12-18 Thread Andres Freund
On 2017-12-18 12:43:24 -0800, Joshua D. Drake wrote:
> This is the behavior I was expecting. As I said, I may have misunderstood
> the responses but it is acting as I would expect.

Just ot make sure: You're saying there's no problem here, and that
logical rep is behaving correctly, right?

FWIW, I wonder if we need to add a warning somewhere about FULL
replication, given it's essentially O(#changes * #rows) -> O(n^2) for
updating the whole table.

Greetings,

Andres Freund



Re: Logical replication without a Primary Key

2017-12-18 Thread Joshua D. Drake

On 12/18/2017 12:52 PM, Andres Freund wrote:


Just ot make sure: You're saying there's no problem here, and that
logical rep is behaving correctly, right?


Correct. I am not sure where the miscommunication was (fully willing to 
accept it was on my side) but if I update multiple rows in a single 
statement, all the rows that were modified get replicated. That is the 
behavior I would have expected.



FWIW, I wonder if we need to add a warning somewhere about FULL
replication, given it's essentially O(#changes * #rows) -> O(n^2) for
updating the whole table.


The docs do mention it is a performance hit but something a little more 
"IF YOU DO THIS, BEWARE" may be good.


Thanks,

JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: Logical replication without a Primary Key

2017-12-18 Thread Petr Jelinek
On 18/12/17 21:57, Joshua D. Drake wrote:
> On 12/18/2017 12:52 PM, Andres Freund wrote:
>>
>> Just ot make sure: You're saying there's no problem here, and that
>> logical rep is behaving correctly, right?
> 
> Correct. I am not sure where the miscommunication was (fully willing to
> accept it was on my side) but if I update multiple rows in a single
> statement, all the rows that were modified get replicated. That is the
> behavior I would have expected.
> 

I think it's because we said if you update single row on upstream which
may be confusing in case of multiple rows. It will update single row on
downstream even though there is 4 same rows on downstream. That's still
true. In your test however you have 4 same rows on downstream but you
also have same 4 rows on upstream which you all updated. So you got 4
row updates which were replicated and each of those 4 updates changed
one row.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



  1   2   >