Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-22 Thread Masahiko Sawada
On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
 wrote:
> On 2017/06/23 10:22, Masahiko Sawada wrote:
>> On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote
>>  wrote:
>>> On 2017/06/22 16:56, Michael Paquier wrote:
 Did you check this patch with wal_consistency_checking? I am getting
 failures so your patch does not have the masking of GIN pages
 completely right:
 FATAL:  inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
 CONTEXT:  WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
 That's easily reproducible with installcheck and a standby replaying
 the changes. I did not look at the code in details to see what you may
 be missing here.
>>>
>>> Oh, wasn't sure about the gin_mask() changes myself.  Thanks for checking.
>>>
>>> Actually, the WAL consistency check fails even without patching
>>> gin_mask(), so the problem may be with the main patch itself.  That is,
>>> the patch needs to do something else other than just teaching
>>> GinInitMetabuffer() to initialize pd_lower.  Will look into that.
>>>
>>
>> I've not read the code deeply but I guess we should use
>> GinInitMetabuffer() in ginRedoUpdateMetapage() instead of
>> GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is
>> the same.
>
> That was it, thanks for the pointer.
>
> Attached updated patch, which I confirmed, passes wal_consistency_check = gin.

Thank you for updating the patch. It looks good to me.
BTW I'm inclined to have a regression test case where doing 'make
check' to the streaming replication environment with
wal_consistency_check on standby server so that we can detect a bug
around the wal.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-06-22 Thread Yugo Nagata
On Fri, 23 Jun 2017 13:41:15 +0900
Yugo Nagata  wrote:

> On Tue, 6 Jun 2017 13:03:58 +0530
> amul sul  wrote:
> 
> 
> > Updated patch attached.
> 
> I looked into the latest patch (v13) and have some comments
> althogh they might be trivial.

One more comment:

+   if (spec->remainder < 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_TABLE_DEFINITION),
+errmsg("remainder for hash partition must be a 
non-negative integer")));

The value of remainder is defined as Iconst in gram.y, so it never be negative.
Hence, I think this check is not necessary or Assert is enough.

> 
> First, I couldn't apply this patch to the latest HEAD due to
> a documentation fix and pgintend updates. It needes rebase.
> 
> $ git apply /tmp/0002-hash-partitioning_another_design-v13.patch  
> error: patch failed: doc/src/sgml/ref/create_table.sgml:87
> error: doc/src/sgml/ref/create_table.sgml: patch does not apply
> error: patch failed: src/backend/catalog/partition.c:76
> error: src/backend/catalog/partition.c: patch does not apply
> error: patch failed: src/backend/commands/tablecmds.c:13371
> error: src/backend/commands/tablecmds.c: patch does not apply
> 
> 
>
> +   Hash Partitioning
> +
> +   
> +
> + The table is partitioned by specifying modulus and remainder for 
> each
> + partition. Each partition holds rows for which the hash value of
> + partition keys when divided by specified modulus produces specified
> + remainder. For more clarification on modulus and remainder please 
> refer
> + .
> +
> +   
> +  
> +
> +  
> Range Partitioning
> 
> I think this section should be inserted after List Partitioning section 
> because
> the order of the descriptions is Range, List, then Hash in other places of
> the documentation. At least, 
> 
> 
> -partition bounds.  Currently supported
> -partitioning methods include range and list, where each partition is
> -assigned a range of keys and a list of keys, respectively.
> +partition bounds.  The currently supported
> +partitioning methods are list, range, and hash.
> 
> 
> Also in this hunk. I think "The currently supported partitioning methods are
> range, list, and hash." is better. We don't need to change the order of
> the original description.
>  
> 
>
> 
> -Declarative partitioning only supports list and range partitioning,
> -whereas table inheritance allows data to be divided in a manner of
> -the user's choosing.  (Note, however, that if constraint exclusion is
> -unable to prune partitions effectively, query performance will be 
> very
> -poor.)
> +Declarative partitioning only supports hash, list and range
> +partitioning, whereas table inheritance allows data to be divided in 
> a
> +manner of the user's choosing.  (Note, however, that if constraint
> +exclusion is unable to prune partitions effectively, query 
> performance
> +will be very poor.)
> 
> Similarly, I think "Declarative partitioning only supports range, list and 
> hash
> partitioning," is better.
> 
> 
> +
> +  
> +   Create a hash partitioned table:
> +
> +CREATE TABLE orders (
> +order_id bigint not null,
> +cust_id  bigint not null,
> +status   text
> +) PARTITION BY HASH (order_id);
> +
> +
> 
> This paragraph should be inserted between "Create a list partitioned table:"
> paragraph and "Ceate partition of a range partitioned table:" paragraph
> as well as range and list.
> 
> 
>   *strategy = PARTITION_STRATEGY_LIST;
>   else if (pg_strcasecmp(partspec->strategy, "range") == 0)
>   *strategy = PARTITION_STRATEGY_RANGE;
> + else if (pg_strcasecmp(partspec->strategy, "hash") == 0)
> + *strategy = PARTITION_STRATEGY_HASH;
>   else
>   ereport(ERROR,
> 
> In the most of codes, the order is hash, range, then list, but only
> in transformPartitionSpec(), the order is list, range, then hash,
> as above. Maybe it is better to be uniform.
> 
> 
> + {
> + if (strategy == PARTITION_STRATEGY_HASH)
> + ereport(ERROR,
> + 
> (errcode(ERRCODE_UNDEFINED_OBJECT),
> +  errmsg("data type %s 
> has no default hash operator class",
> + 
> format_type_be(atttype)),
> +  errhint("You must 
> specify a hash operator class or define a default hash operator class for the 
> data type.")));
> + else
> + ereport(ERROR,
> + 
> 

[HACKERS] Small bug in replication lag tracking

2017-06-22 Thread Thomas Munro
Hi,

I discovered a thinko in the new replication lag interpolation code
that can cause a strange number to be reported occasionally.

The interpolation code is designed to report increasing lag when
replay gets stuck somewhere between two LSNs for which we have
timestamp samples.  The bug is that after sitting idle and fully
replayed for a while and then encountering a new burst of WAL
activity, we interpolate between an ancient sample and the
not-yet-reached one for the new traffic, which is inappropriate.  It's
hard to see obviously strange lag times, because they normally only
exist for a very short moment in between receiving the first and
second replies from the standby, and they often look reasonable even
if you do manage to catch one in pg_stat_replication.  You can see the
problem by pausing replay on the the standby in between two bursts of
WAL with a long period of idleness in between.

Please find attached a patch to fix that, with comments to explain.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-lag-interpolation-bug.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in partition.c

2017-06-22 Thread Masahiko Sawada
On Thu, Jun 22, 2017 at 10:38 PM, Magnus Hagander  wrote:
>
>
> On Thu, Jun 22, 2017 at 4:16 AM, Masahiko Sawada 
> wrote:
>>
>> On Wed, Jun 21, 2017 at 3:32 AM, Peter Eisentraut
>>  wrote:
>> > On 6/19/17 23:02, Masahiko Sawada wrote:
>> >> Hi,
>> >>
>> >> Attached patch for $subject.
>> >>
>> >> s/opreator/operator/
>> >
>> > fixed
>> >
>>
>> Thank you!
>> I found another one.
>>
>> s/retrived/retrieved/
>>
>
> Thanks, applied.
>
>

Thanks too!

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-22 Thread Yugo Nagata
On Thu, 22 Jun 2017 13:55:26 -0400
Alvaro Herrera  wrote:

> Yugo Nagata wrote:
> > Hi,
> > 
> > As I report in another thread[1], I found the autovacuum launcher occurs
> > the following error in PG 10 when this received SIGINT. I can repuroduce
> > this by pg_cancel_backend or `kill -2 `.
> 
> Thanks for the report, BTW!

Thank you for fixing it!

> 
> -- 
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Yugo Nagata 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting server crash on Windows when using ICU collation

2017-06-22 Thread Ashutosh Sharma
Hi,

On Wed, Jun 21, 2017 at 9:50 AM, Amit Kapila  wrote:
> On Wed, Jun 21, 2017 at 12:15 AM, Peter Eisentraut
>  wrote:
>> On 6/20/17 09:23, Amit Kapila wrote:
>>> To avoid that why can't we use the same icu path for executing uconv
>>> as we are using for linking?
>>
>> Yeah, you'd need to use prefix $self->{options}->{icu} . '\bin\uconv' or
>> something like that.
>>
>
> That's what I had in mind as well.
>

Okay, attached is the patch which first detects the platform type and
runs the uconv commands accordingly  from the corresponding icu bin
path. Please have a look and let me know for any issues. Thanks.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com


detect_ICU_version_v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-06-22 Thread amul sul
On Fri, Jun 23, 2017 at 6:58 AM, Amit Langote
 wrote:
> On 2017/06/22 20:48, amul sul wrote:
>> Hi,
>>
>> While working on the another patch, I came across the case where
>> I need an auto generated partition for a mutil-column range partitioned
>> table having following range bound:
>>
>> PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
>> PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
>> PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
>> PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
>> PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)
>>
>> In this, a lower bound of the partition is an upper bound of the
>> previous partition.
>>
>> While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
>> got an overlap partition error.
>>
>> Here is the SQL to reproduced this error:
>>
>> CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
>> CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
>> UNBOUNDED) TO (10, 10);
>> CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
>> (10, UNBOUNDED);
>> CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 
>> 10);
>>
>> ERROR:  partition "p3" would overlap partition "tab1_p_10_10"
>>
>> This happened because of UNBOUNDED handling, where it is a negative infinite
>> if it is in FROM clause.  Wondering can't we explicitly treat this as
>> a positive infinite value, can we?
>
> No, we cannot.  What would be greater than (or equal to) +infinite?
> Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
> because 9890148 is not >= +infinite.  It will accept only the rows where
> the first column is > 10 (second column is not checked in that case).
>
> You will have to define p3 as follows:
>
> CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);
>
What if the partition key column is FLOAT ?

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [POC] hash partitioning

2017-06-22 Thread Yugo Nagata
On Tue, 6 Jun 2017 13:03:58 +0530
amul sul  wrote:


> Updated patch attached.

I looked into the latest patch (v13) and have some comments
althogh they might be trivial.

First, I couldn't apply this patch to the latest HEAD due to
a documentation fix and pgintend updates. It needes rebase.

$ git apply /tmp/0002-hash-partitioning_another_design-v13.patch  
error: patch failed: doc/src/sgml/ref/create_table.sgml:87
error: doc/src/sgml/ref/create_table.sgml: patch does not apply
error: patch failed: src/backend/catalog/partition.c:76
error: src/backend/catalog/partition.c: patch does not apply
error: patch failed: src/backend/commands/tablecmds.c:13371
error: src/backend/commands/tablecmds.c: patch does not apply


   
+   Hash Partitioning
+
+   
+
+ The table is partitioned by specifying modulus and remainder for each
+ partition. Each partition holds rows for which the hash value of
+ partition keys when divided by specified modulus produces specified
+ remainder. For more clarification on modulus and remainder please 
refer
+ .
+
+   
+  
+
+  
Range Partitioning

I think this section should be inserted after List Partitioning section because
the order of the descriptions is Range, List, then Hash in other places of
the documentation. At least, 


-partition bounds.  Currently supported
-partitioning methods include range and list, where each partition is
-assigned a range of keys and a list of keys, respectively.
+partition bounds.  The currently supported
+partitioning methods are list, range, and hash.


Also in this hunk. I think "The currently supported partitioning methods are
range, list, and hash." is better. We don't need to change the order of
the original description.
 

   

-Declarative partitioning only supports list and range partitioning,
-whereas table inheritance allows data to be divided in a manner of
-the user's choosing.  (Note, however, that if constraint exclusion is
-unable to prune partitions effectively, query performance will be very
-poor.)
+Declarative partitioning only supports hash, list and range
+partitioning, whereas table inheritance allows data to be divided in a
+manner of the user's choosing.  (Note, however, that if constraint
+exclusion is unable to prune partitions effectively, query performance
+will be very poor.)

Similarly, I think "Declarative partitioning only supports range, list and hash
partitioning," is better.


+
+  
+   Create a hash partitioned table:
+
+CREATE TABLE orders (
+order_id bigint not null,
+cust_id  bigint not null,
+status   text
+) PARTITION BY HASH (order_id);
+
+

This paragraph should be inserted between "Create a list partitioned table:"
paragraph and "Ceate partition of a range partitioned table:" paragraph
as well as range and list.


*strategy = PARTITION_STRATEGY_LIST;
else if (pg_strcasecmp(partspec->strategy, "range") == 0)
*strategy = PARTITION_STRATEGY_RANGE;
+   else if (pg_strcasecmp(partspec->strategy, "hash") == 0)
+   *strategy = PARTITION_STRATEGY_HASH;
else
ereport(ERROR,

In the most of codes, the order is hash, range, then list, but only
in transformPartitionSpec(), the order is list, range, then hash,
as above. Maybe it is better to be uniform.


+   {
+   if (strategy == PARTITION_STRATEGY_HASH)
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("data type %s 
has no default hash operator class",
+   
format_type_be(atttype)),
+errhint("You must 
specify a hash operator class or define a default hash operator class for the 
data type.")));
+   else
+   ereport(ERROR,
+   
(errcode(ERRCODE_UNDEFINED_OBJECT),
+errmsg("data type %s 
has no default btree operator class",
+   
format_type_be(atttype)),
+errhint("You must 
specify a btree operator class or define a default btree operator class for the 
data type.")));
+
+


   atttype,
-   
   "btree",
- 

Re: [HACKERS] Regarding Postgres Dynamic Shared Memory (DSA)

2017-06-22 Thread Mahendranath Gurram
Hi Thomas,



When you create or attach to a DSA area, a detach callback is 

automatically registered on the control segment (or containing 

segment, for an in-place DSA area). See the code like this in dsa.c: 



/* Clean up when the control segment detaches. */ 

on_dsm_detach(segment, dsa_on_dsm_detach_release_in_place, 



PointerGetDatum(dsm_segment_address(segment))); 



So the reference counting is automatically looked after and you don't 

need to do anything. There are four possibilities: (1) you explicitly 

detach from the area, (2) the ResourceManager active when you created 

or attached goes out of scope, detaching you automatically (you 

probably want to disable that with dsa_pin_mapping(area)), (3) your 

backend exits normally and it's automatically detached (4) you crash 

and the postmaster restarts the whole cluster on the basis that shared 

memory could be arbitrarily corrupted. 


The third possibility is our use case. We want the dsa_area to be alive through 
out the lifetime of the cluster and we want each backend to hold the attached 
dsa through out the lifetime of the backend irrespective of ResourceManager 
scope.



Hence we are calling dsa_pin(area) after dsa_create() and dsa_pin_mapping(area) 
after dsa_create() and dsa_attach().

DSA implementation in autovacuum.c helped me in understanding these concepts :)



So in my case, i could safely assume that postgres will automatically takes 
care of dsa detaching and reference count decrements during backend exit.



It's so nice of you, taking time and explaining and helping to solve the use 
cases. 



Best Regards,

-Mahi
Teamwork divides the task and multiplies the success.







 On Thu, 22 Jun 2017 17:08:01 +0530 Thomas Munro 
thomas.mu...@enterprisedb.com wrote 




On Thu, Jun 22, 2017 at 10:59 PM, Mahendranath Gurram 

mahendran...@zohocorp.com wrote: 

 I'm implementing the In-Memory index as per your suggestion. So far it's 

 good. 



Great news. 



 As of now, only one thing is unclear for me. How could i detach the 

 dsa(dsa_detach() call) in backend (typically during backend quit). 

 

 Of-course when process quits, all it's associated memory will be 

 cleared/destroyed. But we have to decrement dsm reference counts as part 
of 

 the potgres dsa framework implementation. which can not be done now. 

 

 I have gone through the postgres source code. But unfortunately, i 
couldn't 

 see anything in handling this case. 



When you create or attach to a DSA area, a detach callback is 

automatically registered on the control segment (or containing 

segment, for an in-place DSA area). See the code like this in dsa.c: 



/* Clean up when the control segment detaches. */ 

on_dsm_detach(segment, dsa_on_dsm_detach_release_in_place, 



PointerGetDatum(dsm_segment_address(segment))); 



So the reference counting is automatically looked after and you don't 

need to do anything. There are four possibilities: (1) you explicitly 

detach from the area, (2) the ResourceManager active when you created 

or attached goes out of scope, detaching you automatically (you 

probably want to disable that with dsa_pin_mapping(area)), (3) your 

backend exits normally and it's automatically detached (4) you crash 

and the postmaster restarts the whole cluster on the basis that shared 

memory could be arbitrarily corrupted. 



Note that if you call dsa_pin(area) after creating the DSA area the 

reference count cannot reach zero until you either call 

dsa_unpin(area) or the cluster exits. That's the right thing to do 

for a case where backends might come and go and it's possible for no 

one to be attached for periods of time, but you want the DSA area to 

continue to exist. I think that's probably what you need for a 

DSA-backed in-memory index. 



-- 

Thomas Munro 

http://www.enterprisedb.com 





-- 

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) 

To make changes to your subscription: 

http://www.postgresql.org/mailpref/pgsql-hackers 








Re: [HACKERS] Guarding against bugs-of-omission in initdb's setup_depend

2017-06-22 Thread Tom Lane
Robert Haas  writes:
> On Thu, Jun 22, 2017 at 2:11 PM, Tom Lane  wrote:
>> So I'm thinking about adding a regression test case, say in dependency.sql,
>> that looks for unpinned objects with OIDs in the hand-assigned range,
>> along the lines of this prototype code:

> I don't have specific thoughts, but I like the general idea.

Here's a more refined proposed patch.  After I tightened the pg_depend
probes to check refclassid, I started to get complaints about
pg_largeobject_metadata.  It turns out that large_object.sql creates a
large object with OID 3001, which triggered the check (but without the
classid test, it'd accidentally been fooled by the existence of a pin
entry for pg_proc OID 3001).  I'm not sure if it's such a hot idea to make
that large object; but on the whole it seems like this needs to be done
early in the regression tests so that it's not fooled by whatever random
objects might be created later in the tests.  I could not quite persuade
myself that checking pg_depend belonged in either opr_sanity or
type_sanity, so this patch sets up a "misc_sanity" test script to go
alongside those two.

regards, tom lane

diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 0f22e6d..b76eb1e 100644
*** a/src/bin/initdb/initdb.c
--- b/src/bin/initdb/initdb.c
*** setup_depend(FILE *cmdfd)
*** 1478,1486 
  		 * for instance) but generating only the minimum required set of
  		 * dependencies seems hard.
  		 *
! 		 * Note that we deliberately do not pin the system views, which
! 		 * haven't been created yet.  Also, no conversions, databases, or
! 		 * tablespaces are pinned.
  		 *
  		 * First delete any already-made entries; PINs override all else, and
  		 * must be the only entries for their objects.
--- 1478,1493 
  		 * for instance) but generating only the minimum required set of
  		 * dependencies seems hard.
  		 *
! 		 * Catalogs that are intentionally not scanned here are:
! 		 *
! 		 * pg_database: it's a feature, not a bug, that template1 is not
! 		 * pinned.
! 		 *
! 		 * pg_extension: a pinned extension isn't really an extension, hmm?
! 		 *
! 		 * pg_tablespace: tablespaces don't participate in the dependency
! 		 * code, and DropTableSpace() explicitly protects the built-in
! 		 * tablespaces.
  		 *
  		 * First delete any already-made entries; PINs override all else, and
  		 * must be the only entries for their objects.
*** setup_depend(FILE *cmdfd)
*** 1501,1506 
--- 1508,1515 
  		"INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
  		" FROM pg_constraint;\n\n",
  		"INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
+ 		" FROM pg_conversion;\n\n",
+ 		"INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
  		" FROM pg_attrdef;\n\n",
  		"INSERT INTO pg_depend SELECT 0,0,0, tableoid,oid,0, 'p' "
  		" FROM pg_language;\n\n",
*** initialize_data_directory(void)
*** 2906,2911 
--- 2915,2925 
  
  	setup_depend(cmdfd);
  
+ 	/*
+ 	 * Note that no objects created after setup_depend() will be "pinned".
+ 	 * They are all droppable at the whim of the DBA.
+ 	 */
+ 
  	setup_sysviews(cmdfd);
  
  	setup_description(cmdfd);
diff --git a/src/test/regress/expected/misc_sanity.out b/src/test/regress/expected/misc_sanity.out
index ...f026896 .
*** a/src/test/regress/expected/misc_sanity.out
--- b/src/test/regress/expected/misc_sanity.out
***
*** 0 
--- 1,78 
+ --
+ -- MISC_SANITY
+ -- Sanity checks for common errors in making system tables that don't fit
+ -- comfortably into either opr_sanity or type_sanity.
+ --
+ -- Every test failure in this file should be closely inspected.
+ -- The description of the failing test should be read carefully before
+ -- adjusting the expected output.  In most cases, the queries should
+ -- not find *any* matching entries.
+ --
+ -- NB: run this test early, because some later tests create bogus entries.
+ --  pg_depend 
+ -- Look for illegal values in pg_depend fields.
+ -- classid/objid can be zero, but only in 'p' entries
+ SELECT *
+ FROM pg_depend as d1
+ WHERE refclassid = 0 OR refobjid = 0 OR
+   deptype NOT IN ('a', 'e', 'i', 'n', 'p') OR
+   (deptype != 'p' AND (classid = 0 OR objid = 0)) OR
+   (deptype = 'p' AND (classid != 0 OR objid != 0 OR objsubid != 0));
+  classid | objid | objsubid | refclassid | refobjid | refobjsubid | deptype 
+ -+---+--++--+-+-
+ (0 rows)
+ 
+ --  pg_shdepend 
+ -- Look for illegal values in pg_shdepend fields.
+ -- classid/objid can be zero, but only in 'p' entries
+ SELECT *
+ FROM pg_shdepend as d1
+ WHERE refclassid = 0 OR refobjid = 0 OR
+   deptype NOT IN ('a', 'o', 'p', 'r') OR
+   (deptype != 'p' AND (dbid = 0 OR classid = 0 OR objid = 0)) OR
+   (deptype = 'p' AND (dbid != 0 OR classid 

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-22 Thread Tom Lane
Andrew Dunstan  writes:
> On 06/22/2017 10:24 AM, Tom Lane wrote:
>> That earlier request is still valid.  Also, if you can reproduce the
>> symptom that lorikeet just showed and get a stack trace from the
>> (hypothetical) postmaster core dump, that would be hugely valuable.

> See attached log and stacktrace

The stacktrace seems to be from the parallel-query-leader backend.
Was there another core file?

The lack of any indication in the postmaster log that the postmaster saw
the parallel worker's crash sure looks the same as lorikeet's failure.
But if the postmaster didn't dump core, then we're still at a loss as to
why it didn't respond.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

2017-06-22 Thread Chapman Flack
On 06/21/17 04:51, Heikki Linnakangas wrote:
> (I'm cleaning up my inbox, hence the delayed reply)

I had almost completely forgotten ever bringing it up. :)

> When I wrote that code, I don't remember if I realized that we're
> initializing the page headers, or if I thought that it's good enough even if
> we do. I guess I didn't realize it, because a comment would've been in order
> if it was intentional.
> 
> So +1 on fixing that, a patch would be welcome.

Ok, that sounds like something I could take a whack at. Overall, xlog.c
is a bit daunting, but this particular detail seems fairly approachable.

> In the meanwhile, have you
> tried using a different compression program? Something else than gzip might
> do a better job at the almost zero pages.

Well, gzip was doing pretty well; it could get a 16 MB segment file down
to under 27 kB, or less than 14 bytes for each of 2000 pages, when a page
header is what, 20 bytes, it looks like? I'm not sure how much better
I'd expect a (non-custom) compression scheme to do. The real difference
comes between compressing (even well) a large unchanged area, versus being
able to recognize (again with a non-custom tool) that the whole area is
unchanged.

-Chap


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing comment for create_modifytable_path

2017-06-22 Thread Etsuro Fujita

On 2017/06/23 2:53, Robert Haas wrote:

On Thu, Jun 15, 2017 at 4:40 AM, Etsuro Fujita
 wrote:

While working on adding support for tuple routing for foreign partitions, I
noticed that in create_modifytable_path, we forgot to add a comment on its
new argument 'partitioned_rels'.  Attached a patch for including that in the
comments for that function.


Committed with a slight adjustment.


Thank you for committing this patch (and another one)!

Best regards,
Etsuro Fujita



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-22 Thread Andrew Dunstan


On 06/22/2017 10:24 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Please let me know if there are tests I can run. I  missed your earlier
>> request in this thread, sorry about that.
> That earlier request is still valid.  Also, if you can reproduce the
> symptom that lorikeet just showed and get a stack trace from the
> (hypothetical) postmaster core dump, that would be hugely valuable.
>
>   


See attached log and stacktrace

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
errfinish
/home/andrew/bf64/root/HEAD/pgsql/src/backend/utils/error/elog.c:555
elog_finish
/home/andrew/bf64/root/HEAD/pgsql/src/backend/utils/error/elog.c:1378
s_lock_stuck
/home/andrew/bf64/root/HEAD/pgsql/src/backend/storage/lmgr/s_lock.c:83
s_lock
/home/andrew/bf64/root/HEAD/pgsql/src/backend/storage/lmgr/s_lock.c:98
shm_mq_detach
/home/andrew/bf64/root/HEAD/pgsql/src/backend/storage/ipc/shm_mq.c:783
slist_is_empty
/home/andrew/bf64/root/HEAD/pgsql/src/include/lib/ilist.h:567
DestroyParallelContext
/home/andrew/bf64/root/HEAD/pgsql/src/backend/access/transam/parallel.c:629
ExecParallelCleanup
/home/andrew/bf64/root/HEAD/pgsql/src/backend/executor/execParallel.c:672
ExecShutdownGather
/home/andrew/bf64/root/HEAD/pgsql/src/backend/executor/nodeGather.c:408


pglog
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Can ICU be used for a database's default sort order?

2017-06-22 Thread Peter Geoghegan
On Thu, Jun 22, 2017 at 7:10 PM, Tom Lane  wrote:
> Is there some way I'm missing, or is this just a not-done-yet feature?

It's a not-done-yet feature.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-22 Thread Kyotaro HORIGUCHI
At Fri, 23 Jun 2017 10:04:26 +0900 (JST), Tatsuo Ishii  
wrote in <20170623.100426.157023025943107410.t-is...@sraoss.co.jp>
> > Hm. I am wondering about licensing issues here to keep those files in
> > the tree. I am no lawyer.
> 
> Of course. Regarding euc-jis-2004-std.txt and sjis-0213-2004-std.txt,
> it seems safe to keep them.
> 
> > ## Date: 13 May 2006
> > ## License:
> > ##  Copyright (C) 2001 earth...@tama.or.jp, All Rights Reserved.
> > ##  Copyright (C) 2001 I'O, All Rights Reserved.
> > ##  Copyright (C) 2006 Project X0213, All Rights Reserved.
> > ##  You can use, modify, distribute this table freely.
> 
> >> - It allows to track the changes in the original file if we decide to
> >>   change the map files.
> > 
> > You have done that in the past for a couple of codepoints, didn't you?
> 
> I believe the reason why I didn't keep other txt files were they were
> prohibited to have copies according to their license.

For clarity, I personally perfer to keep all the source text file
in the repository, especially so that we can detect changes of
them. But since we decide that at least most of them not to be
there (from a reason of license), I just don't see a reason to
keep only the rest even without the restriction.

> >> - The site http://x0213.org/ may disappear in the future. If that
> >>   happens, we will lose track data how we create the map files.
> > 
> > There are other problems then as there are 3 sites in use to fetch the data:
> > - GB2312.TXT comes from greenstone.org.
> > - Some from icu-project.org.
> > - The rest is from unicode.org.
> 
> Maybe, but I don't know how to deal with them.

Except for detecting changes, as mentioned upthread, in case of
necessity of authority files (why?) after losing the autority, we
can regenerate a linear mapping from a .map file. But I believe
that further change (that we should follow) will hardly come.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-22 Thread Amit Kapila
On Thu, Jun 22, 2017 at 7:54 PM, Tom Lane  wrote:
> Andrew Dunstan  writes:
>> Please let me know if there are tests I can run. I  missed your earlier
>> request in this thread, sorry about that.
>
> That earlier request is still valid.
>

Yeah, that makes and also maybe we can try to print dsm_segment handle
value both before launching the worker and in parallel worker
(ParallelWorkerMain).  That might help us in identifying the reason of
error "could not map dynamic shared memory segment".


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql: Activate pager only for height, not width

2017-06-22 Thread Brendan Jurd
On Tue, 30 May 2017 at 05:30 Christoph Berg  wrote:

> Oh interesting, I didn't know about pager_min_lines. That sounds
> useful as well. +1 on the analogous pager_min_cols option.
>

On closer inspection, I note that psql already has a 'columns' \pset
option, which does control the width for triggering the pager, but also
controls the width for wrapping and auto-expanded mode purposes.  So, I can
set 'columns' to get the pager behaviour that I want, but I also get
unwanted effects where I'd rather let the default (terminal width) apply.

So if we were to add a 'pager_min_cols', we'd have to decide how it
interacts with 'columns'.  For example, to determine whether to trigger the
pager, we look for 'pager_min_cols' first, and if that is not set, then
fall back to 'columns'.  For all other width purposes, 'columns' would
continue to apply as present.

However, my feeling is that this is becoming a bit too fiddly.  If
'columns' did not already exist then 'pager_min_cols' would make more
sense, but as it does already exist, my preference is to leave 'columns'
as-is, and go for a height-only option to 'pager' instead.

Thoughts?

Cheers,
BJ


Re: [HACKERS] Broken hint bits (freeze)

2017-06-22 Thread Amit Kapila
On Wed, Jun 21, 2017 at 10:03 PM, Bruce Momjian  wrote:
> On Wed, Jun 21, 2017 at 07:49:21PM +0530, Amit Kapila wrote:
>> On Tue, Jun 20, 2017 at 7:24 PM, Amit Kapila  wrote:
>> > Hmm.  I think we need something that works with lesser effort because
>> > not all users will be as knowledgeable as you are, so if they make any
>> > mistakes in copying the file manually, it can lead to problems.  How
>> > about issuing a notification (XLogArchiveNotifySeg) in shutdown
>> > checkpoint if archiving is enabled?
>> >
>>
>> I have thought more about the above solution and it seems risky to
>> notify archiver for incomplete WAL segments (which will be possible in
>> this case as there is no guarantee that Checkpoint record will fill
>> the segment).  So, it seems to me we should update the document unless
>> you or someone has some solution to this problem.
>
> The over-arching question is how do we tell users to verify that the WAL
> has been replayed on the standby?  I am thinking we would say that for
> streaming replication, the "Latest checkpoint location" should match on
> the primary and standby, while for log shipping, the standbys should be
> exactly one WAL file less than the primary.
>

I am not sure if we can say "standbys should be exactly one WAL file
less than the primary" because checkpoint can create few more WAL
segments for future use.  I think to make this work user needs to
carefully just copy the next WAL segment (next to the last file in
standby) which will contain checkpoint record.  Ideally, there should
be some way either in form of a tool or a functionality in the
database server with which this last file can be copied but I think in
the absence of that we can at least document this fact.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-22 Thread Amit Langote
On 2017/06/23 10:22, Masahiko Sawada wrote:
> On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote
>  wrote:
>> On 2017/06/22 16:56, Michael Paquier wrote:
>>> Did you check this patch with wal_consistency_checking? I am getting
>>> failures so your patch does not have the masking of GIN pages
>>> completely right:
>>> FATAL:  inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
>>> CONTEXT:  WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
>>> That's easily reproducible with installcheck and a standby replaying
>>> the changes. I did not look at the code in details to see what you may
>>> be missing here.
>>
>> Oh, wasn't sure about the gin_mask() changes myself.  Thanks for checking.
>>
>> Actually, the WAL consistency check fails even without patching
>> gin_mask(), so the problem may be with the main patch itself.  That is,
>> the patch needs to do something else other than just teaching
>> GinInitMetabuffer() to initialize pd_lower.  Will look into that.
>>
> 
> I've not read the code deeply but I guess we should use
> GinInitMetabuffer() in ginRedoUpdateMetapage() instead of
> GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is
> the same.

That was it, thanks for the pointer.

Attached updated patch, which I confirmed, passes wal_consistency_check = gin.

Thanks,
Amit
diff --git a/src/backend/access/gin/ginutil.c b/src/backend/access/gin/ginutil.c
index 91e4a8cf70..52de599b29 100644
--- a/src/backend/access/gin/ginutil.c
+++ b/src/backend/access/gin/ginutil.c
@@ -372,6 +372,13 @@ GinInitMetabuffer(Buffer b)
metadata->nDataPages = 0;
metadata->nEntries = 0;
metadata->ginVersion = GIN_CURRENT_VERSION;
+
+   /*
+* Set pd_lower just past the end of the metadata.  This is not 
essential
+* but it makes the page look compressible to xlog.c.
+*/
+   ((PageHeader) page)->pd_lower =
+   ((char *) metadata + sizeof(GinMetaPageData)) - (char 
*) page;
 }
 
 /*
diff --git a/src/backend/access/gin/ginxlog.c b/src/backend/access/gin/ginxlog.c
index 7ba04e324f..6d2e528852 100644
--- a/src/backend/access/gin/ginxlog.c
+++ b/src/backend/access/gin/ginxlog.c
@@ -514,7 +514,7 @@ ginRedoUpdateMetapage(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
 
-   GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+   GinInitMetabuffer(metabuffer);
memcpy(GinPageGetMeta(metapage), >metadata, 
sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
MarkBufferDirty(metabuffer);
@@ -656,7 +656,7 @@ ginRedoDeleteListPages(XLogReaderState *record)
Assert(BufferGetBlockNumber(metabuffer) == GIN_METAPAGE_BLKNO);
metapage = BufferGetPage(metabuffer);
 
-   GinInitPage(metapage, GIN_META, BufferGetPageSize(metabuffer));
+   GinInitMetabuffer(metabuffer);
 
memcpy(GinPageGetMeta(metapage), >metadata, 
sizeof(GinMetaPageData));
PageSetLSN(metapage, lsn);
@@ -776,18 +776,11 @@ gin_mask(char *pagedata, BlockNumber blkno)
mask_page_hint_bits(page);
 
/*
-* GIN metapage doesn't use pd_lower/pd_upper. Other page types do. 
Hence,
-* we need to apply masking for those pages.
+* For GIN_DELETED page, the page is initialized to empty. Hence, mask
+* the page content.
 */
-   if (opaque->flags != GIN_META)
-   {
-   /*
-* For GIN_DELETED page, the page is initialized to empty. 
Hence, mask
-* the page content.
-*/
-   if (opaque->flags & GIN_DELETED)
-   mask_page_content(page);
-   else
-   mask_unused_space(page);
-   }
+   if (opaque->flags & GIN_DELETED)
+   mask_page_content(page);
+   else
+   mask_unused_space(page);
 }

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Can ICU be used for a database's default sort order?

2017-06-22 Thread Tom Lane
I tried to arrange $subject via

create database icu encoding 'utf8' lc_ctype "en-US-x-icu" lc_collate 
"en-US-x-icu" template template0;

and got only

ERROR:  invalid locale name: "en-US-x-icu"

which is unsurprising after looking into the code, because createdb()
checks those parameters with check_locale() which only knows about
libc-defined locale names.

Is there some way I'm missing, or is this just a not-done-yet feature?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUE and true

2017-06-22 Thread Kyotaro HORIGUCHI
At Thu, 22 Jun 2017 09:13:56 -0400, Peter Eisentraut 
 wrote in 
<08678a07-3967-8567-59e5-b9bcced7f...@2ndquadrant.com>
> On 6/22/17 07:09, Kyotaro HORIGUCHI wrote:
> > What makes us have both TRUE/true or FALSE/false as constants of
> > bool?
> 
> Historical reasons, probably.  I plan to submit a patch to phase out or
> remove TRUE/FALSE as part of a migration toward stdbool.h.  So I suggest
> you use lower case and don't worry about the other ones for now.

Oh. I'm happy to hear that. Thanks for the information.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TRUE and true

2017-06-22 Thread Kyotaro HORIGUCHI
At Thu, 22 Jun 2017 15:35:13 +0300, Alexander Korotkov 
 wrote in 

Re: [HACKERS] Fast promotion not used when doing a recovery_target PITR restore?

2017-06-22 Thread Michael Paquier
On Fri, Jun 23, 2017 at 2:34 AM, Andres Freund  wrote:
> I don't think it's really a bug - just a missed optimization.  I'd
> personally not be in favor of backpatching this - it'll have some chance
> of screwing things up, even if I hope that chance is fairly small.

It would be better to wait until the branch for PG11 opens then.

> As a wider discussion, I wonder if we should keep non-fast promotion for
> anything but actual crash recovery?

Yes, I would push a bit forward and remove fallback_promote.

> And even there it might actually be
> a pretty good idea to not force a full checkpoint - getting up fast
> after a crash is kinda important..

But not that. Crash recovery is designed to be simple and robust, with
only the postmaster and the startup processes running when doing so.
Not having the startup process doing by itself checkpoints would
require the need of the bgwriter, which increases the likelihood of
bugs. In short, I don't think that improving performance is the matter
for crash recovery, robustness and simplicity are.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-22 Thread Noah Misch
On Sun, Jun 18, 2017 at 11:59:43PM +0100, Andrew Gierth wrote:
> > "Andrew" == Andrew Gierth  writes:
> 
>  Andrew> Unfortunately I've been delayed over the past couple of days,
>  Andrew> but I have Thomas' latest patchset in hand and will be working
>  Andrew> on it over the rest of the week. Status update by 23:59 BST on
>  Andrew> Sun 18th, by which time I hope to have everything finalized
>  Andrew> (all three issues, not just the inheritance one).
> 
> I have, I believe, completed my review of the patchset. My conclusion is
> that the fix appears to be sound and I haven't been able to find any
> further issues with it; so I think Thomas's patches should be committed
> as-is. Unless anyone objects I will do this within the next few days.
> 
> (Any preferences for whether it should be one commit or 3 separate ones?)

This PostgreSQL 10 open item is past due for your status update.  Kindly send
a status update within 24 hours, and include a date for your subsequent status
update.  Refer to the policy on open item ownership:
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ICU non-utf8 code path leaks memory like there's no tomorrow

2017-06-22 Thread Tom Lane
In a database with utf8 encoding, this behaves reasonably:

select count(*) from
(select * from generate_series(1,1000) x
 order by x::text collate "en-x-icu") ss;

It eats circa 25MB, not a lot worse than the libc-collation equivalent.
But try it in say LATIN1, and it eats multiple gigabytes.

I believe the reason is that the code paths in varstr_cmp that make
use of icu_to_uchar() have forgotten to free the palloc'd output
of the latter.  I have not looked to see where else the users of
that and the reverse function made this mistake.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-06-22 Thread Amit Langote
On 2017/06/22 20:48, amul sul wrote:
> Hi,
> 
> While working on the another patch, I came across the case where
> I need an auto generated partition for a mutil-column range partitioned
> table having following range bound:
> 
> PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
> PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
> PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
> PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
> PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)
> 
> In this, a lower bound of the partition is an upper bound of the
> previous partition.
> 
> While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
> got an overlap partition error.
> 
> Here is the SQL to reproduced this error:
> 
> CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
> CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
> UNBOUNDED) TO (10, 10);
> CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
> (10, UNBOUNDED);
> CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 
> 10);
> 
> ERROR:  partition "p3" would overlap partition "tab1_p_10_10"
> 
> This happened because of UNBOUNDED handling, where it is a negative infinite
> if it is in FROM clause.  Wondering can't we explicitly treat this as
> a positive infinite value, can we?

No, we cannot.  What would be greater than (or equal to) +infinite?
Nothing.  So, even if you will want p3 to accept (10, 9890148), it won't
because 9890148 is not >= +infinite.  It will accept only the rows where
the first column is > 10 (second column is not checked in that case).

You will have to define p3 as follows:

CREATE TABLE p3 PARTITION OF tab1 FOR VALUES FROM (11, UNBOUNDED) TO (20, 10);

It's fine to use the previous partition's upper bound as the lower bound
of the current partition, if the former does contain an UNBOUNDED value,
because whereas a finite value divides the range into two parts (assigned
to the two partitions respectively), an UNBOUNDED value does not.  The
latter represents an abstract end of the range (either on the positive
side or the negative).

Does that make sense?

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-22 Thread Masahiko Sawada
On Thu, Jun 22, 2017 at 6:55 PM, Amit Langote
 wrote:
> On 2017/06/22 16:56, Michael Paquier wrote:
>> On Wed, Jun 21, 2017 at 9:42 AM, Amit Langote
>>  wrote:
>>> On 2017/06/20 20:37, Amit Kapila wrote:
 On Tue, Jun 20, 2017 at 1:50 PM, Amit Langote
  wrote:
> On 2017/06/19 23:31, Tom Lane wrote:
>> I'd suggest a rule like "if pd_lower is smaller than SizeOfPageHeaderData
>> then don't trust it, but assume all of the page is valid data".
>
> Actually, such a check is already in place in the tool, whose condition
> looks like:
>
> if (PageGetPageSize(header) == BLCKSZ &&
> PageGetPageLayoutVersion(header) == PG_PAGE_LAYOUT_VERSION &&
> (header->pd_flags & ~PD_VALID_FLAG_BITS) == 0 &&
> header->pd_lower >= SizeOfPageHeaderData &&
> header->pd_lower <= header->pd_upper &&
> header->pd_upper <= header->pd_special &&
> header->pd_special <= BLCKSZ &&
> header->pd_special == MAXALIGN(header->pd_special) && ...
>
> which even GIN metapage passes, making it an eligible data page and hence
> for omitting the hole between pd_lower and pd_upper.
>

 Won't checking for GIN_META in header->pd_flags gives you what you want?
>>>
>>> GIN_META flag is not written into pd_flags but GinPageOpaqueData.flags,
>>> which still requires including GIN's private header.
>>
>> Did you check this patch with wal_consistency_checking? I am getting
>> failures so your patch does not have the masking of GIN pages
>> completely right:
>> FATAL:  inconsistent page found, rel 1663/16385/28133, forknum 0, blkno 0
>> CONTEXT:  WAL redo at 0/39379EB8 for Gin/UPDATE_META_PAGE:
>> That's easily reproducible with installcheck and a standby replaying
>> the changes. I did not look at the code in details to see what you may
>> be missing here.
>
> Oh, wasn't sure about the gin_mask() changes myself.  Thanks for checking.
>
> Actually, the WAL consistency check fails even without patching
> gin_mask(), so the problem may be with the main patch itself.  That is,
> the patch needs to do something else other than just teaching
> GinInitMetabuffer() to initialize pd_lower.  Will look into that.
>

I've not read the code deeply but I guess we should use
GinInitMetabuffer() in ginRedoUpdateMetapage() instead of
GinInitPage(). Maybe also GinInitPage() in ginRedoDeleteListPages() is
the same.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-22 Thread Tatsuo Ishii
> Hm. I am wondering about licensing issues here to keep those files in
> the tree. I am no lawyer.

Of course. Regarding euc-jis-2004-std.txt and sjis-0213-2004-std.txt,
it seems safe to keep them.

> ## Date: 13 May 2006
> ## License:
> ##Copyright (C) 2001 earth...@tama.or.jp, All Rights Reserved.
> ##Copyright (C) 2001 I'O, All Rights Reserved.
> ##Copyright (C) 2006 Project X0213, All Rights Reserved.
> ##You can use, modify, distribute this table freely.

>> - It allows to track the changes in the original file if we decide to
>>   change the map files.
> 
> You have done that in the past for a couple of codepoints, didn't you?

I believe the reason why I didn't keep other txt files were they were
prohibited to have copies according to their license.

>> - The site http://x0213.org/ may disappear in the future. If that
>>   happens, we will lose track data how we create the map files.
> 
> There are other problems then as there are 3 sites in use to fetch the data:
> - GB2312.TXT comes from greenstone.org.
> - Some from icu-project.org.
> - The rest is from unicode.org.

Maybe, but I don't know how to deal with them.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-22 Thread Michael Paquier
On Fri, Jun 23, 2017 at 9:39 AM, Tatsuo Ishii  wrote:
> I think we should keep the original .txt files because:

Hm. I am wondering about licensing issues here to keep those files in
the tree. I am no lawyer.

> - It allows to track the changes in the original file if we decide to
>   change the map files.

You have done that in the past for a couple of codepoints, didn't you?

> - The site http://x0213.org/ may disappear in the future. If that
>   happens, we will lose track data how we create the map files.

There are other problems then as there are 3 sites in use to fetch the data:
- GB2312.TXT comes from greenstone.org.
- Some from icu-project.org.
- The rest is from unicode.org.

> I believe we'd better to follow the same way how src/timezone keeps
> the original timezone data.
>
> Above reasoning will not valid if we have a way to reconstruct the
> original txt files from the map files, I doubt it's worth the
> trouble to create such tools however.

That's true as well. No need for reverse-engineering if there is no
reason to. That would be possible though.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-22 Thread Tatsuo Ishii
>> Why do you believe so?
> 
> Unicode/Makefile includes that:
> euc-jis-2004-std.txt sjis-0213-2004-std.txt:
> $(DOWNLOAD) http://x0213.org/codetable/$(@F)
> 
> So those files ought to be downloaded when rebuilding the maps, and
> they should not be in the tree. In short, I think that Horiguchi-san
> is right. On top of the two pointed out by Horiguchi-san,
> gb-18030-2000.xml should not be in the tree.

I think we should keep the original .txt files because:

- It allows to track the changes in the original file if we decide to
  change the map files.

- The site http://x0213.org/ may disappear in the future. If that
  happens, we will lose track data how we create the map files.

I believe we'd better to follow the same way how src/timezone keeps
the original timezone data.

Above reasoning will not valid if we have a way to reconstruct the
original txt files from the map files, I doubt it's worth the
trouble to create such tools however.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix comment in xlog.c

2017-06-22 Thread Amit Langote
On 2017/06/23 5:44, Alvaro Herrera wrote:
> Amit Langote wrote:
>> Attached a patch for $SUBJECT.
>>
>> - * If RecPtr is not NULL, try to read a record at that position.  Otherwise
>> + * If RecPtr is valid, try to read a record at that position.  Otherwise
>>
>> Commit 4d6d425ab8d addressed the comment above XLogReadRecord() in
>> xlogreader.c, but missed the same above ReadRecord() in xlog.c.
>>
>> Backpatchable to 9.3, I'd think.
> 
> Yeah, fixed, thanks.

And thanks too.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redundant check of em_is_child

2017-06-22 Thread Amit Langote
On 2017/06/23 0:00, Robert Haas wrote:
> On Fri, May 19, 2017 at 3:46 AM, Amit Langote
>  wrote:
>> In match_eclasses_to_foreign_key_col(), there is this:
>>
>> if (em->em_is_child)
>> continue;   /* ignore children here */
>>
>> ISTM, it might as well be:
>>
>> Assert(!em->em_is_child);/* no children yet */
>>
>> That's because, I think it's still too early in query_planner() to be
>> expecting any child EC members.
> 
> I'm not sure there's really any benefit to this change.  In the
> future, somebody might want to use the function from someplace later
> on in the planner.  If the logic as-written would work correctly in
> that case now, I can't see why we should turn it into an assertion
> failure instead.  This isn't really costing us anything, is it?

Not much perhaps.  Just thought it might be an oversight (and potentially
a source of confusion today to someone trying to understand this code),
but no harm in leaving it the way it is, as you say, so that someone in
the future doesn't have to worry about checking here.

Sorry for the noise.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-22 Thread Thom Brown
On 22 June 2017 at 22:52, Robert Haas  wrote:
> On Thu, Jun 15, 2017 at 12:35 AM, Mithun Cy  
> wrote:
>> [ new patch ]
>
> I think this is looking better.  I have some suggestions:
>
> * I suggest renaming launch_autoprewarm_dump() to
> autoprewarm_start_worker().  I think that will be clearer.  Remember
> that user-visible names, internal names, and the documentation should
> all match.

+1

I like related functions and GUCs to be similarly named so that they
have the same prefix.

>
> * I think the GUC could be pg_prewarm.autoprewarm rather than
> pg_prewarm.enable_autoprewarm.  It's shorter and, I think, no less
> clear.

+1

I also think pg_prewarm.dump_interval should be renamed to
pg_prewarm.autoprewarm_interval.

>
> * In the documentation, don't say "This is a SQL callable function
> to".  This is a list of SQL-callable functions, so each thing in
> the list is one.  Just delete this from the beginning of each
> sentence.

I've made a pass at the documentation and ended up removing those
intros.  I haven't made any GUC/function renaming changes, but I have
rewritten some paragraphs for clarity.  Updated patch attached.

One thing I couldn't quite make sense of is:

"The autoprewarm process will start loading blocks recorded in
$PGDATA/autoprewarm.blocks until there is a free buffer left in the
buffer pool."

Is this saying "until there is a single free buffer remaining in
shared buffers"?  I haven't corrected or clarified this as I don't
understand it.

Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
detect an autoprewarm process already running.  I'd want this to
return NULL or an error if called for a 2nd time.

>
> * The reason for the AT_PWARM_* naming is not very obvious.  Does AT
> mean "at" or "auto" or something else?  How about
> AUTOPREWARM_INTERVAL_DISABLED, AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY,
> AUTOPREWARM_INTERVAL_DEFAULT?
>
> * Instead of defining apw_sigusr1_handler, I think you could just use
> procsignal_sigusr1_handler.  Instead of defining apw_sigterm_handler,
> perhaps you could just use die().  got_sigterm would go away, and
> you'd just CHECK_FOR_INTERRUPTS().
>
> * The PG_TRY()/PG_CATCH() block in autoprewarm_dump_now() could reuse
> reset_apw_state(), which might be better named detach_apw_shmem().
> Similarly, init_apw_state() could be init_apw_shmem().
>
> * Instead of load_one_database(), I suggest
> autoprewarm_database_main().  That is more parallel to
> autoprewarm_main(), which you have elsewhere, and makes it more
> obvious that it's the main entrypoint for some background worker.
>
> * Instead of launch_and_wait_for_per_database_worker(), I suggest
> autoprewarm_one_database(), and instead of prewarm_buffer_pool(), I
> suggest autoprewarm_buffers().   The motivation for changing prewarm
> to autoprewarm is that we want the names here to be clearly distinct
> from the other parts of pg_prewarm that are not related to
> autoprewarm.  The motivation for changing buffer_pool to buffers is
> just that it's a little shorter.  Personally I  also like the sound it
> of it better, but YMMV.
>
> * prewarm_buffer_pool() ends with a useless return statement.  I
> suggest removing it.
>
> * Instead of creating our own buffering system via buffer_file_write()
> and buffer_file_flush(), why not just use the facilities provided by
> the operating system?  fopen() et. al. provide buffering, and we have
> AllocateFile() to provide a FILE *; it's just like
> OpenTransientFile(), which you are using, but you'll get the buffering
> stuff for free.  Maybe there's some reason why this won't work out
> nicely, but off-hand it seems like it might.  It looks like you are
> already using AllocateFile() to read the dump, so using it to write
> the dump as well seems like it would be logical.
>
> * I think that it would be cool if, when autoprewarm completed, it
> printed a message at LOG rather than DEBUG1, and with a few more
> details, like "autoprewarm successfully prewarmed %d of %d
> previously-loaded blocks".  This would require some additional
> tracking that you haven't got right now; you'd have to keep track not
> only of the number of blocks read from the file but how many of those
> some worker actually loaded.  You could do that with an extra counter
> in the shared memory area that gets incremented by the per-database
> workers.
>
> * dump_block_info_periodically() calls ResetLatch() immediately before
> WaitLatch; that's backwards.  See the commit message for commit
> 887feefe87b9099c2967ec31ce20df4dfa9b and the comments it added to
> the top of latch.h for details on how to do this correctly.
>
> * dump_block_info_periodically()'s main loop is a bit confusing.  I
> think that after calling dump_now(true) it should just "continue",
> which will automatically retest got_sigterm.  You could rightly object
> to that plan on the grounds that we then wouldn't recheck got_sighup
> promptly, but you can fix 

Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-22 Thread Michael Paquier
On Fri, Jun 23, 2017 at 8:12 AM, Tatsuo Ishii  wrote:
>> Hi, I happned to notice that backend/utils/mb/Unicode directory
>> contains two encoding authority files, which I believe are not to
>> be there.

(Worked on that with Horiguchi-san a couple of weeks back.)

>> euc-jis-2004-std.txt
>> sjis-0213-2004-std.txt
>
> Why do you believe so?

Unicode/Makefile includes that:
euc-jis-2004-std.txt sjis-0213-2004-std.txt:
$(DOWNLOAD) http://x0213.org/codetable/$(@F)

So those files ought to be downloaded when rebuilding the maps, and
they should not be in the tree. In short, I think that Horiguchi-san
is right. On top of the two pointed out by Horiguchi-san,
gb-18030-2000.xml should not be in the tree.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-22 Thread Michael Paquier
On Thu, Jun 22, 2017 at 11:27 PM, Robert Haas  wrote:
> On Thu, Jun 22, 2017 at 8:32 AM, Alexander Korotkov
>  wrote:
>> On Thu, Jun 22, 2017 at 4:01 AM, Michael Paquier 
>> wrote:
>>> Putting that in a couple of words.
>>> 1. Table AM with a 6-byte TID.
>>> 2. Table AM with a custom locator format, which could be TID-like.
>>> 3. Table AM with no locators.
>>>
>>> Getting into having #1 first to work out would already be really
>>> useful for users.
>>
>> What exactly would be useful for *users*?  Any kind of API itself is
>> completely useless for users, because they are users, not developers.
>> Storage API could be useful for developers to implement storage AMs whose in
>> turn could be useful for users.
>
> What's your point?  I assume that is what Michael meant.

Sorry for the confusion. I implied here that users are the ones
developing modules.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-22 Thread Michael Paquier
On Fri, Jun 23, 2017 at 7:02 AM, Alvaro Herrera
 wrote:
> Thomas Munro wrote:
>> I thought about this when designing the DSA API.  I couldn't think of
>> any good reason to provide an 'am-I-already-attached?' function
>> equivalent to dsm_find_mapping.  It seemed to me that the client code
>> shouldn't ever be in any doubt about whether it's attached, and that
>> wilfully or absent-mindedly throwing away dsa_area pointers and having
>> to ask for them again doesn't seem like a very good design.  I suspect
>> the same applies to dsm_find_mapping, and I don't see any callers in
>> the source tree or indeed anywhere on the internet (based on a quick
>> Google search).  But I could be missing something.

I don't think that's completely exact. dsm_attach() uses at its
duplicates dsm_find_mapping() to see if a segment is already attached.
So dsm_attach could be refactored to directly use dsm_find_mapping().

> I think such an API call is potentially useful for things like assertion
> checks, if nothing else.

Indeed, that's useful here as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-22 Thread Tatsuo Ishii
> Hi, I happned to notice that backend/utils/mb/Unicode directory
> contains two encoding authority files, which I believe are not to
> be there.
> 
> euc-jis-2004-std.txt
> sjis-0213-2004-std.txt

Why do you believe so?
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb initalization failure for collation "ja_JP"

2017-06-22 Thread Tom Lane
I wrote:
> Now the hard part of that is that because pg_import_system_collations
> isn't using a temporary staging table, but is just inserting directly
> into pg_collation, there isn't any way for it to eliminate duplicates
> unless it uses if_not_exists behavior all the time.  So there seem to
> be two ways to proceed:
> 1. Drop pg_import_system_collations' if_not_exists argument and just
> define it as adding any collations not already known in pg_collation.
> 2. Significantly rewrite it so that it de-dups the collation set by
> hand before trying to insert into pg_collation.

After further thought, I realized that there's another argument for
making pg_import_system_collations() always do if-not-exists, which
is that as it stands it's inconsistent anyway because it silently
uses if-not-exists behavior for aliases.

So attached is a draft patch that drops if_not_exists and tweaks the
alias insertion code to guarantee deterministic behavior.  I also
corrected failure to use CommandCounterIncrement() in the ICU part
of the code, which would cause problems if ICU can ever report
duplicate names; something I don't especially care to assume is
impossible.  Also, I fixed things so that pg_import_system_collations()
doesn't emit any chatter about duplicate existing names, because it
didn't take long to realize that that behavior was useless and
irritating.  (Perhaps we should change the function to return the
number of entries successfully added?  But I didn't do that here.)

I've tested this with a faked version of "locale -a" that generates
duplicated entries, and it does what I think it should.

One question that I've got is why the ICU portion refuses to load
any entries unless is_encoding_supported_by_icu(GetDatabaseEncoding()).
Surely this is completely wrong?  I should think that what we load into
pg_collation ought to be independent of template1's encoding, the same
as it is for libc collations, and the right place to be making a test
like that is where somebody attempts to use an ICU collation.  But
I've not tried to test it.

Also, I'm a bit tempted to remove setup_collation()'s manual insertion of
'ucs_basic' in favor of adding a DATA() line for that to pg_collation.h.
As things stand, if for some reason "locale -a" were to report a locale
named that, initdb would fail.  In the old code, it would not have failed
--- it's uncertain whether you would have gotten the forced UTF8/C
definition or libc's definition, but you would have gotten only one.
However, that approach would result in 'ucs_basic' being treated as
pinned, which perhaps we don't want.  If we don't want it to be pinned,
another idea is just to make setup_collation() insert it first not last,
thereby ensuring that the SQL definition wins over any libc entries.

Comments?

regards, tom lane

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e073f7b..bd2eaaa 100644
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** postgres=# SELECT * FROM pg_walfile_name
*** 19711,19717 

 
  pg_import_system_collations
! pg_import_system_collations(if_not_exists boolean, schema regnamespace)
 
 void
 Import operating system collations
--- 19711,19717 

 
  pg_import_system_collations
! pg_import_system_collations(schema regnamespace)
 
 void
 Import operating system collations
*** postgres=# SELECT * FROM pg_walfile_name
*** 19730,19747 
 
  
 
! pg_import_system_collations populates the system
! catalog pg_collation with collations based on all the
! locales it finds on the operating system.  This is
  what initdb uses;
  see  for more details.  If additional
  locales are installed into the operating system later on, this function
! can be run again to add collations for the new locales.  In that case, the
! parameter if_not_exists should be set to true to
! skip over existing collations.  The schema
! parameter would typically be pg_catalog, but that is
! not a requirement.  (Collation objects based on locales that are no longer
! present on the operating system are never removed by this function.)
 
  

--- 19730,19748 
 
  
 
! pg_import_system_collations adds collations to the system
! catalog pg_collation based on all the
! locales it finds in the operating system.  This is
  what initdb uses;
  see  for more details.  If additional
  locales are installed into the operating system later on, this function
! can be run again to add collations for the new locales.  Locales that
! match existing entries in pg_collation will be skipped.
! (But collation objects based on locales that are no longer
! present in the operating system are not removed by this function.)
! The schema parameter would typically
! be pg_catalog, but that is 

Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-22 Thread Alvaro Herrera
Thomas Munro wrote:

> I thought about this when designing the DSA API.  I couldn't think of
> any good reason to provide an 'am-I-already-attached?' function
> equivalent to dsm_find_mapping.  It seemed to me that the client code
> shouldn't ever be in any doubt about whether it's attached, and that
> wilfully or absent-mindedly throwing away dsa_area pointers and having
> to ask for them again doesn't seem like a very good design.  I suspect
> the same applies to dsm_find_mapping, and I don't see any callers in
> the source tree or indeed anywhere on the internet (based on a quick
> Google search).  But I could be missing something.

I think such an API call is potentially useful for things like assertion
checks, if nothing else.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-22 Thread Robert Haas
On Thu, Jun 15, 2017 at 12:35 AM, Mithun Cy  wrote:
> [ new patch ]

I think this is looking better.  I have some suggestions:

* I suggest renaming launch_autoprewarm_dump() to
autoprewarm_start_worker().  I think that will be clearer.  Remember
that user-visible names, internal names, and the documentation should
all match.

* I think the GUC could be pg_prewarm.autoprewarm rather than
pg_prewarm.enable_autoprewarm.  It's shorter and, I think, no less
clear.

* In the documentation, don't say "This is a SQL callable function
to".  This is a list of SQL-callable functions, so each thing in
the list is one.  Just delete this from the beginning of each
sentence.

* The reason for the AT_PWARM_* naming is not very obvious.  Does AT
mean "at" or "auto" or something else?  How about
AUTOPREWARM_INTERVAL_DISABLED, AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY,
AUTOPREWARM_INTERVAL_DEFAULT?

* Instead of defining apw_sigusr1_handler, I think you could just use
procsignal_sigusr1_handler.  Instead of defining apw_sigterm_handler,
perhaps you could just use die().  got_sigterm would go away, and
you'd just CHECK_FOR_INTERRUPTS().

* The PG_TRY()/PG_CATCH() block in autoprewarm_dump_now() could reuse
reset_apw_state(), which might be better named detach_apw_shmem().
Similarly, init_apw_state() could be init_apw_shmem().

* Instead of load_one_database(), I suggest
autoprewarm_database_main().  That is more parallel to
autoprewarm_main(), which you have elsewhere, and makes it more
obvious that it's the main entrypoint for some background worker.

* Instead of launch_and_wait_for_per_database_worker(), I suggest
autoprewarm_one_database(), and instead of prewarm_buffer_pool(), I
suggest autoprewarm_buffers().   The motivation for changing prewarm
to autoprewarm is that we want the names here to be clearly distinct
from the other parts of pg_prewarm that are not related to
autoprewarm.  The motivation for changing buffer_pool to buffers is
just that it's a little shorter.  Personally I  also like the sound it
of it better, but YMMV.

* prewarm_buffer_pool() ends with a useless return statement.  I
suggest removing it.

* Instead of creating our own buffering system via buffer_file_write()
and buffer_file_flush(), why not just use the facilities provided by
the operating system?  fopen() et. al. provide buffering, and we have
AllocateFile() to provide a FILE *; it's just like
OpenTransientFile(), which you are using, but you'll get the buffering
stuff for free.  Maybe there's some reason why this won't work out
nicely, but off-hand it seems like it might.  It looks like you are
already using AllocateFile() to read the dump, so using it to write
the dump as well seems like it would be logical.

* I think that it would be cool if, when autoprewarm completed, it
printed a message at LOG rather than DEBUG1, and with a few more
details, like "autoprewarm successfully prewarmed %d of %d
previously-loaded blocks".  This would require some additional
tracking that you haven't got right now; you'd have to keep track not
only of the number of blocks read from the file but how many of those
some worker actually loaded.  You could do that with an extra counter
in the shared memory area that gets incremented by the per-database
workers.

* dump_block_info_periodically() calls ResetLatch() immediately before
WaitLatch; that's backwards.  See the commit message for commit
887feefe87b9099c2967ec31ce20df4dfa9b and the comments it added to
the top of latch.h for details on how to do this correctly.

* dump_block_info_periodically()'s main loop is a bit confusing.  I
think that after calling dump_now(true) it should just "continue",
which will automatically retest got_sigterm.  You could rightly object
to that plan on the grounds that we then wouldn't recheck got_sighup
promptly, but you can fix that by moving the got_sighup test to the
top of the loop, which is a good idea anyway for at least two other
reasons.  First, you probably want to check for a pending SIGHUP on
initially entering this function, because something might have changed
during the prewarm phase, and second, see the previous comment about
using the "another valid coding pattern" from latch.h, which puts the
ResetLatch() at the bottom of the loop.

* I think that launch_autoprewarm_dump() should ereport(ERROR, ...)
rather than just return NULL if the feature is disabled.  Maybe
something like ... ERROR: pg_prewarm.dump_interval must be
non-negative in order to launch worker

* Not sure about this one, but maybe we should consider just getting
rid of pg_prewarm.dump_interval = -1 altogether and make the minimum
value 0. If pg_prewarm.autoprewarm = on, then we start the worker and
dump according to the dump interval; if pg_prewarm.autoprewarm = off
then we don't start the worker automatically, but we still let you
start it manually.  If you do, it respects the configured
dump_interval.  With this design, we don't need the error suggested in
the 

Re: [HACKERS] Pluggable storage

2017-06-22 Thread Tomas Vondra

Hi,

On 6/22/17 4:36 PM, Alexander Korotkov wrote:
On Thu, Jun 22, 2017 at 5:27 PM, Robert Haas > wrote:


On Thu, Jun 22, 2017 at 8:32 AM, Alexander Korotkov
> wrote:
> On Thu, Jun 22, 2017 at 4:01 AM, Michael Paquier >
> wrote:
>> Putting that in a couple of words.
>> 1. Table AM with a 6-byte TID.
>> 2. Table AM with a custom locator format, which could be TID-like.
>> 3. Table AM with no locators.
>>
>> Getting into having #1 first to work out would already be really
>> useful for users.
>
> What exactly would be useful for *users*?  Any kind of API itself is
> completely useless for users, because they are users, not developers.
> Storage API could be useful for developers to implement storage AMs whose 
in
> turn could be useful for users.

What's your point?  I assume that is what Michael meant.


TBH, I don't understand what particular enchantments do we expect from 
having #1.


I'd say that's one of the things we're trying to figure out in this 
thread. Does it make sense to go with #1 in v1 of the patch, or do we 
have to implement #2 or #3 to get some real benefit for the users?


>
This is why it's hard for me to say if #1 is good idea.  It's also hard 
for me to say if patch upthread is right way of implementing #1.
But, I have gut feeling that if even #1 is good idea itself, it's 
definitely not what users expect from "pluggable storages".


The question is "why" do you think that. What features do you expect 
from pluggable storage API that would be impossible to implement with 
option #1?


I'm not trying to be annoying or anything like that - I don't know the 
answer and discussing those things is exactly why this thread exists.


I do agree #1 has limitations, and that it'd be great to get API that 
supports all kinds of pluggable storage implementations. But I guess 
that'll take some time.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-22 Thread Thomas Munro
On Thu, Jun 22, 2017 at 4:29 AM, Kuntal Ghosh
 wrote:
> On Wed, Jun 21, 2017 at 7:52 PM, Dilip Kumar  wrote:
>> On Wed, Jun 21, 2017 at 7:47 PM, Kuntal Ghosh
>>  wrote:
 IMHO, It's not a good idea to use DSM call to verify the DSA handle.

>>> Okay. Is there any particular scenario you've in mind where this may fail?
>>
>> It's not about failure, but about the abstraction.  When we are using
>> the DSA we should not directly access the DSM which is under DSA.
>>
> Okay. I thought that I've found at least one usage of
> dsm_find_mapping() in the code. :-)
>
> But, I've some more doubts.
> 1. When should we use dsm_find_mapping()? (The first few lines of
> dsm_attach is same as dsm_find_mapping().)
> 2. As a user of dsa, how should we check whether my dsa handle is
> already attached? I guess this is required because, if a user tries to
> re-attach a dsa handle,  it's punishing the user by throwing an error
> and the user wants to avoid such errors.

I thought about this when designing the DSA API.  I couldn't think of
any good reason to provide an 'am-I-already-attached?' function
equivalent to dsm_find_mapping.  It seemed to me that the client code
shouldn't ever be in any doubt about whether it's attached, and that
wilfully or absent-mindedly throwing away dsa_area pointers and having
to ask for them again doesn't seem like a very good design.  I suspect
the same applies to dsm_find_mapping, and I don't see any callers in
the source tree or indeed anywhere on the internet (based on a quick
Google search).  But I could be missing something.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in README.dependencies

2017-06-22 Thread Alvaro Herrera
Ashutosh Bapat wrote:
> On Mon, Jun 5, 2017 at 8:22 AM, atorikoshi
>  wrote:
> > Hi,
> >
> > I found below formula to compute selectivities, but
> > I think the last Probability 'P(b=?)' should be 'P(c=?)'.
> >
> >>  P(a=?,b=?,c=?) = P(a=?,b=?) * (d + (1-d)*P(b=?))
> >
> >
> > Attached patch fixes it, and it also adds some spaces
> > following another formula which is on line 86 and
> > computes P(a=?, b=?).
> 
> Agree. Also using "d" for "degree of functional dependence (b=>c) as
> well is confusing. We are already using "d" for "degree of functional
> dependence (a=>b). Here' patch to use "d'" instead of "d".

Since the surrounding text uses single quotes to talk about each letter,
I thought it was better to use a new letter (e) so that we don't require
the "prime" notation, which would end up being either inconsistent,
confusing, stupid-looking, or combinations thereof.

Also, your proposed text had a slight mistake: it's not (b=>c) that
d' is for, but (a,b=>c). 

Pushed with those corrections.

Thanks for the reports and patches!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-22 Thread Tomas Vondra

Hi,

On 6/21/17 9:47 PM, Robert Haas wrote:
...
>

like int8 or numeric, it won't work at all.  Even for other things,
it's going to cause problems because the bit patterns won't be what
the code is expecting; e.g. bitmap scans care about the structure of
the TID, not just how many bits it is.  (Due credit: Somebody, maybe
Alvaro, pointed out this problem before, at PGCon.) 


Can you elaborate a bit more about this TID bit pattern issues? I do 
remember that not all TIDs are valid due to safeguards on individual 
fields, like for example


Assert(iptr->ip_posid < (1 << MaxHeapTuplesPerPageBits))

But perhaps there are some other issues?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Dynamic instrumentation of lwlock wait times (lwlock flamegraphs)

2017-06-22 Thread Andres Freund
Hi,

At pgcon some people were talking about the difficulty of instrumenting
the time actually spent waiting for lwlocks and related measurements.
I'd mentioned that linux these days provides infrastructure to measure
such things in unmodified binaries.

Attached is a prototype of a script that measures the time spent inside
PGSemaphoreLock(), aggregates that inside the kernel, grouped by pid and
stacktrace.  That allows one to generate nice flame graphs showing which
part of the code waits how long for lwlocks.

The attached script clearly needs improvements, but I thought I'd show
some of the results it can get.  To run it you need the the python
library of the 'bcc' project [1], and a sufficiently new kernel.  Some
distributions, e.g. newer debian versions, package this as python-bpfcc
and similar.

The output of the script can be turned into a flamegraph with
https://github.com/brendangregg/FlameGraph 's flamegraph.pl.

Here's a few examples of a pgbench run. The number is the number of
clients, sync/async indicates synchronous_commit on/off.  All the
numbers here were generated with the libpq & pgbench batch patches
applied and in use, but that's just because that was the state of my
working tree.

http://anarazel.de/t/2017-06-22/pgsemwait_8_sync.svg
http://anarazel.de/t/2017-06-22/pgsemwait_8_async.svg
http://anarazel.de/t/2017-06-22/pgsemwait_64_sync.svg
http://anarazel.de/t/2017-06-22/pgsemwait_64_async.svg

A bonus, not that representative one is the wait for a pgbench readonly
run after the above, with autovacuum previously disabled:
http://anarazel.de/t/2017-06-22/pgsemwait_64_select.svg
interesting to see how the backends themselves never end up having to
flush WAL themselves, or at least not in a manner triggering contention.

I plan to write a few more of these, because they're hugely useful to
understand actual locking behaviour. Among them:
- time beteen Acquire/Release of lwlocks, grouped by lwlock
- time beteen Acquire/Release of lwlocks, grouped by stack
- callstack of acquirer and waker of lwlocks, grouped by caller stack, waiter 
stack
- ...

I think it might be interesting to collect a few of these somewhere
centrally once halfway mature.  Maybe in src/tools or such.

Greetings,

Andres Freund

[1] https://github.com/iovisor/bcc
#!/usr/bin/python

from __future__ import print_function
from bcc import BPF
from time import sleep

bpf_text = """
#include 
#include 

struct stats_key_t {
u32 pid;
int user_stack_id;
};

struct stats_value_t {
u64 total_time;
};

struct start_key_t {
u32 pid;
};

struct start_value_t {
u64 last_start;
};

// map_type, key_type, leaf_type, table_name, num_entry
BPF_HASH(stats, struct stats_key_t, struct stats_value_t);
BPF_HASH(start, struct start_key_t, struct start_value_t);

BPF_STACK_TRACE(stack_traces, STACK_STORAGE_SIZE)

int trace_sem_entry(struct pt_regs *ctx)
{
struct start_key_t start_key = {};
struct start_value_t start_value = {};

start_key.pid = bpf_get_current_pid_tgid();
start_value.last_start = bpf_ktime_get_ns();

start.update(_key, _value);

return 0;
}

int trace_sem_return(struct pt_regs *ctx)
{
struct stats_key_t stats_key = {};
struct start_key_t start_key = {};
struct stats_value_t zero = {};
struct stats_value_t *stats_value;
struct start_value_t *start_value;
u64 delta;

start_key.pid = bpf_get_current_pid_tgid();
start_value = start.lookup(_key);

if (!start_value)
return 0; /* missed start */;

delta = bpf_ktime_get_ns() - start_value->last_start;
start.delete(_key);

stats_key.pid = bpf_get_current_pid_tgid();
stats_key.user_stack_id = stack_traces.get_stackid(ctx, BPF_F_REUSE_STACKID 
| BPF_F_USER_STACK);

stats_value = stats.lookup_or_init(_key, );
stats_value->total_time += delta;

return 0;
}

"""

# set stack storage size
bpf_text = bpf_text.replace('STACK_STORAGE_SIZE', str(1000))

b = BPF(text=bpf_text)


libpath = 
BPF.find_exe('/home/andres/build/postgres/dev-optimize/vpath/src/backend/postgres')
if not libpath:
bail("can't resolve library %s" % library)
library = libpath

b.attach_uprobe(name=library, sym_re='PGSemaphoreLock',
fn_name="trace_sem_entry",
pid = -1)

b.attach_uretprobe(name=library, sym_re='PGSemaphoreLock',
fn_name="trace_sem_return",
pid = -1)

matched = b.num_open_uprobes()
if matched == 0:
print("error: 0 functions traced. Exiting.", file=stderr)
exit(1)

sleep(3)

stats = b.get_table("stats")
stack_traces = b.get_table("stack_traces")
folded = True #False

for k, v in stats.items(): #, key=lambda v: v.total_time):
#print(dir(k))
#print(dir(v))
user_stack = [] if k.user_stack_id < 0 else \
stack_traces.walk(k.user_stack_id)
name = 'postgres'
if v.total_time == 0:
continue
if folded:
# print folded stack output
user_stack = list(user_stack)
line = [name] + \
[b.sym(addr, k.pid) for addr 

Re: [HACKERS] Fix comment in xlog.c

2017-06-22 Thread Alvaro Herrera
Amit Langote wrote:
> Attached a patch for $SUBJECT.
> 
> - * If RecPtr is not NULL, try to read a record at that position.  Otherwise
> + * If RecPtr is valid, try to read a record at that position.  Otherwise
> 
> Commit 4d6d425ab8d addressed the comment above XLogReadRecord() in
> xlogreader.c, but missed the same above ReadRecord() in xlog.c.
> 
> Backpatchable to 9.3, I'd think.

Yeah, fixed, thanks.



-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Guarding against bugs-of-omission in initdb's setup_depend

2017-06-22 Thread Robert Haas
On Thu, Jun 22, 2017 at 2:11 PM, Tom Lane  wrote:
> While thinking about something else, it started to bother me that
> initdb's setup_depend() function knows exactly which catalogs might
> contain pinnable objects.  It is not very hard to imagine that somebody
> might add a DATA() line to, say, pg_transform.h and expect that the
> represented object could not get dropped.  Well, tain't so, because
> setup_depend() doesn't collect OIDs from there.
>
> So I'm thinking about adding a regression test case, say in dependency.sql,
> that looks for unpinned objects with OIDs in the hand-assigned range,
> along the lines of this prototype code:

I don't have specific thoughts, but I like the general idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Adding support for Default partition in partitioning

2017-06-22 Thread Robert Haas
On Wed, Jun 21, 2017 at 8:47 PM, Amit Langote
 wrote:
> It's the comma inside the error message that suggests to me that it's a
> style that I haven't seen elsewhere in the backend code.

Exactly.  Avoid that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

2017-06-22 Thread Tom Lane
Andres Freund  writes:
> Or, probably more robust: Simply _exit(2) without further ado, and rely
> on postmaster to output an appropriate error message. Arguably it's not
> actually useful to see hundreds of "WARNING: terminating connection because of
> crash of another server process" messages in the log anyway.

At that point you might as well skip the entire mechanism and go straight
to SIGKILL.  IMO the only reason quickdie() exists at all is to try to
send a helpful message to the client.  And it is helpful --- your attempt
to claim that it isn't sounds very much like wishful thinking.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Guarding against bugs-of-omission in initdb's setup_depend

2017-06-22 Thread Tom Lane
While thinking about something else, it started to bother me that
initdb's setup_depend() function knows exactly which catalogs might
contain pinnable objects.  It is not very hard to imagine that somebody
might add a DATA() line to, say, pg_transform.h and expect that the
represented object could not get dropped.  Well, tain't so, because
setup_depend() doesn't collect OIDs from there.

So I'm thinking about adding a regression test case, say in dependency.sql,
that looks for unpinned objects with OIDs in the hand-assigned range,
along the lines of this prototype code:

do $$
declare relnm text;
  shared bool;
  lowoid oid;
  pinned bool;
begin
for relnm, shared in
   select relname, relisshared from pg_class
   where relhasoids and oid < 16384 order by 1
loop
  execute 'select min(oid) from ' || relnm into lowoid;
  continue when lowoid is null or lowoid >= 1;
  if shared then
pinned := exists(select 1 from pg_shdepend where refobjid = lowoid and 
deptype = 'p');
  else
pinned := exists(select 1 from pg_depend where refobjid = lowoid and 
deptype = 'p');
  end if;
  if not pinned then
raise notice '% contains unpinned object with OID %', relnm, lowoid;
  end if;
end loop;
end$$;

At the moment, this prints

NOTICE:  pg_database contains unpinned object with OID 1
NOTICE:  pg_tablespace contains unpinned object with OID 1663

It's intentional that no databases are pinned, so I'm inclined to
explicitly skip pg_database in the test, or else to allow the above
to remain in the test's expected output.  The fact that the builtin
tablespaces aren't pinned is okay, because DropTablespace contains an
explicit privilege check preventing dropping them, but I wonder whether
we shouldn't adjust setup_depend() to pin them anyway.  Otherwise we
can do one of the above two things for pg_tablespace as well.

Another thought is that while this test would be enough to ensure that
there are no unpinned OIDs that the C code knows about explicitly through
#defines, it's certainly conceivable that there might be DATA() lines
without hand-assigned OIDs that the system nonetheless is really reliant
on being there.  So it feels like maybe the test isn't strong enough.
We could change the OID cutoff to be 16384 not 1, in which case
we also get complaints about pg_constraint, pg_conversion, pg_extension,
and pg_rewrite.  pg_constraint and pg_rewrite are safe enough, because
setup_depend() does scan those, and surely pg_extension is too (a pinned
extension seems like a contradiction in terms).  But as for pg_conversion,
it seems a bit scary that setup_depend() doesn't scan it.

So what I'm thinking about doing is adding a test like the above,
with OID cutoff 16384 and no printing of specific OIDs (because they
could change).  The expected output would be annotated with a comment
like "if a new catalog starts appearing here, that's probably okay,
but make sure setup_depend() is scanning it for OIDs that should be
pinned".  I'd want to add pg_conversion to setup_depend(), too.

We might want to pre-emptively add some other catalogs such as pg_policy
and pg_transform while we are at it.  But creating this regression test
ought to be enough to ensure that we don't start needing to scan those
catalogs without knowing it, so I don't feel that it's urgent to do so.

Thoughts?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-22 Thread Alvaro Herrera
Yugo Nagata wrote:
> Hi,
> 
> As I report in another thread[1], I found the autovacuum launcher occurs
> the following error in PG 10 when this received SIGINT. I can repuroduce
> this by pg_cancel_backend or `kill -2 `.

Thanks for the report, BTW!

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-22 Thread Bruce Momjian
On Thu, Jun 22, 2017 at 10:38:41AM -0400, Robert Haas wrote:
> On Wed, Jun 21, 2017 at 5:28 PM, Tom Lane  wrote:
> > Right now we're really just speculating about how much pain there will
> > be, on either end of this.  So it'd be interesting for somebody who's
> > carrying large out-of-tree patches (EDB? Citus?) to try the new
> > pgindent version on a back branch and see how much of their patches no
> > longer apply afterwards.
> 
> EDB is not continuously reapplying patches; we have branches into
> which the upstream reindents would have to be merged.  As a broad
> statement, reindenting all of the back branches is surely going to
> create some extra work for whoever has to do those merges, but if
> that's what the community thinks is best, we will of course manage.
> It's not *that* bad.
> 
> It would be slightly less annoying for us, I think, if the reindent
> were done immediately after a minor-release rather than at some other
> random point in time.

Also keep in mind that if we don't reindent all active branches then
even forks of Postgres will have merge conflicts in backporting of their
own patches, meaning the community and forks will have backbranch patch
difficulties.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Missing comment for create_modifytable_path

2017-06-22 Thread Robert Haas
On Thu, Jun 15, 2017 at 4:40 AM, Etsuro Fujita
 wrote:
> While working on adding support for tuple routing for foreign partitions, I
> noticed that in create_modifytable_path, we forgot to add a comment on its
> new argument 'partitioned_rels'.  Attached a patch for including that in the
> comments for that function.

Committed with a slight adjustment.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Autovacuum launcher occurs error when cancelled by SIGINT

2017-06-22 Thread Alvaro Herrera
Thomas Munro wrote:

> Hmm.  So the problem here is that AutoVacLauncherMain assumes that
> there are only two possibilities: (1) there is no handle published in
> shmem yet, so we should create a DSA area and publish the handle, and
> (2) there is a handle published in shmem so we should attach to it.
> But there is a another possiblity: (3) there is a handle published in
> shmem, but we are already attached to it (because we've restarted our
> main loop after SIGINT).
> 
> The suggestion so far was to check if we're already attached to that
> segment (making use of the implementation detail that a DSA handle is
> also a DSM segment handle), but don't we know if we're already
> attached by checking our AutoVacuumDSA variable?

This seems like a good fix to me.  Pushed this way, with an additional
comment.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

2017-06-22 Thread Andres Freund
On 2017-06-22 10:41:41 -0700, Andres Freund wrote:
> On 2017-02-02 12:18:22 -0800, Jimmy Yih wrote:
> > In the above pull request, Heikki also mentions that a similar scenario can
> > happen during palloc() as well... which is similar to what we saw in
> > Greenplum a couple years back for a deadlock in a malloc() call where we
> > responded by changing exit() to _exit() in quickdie as a fix.  That could
> > possibly be applicable to latest Postgres as well.
> 
> Isn't the quickdie() issue that we palloc/malloc in the first place,
> rather than the exit call?  There's some risk for exit() too, but we
> reset our own atexit handlers before exiting, so the risk seems fairly
> small.
> 
> 
> In my opinion the fix here would be to do it right and never emit error
> messages from signal handlers via elog.c - we've progressed a lot
> towards the goal and do a lot less in signal handlers these days - but
> quickdie() is one glaring exception to that.  I think a reasonable fix
> here would be to use write() of a statically allocated message, rather
> then elog proper, and not to send the message to the client.  Libpq, and
> I presume other clients, synthethize a message upon unexpected socket
> closure anyway, and it's completely unclear whether we can send a
> message anyway.

Or, probably more robust: Simply _exit(2) without further ado, and rely
on postmaster to output an appropriate error message. Arguably it's not
actually useful to see hundreds of "WARNING: terminating connection because of
crash of another server process" messages in the log anyway.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-22 Thread Robert Haas
On Fri, Apr 7, 2017 at 1:59 AM, Kyotaro HORIGUCHI
 wrote:
> Hi, I happned to notice that backend/utils/mb/Unicode directory
> contains two encoding authority files, which I believe are not to
> be there.
>
> euc-jis-2004-std.txt
> sjis-0213-2004-std.txt
>
> And what is more astonishing, make distclean didn't its work.
>
> | $ make distclean
> | rm -f
>
> The Makefile there is missing the defenition of TEXT.
>
> # Sorry for the bogus patch by me..
>
> The attached is the *first patch* that fixes distclean and adds
> the two files into GENERICTEXTS.
>
> =
>
> I don't attach the *second* patch since it's too large for the
> trivality and can be made by the following steps.
>
> $ cd src/backend/utils/mb/Unicode
> $ git rm *.txt
> $ git commit

I think you are right about all of this, although I am not an expert
in this area.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-22 Thread Daniel Verite
Andres Freund wrote:

> > One option may be to leave that decision to the user by providing a
> > PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
> > function.
> 
> What'd be the difference between PQflush() and PQbatchFlush()?

I guess no difference, I was just not seeing that libpq already provides
this functionality...

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

2017-06-22 Thread Andres Freund
On 2017-02-02 12:18:22 -0800, Jimmy Yih wrote:
> In the above pull request, Heikki also mentions that a similar scenario can
> happen during palloc() as well... which is similar to what we saw in
> Greenplum a couple years back for a deadlock in a malloc() call where we
> responded by changing exit() to _exit() in quickdie as a fix.  That could
> possibly be applicable to latest Postgres as well.

Isn't the quickdie() issue that we palloc/malloc in the first place,
rather than the exit call?  There's some risk for exit() too, but we
reset our own atexit handlers before exiting, so the risk seems fairly
small.


In my opinion the fix here would be to do it right and never emit error
messages from signal handlers via elog.c - we've progressed a lot
towards the goal and do a lot less in signal handlers these days - but
quickdie() is one glaring exception to that.  I think a reasonable fix
here would be to use write() of a statically allocated message, rather
then elog proper, and not to send the message to the client.  Libpq, and
I presume other clients, synthethize a message upon unexpected socket
closure anyway, and it's completely unclear whether we can send a
message anyway.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Stale comments in vacuumlazy.c

2017-06-22 Thread Robert Haas
On Sun, Mar 26, 2017 at 3:22 PM, Pavan Deolasee
 wrote:
> I happened to notice a stale comment at the very beginning of vacuumlazy.c.
> ISTM we forgot to fix that when we introduced FSM. With FSM, vacuum no
> longer needed to track per-page free space info. I propose attached fix.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fast promotion not used when doing a recovery_target PITR restore?

2017-06-22 Thread Andres Freund
On 2017-06-22 14:04:42 +0900, Michael Paquier wrote:
> On Thu, Jun 22, 2017 at 3:04 AM, Andres Freund  wrote:
> > When doing a PITR style recovery, with recovery target set, we're
> > currently not doing a fast promotion, in contrast to the handling when
> > doing a pg_ctl or trigger file based promotion. That can prolong making
> > the server available for writes.
> >
> > I can't really see a reason for this?
> 
> Yes, you are right. I see no reason either why this cannot be done.
> Why not just switching fast_promote to true in when using
> RECOVERY_TARGET_ACTION_PROMOTE? That's a bug, not a critical one
> though.

I don't think it's really a bug - just a missed optimization.  I'd
personally not be in favor of backpatching this - it'll have some chance
of screwing things up, even if I hope that chance is fairly small.

As a wider discussion, I wonder if we should keep non-fast promotion for
anything but actual crash recovery?  And even there it might actually be
a pretty good idea to not force a full checkpoint - getting up fast
after a crash is kinda important..

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible self-deadlock window after bad ProcessStartupPacket

2017-06-22 Thread Robert Haas
On Thu, Feb 2, 2017 at 3:18 PM, Jimmy Yih  wrote:
> In that pull request, we fix the issue by checking for proc_exit_inprogress.
> Is there a reason why startup_die should not check for proc_exit_inprogress?

startup_die() is just calling proc_exit(), so it seems like it might
be better to fix it by putting the check into proc_exit itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017 Proposal for predicate locking in hash index

2017-06-22 Thread Shubham Barai
Hi,

On 22 June 2017 at 21:12, Alvaro Herrera  wrote:

> Shubham Barai wrote:
> > Hi,
> >
> > Now that hash index support write-ahead logging, it will be great if we
> add
> > support for predicate locking to it.
> > Implementation of predicate locking in hash index seems very simple.
> > I have already made changes in the code. I am currently working on
> testing.
>
> So if I understand correctly, this would only cause a false positive if
> two transactions have a rw/ww conflict in different tuples in the same
> bucket.  Is that what we expect?
>
> Yes, I think so. Is there any way to further reduce false positives in
the same bucket?

Regards,
Shubham


   Sent with Mailtrack



Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Peter Geoghegan
On Thu, Jun 22, 2017 at 9:52 AM, Jan de Visser  wrote:
> I am not quite sure what you're trying to achieve, but are you aware that
> pgsql 9.6 introduced the ON CONFLICT clause, which allows you to do the same
> with a different syntax?
>
> https://www.postgresql.org/docs/9.6/static/sql-insert.html

I don't think it's the same thing. I think we could reasonably have
both SQL MERGE and ON CONFLICT. Or at least, I think that that makes
sense. Teradata already has both (their own custom UPSERT syntax, plus
an implementation of SQL MERGE).

Boxuan Zhai's patch didn't try to do anything special about
concurrency. At the time, this was controversial. However, we now
understand that SQL MERGE really isn't obligated to handle that at all
[1]. Besides, we have ON CONFLICT for those use-cases.

[1] https://wiki.postgresql.org/wiki/UPSERT#MERGE_disadvantages
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Kang Yuzhe
On Thu, Jun 22, 2017 at 3:13 PM, Michael Paquier
 wrote:
> On Thu, Jun 22, 2017 at 5:30 PM, Kang Yuzhe  wrote:
>> I wish I could but it's because I don't believe that I have the right
>> capability to fix code conflicts. My ultimate goal is to be PG hacker
>> like YOU.  Anyway, I will consider your perspective.
>
> Nice to see such a goal, though as a first patch presented to the
> community you may want something less ambitious. That's a complicated
> topic you are trying to deal with.

I think so!
Having estimated my skill, would you please point me to something less
ambitious patch(task) that I can work on so that I can find my way in
PG hacking?

Regards,
Zeray.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Kang Yuzhe
On Thu, Jun 22, 2017 at 7:52 PM, Jan de Visser  wrote:
> On Thursday, June 22, 2017 12:32:14 PM EDT Kang Yuzhe wrote:
>> Here is a sample what I did after applying the patch.
>>
>> testdb=# BEGIN;
>> BEGIN
>> testdb=#
>> testdb=# MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
>> testdb-#  WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
>> testdb-#  WHEN NOT MATCHED THEN DO NOTHING;
>> MERGE 1
>> testdb=# SELECT * FROM Stock;
>>  item_id | balance
>> -+-
>>   20 |1900
>>   10 |3200
>> (2 rows)
>>
>> testdb=# ROLLBACK;
>> ROLLBACK
>
> I am not quite sure what you're trying to achieve, but are you aware that
> pgsql 9.6 introduced the ON CONFLICT clause, which allows you to do the same
> with a different syntax?
>
> https://www.postgresql.org/docs/9.6/static/sql-insert.html
>
> Look for ON CONFLICT.

Yes, I am aware of ON CONFLICT.
DO NOTHING in SQL Merge is one type of scenario which is like ON CONFLICT.
My goal is to understand how SQL MERGE works which is in SQL ANSI/ISO
standard. And I would implement as a patch in the latest PG if I could
do that.

Regards,
Zeray.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Jan de Visser
On Thursday, June 22, 2017 12:32:14 PM EDT Kang Yuzhe wrote:
> Here is a sample what I did after applying the patch.
> 
> testdb=# BEGIN;
> BEGIN
> testdb=#
> testdb=# MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
> testdb-#  WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
> testdb-#  WHEN NOT MATCHED THEN DO NOTHING;
> MERGE 1
> testdb=# SELECT * FROM Stock;
>  item_id | balance
> -+-
>   20 |1900
>   10 |3200
> (2 rows)
> 
> testdb=# ROLLBACK;
> ROLLBACK

I am not quite sure what you're trying to achieve, but are you aware that 
pgsql 9.6 introduced the ON CONFLICT clause, which allows you to do the same 
with a different syntax?

https://www.postgresql.org/docs/9.6/static/sql-insert.html

Look for ON CONFLICT.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-06-22 Thread Andres Freund
On 2017-06-22 12:43:16 -0400, Robert Haas wrote:
> On Wed, Jan 25, 2017 at 2:52 PM, Andres Freund  wrote:
> > You'll, depending on your workload, still have a lot of lseeks even if
> > we were to use pread/pwrite because we do lseek(SEEK_END) to get file
> > sizes.
> 
> I'm pretty convinced that the lseek overhead that we're incurring
> right now is excessive.

No argument there.


> I mean, the Linux kernel guys fixed lseek to
> scale better more or less specifically because of PostgreSQL, which
> indicates that we're hitting it harder than most people.[1] And, more
> concretely, I've seen strace -c output where the time spent in lseek
> is far ahead of any other system call -- so if lseek overhead is
> negligible, then all of our system call overhead taken together is
> negligible, too.

That'll partially be because syscalls is where the kernel "prefers" to
switch between processes, and lseek is the most frequent one.


> Having said that, it's probably not a big percentage of our runtime
> right now -- on normal workloads, it's probably some number of tenths
> of one percent. But I'm not sure that's a good reason to ignore it.
> The more we CPU-optimize other things (say, expression evaluation!)
> the more significant the things that remain will become.  And we've
> certainly made performance fixes to save far fewer cycles than we're
> talking about here[2].

Well, there's some complexity / simplicity tradeoffs as everywhere ;)


> I'm no longer very sure fixing this is a very simple thing to do,
> partly because of the use of lseek to get the file size which you note
> above, and partly because of the possibility that this may, for
> example, break read-ahead, as Tom worried about previously[3].  But I
> think dismissing this as not-really-a-problem is the wrong approach.

I suspect this'll become a larger problem once we fix a few other
issues.  Right now I've a hard time measuring this, but if we'd keep
file sizes cached in shared memory, and we'd use direct IO, then we'd
potentially be able to have high enough IO throughput for this to
matter.  At the moment 8kb memcpy's (instead of DMA into user buffer) is
nearly always going to dwarf the overhead of the lseek().

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Misplacement of function declaration in contrib/postgres_fdw/postgres_fdw.h

2017-06-22 Thread Robert Haas
On Wed, Jan 11, 2017 at 10:19 PM, Etsuro Fujita
 wrote:
> While working on pushing down more joins/updates to the remote, I noticed
> that in contrib/postgres_fdw/postgres_fdw.h the declaration of
> get_jointype_name is misplaced in the section of shippable.c.  Since that
> function is defined in contrib/postgres_fdw/deparse.c, we should put that
> declaration in the section of deparse.c in the header file. Attached is a
> patch for that.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Kang Yuzhe
On Thu, Jun 22, 2017 at 3:51 PM, Peter Eisentraut
 wrote:
> On 6/22/17 05:13, Craig Ringer wrote:
>> On 22 June 2017 at 17:00, Kang Yuzhe  wrote:
>>
>>> diff --git a/src/backend/executor/nodeModifyTable.c
>>> b/src/backend/executor/nodeModifyTable.c
>>> index 8619ce3..e3ac758 100644
>>> --- a/src/backend/executor/nodeModifyTable.c
>>> +++ b/src/backend/executor/nodeModifyTable.c
>>
>> The first entry in the 'index' is the git commit hash of the base commit, 
>> IIRC.
> I don't know what the technical term is, but these values are hashes of
> the file before and after, or something like that.  They are not Git
> commits or trees.
>
> See git format-patch option --base for how to communicate the base commit.
>
> (The above patch was possibly created from an earlier unofficial Git
> repository, because I can't find those hashes.)

Thanks Peter for trying to help.
According to Thomas Munro advice, I did git log --since="Aug 4 2010"
--until="Aug 29 2010"
Picking random commit which is  005e427a22e3bb7fa01a84a7b476a3d6359a0344, I did
git checkout 005e427a22e3bb7fa01a84a7b476a3d6359a0344.

Next I did git checkout -b pgSQLMerge

And finally I did the following and it worked.

patch -p1 < merge_v201.patch

Regards,
Zeray


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-06-22 Thread Robert Haas
On Wed, Jan 25, 2017 at 2:52 PM, Andres Freund  wrote:
> You'll, depending on your workload, still have a lot of lseeks even if
> we were to use pread/pwrite because we do lseek(SEEK_END) to get file
> sizes.

I'm pretty convinced that the lseek overhead that we're incurring
right now is excessive.  I mean, the Linux kernel guys fixed lseek to
scale better more or less specifically because of PostgreSQL, which
indicates that we're hitting it harder than most people.[1] And, more
concretely, I've seen strace -c output where the time spent in lseek
is far ahead of any other system call -- so if lseek overhead is
negligible, then all of our system call overhead taken together is
negligible, too.

Having said that, it's probably not a big percentage of our runtime
right now -- on normal workloads, it's probably some number of tenths
of one percent. But I'm not sure that's a good reason to ignore it.
The more we CPU-optimize other things (say, expression evaluation!)
the more significant the things that remain will become.  And we've
certainly made performance fixes to save far fewer cycles than we're
talking about here[2].

I'm no longer very sure fixing this is a very simple thing to do,
partly because of the use of lseek to get the file size which you note
above, and partly because of the possibility that this may, for
example, break read-ahead, as Tom worried about previously[3].  But I
think dismissing this as not-really-a-problem is the wrong approach.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] https://www.postgresql.org/message-id/201110282133.18125.and...@anarazel.de
[2] 
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=2781b4bea7db357be59f9a5fd73ca1eb12ff5a79
[3] https://www.postgresql.org/message-id/6352.1471461075%40sss.pgh.pa.us


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Kang Yuzhe
On Thu, Jun 22, 2017 at 12:10 PM, Thomas Munro
 wrote:
> On Thu, Jun 22, 2017 at 9:00 PM, Kang Yuzhe  wrote:
>> I just downloaded the patch from GSoC site.
>
> I just looked at
> https://wiki.postgresql.org/wiki/Add_MERGE_command_GSoC_2010 and saw
> that the file https://wiki.postgresql.org/wiki/File:Merge_v201.tar was
> uploaded on 24 Aug 2010.  So I picked a random commit from that date,
> git checkout 005e427a22e3bb7fa01a84a7b476a3d6359a0344, and then I was
> able to apply that patch with patch -p1 < merge_v201.patch without any
> failures.

Thanks so much Thomas! I have managed to apply without any failures by
following your approach.
Here is a sample what I did after applying the patch.

testdb=# BEGIN;
BEGIN
testdb=#
testdb=# MERGE INTO Stock USING Buy ON Stock.item_id = Buy.item_id
testdb-#  WHEN MATCHED THEN UPDATE SET balance = balance + Buy.volume
testdb-#  WHEN NOT MATCHED THEN DO NOTHING;
MERGE 1
testdb=# SELECT * FROM Stock;
 item_id | balance
-+-
  20 |1900
  10 |3200
(2 rows)

testdb=# ROLLBACK;
ROLLBACK

I think and believe that though it may be tedious to do so, it is the
style Thomas followed to help me that hackers should do to help
newbies so that they can pursue their ambition.

I am humbled to say that I didn't know how to search git commits by a
specific day like what Thomas did.

Regards,
Zeray


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-06-22 Thread Masahiko Sawada
On Thu, Jun 22, 2017 at 10:36 PM, Magnus Hagander  wrote:
>
>
> On Thu, Jun 22, 2017 at 10:12 AM, Masahiko Sawada 
> wrote:
>>
>> Hi,
>>
>> Since an optional second argument wait_for_archive of pg_stop_backup
>> has been  introduced in PostgreSQL 10 we can choose whether wait for
>> archiving. But my colleagues found that we can do pg_stop_backup with
>> wait_for_archive = true on the standby server but it actually doesn't
>> wait for WAL archiving. Because this behavior is not documented and we
>> cannot find out it without reading source code it will confuse the
>> user.
>>
>> I think we can raise an error when pg_stop_backup with
>> wait_for_archive = true is executed on the standby. Attached patch
>> change it so that.
>
>
> Wouldn't it be better to make it *work*? If you have archive_mode=always, it
> makes sense to want to wait on the standby as well, does it not?
>

Yes, ideally it will be better to make it wait for WAL archiving on
standby server when archive_mode=always. But I think it would be for
PG11 item, and this item is for PG10.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-22 Thread Dilip Kumar
On Thu, Jun 22, 2017 at 7:18 PM, Daniel Gustafsson  wrote:
> Good point.  I’ve attached a new version which issues a NOTICE on truncation
> and also addresses all other comments so far in this thread.  Thanks a lot for
> the early patch reviews!
>
> cheers ./daniel

I have done an initial review of the patch. I have some comments/suggestions.


+int
+GetCancelMessage(char **buffer, size_t buf_len)
+{
+ volatile BackendCancelShmemStruct *slot = MyCancelSlot;
+ int msg_length = 0;
+

Returned value of this function is never used, better to convert it to
just void.

---

+bool
+HasCancelMessage(void)
+{
+ volatile BackendCancelShmemStruct *slot = MyCancelSlot;


+/*
+ * Return the configured cancellation message and its length. If the returned
+ * length is greater than the size of the passed buffer, truncation has been
+ * performed. The message is cleared on reading.
+ */
+int
+GetCancelMessage(char **buffer, size_t buf_len)

I think it will be good to merge these two function where
GetCancelMessage will first check whether it has the message or not
if it does then allocate the buffer of size slot->len and copy the
slot message to allocated buffer otherwise just return NULL.

So it will look like "char *GetCancelMessage()"

-

+ SpinLockAcquire(>mutex);
+ strlcpy(*buffer, (const char *) slot->message, buf_len);

strlcpy(*buffer, (const char *) slot->message, slot->len);

Isn't it better to copy only upto slot->len, seems like it will always
be <= buf_len, and if not then
we can do min(buf_len, slot->len)



-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PATCH: Batch/pipelining support for libpq

2017-06-22 Thread Andres Freund
Hi,

On 2017-06-22 13:43:35 +0200, Daniel Verite wrote:
> But OTOH there are certainly batch workloads where it will be preferrable
> for the first query to reach the server ASAP, rather than waiting to be
> coalesced with the next ones.

Is that really something people expect from a batch API?  I suspect it's
not really, and nothing would stop one from adding PQflush() or similar
calls if desirable anyway.

FWIW, the way I did that in the hack clearly isn't ok: If you were to
send a gigabyte of queries, it'd buffer them all up in memory... So some
more intelligence is going to be needed.


> libpq is not going to know what's best.
> One option may be to leave that decision to the user by providing a
> PQBatchAutoFlush(true|false) property, along with a PQBatchFlush()
> function.

What'd be the difference between PQflush() and PQbatchFlush()?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] GSoC 2017 Proposal for predicate locking in hash index

2017-06-22 Thread Alvaro Herrera
Shubham Barai wrote:
> Hi,
> 
> Now that hash index support write-ahead logging, it will be great if we add
> support for predicate locking to it.
> Implementation of predicate locking in hash index seems very simple.
> I have already made changes in the code. I am currently working on testing.

So if I understand correctly, this would only cause a false positive if
two transactions have a rw/ww conflict in different tuples in the same
bucket.  Is that what we expect?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [patch] pg_dump/pg_restore zerror() and strerror() mishap

2017-06-22 Thread Alvaro Herrera
Kunshchikov Vladimir wrote:

Hi,

> our testing team has noticed apparently wrong backup/restore error 
> messages like this:
> 
> 
> pg_restore: [compress_io] could not read from input file: success
> pg_dump: [directory archiver] could not write to output file: success
> 
> 
> 
> Such "success" messages are caused by calling strerror() after 
> gzread()/gzwrite() failures.

Yeah, I've complained about this before but never got around to actually
fixing it.  Thanks for the patch, I'll see about putting it on all
branches soon.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple TO version in ALTER EXTENSION UPDATE

2017-06-22 Thread Daniel Gustafsson
> On 22 Jun 2017, at 17:02, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> On Wed, Mar 29, 2017 at 11:13 AM, Daniel Gustafsson  wrote:
>>> While reading I noticed that we allow multiple TO  in ALTER 
>>> EXTENSION
>>> UPDATE, and defer throwing a syntax error until command processing.  Is 
>>> there a
>>> reason for deferring and not handling it in gram.y directly as in the 
>>> attached
>>> patch since it is in fact a syntax error?  It yields a different error 
>>> message
>>> to the user, but makes for easier to read code (IMH-and-biased-O).
> 
>> I think the idea of the current implementation was probably that the
>> grammar should leave room to support multiple options in arbitrary
>> order at that point in the syntax.  I'm not sure whether that's
>> something we'll ever actually need, or not.
> 
> It certainly seems plausible to me that we might someday grow additional
> options to control the UPDATE,

Fair enough, I was mainly curious about the reasoning, future proofing support
for additional options makes perfect sense.

> so I'm inclined to reject this patch.

I completely agree, I was using the patch to illustrate my question but wasn’t
very clear about that.

Thanks!

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [psql] patch to fix ordering in words_after_create array

2017-06-22 Thread Robert Haas
On Fri, Mar 24, 2017 at 7:10 AM, Rushabh Lathia
 wrote:
> While looking at the code around tab-complete.c, I
> found the ordering in words_after_create array is not
> correct for DEFAULT PRIVILEGES, which been added
> under below commit:
>
> commit d7d77f3825122bde55be9e06f6c4851028b99795
> Author: Peter Eisentraut 
> Date:   Thu Mar 16 18:54:28 2017 -0400
>
> psql: Add completion for \help DROP|ALTER
>
> PFA patch to fix the same.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple TO version in ALTER EXTENSION UPDATE

2017-06-22 Thread Tom Lane
Robert Haas  writes:
> On Wed, Mar 29, 2017 at 11:13 AM, Daniel Gustafsson  wrote:
>> While reading I noticed that we allow multiple TO  in ALTER 
>> EXTENSION
>> UPDATE, and defer throwing a syntax error until command processing.  Is 
>> there a
>> reason for deferring and not handling it in gram.y directly as in the 
>> attached
>> patch since it is in fact a syntax error?  It yields a different error 
>> message
>> to the user, but makes for easier to read code (IMH-and-biased-O).

> I think the idea of the current implementation was probably that the
> grammar should leave room to support multiple options in arbitrary
> order at that point in the syntax.  I'm not sure whether that's
> something we'll ever actually need, or not.

It certainly seems plausible to me that we might someday grow additional
options to control the UPDATE, so I'm inclined to reject this patch.
If it were saving a meaningful amount of grammar code, I might think
differently, but it's not really.  And it's not like we don't use the
same grammar pattern in lots of other places.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Redundant check of em_is_child

2017-06-22 Thread Robert Haas
On Fri, May 19, 2017 at 3:46 AM, Amit Langote
 wrote:
> In match_eclasses_to_foreign_key_col(), there is this:
>
> if (em->em_is_child)
> continue;   /* ignore children here */
>
> ISTM, it might as well be:
>
> Assert(!em->em_is_child);/* no children yet */
>
> That's because, I think it's still too early in query_planner() to be
> expecting any child EC members.

I'm not sure there's really any benefit to this change.  In the
future, somebody might want to use the function from someplace later
on in the planner.  If the logic as-written would work correctly in
that case now, I can't see why we should turn it into an assertion
failure instead.  This isn't really costing us anything, is it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Prologue of set_append_rel_size() and partitioned tables

2017-06-22 Thread Robert Haas
On Wed, Mar 29, 2017 at 3:18 AM, Ashutosh Bapat
 wrote:
>> Update patch attached.
>
> Looks good to me.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multiple TO version in ALTER EXTENSION UPDATE

2017-06-22 Thread Robert Haas
On Wed, Mar 29, 2017 at 11:13 AM, Daniel Gustafsson  wrote:
> While reading I noticed that we allow multiple TO  in ALTER EXTENSION
> UPDATE, and defer throwing a syntax error until command processing.  Is there 
> a
> reason for deferring and not handling it in gram.y directly as in the attached
> patch since it is in fact a syntax error?  It yields a different error message
> to the user, but makes for easier to read code (IMH-and-biased-O).

I think the idea of the current implementation was probably that the
grammar should leave room to support multiple options in arbitrary
order at that point in the syntax.  I'm not sure whether that's
something we'll ever actually need, or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re-indent HEAD tomorrow?

2017-06-22 Thread Robert Haas
On Wed, Jun 21, 2017 at 5:28 PM, Tom Lane  wrote:
> Right now we're really just speculating about how much pain there will
> be, on either end of this.  So it'd be interesting for somebody who's
> carrying large out-of-tree patches (EDB? Citus?) to try the new
> pgindent version on a back branch and see how much of their patches no
> longer apply afterwards.

EDB is not continuously reapplying patches; we have branches into
which the upstream reindents would have to be merged.  As a broad
statement, reindenting all of the back branches is surely going to
create some extra work for whoever has to do those merges, but if
that's what the community thinks is best, we will of course manage.
It's not *that* bad.

It would be slightly less annoying for us, I think, if the reindent
were done immediately after a minor-release rather than at some other
random point in time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pluggable storage

2017-06-22 Thread Alexander Korotkov
On Thu, Jun 22, 2017 at 5:27 PM, Robert Haas  wrote:

> On Thu, Jun 22, 2017 at 8:32 AM, Alexander Korotkov
>  wrote:
> > On Thu, Jun 22, 2017 at 4:01 AM, Michael Paquier <
> michael.paqu...@gmail.com>
> > wrote:
> >> Putting that in a couple of words.
> >> 1. Table AM with a 6-byte TID.
> >> 2. Table AM with a custom locator format, which could be TID-like.
> >> 3. Table AM with no locators.
> >>
> >> Getting into having #1 first to work out would already be really
> >> useful for users.
> >
> > What exactly would be useful for *users*?  Any kind of API itself is
> > completely useless for users, because they are users, not developers.
> > Storage API could be useful for developers to implement storage AMs
> whose in
> > turn could be useful for users.
>
> What's your point?  I assume that is what Michael meant.
>

TBH, I don't understand what particular enchantments do we expect from
having #1.
This is why it's hard for me to say if #1 is good idea.  It's also hard for
me to say if patch upthread is right way of implementing #1.
But, I have gut feeling that if even #1 is good idea itself, it's
definitely not what users expect from "pluggable storages".
>From user side, it would be normal to expect that "pluggable storage" could
be index-organized table where index could be say LSM...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Pluggable storage

2017-06-22 Thread Alexander Korotkov
On Wed, Jun 21, 2017 at 10:47 PM, Robert Haas  wrote:

> On Mon, Jun 12, 2017 at 9:50 PM, Haribabu Kommi
>  wrote:
> > Open Items:
> >
> > 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
> > HeapTuple and HeapScanDesc, So these scans are directly operating
> > on those structures and providing the result.
> >
> > These scan types may not be applicable to different storage formats.
> > So how to handle them?
>
> I think that BitmapHeapScan, at least, is applicable to any table AM
> that has TIDs.   It seems to me that in general we can imagine three
> kinds of table AMs:
>
> 1. Table AMs where a tuple can be efficiently located by a real TID.
> By a real TID, I mean that the block number part is really a block
> number and the item ID is really a location within the block.  These
> are necessarily quite similar to our current heap, but they can change
> the tuple format and page format to some degree, and it seems like in
> many cases it should be possible to plug them into our existing index
> AMs without too much heartache.  Both index scans and bitmap index
> scans ought to work.
>

If #1 is only about changing tuple and page formats, then could be much
simpler than the patch upthread?  We can implement "page format access
methods" with routines for insertion, update, pruning and deletion of
tuples *in particular page*.  There is no need to redefine high-level logic
for scanning heap, doing updates and so on...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Pluggable storage

2017-06-22 Thread Robert Haas
On Thu, Jun 22, 2017 at 8:32 AM, Alexander Korotkov
 wrote:
> On Thu, Jun 22, 2017 at 4:01 AM, Michael Paquier 
> wrote:
>> Putting that in a couple of words.
>> 1. Table AM with a 6-byte TID.
>> 2. Table AM with a custom locator format, which could be TID-like.
>> 3. Table AM with no locators.
>>
>> Getting into having #1 first to work out would already be really
>> useful for users.
>
> What exactly would be useful for *users*?  Any kind of API itself is
> completely useless for users, because they are users, not developers.
> Storage API could be useful for developers to implement storage AMs whose in
> turn could be useful for users.

What's your point?  I assume that is what Michael meant.

> Then while saying that #1 is useful for
> users, it would be nice to keep in mind particular storage AMs which can be
> implemented using #1.

I don't think anybody's arguing with that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-22 Thread Tom Lane
Andrew Dunstan  writes:
> Please let me know if there are tests I can run. I  missed your earlier
> request in this thread, sorry about that.

That earlier request is still valid.  Also, if you can reproduce the
symptom that lorikeet just showed and get a stack trace from the
(hypothetical) postmaster core dump, that would be hugely valuable.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-22 Thread Andrew Dunstan


On 06/21/2017 06:44 PM, Tom Lane wrote:
> Today, lorikeet failed with a new variant on the bgworker start crash:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lorikeet=2017-06-21%2020%3A29%3A10
>
> This one is even more exciting than the last one, because it sure looks
> like the crashing bgworker took the postmaster down with it.  That is
> Not Supposed To Happen.
>
> Wondering if we broke something here recently, I tried to reproduce it
> on a Linux machine by adding a randomized Assert failure in
> shm_mq_set_sender.  I don't see any such problem, even with EXEC_BACKEND;
> we recover from the crash as-expected.
>
> So I'm starting to get a distinct feeling that there's something wrong
> with the cygwin port.  But I dunno what.
>
>   


:-(

Please let me know if there are tests I can run. I  missed your earlier
request in this thread, sorry about that.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-22 Thread Daniel Gustafsson
> On 22 Jun 2017, at 10:24, Yugo Nagata  wrote:
> 
> On Thu, 22 Jun 2017 09:24:54 +0900
> Michael Paquier  wrote:
> 
>> On Wed, Jun 21, 2017 at 11:42 PM, Daniel Gustafsson  wrote:
>>> The message is truncated in SetBackendCancelMessage() for safety, but
>>> pg_{cancel|terminate}_backend() could throw an error on too long message, or
>>> warning truncation, to the caller as well.  Personally I think a warning is 
>>> the
>>> appropriate response, but I don’t really have a strong opinion.
>> 
>> And a NOTICE? That's what happens for relation name truncation. You
>> are right that having a check in SetBackendCancelMessage() makes the
>> most sense as bgworkers could just call the low level API. Isn't the
>> concept actually closer to just a backend message? This slot could be
>> used for other purposes than cancellation.
> 
> +1 for NOTICE. The message truncation seems to be a kind of helpful
> information rather than a likely problem as long as pg_terminated_backend
> exits successfully.
> 
> https://www.postgresql.org/docs/10/static/runtime-config-logging.html#runtime-config-severity-levels

Good point.  I’ve attached a new version which issues a NOTICE on truncation
and also addresses all other comments so far in this thread.  Thanks a lot for
the early patch reviews!

cheers ./daniel



terminate_msg_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in partition.c

2017-06-22 Thread Magnus Hagander
On Thu, Jun 22, 2017 at 4:16 AM, Masahiko Sawada 
wrote:

> On Wed, Jun 21, 2017 at 3:32 AM, Peter Eisentraut
>  wrote:
> > On 6/19/17 23:02, Masahiko Sawada wrote:
> >> Hi,
> >>
> >> Attached patch for $subject.
> >>
> >> s/opreator/operator/
> >
> > fixed
> >
>
> Thank you!
> I found another one.
>
> s/retrived/retrieved/
>
>
Thanks, applied.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-06-22 Thread Magnus Hagander
On Thu, Jun 22, 2017 at 10:12 AM, Masahiko Sawada 
wrote:

> Hi,
>
> Since an optional second argument wait_for_archive of pg_stop_backup
> has been  introduced in PostgreSQL 10 we can choose whether wait for
> archiving. But my colleagues found that we can do pg_stop_backup with
> wait_for_archive = true on the standby server but it actually doesn't
> wait for WAL archiving. Because this behavior is not documented and we
> cannot find out it without reading source code it will confuse the
> user.
>
> I think we can raise an error when pg_stop_backup with
> wait_for_archive = true is executed on the standby. Attached patch
> change it so that.
>

Wouldn't it be better to make it *work*? If you have archive_mode=always,
it makes sense to want to wait on the standby as well, does it not?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [HACKERS] TRUE and true

2017-06-22 Thread Peter Eisentraut
On 6/22/17 07:09, Kyotaro HORIGUCHI wrote:
> What makes us have both TRUE/true or FALSE/false as constants of
> bool?

Historical reasons, probably.  I plan to submit a patch to phase out or
remove TRUE/FALSE as part of a migration toward stdbool.h.  So I suggest
you use lower case and don't worry about the other ones for now.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Peter Eisentraut
On 6/22/17 05:13, Craig Ringer wrote:
> On 22 June 2017 at 17:00, Kang Yuzhe  wrote:
> 
>> diff --git a/src/backend/executor/nodeModifyTable.c
>> b/src/backend/executor/nodeModifyTable.c
>> index 8619ce3..e3ac758 100644
>> --- a/src/backend/executor/nodeModifyTable.c
>> +++ b/src/backend/executor/nodeModifyTable.c
> 
> The first entry in the 'index' is the git commit hash of the base commit, 
> IIRC.
I don't know what the technical term is, but these values are hashes of
the file before and after, or something like that.  They are not Git
commits or trees.

See git format-patch option --base for how to communicate the base commit.

(The above patch was possibly created from an earlier unofficial Git
repository, because I can't find those hashes.)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [patch] pg_dump/pg_restore zerror() and strerror() mishap

2017-06-22 Thread Kunshchikov Vladimir
Hello,

our testing team has noticed apparently wrong backup/restore error messages 
like this:


pg_restore: [compress_io] could not read from input file: success
pg_dump: [directory archiver] could not write to output file: success



Such "success" messages are caused by calling strerror() after 
gzread()/gzwrite() failures.

In order to properly decode errors, there should be used  gzerror() instead of 
strerror():
 
http://refspecs.linuxbase.org/LSB_2.1.0/LSB-generic/LSB-generic/zlib-gzerror-1.html

Errors should be like this:
pg_restore: [compress_io] could not read from input file: d3/2811.dat.gz: 
invalid distance too far back

Attached small fix for this issue.

You can view that patch  online on our github:
https://github.com/Infotecs/postgres/commit/1578f5011ad22d78ae059a4ef0924426fd6db762


--
Best regards,
Vladimir Kunschikov
Lead software developer
IDS project
InfoTeCS JSC


commit 1578f5011ad22d78ae059a4ef0924426fd6db762
Author: Kunshchikov Vladimir 
Date:   Wed Jun 21 11:33:47 2017 +0300

fixed gz_error/strerror mishap

diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c
index 631fa337e6..6e48d9d765 100644
--- a/src/bin/pg_dump/compress_io.c
+++ b/src/bin/pg_dump/compress_io.c
@@ -422,23 +422,6 @@ WriteDataToArchiveNone(ArchiveHandle *AH, CompressorState *cs,
 }
 
 
-/*--
- * Compressed stream API
- *--
- */
-
-/*
- * cfp represents an open stream, wrapping the underlying FILE or gzFile
- * pointer. This is opaque to the callers.
- */
-struct cfp
-{
-	FILE	   *uncompressedfp;
-#ifdef HAVE_LIBZ
-	gzFile		compressedfp;
-#endif
-};
-
 #ifdef HAVE_LIBZ
 static int	hasSuffix(const char *filename, const char *suffix);
 #endif
@@ -593,7 +576,7 @@ cfread(void *ptr, int size, cfp *fp)
 		ret = gzread(fp->compressedfp, ptr, size);
 		if (ret != size && !gzeof(fp->compressedfp))
 			exit_horribly(modulename,
-	"could not read from input file: %s\n", strerror(errno));
+	"could not read from input file: %s\n", get_gz_error(fp->compressedfp));
 	}
 	else
 #endif
@@ -629,7 +612,7 @@ cfgetc(cfp *fp)
 		{
 			if (!gzeof(fp->compressedfp))
 exit_horribly(modulename,
-	"could not read from input file: %s\n", strerror(errno));
+	"could not read from input file: %s\n", get_gz_error(fp->compressedfp));
 			else
 exit_horribly(modulename,
 			"could not read from input file: end of file\n");
@@ -710,4 +693,16 @@ hasSuffix(const char *filename, const char *suffix)
   suffixlen) == 0;
 }
 
+const char *
+get_gz_error(gzFile gzf)
+{
+	int errnum;
+	static const char fallback[] = "Zlib error";
+	const int maxlen = 255;
+	const char *errmsg = gzerror(gzf, );
+	if(!errmsg || !memchr(errmsg, 0, maxlen))
+		errmsg = fallback;
+
+	return errnum == Z_ERRNO ? strerror(errno) : errmsg;
+}
 #endif
diff --git a/src/bin/pg_dump/compress_io.h b/src/bin/pg_dump/compress_io.h
index 8f2e752cba..91007b042b 100644
--- a/src/bin/pg_dump/compress_io.h
+++ b/src/bin/pg_dump/compress_io.h
@@ -27,6 +27,24 @@ typedef enum
 	COMPR_ALG_LIBZ
 } CompressionAlgorithm;
 
+/*--
+ * Compressed stream API
+ *--
+ */
+
+/*
+ * cfp represents an open stream, wrapping the underlying FILE or gzFile
+ * pointer. This is opaque to the callers.
+ */
+typedef struct 
+{
+	FILE	   *uncompressedfp;
+#ifdef HAVE_LIBZ
+	gzFile		compressedfp;
+#endif
+} cfp;
+
+
 /* Prototype for callback function to WriteDataToArchive() */
 typedef void (*WriteFunc) (ArchiveHandle *AH, const char *buf, size_t len);
 
@@ -52,10 +70,6 @@ extern void ReadDataFromArchive(ArchiveHandle *AH, int compression,
 extern void WriteDataToArchive(ArchiveHandle *AH, CompressorState *cs,
    const void *data, size_t dLen);
 extern void EndCompressor(ArchiveHandle *AH, CompressorState *cs);
-
-
-typedef struct cfp cfp;
-
 extern cfp *cfopen(const char *path, const char *mode, int compression);
 extern cfp *cfopen_read(const char *path, const char *mode);
 extern cfp *cfopen_write(const char *path, const char *mode, int compression);
@@ -65,5 +79,6 @@ extern int	cfgetc(cfp *fp);
 extern char *cfgets(cfp *fp, char *buf, int len);
 extern int	cfclose(cfp *fp);
 extern int	cfeof(cfp *fp);
+extern const char * get_gz_error(gzFile gzf);
 
 #endif
diff --git a/src/bin/pg_dump/pg_backup_directory.c b/src/bin/pg_dump/pg_backup_directory.c
index 79922da8ba..6196227a04 100644
--- a/src/bin/pg_dump/pg_backup_directory.c
+++ b/src/bin/pg_dump/pg_backup_directory.c
@@ -42,6 +42,15 @@
 #include 
 #include 
 
+#ifdef WRITE_ERROR_EXIT
+#undef WRITE_ERROR_EXIT
+#endif
+#define WRITE_ERROR_EXIT \
+	do { \
+		exit_horribly(modulename, "could not write to output file: %s\n", \
+		ctx && ctx->dataFH && ctx->dataFH->compressedfp? get_gz_error(ctx->dataFH->compressedfp) : strerror(errno)); \
+	} while (0)
+
 typedef struct
 {
 	/*
diff --git 

Re: [HACKERS] TRUE and true

2017-06-22 Thread Alexander Korotkov
On Thu, Jun 22, 2017 at 2:09 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:

> Hello, I have a maybe-silly question.
>
> What makes us have both TRUE/true or FALSE/false as constants of
> bool?
>
> The following definitions in c.h didn't mess anything up.
>
> #define TRUEtrue
> #define FALSE   false
>

+1
I don't know any policy about where to use TRUE and FALSE over true and
false.

# NIL seems causing similar mess.
>

NIL is used for lists, while NULL is used for other pointers.
That may cause redundant, but at least there is clear policy.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Pluggable storage

2017-06-22 Thread Alexander Korotkov
On Thu, Jun 22, 2017 at 4:01 AM, Michael Paquier 
wrote:

> Putting that in a couple of words.
> 1. Table AM with a 6-byte TID.
> 2. Table AM with a custom locator format, which could be TID-like.
> 3. Table AM with no locators.
>
> Getting into having #1 first to work out would already be really
> useful for users.


What exactly would be useful for *users*?  Any kind of API itself is
completely useless for users, because they are users, not developers.
Storage API could be useful for developers to implement storage AMs whose
in turn could be useful for users.  Then while saying that #1 is useful for
users, it would be nice to keep in mind particular storage AMs which can be
implemented using #1.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] Pluggable storage

2017-06-22 Thread Alexander Korotkov
On Tue, Jun 13, 2017 at 4:50 AM, Haribabu Kommi 
wrote:

> On Fri, Oct 14, 2016 at 7:26 AM, Alvaro Herrera 
> wrote:
>
>> I have sent the partial patch I have to Hari Babu Kommi.  We expect that
>> he will be able to further this goal some more.
>
>
> Thanks Alvaro for sharing your development patch.
>
> Most of the patch design is same as described by Alvaro in the first mail
> [1].
> I will detail the modifications, pending items and open items (needs
> discussion)
> to implement proper pluggable storage.
>
> Here I attached WIP patches to support pluggable storage. The patch series
> are may not work individually. Still so many things are under development.
> These patches are just to share the approach of the current development.
>
> Some notable changes that I did to make the patch work:
>
> 1. Added storageam handler to the slot, this is because not all places
> the relation is not available in handy.
> 2. Retained the minimal Tuple in the slot, as this is used in HASH join.
> As per the first version, I feel it is fine to allow creating HeapTuple
> format data.
>
> Thanks everyone for sharing their ideas in the developer's unconference at
> PGCon Ottawa.
>
> Pending items:
>
> 1. Replacement of Tuple with slot in Trigger functionality
> 2. Replacement of Tuple with Slot from storage handler functions.
> 3. Remove/minimize the use of HeapTuple as a Datum.
> 4. Replace all references of HeapScanDesc with StorageScanDesc
> 5. Planner changes to consider the relation storage during the planning.
> 6. Any planner changes based on the discussion of open items?
> 7. some Executor changes to consider the storage advantages?
>
> Open Items:
>
> 1. The BitmapHeapScan and TableSampleScan are tightly coupled with
> HeapTuple and HeapScanDesc, So these scans are directly operating
> on those structures and providing the result.
>

What about vacuum?  I see vacuum is untouched in the patchset and it is not
mentioned in this discussion.
Do you plan to override low-level function like heap_page_prune(),
lazy_vacuum_page() etc., but preserve high-level logic of vacuum?
Or do you plan to let pluggable storage implement its own high-level vacuum
algorithm?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] SQL MERGE patches for PostgreSQL Versions

2017-06-22 Thread Michael Paquier
On Thu, Jun 22, 2017 at 5:30 PM, Kang Yuzhe  wrote:
> I wish I could but it's because I don't believe that I have the right
> capability to fix code conflicts. My ultimate goal is to be PG hacker
> like YOU.  Anyway, I will consider your perspective.

Nice to see such a goal, though as a first patch presented to the
community you may want something less ambitious. That's a complicated
topic you are trying to deal with.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Multi column range partition table

2017-06-22 Thread amul sul
Hi,

While working on the another patch, I came across the case where
I need an auto generated partition for a mutil-column range partitioned
table having following range bound:

PARTITION p1 FROM  (UNBOUNDED, UNBOUNDED) TO (10, 10)
PARTITION p2 FROM  (10, 10)  TO (10, UNBOUNDED)
PARTITION p3 FROM  (10, UNBOUNDED) TO (20, 10)
PARTITION p4 FROM (20, 10) TO (20, UNBOUNDED)
PARTITION p5 FROM (20, UNBOUNDED) TO (UNBOUNDED, UNBOUNDED)

In this, a lower bound of the partition is an upper bound of the
previous partition.

While trying to create p3 partition with (10, UNBOUNDED) to (20, 10) bound,
got an overlap partition error.

Here is the SQL to reproduced this error:

CREATE TABLE range_parted ( i1 int,  i2 int ) PARTITION BY RANGE (i1, i2);
CREATE TABLE p1 PARTITION OF range_parted FOR VALUES FROM (UNBOUNDED,
UNBOUNDED) TO (10, 10);
CREATE TABLE p2 PARTITION OF range_parted FOR VALUES FROM (10, 10) TO
(10, UNBOUNDED);
CREATE TABLE p3   PARTITION OF tab1 FOR VALUES FROM (10, UNBOUNDED) TO (20, 10);

ERROR:  partition "p3" would overlap partition "tab1_p_10_10"

This happened because of UNBOUNDED handling, where it is a negative infinite
if it is in FROM clause.  Wondering can't we explicitly treat this as
a positive infinite value, can we?

Thoughts/Comments?

Regards,
Amul


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >