date:20180423

Extending a bit string

2018-04-23 Thread Evan Carroll

Currently the behavior of bit-string extensions is pretty insane.

SELECT b'01'::bit(2)::bit(4),
  b'01'::bit(2)::int::bit(4);
 bit  | bit
--+--
 0100 | 0001
(1 row)

I'd like propose we standardize this a bit. Previously, in version 8
compatibility was broke. From the Version 8 release notes (thanks to
Rhodium Toad for the research),

> Casting an integer to BIT(N) selects the rightmost N bits of the integer,
not the leftmost N bits as before.

Everything should select the right-most bits, and extend the left-most bits
from the docs:

> Casting an integer to a bit string width wider than the integer itself
will sign-extend on the left.

That makes sense to me. Intergers sign-extend on the left, and the behavior
is currently undefined for bit->bit extensions. What say you?

-- 
Evan Carroll - m...@evancarroll.com
System Lord of the Internets
web: http://www.evancarroll.com
ph: 281.901.0011

Re: minor fix for acquire_inherited_sample_rows

2018-04-23 Thread Amit Langote

On 2018/04/24 13:29, Ashutosh Bapat wrote:
> On Mon, Apr 23, 2018 at 6:45 PM, Amit Langote  wrote:
>> On Mon, Apr 23, 2018 at 8:25 PM, Ashutosh Bapat wrote:
>>> On Mon, Apr 23, 2018 at 3:44 PM, Amit Langote wrote:
 Hi.

 acquire_inherited_sample_rows() currently uses equalTupleDescs() being
 false as the condition for going to tupconv.c to determine whether tuple
 conversion is needed.  But equalTupleDescs() will always return false if
 it's passed TupleDesc's of two different tables, which is the most common
 case here.  So I first thought we should just unconditionally go to
 tupconv.c, but there is still one case where we don't need to, which is
 the case where the child table is same as the parent table.  However, it
 would be much cheaper to just check if the relation OIDs are different
 instead of calling equalTupleDescs, which the attached patch teaches it to 
 do.
>>>
>>> We want to check whether tuple conversion is needed. Theoretically
>>> (not necessarily practically), one could have tuple descs of two
>>> different tables stamped with the same tdtypeid. From that POV,
>>> equalTupleDescs seems to be a stronger check than what you have in the
>>> patch.
>>
>> If I'm reading right, that sounds like a +1 to the patch. :)
> 
> Not really! To make things clear, I am not in favour of this patch.

Oh okay.  I read your last sentence as "equalTupleDescs() seems to be a
stronger check than needed" (which is how I see it), but apparently that's
not what you meant to say.  Anyway, thanks for clarifying.

Regards,
Amit

Re: minor fix for acquire_inherited_sample_rows

2018-04-23 Thread Ashutosh Bapat

On Mon, Apr 23, 2018 at 8:46 PM, Alvaro Herrera  wrote:
> Hello Amit
>
> Amit Langote wrote:
>
>> acquire_inherited_sample_rows() currently uses equalTupleDescs() being
>> false as the condition for going to tupconv.c to determine whether tuple
>> conversion is needed.  But equalTupleDescs() will always return false if
>> it's passed TupleDesc's of two different tables, which is the most common
>> case here.  So I first thought we should just unconditionally go to
>> tupconv.c, but there is still one case where we don't need to, which is
>> the case where the child table is same as the parent table.  However, it
>> would be much cheaper to just check if the relation OIDs are different
>> instead of calling equalTupleDescs, which the attached patch teaches it to 
>> do.
>
> When (not if) we get around to updating equalTupleDescs to cope, we will
> need this call right where it is (and we'd have a hard time finding this
> potential callsite later, I think).  If this were a hot spot I would be
> happy to change it, but it's not so I'd rather leave it alone.
>
+1




-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: minor fix for acquire_inherited_sample_rows

2018-04-23 Thread Ashutosh Bapat

On Mon, Apr 23, 2018 at 6:45 PM, Amit Langote  wrote:
> Thanks for the review.
>
> On Mon, Apr 23, 2018 at 8:25 PM, Ashutosh Bapat
>  wrote:
>> On Mon, Apr 23, 2018 at 3:44 PM, Amit Langote
>>  wrote:
>>> Hi.
>>>
>>> acquire_inherited_sample_rows() currently uses equalTupleDescs() being
>>> false as the condition for going to tupconv.c to determine whether tuple
>>> conversion is needed.  But equalTupleDescs() will always return false if
>>> it's passed TupleDesc's of two different tables, which is the most common
>>> case here.  So I first thought we should just unconditionally go to
>>> tupconv.c, but there is still one case where we don't need to, which is
>>> the case where the child table is same as the parent table.  However, it
>>> would be much cheaper to just check if the relation OIDs are different
>>> instead of calling equalTupleDescs, which the attached patch teaches it to 
>>> do.
>>
>> We want to check whether tuple conversion is needed. Theoretically
>> (not necessarily practically), one could have tuple descs of two
>> different tables stamped with the same tdtypeid. From that POV,
>> equalTupleDescs seems to be a stronger check than what you have in the
>> patch.
>
> If I'm reading right, that sounds like a +1 to the patch. :)

Not really! To make things clear, I am not in favour of this patch.

>
>> BTW the patch you have posted also has a fix you proposed in some
>> other thread. Probably you want to get rid of it.
>
> Oops, fixed that in the attached.

Thanks.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

community bonding

2018-04-23 Thread Charles Cui

Hi PostgreSQL community and mentors:

Thanks for selecting my project as one of GSoC student projects! Pretty
exciting and honor to join the development for PostgreSQL (the best
database in the world :)). So for the first phase of this project
(community bonding), I am planning to go ahead to set up my developing
environment and familiar with the related code bases. And I have several
questions regarding this.
1. What ide or command line tools do you guys used most for PostgreSQL
development?
2. What version control do you use for postgres? Is github acceptable?
3. How do you look at code in PostgreSQL code base? do you have cross
function reference where I can search function and definition online?

Also let me know if you have requirements in this phase which are not
covered by my plan.

Thanks Charles.

Re: wal_consistency_checking reports an inconsistency on master branch

2018-04-23 Thread Michael Paquier

On Mon, Apr 23, 2018 at 07:58:30AM -0700, Andres Freund wrote:
> On 2018-04-23 13:22:21 +0300, Heikki Linnakangas wrote:
>> Why does HeapTupleHeaderSetMovedPartitions() leave the offset number
>> unchanged? The old offset number is meaningless without the block number.
>> Also, bits and magic values in the tuple header are scarce. We're
>> squandering a whole range of values in the ctid, everything with
>> ip_blkid==InvalidBlockNumber, to mean "moved to different partition", when a
>> single value would suffice.
> 
> Yes, I agree on that.

True that the spculative inserts and the partition move are handled in
inconsistent ways now.

>> I kept using InvalidBlockNumber there, so ItemPointerIsValid() still
>> considers those item pointers as invalid. But my gut feeling is actually
>> that it would be better to use e.g. 0 as the block number, so that these
>> item pointers would appear valid. Again, to follow the precedent of
>> speculative insertion tokens. But I'm not sure if there was some
>> well-thought-out reason to make them appear invalid. A comment on that would
>> be nice, at least.
> 
> That seems risky to me. We want something that stops EPQ style chasing
> without running into asserts for invalid offsets...

-/*
- * Special value used in t_ctid.ip_posid, to indicate that it holds a
- * speculative insertion token rather than a real TID.  This must be
-higher
- * than MaxOffsetNumber, so that it can be distinguished from a valid
- * offset number in a regular item pointer.
- */
-#define SpecTokenOffsetNumber  0xfff
Moving those definitions from htup_details.h to itemptr.h seems
confusing for me.  As those are heap-related operations, I would
recommend to keep them where they are, and also move those two ones
to htup_details.h, renaming them on the way so as they are more
-consistent:
- ItemPointerIndicatesMovedPartitions
- ItemPointerSetMovedPartitions
--
Michael


signature.asc
Description: PGP signature

Re: Searching for: Fast windows buildfarm animal

2018-04-23 Thread Andres Freund

On 2018-04-20 19:55:26 -0400, Stephen Frost wrote:
> Greetings,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > It's common that half the buildfarm has reported back before a single
> > windows buildfarm animal reports. And if they report a failure one often
> > has to wait for hours for the next run.
> 
> Yes, that's been rather annoying.
> 
> > It'd be awesome if somebody could set up a windows animal that runs
> > frequently (i.e. checks for build needed every minute or five) and is
> > fast enough to not take ages to finish a build.
> 
> Done.
> 
> 'dory' was approved today which checks for new commits every 5 minutes
> and, as an added bonus, builds with a somewhat more recent version of
> Visual Studio (2015) and a recent Windows Server version (2016) as
> compared to the other Windows systems.  While it's still not quite as
> fast as our faster buildfarm animals, it's at least able to report in
> within 20-30 minutes or so, as compared to hours.
> 
> We'll probably continue to tweak the system a bit, but it's at least up
> and running now for HEAD, REL_10 and REL9_6 (older branches don't work
> easily with VS2015, as discussed elsewhere).
> 
> If anyone has any specific questions about it or suggestions, please
> feel free to reach out.

Thanks!

Greetings,

Andres Freund

Re: Toast issues with OldestXmin going backwards

2018-04-23 Thread Amit Kapila

On Mon, Apr 23, 2018 at 1:34 PM, Andrew Gierth
 wrote:
>> "Amit" == Amit Kapila  writes:
>
>  >> Your patch would actually be needed if (and only if) autovacuum was
>  >> changed back to its old behavior of never vacuuming toast tables
>  >> independently, and if manual VACUUM pg_toast.*; was disabled. But in
>  >> the presence of either of those two possibilities, it does nothing
>  >> useful.
>
>  Amit> Yeah, right, I have missed the point that they can be vacuumed
>  Amit> separately, however, I think that decision is somewhat
>  Amit> questionable.
>
> Some previous discussion links for reference, for the background to the
> thread containing the patch:
>
> https://www.postgresql.org/message-id/flat/87y7gpiqx3.fsf%40oxford.xeocode.com
> https://www.postgresql.org/message-id/flat/20080608230348.GD11028%40alvh.no-ip.org
>

If I read correctly, it seems one of the main reason [1] is to save
the extra pass over the heap and improve the code.  Now, ideally, the
extra pass over heap should also free up some space (occupied by the
rows that contains old toast pointers corresponding to which we are
going to remove rows from toast table), but it is quite possible that
it is already clean due to a previous separate vacuum pass over the
heap.  I think it is good to save extra pass over heap which might not
be as useful as we expect, but that can cost us correctness issues in
boundary cases as in the case being discussed in this thread.

>  Amit> I think it would have been better if along with decoupling of
>  Amit> vacuum for main heap and toast tables, we would have come up with
>  Amit> a way to selectively remove the corresponding rows from the main
>  Amit> heap, say by just vacuuming heap pages/rows which have toast
>  Amit> pointers but maybe that is not viable or involves much more work
>  Amit> without equivalent benefit.
>
> It should be fairly obvious why this is unworkable - most toast-using
> tables will have toast pointers on every page, but without making a
> whole new index of toast pointer OIDs (unacceptable overhead), it would
> be impossible to find the toast pointers pointing to a specific item
> without searching the whole rel (in which case we might just as well
> have vacuumed it).
>

Okay, such an optimization might not be much effective and it would
anyway lead to extra pass over the heap, however, that will resolve
the correctness issue.   Now, I understand that it is not advisable to
go back to the previous behavior for performance concerns, but I think
it would be better if we find a bullet-proof way to fix this symptom,
rather than just fixing the issue reported in this thread.

[1] - From one of the email: "We go certain lengths in autovacuum to
make sure tables are vacuumed when their toast table needs vacuuming
and the main table does not, which is all quite kludgy. So we already
look at their stats and make decisions about them. But what we do
after that is force a vacuum to the main table, even if that one does
not need any vacuuming, which is dumb."

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-04-23 Thread David Gould

On Mon, 23 Apr 2018 20:09:21 -0500
Justin Pryzby  wrote:

> Just want to add for the archive that I happened to run across what appears to
> be a 7-year old report of (I think) both of these vacuum/analyze bugs:
> 
> Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.   
>   
>
> Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means  
>   
>
> 
> https://www.postgresql.org/message-id/flat/6373488.29.1323717222303.JavaMail.root%40store1.zcs.ext.wpsrv.net#6373488.29.1323717222303.javamail.r...@store1.zcs.ext.wpsrv.net

Nice find. It does look like both, although, since the info is not in the
thread it is not certain that relpages was large enough to hit the analyze
issue. Unless that was with the old default statistics target in which case
it would be pretty easy to hit.

-dg


-- 
David Gould   da...@sonic.net
If simplicity worked, the world would be overrun with insects.

Re: Reopen logfile on SIGHUP

2018-04-23 Thread Kyotaro HORIGUCHI

At Fri, 20 Apr 2018 21:51:49 +0300, Alexander Kuzmenkov 
 wrote in 

> On 04/16/2018 05:54 AM, Kyotaro HORIGUCHI wrote:
> > We can provide a new command "pg_ctl logrotate" to hide the
> > details. (It cannot be executed by root, though.)
> 
> I like this approach.

Thanks. I found that the SIGUSR1 path is already ignoring
rotation_disabled so we don't need bother with the flag.

> I looked at the patch and changed some things:
> - cleaned up the error messages

Thanks for cleaning up crude copy'n pastes and wording.

> - moved checkLogrotateSignal to postmaster.c, since it has no reason to
> - be in xlog.c

Agreed. But 

there seem to be no convention that static function starts with a
lower case letter. checkControlFile initMasks are minor instances
but..

> - added some documentation I had from my older patch
> 
> Other than that, it looks good to me. The updated patch is attached.

Thanks for the documentation, but I see a description for the
same objective and different measure just above there.

https://www.postgresql.org/docs/devel/static/logfile-maintenance.html

> Alternatively, you might prefer to use an external log rotation
> program if you have one that you are already using with other
...
> rotation, the logrotate program can be configured to work with
> log files from syslog.

It seems that the additional description needs to be meld into
this at the first place? And some caveat may be needed on failure
cases. And in the attached the comment for "if
(rotateion_requested)" is edited.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml
index c0bf632cd4..068aec7b47 100644
--- a/doc/src/sgml/maintenance.sgml
+++ b/doc/src/sgml/maintenance.sgml
@@ -930,10 +930,30 @@ analyze threshold = analyze base threshold + analyze scale factor * number of tu
   
Alternatively, you might prefer to use an external log rotation
program if you have one that you are already using with other
-   server software. For example, the rotatelogs
-   tool included in the Apache distribution
-   can be used with PostgreSQL.  To do this,
-   just pipe the server's
+
+   server software. For example, logrotate can
+   also be used to collect log files produced by the built-in logging
+   collector. In this configuration, the location and names of the log files
+   are determined by logging collector,
+   and logrotate periodically archives these files.
+   There must be some way for logrotate to inform
+   the application that the log file it is using is going to be archived, and
+   that it must direct further output to a new file. This is commonly done
+   with a postrotate script that sends
+   a SIGHUP signal to the application, which then reopens
+   the log file. In PostgreSQL, this is achieved
+   using a
+   logrotate command
+   of pg_ctl. When the server receives this
+   command, it switches to the new log files according to its configuration
+   (see ). If it is configured
+   to use a static file name, it will just reopen the file.
+  
+  
+   For another example, logrotate tool included in
+   the Apache distribution can also be used
+   with PostgreSQL.  To do this, just pipe the
+   server's
stderr output to the desired program.
If you start the server with
pg_ctl, then stderr
@@ -958,23 +978,6 @@ pg_ctl start | rotatelogs /var/log/pgsql_log 86400
configured to work with log files from
syslog.
   
-
-  
-   An external log rotation tool, such as logrotate, can also 
-   be used to collect log files produced by the built-in logging collector. In this 
-   configuration, the location and names of the log files are determined by logging 
-   collector, and logrotate periodically archives these files.
-   There must be some way for logrotate to inform the application
-   that the log file it is using is going to be archived, and that it must direct further 
-   output to a new file. This is commonly done with a postrotate script
-   that sends a SIGHUP signal to the application, which then reopens the 
-   log file. In PostgreSQL, this is achieved using a 
-   logrotate command of pg_ctl. When the
-   server receives this command, it switches to the new log files according to its 
-   configuration (see ). If it is configured
-   to use a static file name, it will just reopen the file.
-  
-
   
On many systems, however, syslog is not very reliable,
particularly with large log messages; it might truncate or drop messages
diff --git a/src/backend/postmaster/syslogger.c b/src/backend/postmaster/syslogger.c
index 58b759f305..41f036faad 100644
--- a/src/backend/postmaster/syslogger.c
+++ b/src/backend/postmaster/syslogger.c
@@ -388,7 +388,7 @@ SysLoggerMain(int argc, char *argv[])
 		{
 			/*
 			 * Force rotation when both values are zero. It means the request
-			 * was sent by pg_rotate_logfile.
+			 * was sent

Re: Should we add GUCs to allow partition pruning to be disabled?

2018-04-23 Thread Amit Langote

On 2018/04/24 6:10, Alvaro Herrera wrote:
>> BTW, while we're at it, would it also be a good idea to consider the patch
>> you had proposed, which I then posted an updated version of, to adjust the
>> documentation in ddl.sgml (in the section 5.10. Table Partitioning)
>> regarding the relationship between constraint exclusion and declarative
>> partitioning?
> 
> I looked at this one.  That patch has two hunks.  I applied a change
> where the first hunk is, to replace constraint_exclusion with the new
> GUC -- seemed easy enough.

Looks good.

> However, the second hunk is on "5.10.4.
> Partitioning and Constraint Exclusion" which needs major editing.

Reading 5.10.4 again, I tend to agree with this.

> Not really sure how best to handle that one.  For starters, I think it need
> to stop mentioning the GUC name in the title;

Hmm, "Constraint Exclusion" that's used in the title is a concept, not a
GUC, although pretty close.

> maybe rename it to
> "Partition Pruning" or some such, and then in the text explain that
> sometimes the enable_partition_pruning param is used in one case and
> constraint_exclusion in the other, and approximately what effects they
> have.  I don't think it's worth going into too much detail on exactly
> how they differ, but then I'm not 100% sure of that either.

Just a thought -- How about making 5.10.4 cover partitioning based
optimizations in general?  I see that a number of partitioning-based
optimizations have been developed in this release cycle, but I only see
various enable_partition* GUCs listed in config.sgml and not much else.
Although the config.sgml coverage of the new capabilities seems pretty
good, some may find their being mentioned in 5.10 Table Partitioning
helpful.  Or if we don't want to hijack 5.10.4, maybe create a 5.10.5.

Thanks,
Amit

Re: [HACKERS] Runtime Partition Pruning

2018-04-23 Thread Alvaro Herrera

Anybody wanna argue against pushing this patch now?  I'm inclined to
push it on the grounds of being closure for already committed work, but
there are possible arguments about this being new development after
feature freeze.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Built-in connection pooling

2018-04-23 Thread Bruce Momjian

On Mon, Apr 23, 2018 at 09:53:37PM -0400, Bruce Momjian wrote:
> On Mon, Apr 23, 2018 at 09:47:07PM -0400, Robert Haas wrote:
> > On Mon, Apr 23, 2018 at 7:59 PM, Bruce Momjian  wrote:
> > > So, instead of trying to multiplex multiple sessions in a single
> > > operating system process, why don't we try to reduce the overhead of
> > > idle sessions that each have an operating system process?  We already
> > > use procArray to reduce the number of _assigned_ PGPROC entries we have
> > > to scan.  Why can't we create another array that only contains _active_
> > > sessions, i.e. those not in a transaction.  In what places can procArray
> > > scans be changed to use this new array?
> > 
> > There are lots of places where scans would benefit, but the cost of
> > maintaining the new array would be very high in some workloads, so I
> > don't think you'd come out ahead overall.  Feel free to code it up and
> > test it, though.
> 
> Well, it would be nice if we new exactly which scans are slow for a
> large number of idle sessions, and then we could determine what criteria
> for that array would be beneficial --- that seems like the easiest place
> to start.

I guess my point is if we are looking at trying to store all the session
state in shared memory, so any process can resume it, we might as well
see if we can find a way to more cheaply store the state in an idle
process.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Built-in connection pooling

2018-04-23 Thread Bruce Momjian

On Mon, Apr 23, 2018 at 09:47:07PM -0400, Robert Haas wrote:
> On Mon, Apr 23, 2018 at 7:59 PM, Bruce Momjian  wrote:
> > So, instead of trying to multiplex multiple sessions in a single
> > operating system process, why don't we try to reduce the overhead of
> > idle sessions that each have an operating system process?  We already
> > use procArray to reduce the number of _assigned_ PGPROC entries we have
> > to scan.  Why can't we create another array that only contains _active_
> > sessions, i.e. those not in a transaction.  In what places can procArray
> > scans be changed to use this new array?
> 
> There are lots of places where scans would benefit, but the cost of
> maintaining the new array would be very high in some workloads, so I
> don't think you'd come out ahead overall.  Feel free to code it up and
> test it, though.

Well, it would be nice if we new exactly which scans are slow for a
large number of idle sessions, and then we could determine what criteria
for that array would be beneficial --- that seems like the easiest place
to start.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Boolean partitions syntax

2018-04-23 Thread Kyotaro HORIGUCHI

At Tue, 24 Apr 2018 09:23:03 +0900, Amit Langote 
 wrote in 

> On 2018/04/24 4:33, Tom Lane wrote:
> > Amit Langote  writes:
> >> On 2018/04/22 2:29, Tom Lane wrote:
> >>> I propose the attached slightly-less-invasive version of Amit's original
> >>> patch as what we should do in v10 and v11, and push the patch currently
> >>> under discussion out to v12.

Agreed that it is too-much for v10. (I hoped that something more
might be introduced to v11:p)

> >> Here too.
> > 
> > Pushed.  It occurred to me at the last moment that we could partially
> > address one of my original concerns about this hack by converting TRUE
> > and FALSE to strings 'true' and 'false' not just 't' and 'f'.  Those
> > will be accepted by boolin just as well, and doing it like that slightly
> > reduces the astonishment factor if somebody writes TRUE for, say, a
> > text column's partbound.  I'd still prefer that we throw an error for
> > such a case, but that'll have to wait for v12.
> 
> Thanks for making that change and committing.

+1, and thank you for discussing. I'll add this item to the next
CF.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Built-in connection pooling

2018-04-23 Thread Robert Haas

On Mon, Apr 23, 2018 at 7:59 PM, Bruce Momjian  wrote:
> So, instead of trying to multiplex multiple sessions in a single
> operating system process, why don't we try to reduce the overhead of
> idle sessions that each have an operating system process?  We already
> use procArray to reduce the number of _assigned_ PGPROC entries we have
> to scan.  Why can't we create another array that only contains _active_
> sessions, i.e. those not in a transaction.  In what places can procArray
> scans be changed to use this new array?

There are lots of places where scans would benefit, but the cost of
maintaining the new array would be very high in some workloads, so I
don't think you'd come out ahead overall.  Feel free to code it up and
test it, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Problem while setting the fpw with SIGHUP

2018-04-23 Thread Kyotaro HORIGUCHI

At Tue, 24 Apr 2018 08:52:17 +0900, Michael Paquier  wrote 
in <20180423235217.gb1...@paquier.xyz>
> On Mon, Apr 23, 2018 at 12:21:04PM -0400, Robert Haas wrote:
> > Fine, but that doesn't answer the question of whether we actually need
> > to or should change the behavior in the first place.
> 
> Per the last arguments that would be "No, we don't want to change it as
> it would surprise some users":
> https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz

The answer is that the change of behavior is not required to fix
the bug. So I'm fine with applying only (0001 and) 0002 here.

https://www.postgresql.org/message-id/20180420010402.gf2...@paquier.xyz

One reason that I introduced the "restriction" was that it was
useful to avoid concurrent udpate of the flag. It is now avoided
in another way.

Just my opinion on the behavior follows.

I don't think anyone confirms that FPI come back after switching
full_page_writes (one of the reason is it needs pg_waldump).
pg_start/stop_backup() on standby says that "You need to turn on
FPW, then do checkpoint". It suggests that FPI's don't work
before the next checkpoint. If we keep the current behavior, the
documentation might need an additional phrase something like "FPW
ensures that data is protected from partial writes after the next
chackpoint starts". On the other hand honestly I don't have
confidence that the WAL reduction worth the additional complexity
by 0003.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Corrupted btree index on HEAD because of covering indexes

2018-04-23 Thread Peter Geoghegan

On Sat, Apr 21, 2018 at 6:02 PM, Peter Geoghegan  wrote:
> I refined the amcheck enhancement quite a bit today. It will not just
> check that a downlink is not missing; It will also confirm that it
> wasn't a legitimately interrupted multi-level deletion, by descending
> to the leaf page to match the leaf high key pointer to the top most
> parent, which should be the target page (the page that lacked a
> downlink according to the new Bloom filter). We need to worry about
> multi-level deletions that are interrupted by an error or a hard
> crash, which presents a legitimate case where there'll be no downlink
> for an internal page in its parent. VACUUM is okay with that, so we
> must be too.

Attached patch lets amcheck detect the issue when
bt_index_parent_check() is called, though only when heapallindexed
verification was requested (that is, only when bt_index_parent_check()
is doing something with a Bloom filter already). The new checks will
probably also detect any possible issue with multi-level page
deletions. The patch tightens up our general expectations around
half-dead and fully deleted pages, which seems necessary but also
independently useful.

I'm using work_mem to constrain the size of the second Bloom filter,
whereas the heapallindexed Bloom filter is constrained by
maintenance_work_mem. This seems fine to me, since we have always used
an additional work_mem budget for spool2 when building a unique index
within nbtsort.c. Besides, it will probably be very common for the
downlink Bloom filter to be less than 1% the size of the first Bloom
filter when we have adequate memory for both Bloom filters (i.e. very
small). I thought about mentioning this work_mem allocation in the
docs, but decided that there was no need, since the CREATE INDEX
spool2 work_mem stuff isn't documented anywhere either.

Note that the "c.relkind = 'i'" change in the docs is about not
breaking the amcheck query when local partitioned indexes happen to be
in use (when the user changed the sample SQL query to not just look at
pg_catalog indexes). See the "Local partitioned indexes and
pageinspect" thread I just started for full details.

The new P_ISDELETED() test within bt_downlink_missing_check() is what
actually detects the corruption that the test case causes, since the
fully deleted leaf page still has a sane top parent block number left
behind (that is, we don't get as far as testing "if
(BTreeTupleGetTopParent(itup) == state->targetblock)"; that's not how
the the leaf page can get corrupt in the test case that Michael
posted). Note that there are also two new similar P_ISDELETED() tests
added to two existing functions (bt_downlink_check() and
bt_check_level_from_leftmost()), but those tests won't detect the
corruption that we saw. They're really there to nail down how we think
about fully deleted pages.

-- 
Peter Geoghegan
From 75cd75e0b9e0d2d0c98a1222ee01208f83bae8ce Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Thu, 19 Apr 2018 17:45:26 -0700
Subject: [PATCH] Add missing and dangling downlink checks to amcheck.

When bt_index_parent_check() is called with the heapallindexed option,
allocate a second Bloom filter to fingerprint block numbers that appear
in the downlinks of internal pages.  Use Bloom filter probes when
walking the B-Tree to detect missing downlinks.  This can detect subtle
problems with page deletion/VACUUM, such as corruption caused by the bug
just fixed in commit 6db4b499.

The downlink Bloom filter is bound in size by work_mem.  Its optimal
size is typically far smaller than that of the regular heapallindexed
Bloom filter, especially when the index has high fan-out.

Author: Peter Geoghegan
Discussion: https://postgr.es/m/cah2-wznuzy4fwtjm1tbb3jpvz8ccfz7k_qvp5bhupyhivmw...@mail.gmail.com
---
 contrib/amcheck/verify_nbtree.c | 397 ++--
 doc/src/sgml/amcheck.sgml   |   7 +-
 2 files changed, 385 insertions(+), 19 deletions(-)

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index 1a605f9..9449529 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -91,6 +91,10 @@ typedef struct BtreeCheckState

 	/* Bloom filter fingerprints B-Tree index */
 	bloom_filter *filter;
+	/* Bloom filter fingerprints downlink blocks within tree */
+	bloom_filter *downlinkfilter;
+	/* Right half of incomplete split marker */
+	bool		rightsplit;
 	/* Debug counter */
 	int64		heaptuplespresent;
 } BtreeCheckState;
@@ -124,6 +128,7 @@ static void bt_target_page_check(BtreeCheckState *state);
 static ScanKey bt_right_page_check_scankey(BtreeCheckState *state);
 static void bt_downlink_check(BtreeCheckState *state, BlockNumber childblock,
   ScanKey targetkey);
+static void bt_downlink_missing_check(BtreeCheckState *state);
 static void bt_tuple_present_callback(Relation index, HeapTuple htup,
 		  Datum *values, bool *isnull,
 		  bool tupleIsAlive, void *checkstate);
@@

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

2018-04-23 Thread Justin Pryzby

Just want to add for the archive that I happened to run across what appears to
be a 7-year old report of (I think) both of these vacuum/analyze bugs:

Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate. 

   
Re: [HACKERS] VACUUM and ANALYZE disagreeing on what reltuples means

   

https://www.postgresql.org/message-id/flat/6373488.29.1323717222303.JavaMail.root%40store1.zcs.ext.wpsrv.net#6373488.29.1323717222303.javamail.r...@store1.zcs.ext.wpsrv.net

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-23 Thread Bruce Momjian

On Mon, Apr 23, 2018 at 01:14:48PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2018-03-28 10:23:46 +0800, Craig Ringer wrote:
> > TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at
> > least on Linux. When fsync() returns success it means "all writes since the
> > last fsync have hit disk" but we assume it means "all writes since the last
> > SUCCESSFUL fsync have hit disk".
> 
> > But then we retried the checkpoint, which retried the fsync(). The retry
> > succeeded, because the prior fsync() *cleared the AS_EIO bad page flag*.
> 
> Random other thing we should look at: Some filesystems (nfs yes, xfs
> ext4 no) flush writes at close(2). We check close() return code, just
> log it... So close() counts as an fsync for such filesystems().

Well, that's interesting.  You might remember that NFS does not reserve
space for writes like local file systems like ext4/xfs do.  For that
reason, we might be able to capture the out-of-space error on close and
exit sooner for NFS.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Built-in connection pooling

2018-04-23 Thread Bruce Momjian

On Fri, Apr 20, 2018 at 11:40:59AM +0300, Konstantin Knizhnik wrote:
> 
> Sorry, may we do not understand each other.
> There are the following facts:
> 1. There are some entities in Postgres which are local to a backend:
> temporary tables, GUCs, prepared statement, relation and catalog caches,...
> 2. Postgres doesn't "like"  larger number of backends. Even only few of them
> are actually active. Large number of backends means large procarray, large
> snapshots,...
> Please refere to my measurement at the beginning of this thread which
> illustrate how performance of Pastgres degrades with increasing number of
> backends.

So, instead of trying to multiplex multiple sessions in a single
operating system process, why don't we try to reduce the overhead of
idle sessions that each have an operating system process?  We already
use procArray to reduce the number of _assigned_ PGPROC entries we have
to scan.  Why can't we create another array that only contains _active_
sessions, i.e. those not in a transaction.  In what places can procArray
scans be changed to use this new array?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: perltidy version

2018-04-23 Thread Michael Paquier

On Mon, Apr 23, 2018 at 12:40:00PM -0400, Tom Lane wrote:
> Alvaro Herrera  writes:
>> I still vote we use 20170521 which is recent enough that we won't have
>> to change it in a few years.

That's the version available on Debian sid, so from the prospective of
any Debian user this is a handy version to use when sending patches:
perltidy/unstable,now 20170521-1 all [installed]

>> I further vote that we change the URL in pgindent/README from
>> sourceforge to metacpan.org,
>> https://metacpan.org/pod/distribution/Perl-Tidy/lib/Perl/Tidy.pod
> 
> I have no particular opinion on which version to use, but if we're going
> to change for this cycle, it's time to do so.  I'm intending to do the
> initial pgindent run pretty soon ...

I would vote for using a newer version with v11.
--
Michael


signature.asc
Description: PGP signature

Re: Make description of heap records more talkative for flags

2018-04-23 Thread Michael Paquier

On Mon, Apr 23, 2018 at 01:02:16PM -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2018-04-23 12:37:20 -0300, Alvaro Herrera wrote:
> > > Me too.  Should we consider this for pg11?  My vote is yes, with no
> > > backpatch (might break scripts reading pg_{wal,xlog}dump output.
> > > Though, really!?).

Breaking the output format is exactly one reason to not patch v10 and
older versions.

> > It's low risk enough, but why should we try to get it into v11? Flags
> > been around for years now.
> 
> I was thinking the newly added flags would make for possibly more
> confusing scenarios, but now that I look again, the new ones are XLHL
> flags not the XLOG flags being handled in this patch.
> 
> Now, frankly, this being mostly a debugging tool, I think it would be
> better to have the output as complete as we can.  Otherwise, when
> debugging some hideous problem we find ourselves patching the tool
> during an emergency in order to figure out what is going on.  For this,
> I would rather patch the earliest version we conveniently can.  We
> cannot patch pg10 because it's already out there, but not so for pg11.

OK, that works for me.  Attached is a patch using hex values for the
flags of all those records (the comment could be fixed further down but
let's not bother about it).  There are a couple of things I noticed on
the way:
1) xl_heap_lock prints flags, but not in hexa but using %u.
2) xl_heap_visible prints flags, not in hexa but using %d.
3) xl_heap_lock_updated prints flags, not in hexa but using %u.
4) xl_hash_split_allocate_page has flags, those are handled as directly
text (see hashdesc.c).
5) Most GIN records also have flags, which are handled.
6) xl_smgr_truncate as records, which are not showed up yet.
So I have hacked the patch so as all flags show up in hexa format,
except those which are already handled and print text.
--
Michael
diff --git a/src/backend/access/rmgrdesc/heapdesc.c b/src/backend/access/rmgrdesc/heapdesc.c
index 318a281d7f..f63b8f844d 100644
--- a/src/backend/access/rmgrdesc/heapdesc.c
+++ b/src/backend/access/rmgrdesc/heapdesc.c
@@ -42,22 +42,26 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_insert *xlrec = (xl_heap_insert *) rec;
 
-		appendStringInfo(buf, "off %u", xlrec->offnum);
+		appendStringInfo(buf, "off %u flags %02X", xlrec->offnum,
+		 xlrec->flags);
 	}
 	else if (info == XLOG_HEAP_DELETE)
 	{
 		xl_heap_delete *xlrec = (xl_heap_delete *) rec;
 
-		appendStringInfo(buf, "off %u ", xlrec->offnum);
+		appendStringInfo(buf, "off %u flags %02X ",
+		 xlrec->offnum,
+		 xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
 	else if (info == XLOG_HEAP_UPDATE)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u ",
+		appendStringInfo(buf, "off %u xmax %u flags %02X ",
 		 xlrec->old_offnum,
-		 xlrec->old_xmax);
+		 xlrec->old_xmax,
+		 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
 		appendStringInfo(buf, "; new off %u xmax %u",
 		 xlrec->new_offnum,
@@ -67,9 +71,10 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_update *xlrec = (xl_heap_update *) rec;
 
-		appendStringInfo(buf, "off %u xmax %u ",
+		appendStringInfo(buf, "off %u xmax %u flags %02X ",
 		 xlrec->old_offnum,
-		 xlrec->old_xmax);
+		 xlrec->old_xmax,
+		 xlrec->flags);
 		out_infobits(buf, xlrec->old_infobits_set);
 		appendStringInfo(buf, "; new off %u xmax %u",
 		 xlrec->new_offnum,
@@ -98,7 +103,7 @@ heap_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_lock *xlrec = (xl_heap_lock *) rec;
 
-		appendStringInfo(buf, "off %u: xid %u: flags %u ",
+		appendStringInfo(buf, "off %u: xid %u: flags %02X ",
 		 xlrec->offnum, xlrec->locking_xid, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
@@ -139,20 +144,21 @@ heap2_desc(StringInfo buf, XLogReaderState *record)
 	{
 		xl_heap_visible *xlrec = (xl_heap_visible *) rec;
 
-		appendStringInfo(buf, "cutoff xid %u flags %d",
+		appendStringInfo(buf, "cutoff xid %u flags %02X",
 		 xlrec->cutoff_xid, xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_MULTI_INSERT)
 	{
 		xl_heap_multi_insert *xlrec = (xl_heap_multi_insert *) rec;
 
-		appendStringInfo(buf, "%d tuples", xlrec->ntuples);
+		appendStringInfo(buf, "%d tuples flags %02X", xlrec->ntuples,
+		 xlrec->flags);
 	}
 	else if (info == XLOG_HEAP2_LOCK_UPDATED)
 	{
 		xl_heap_lock_updated *xlrec = (xl_heap_lock_updated *) rec;
 
-		appendStringInfo(buf, "off %u: xmax %u: flags %u ",
+		appendStringInfo(buf, "off %u: xmax %u: flags %02X ",
 		 xlrec->offnum, xlrec->xmax, xlrec->flags);
 		out_infobits(buf, xlrec->infobits_set);
 	}
diff --git a/src/backend/access/rmgrdesc/smgrdesc.c b/src/backend/access/rmgrdesc/smgrdesc.c
index df1ad38b5a..604604ac04 100644
--- a/src/backend/access/rmgrdesc/smgrdesc.c
+++ b/src/backend/access/rmgrdesc/smgrdesc.c
@@ -36,7 +36,7 @@ smgr_desc(StringInfo buf,

Re: "could not reattach to shared memory" on buildfarm member dory

2018-04-23 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> So far, dory has failed three times with essentially identical symptoms:
> 
> 2018-04-23 19:57:10.624 GMT [2240] FATAL:  could not reattach to shared 
> memory (key=0190, addr=018E): error code 487
> 2018-04-23 15:57:10.657 EDT [8836] ERROR:  lost connection to parallel worker
> 2018-04-23 15:57:10.657 EDT [8836] STATEMENT:  select count(*) from tenk1 
> group by twenty;
> 2018-04-23 15:57:10.660 EDT [3820] LOG:  background worker "parallel worker" 
> (PID 2240) exited with exit code 1
> 
> Now how can this be?  We've successfully reserved and released the address
> range we want to use, so it *should* be free at the instant we try to map.

Yeah, that's definitely interesting.

> I guess the good news is that we're seeing this in a reasonably
> reproducible fashion, so there's some hope of digging down to find
> out the actual cause.

I've asked Heath to take a look at the system again and see if there's
any Windows logs or such that might help us understand what's happening.
AV was disabled on the box, so don't think it's that, at least.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: Boolean partitions syntax

2018-04-23 Thread Jonathan S. Katz


> On Apr 23, 2018, at 12:33 PM, Tom Lane  wrote:
> 
> Amit Langote  writes:
>> On 2018/04/22 2:29, Tom Lane wrote:
>>> I propose the attached slightly-less-invasive version of Amit's original
>>> patch as what we should do in v10 and v11, and push the patch currently
>>> under discussion out to v12.
> 
>> Here too.
> 
> Pushed.  It occurred to me at the last moment that we could partially
> address one of my original concerns about this hack by converting TRUE
> and FALSE to strings 'true' and 'false' not just 't' and 'f’.

I tried  the earlier test case I presented against HEAD and it worked
like a charm. Thank you!

Jonathan

Re: [HACKERS] Clock with Adaptive Replacement

2018-04-23 Thread Thomas Munro

On Fri, Feb 12, 2016 at 10:02 AM, Konstantin Knizhnik
 wrote:
> Hi hackers,
>
> What do you think about improving cache replacement clock-sweep algorithm in
> PostgreSQL with adaptive version proposed in this article:
>
> http://www-cs.stanford.edu/~sbansal/pubs/fast04.pdf
>
> Are there some well known drawbacks of this approach or it will be
> interesting to adopt this algorithm to PostgreSQL and measure it impact om
> performance under different workloads?

I'm not currently planning to work in this area and have done no real
investigation, so please consider the following to be "water cooler
talk".

I also came across that paper while reading about buffering as general
background.  Yeah, CAR[T] looks pretty interesting at first glance.
Automatic scan resistance seems like something we'd want, its approach
to frequency sounds smarter than GCLOCK's usage counter with arbitrary
parameter 5.  Here's a nice slide deck from the authors:

http://www.cse.iitd.ernet.in/~sbansal/pubs/fast04ppt.pdf

There doesn't seem to be as much written about GCLOCK beyond the 1992
TPC-A paper[1], possibly because as the CAR paper says "[a]
fundamental disadvantage of GCLOCK is that it requires counter
increment on every page hit which makes it infeasible for virtual
memory", making it less interesting to the OS guys.  That is, we
actually have slightly more information, because we have an
opportunity to bump the usage counter when pinning and the VM guys
don't, probably explaining why the paper compares with CLOCK and not
GCLOCK.

One question is whether SPC-1 looks anything like a typical database
workload.  Say someone wrote a prototype CAR[T] patch for PostgreSQL,
how would you validate it?

I'm also curious about how the algorithms at different levels interact
when using buffered IO.  While other databases typically do direct IO,
you can still find a trail of papers about this topic since disk
arrays and network-attached storage systems also have caches and page
replacement problems[2][3], and of course multi-level caching problems
apply to non-database applications too.  I wonder to what extent
Linux's use of "a mash of a number of different algorithms with a
number of modifications for catching corner cases and various
optimisations"[4] affects the performance of different clock-based
algorithms operating higher up.

If we had a page replacement algorithm with CART's magical claimed
properties and it really does perform better than GCLOCK + our scan
resistance special cases, I wonder if it would be tempting to stick
SLRU contents into shared buffers.  slru.c could gain a
smgr-compatible interface so that bufmgr.c could treat those buffers
like any other, using smgr_which to select the appropriate storage
manager.  (I'm going to be proposing something similar for undo log
storage, more on that later.)

[1] 
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.452.9699=rep1=pdf
[2] https://www.usenix.org/legacy/event/usenix01/full_papers/zhou/zhou.pdf
[3] https://www.usenix.org/legacy/event/usenix02/full_papers/wong/wong_html/
[4] http://www.spinics.net/lists/linux-mm/msg13385.html

-- 
Thomas Munro
http://www.enterprisedb.com

Re: bms_prev_member won't work correctly if bitmapword is 64-bits

2018-04-23 Thread David Rowley

On 24 April 2018 at 03:00, Teodor Sigaev  wrote:
> Thank you, pushed

Thanks!

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Should we add GUCs to allow partition pruning to be disabled?

2018-04-23 Thread Alvaro Herrera

Hi,

I just pushed David patch, with some pretty minor changes.  I hope not
to have broken anything.

Amit Langote wrote:

> Your proposed changes to inheritance_planner() look fine to me. In the
> comment added by the patch in relation_excluded_by_constraints():
> 
> + /*
> +  * When constraint_exclusion is set to 'partition' we only handle
> +  * OTHER_MEMBER_RELs, or BASERELs in cases where the result target is an
> +  * inheritance parent or a partitioned table.
> +  */
> 
> Just to clarify this a bit, would it be a good idea to be specific by
> appending " (see inheritance_planner() where this is determined)" or some
> such to this sentence?

I didn't think that was really required.

> BTW, while we're at it, would it also be a good idea to consider the patch
> you had proposed, which I then posted an updated version of, to adjust the
> documentation in ddl.sgml (in the section 5.10. Table Partitioning)
> regarding the relationship between constraint exclusion and declarative
> partitioning?

I looked at this one.  That patch has two hunks.  I applied a change
where the first hunk is, to replace constraint_exclusion with the new
GUC -- seemed easy enough.  However, the second hunk is on "5.10.4.
Partitioning and Constraint Exclusion" which needs major editing.  Not
really sure how best to handle that one.  For starters, I think it need
to stop mentioning the GUC name in the title; maybe rename it to
"Partition Pruning" or some such, and then in the text explain that
sometimes the enable_partition_pruning param is used in one case and
constraint_exclusion in the other, and approximately what effects they
have.  I don't think it's worth going into too much detail on exactly
how they differ, but then I'm not 100% sure of that either.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

"could not reattach to shared memory" on buildfarm member dory

2018-04-23 Thread Tom Lane

So far, dory has failed three times with essentially identical symptoms:

2018-04-23 19:57:10.624 GMT [2240] FATAL:  could not reattach to shared memory 
(key=0190, addr=018E): error code 487
2018-04-23 15:57:10.657 EDT [8836] ERROR:  lost connection to parallel worker
2018-04-23 15:57:10.657 EDT [8836] STATEMENT:  select count(*) from tenk1 group 
by twenty;
2018-04-23 15:57:10.660 EDT [3820] LOG:  background worker "parallel worker" 
(PID 2240) exited with exit code 1

Now how can this be?  We've successfully reserved and released the address
range we want to use, so it *should* be free at the instant we try to map.

Another thing that seems curious, though it may just be an artifact of
not having many data points yet, is that these failures all occurred
during pg_upgradecheck.  You'd think the "check" and "install-check"
steps would be equally vulnerable to the failure.

I guess the good news is that we're seeing this in a reasonably
reproducible fashion, so there's some hope of digging down to find
out the actual cause.

regards, tom lane

Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS

2018-04-23 Thread Andres Freund

Hi,

On 2018-03-28 10:23:46 +0800, Craig Ringer wrote:
> TL;DR: Pg should PANIC on fsync() EIO return. Retrying fsync() is not OK at
> least on Linux. When fsync() returns success it means "all writes since the
> last fsync have hit disk" but we assume it means "all writes since the last
> SUCCESSFUL fsync have hit disk".

> But then we retried the checkpoint, which retried the fsync(). The retry
> succeeded, because the prior fsync() *cleared the AS_EIO bad page flag*.

Random other thing we should look at: Some filesystems (nfs yes, xfs
ext4 no) flush writes at close(2). We check close() return code, just
log it... So close() counts as an fsync for such filesystems().

I'm LSF/MM to discuss future behaviour of linux here, but that's how it
is right now.

Greetings,

Andres Freund

Re: Boolean partitions syntax

2018-04-23 Thread Tom Lane

Amit Langote  writes:
> On 2018/04/22 2:29, Tom Lane wrote:
>> I propose the attached slightly-less-invasive version of Amit's original
>> patch as what we should do in v10 and v11, and push the patch currently
>> under discussion out to v12.

> Here too.

Pushed.  It occurred to me at the last moment that we could partially
address one of my original concerns about this hack by converting TRUE
and FALSE to strings 'true' and 'false' not just 't' and 'f'.  Those
will be accepted by boolin just as well, and doing it like that slightly
reduces the astonishment factor if somebody writes TRUE for, say, a
text column's partbound.  I'd still prefer that we throw an error for
such a case, but that'll have to wait for v12.

regards, tom lane

Re: Built-in connection pooling

2018-04-23 Thread Robert Haas

On Fri, Jan 19, 2018 at 11:59 AM, Tomas Vondra
 wrote:
> Hmmm, that's unfortunate. I guess you'll have process the startup packet
> in the main process, before it gets forked. At least partially.

I'm not keen on a design that would involve doing more stuff in the
postmaster, because that would increase the chances of the postmaster
accidentally dying, which is really bad.  I've been thinking about the
idea of having a separate "listener" process that receives
connections, and that the postmaster can restart if it fails.  Or
there could even be multiple listeners if needed.  When the listener
gets a connection, it hands it off to another process that then "owns"
that connection.

One problem with this is that the process that's going to take over
the connection needs to get started by the postmaster, not the
listener.  The listener could signal the postmaster to start it, just
like we do for background workers, but that might add a bit of
latency.   So what I'm thinking is that the postmaster could maintain
a small (and configurably-sized) pool of preforked workers.  That
might be worth doing independently, as a way to reduce connection
startup latency, although somebody would have to test it to see
whether it really works... a lot of the startup work can't be done
until we know which database the user wants.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Custom compression methods

2018-04-23 Thread Ildus Kurbangaliev

On Mon, 23 Apr 2018 19:34:38 +0300
Konstantin Knizhnik  wrote:

> >  
> Sorry, I really looking at this patch under the different angle.
> And this is why I have some doubts about general idea.
> Postgres allows to defined custom types, access methods,...
> But do you know any production system using some special data types
> or custom indexes which are not included in standard Postgres
> distribution or popular extensions (like postgis)?
> 
> IMHO end-user do not have skills and time to create their own 
> compression algorithms. And without knowledge of specific of
> particular data set,
> it is very hard to implement something more efficient than universal 
> compression library.
> But if you think that it is not a right place and time to discuss it,
> I do not insist.
> 
> But in any case, I think that it will be useful to provide some more 
> examples of custom compression API usage.
>  From my point of view the most useful will be integration with zstd.
> But if it is possible to find some example of data-specific
> compression algorithms which show better results than universal
> compression, it will be even more impressive.
> 
> 

Ok, let me clear up the purpose of this patch. I understand that you
want to compress everything by it but now the idea is just to bring
basic functionality to compress toasting values with external
compression algorithms. It's unlikely that compression algorithms like
zstd, snappy and others will be in postgres core but with this patch
it's really easy to make an extension and start to compress values
using it right away. And the end-user should not be expert in
compression algorithms to make such extension. One of these algorithms
could end up in core if its license will allow it.

I'm not trying to cover all the places in postgres which will benefit
from compression, and this patch only is the first step. It's quite big
already and with every new feature that will increase its size, chances
of its reviewing and commiting will decrease.

The API is very simple now and contains what an every compression
method can do - get some block of data and return a compressed form
of the data. And it can be extended with streaming and other features in
the future.

Maybe the reason of your confusion is that there is no GUC that changes
pglz to some custom compression so all new attributes will use it. I
will think about adding it. Also there was a discussion about
specifying the compression for the type and it was decided that's
better to do it later by a separate patch.

As an example of specialized compression could be time series
compression described in [1]. [2] contains an example of an extension
that adds lz4 compression using this patch.

[1] http://www.vldb.org/pvldb/vol8/p1816-teller.pdf
[2] https://github.com/zilder/pg_lz4

-- 

Regards,
Ildus Kurbangaliev

Re: Built-in connection pooling

2018-04-23 Thread Vladimir Borodin



> 19 апр. 2018 г., в 23:59, Andres Freund  написал(а):
> 
> I think there's plenty things that don't really make sense solving
> outside of postgres:
> - additional added hop / context switches due to external pooler
> - temporary tables
> - prepared statements
> - GUCs and other session state

+1

> 
> I think there's at least one thing that we should attempt to make
> easier for external pooler:
> - proxy authorization

I suggested it here [1] but fair amount of people argued against it in that 
thread.

[1] 
https://www.postgresql.org/message-id/98C8F3EF-52F0-4AF9-BE81-405C15D77DEA%40simply.name

--
May the force be with you…
https://simply.name

Re: documentation is now XML

2018-04-23 Thread Bruce Momjian

On Mon, Apr 23, 2018 at 12:54:40PM +0300, Liudmila Mantrova wrote:
> Hi everyone,
> 
> Reading this thread, I got an impression that everyone would benefit from
> converting back branches to XML, but the main concern is lack of resources to
> complete this task. Are there any other issues that affect this decision? 
> Looks
> like Aleksander Lakhin's offer to prepare patches was missed somehow as the
> discussion sidetracked to other issues:
> 
> I can prepare such patches (scripts to generate them). In fact we 
> (Postgres
> Pro) perform such conversion (SGML->XML) on-fly when building docs 
> starting
> from 9.6. So it's not problem to convert *.sgml and replace Makefile and
> *.xsl. But I would prefer to perform the conversion when we finish the 
> move
> on 11devel (renaming sgml to xml, maybe optimizing xsl's...).
> 
> Do you think it's worth considering after May releases? I could also help with
> manual testing of the prepared patches if required. Supporting documentation
> and translation in several branches will be much easier if the sources are
> consistent.

Well, I will say that backpatching doc changes is more complex having
SGML and XML in supported versions, particularly because of the
different tag endings between them.  Therefore, having all the docs be
XML would be nice from my perspective.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: perltidy version

2018-04-23 Thread Tom Lane

Alvaro Herrera  writes:
> Peter Eisentraut wrote:
>> Did we decide on this?

> No agreement yet apparently :-)

> I still vote we use 20170521 which is recent enough that we won't have
> to change it in a few years.

> I further vote that we change the URL in pgindent/README from
> sourceforge to metacpan.org,
> https://metacpan.org/pod/distribution/Perl-Tidy/lib/Perl/Tidy.pod

I have no particular opinion on which version to use, but if we're going
to change for this cycle, it's time to do so.  I'm intending to do the
initial pgindent run pretty soon ...

regards, tom lane

Re: [HACKERS] Custom compression methods

2018-04-23 Thread Konstantin Knizhnik




On 23.04.2018 18:32, Alexander Korotkov wrote:
But that the main goal of this patch: let somebody implement own 
compression

algorithm which best fit for particular dataset.


Hmmm...Frankly speaking I don't believe in this "somebody".



From my point of view the main value of this patch is that it
allows to replace pglz algorithm with more efficient one, for
example zstd.
At some data sets zstd provides more than 10 times better
compression ratio and at the same time is faster then pglz.


Not exactly.  If we want to replace pglz with more efficient one, then 
we should
just replace pglz with better algorithm.  Pluggable compression 
methods are

definitely don't worth it for just replacing pglz with zstd.


As far as I understand it is not possible for many reasons (portability, 
patents,...) to replace pglz with zstd.
I think that even replacing pglz with zlib (which is much worser than 
zstd) will not be accepted by community.
So from my point of view the main advantage of custom compression method 
is to replace builting pglz compression with more advanced one.




 Some types blob-like datatypes might be not long enough to let generic
compression algorithms like zlib or zstd train a dictionary.  For example,
MySQL successfully utilize column-level dictionaries for JSON [1].  Also
JSON(B) might utilize some compression which let user extract
particular attributes without decompression of the whole document.


Well, I am not an expert in compression.
But I will be very surprised if somebody will show me some real example 
with large enough compressed data buffer (>2kb) where some specialized 
algorithm will provide significantly

better compression ratio than advanced universal compression algorithm.

Also may be I missed something, but current compression API doesn't 
support partial extraction (extra some particular attribute or range).
If we really need it, then it should be expressed in custom compressor 
API. But I am not sure how frequently it will needed.
Large values are splitted into 2kb TOAST chunks. With compression it can 
be about 4-8k of raw data. IMHO storing larger JSON objects is database 
design flaw.
And taken in account that in JSONB we need also extract header (so at 
least two chunks), it makes more obscure advantages of partial JSONB 
decompression.






I do not think that assignment default compression method through
GUC is so bad idea.


It's probably not so bad, but it's a different story. Unrelated to 
this patch, I think.


May be. But in any cases, there are several direction where compression 
can be used:

- custom compression algorithms
- libpq compression
- page level compression
...

and  them should be somehow finally "married" with each other.



I think streaming compression seems like a completely different story.
client-server traffic compression is not just server feature.  It must
be also supported at client side.  And I really doubt it should be
pluggable.

In my opinion, you propose good things like compression of WAL
with better algorithm and compression of client-server traffic.
But I think those features are unrelated to this patch and should
be considered separately.  It's not features, which should be
added to this patch.  Regarding this patch the points you provided
more seems like criticism of the general idea.

I think the problem of this patch is that it lacks of good example.
It would be nice if Ildus implement simple compression with
column-defined dictionary (like [1] does), and show its efficiency
of real-life examples, which can't be achieved with generic
compression methods (like zlib or zstd).  That would be a good
answer to the criticism you provide.

*Links*

1. 
https://www.percona.com/doc/percona-server/LATEST/flexibility/compressed_columns.html


--
Alexander Korotkov
Postgres Professional:http://www.postgrespro.com 


The Russian Postgres Company


Sorry, I really looking at this patch under the different angle.
And this is why I have some doubts about general idea.
Postgres allows to defined custom types, access methods,...
But do you know any production system using some special data types or 
custom indexes which are not included in standard Postgres distribution

or popular extensions (like postgis)?

IMHO end-user do not have skills and time to create their own 
compression algorithms. And without knowledge of specific of particular 
data set,
it is very hard to implement something more efficient than universal 
compression library.
But if you think that it is not a right place and time to discuss it, I 
do not insist.


But in any case, I think that it will be useful to provide some more 
examples of custom compression API usage.

From my point of view the most useful will be integration with zstd.
But if it is possible to find some example of data-specific compression 
algorithms which show better results than universal compression,

it will be even more

Re: Problem while setting the fpw with SIGHUP

2018-04-23 Thread Robert Haas

On Wed, Apr 18, 2018 at 9:49 PM, Michael Paquier  wrote:
> On Wed, Apr 18, 2018 at 10:52:51AM -0400, Robert Haas wrote:
>> I would just document the risks.  If the documentation says that you
>> can't rely on the value until after the next checkpoint, or whatever
>> the rule is, then I think we're fine.  I don't think that we really
>> have the infrastructure to do any better; if we try, we'll just end up
>> with odd warts.  Documenting the current set of warts is less churn
>> and less work.
>
> The last version of the patch proposed has eaten this diff which was
> part of one of the past versions (v2-0001-Change-FPW-handling.patch from
> https://www.postgresql.org/message-id/20180412.103430.133595350.horiguchi.kyotaro%40lab.ntt.co.jp):
> +The default is on. The change of the parameter 
> takes
> +effect at the next checkpoint time.
> So there were some documentation about the beHavior change for what it's
> worth.

Fine, but that doesn't answer the question of whether we actually need
to or should change the behavior in the first place.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Make description of heap records more talkative for flags

2018-04-23 Thread Alvaro Herrera

Hello

Andres Freund wrote:

> On 2018-04-23 12:37:20 -0300, Alvaro Herrera wrote:
> > Michael Paquier wrote:
> > > On Thu, Apr 12, 2018 at 08:49:03PM -0700, Andres Freund wrote:
> > > > OTOH, that also kinda bloats the output noticeably... I'm
> > > > somewhat inclined to just put the hex value or such there?
> > > 
> > > That would do as well for me.
> > 
> > Me too.  Should we consider this for pg11?  My vote is yes, with no
> > backpatch (might break scripts reading pg_{wal,xlog}dump output.
> > Though, really!?).
> 
> It's low risk enough, but why should we try to get it into v11? Flags
> been around for years now.

I was thinking the newly added flags would make for possibly more
confusing scenarios, but now that I look again, the new ones are XLHL
flags not the XLOG flags being handled in this patch.

Now, frankly, this being mostly a debugging tool, I think it would be
better to have the output as complete as we can.  Otherwise, when
debugging some hideous problem we find ourselves patching the tool
during an emergency in order to figure out what is going on.  For this,
I would rather patch the earliest version we conveniently can.  We
cannot patch pg10 because it's already out there, but not so for pg11.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: perltidy version

2018-04-23 Thread Alvaro Herrera

Peter Eisentraut wrote:
> On 3/5/18 09:02, Magnus Hagander wrote:
> > I think we should just pick some recent one and use it for X years; use
> > that one for all backbranches.  I propose X=3.  I propose 20170521
> > (newer ones seem to cater for stuff that I think we mostly don't use).
> > 
> > 20140328 seems to cover *most* versions. Another argument for that one
> > would be it's the one that we have on Borka, which is where we build the
> > official release tarballs, so we can use that as a stable fallback.
> > 
> > Those are both fairly weak arguments though. As long as we have good
> > instructions for how to make a local install of it that doesn't affect
> > the rest of the system, then that should not matter. And we need such
> > instructions anyway, since it won't be on every distribution. 
> 
> Did we decide on this?

No agreement yet apparently :-)

I still vote we use 20170521 which is recent enough that we won't have
to change it in a few years.

I further vote that we change the URL in pgindent/README from
sourceforge to metacpan.org,
https://metacpan.org/pod/distribution/Perl-Tidy/lib/Perl/Tidy.pod

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: perltidy version

2018-04-23 Thread Peter Eisentraut

On 3/5/18 09:02, Magnus Hagander wrote:
> I think we should just pick some recent one and use it for X years; use
> that one for all backbranches.  I propose X=3.  I propose 20170521
> (newer ones seem to cater for stuff that I think we mostly don't use).
> 
> 
> 20140328 seems to cover *most* versions. Another argument for that one
> would be it's the one that we have on Borka, which is where we build the
> official release tarballs, so we can use that as a stable fallback.
> 
> Those are both fairly weak arguments though. As long as we have good
> instructions for how to make a local install of it that doesn't affect
> the rest of the system, then that should not matter. And we need such
> instructions anyway, since it won't be on every distribution. 

Did we decide on this?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Make description of heap records more talkative for flags

2018-04-23 Thread Andres Freund

Hi,

On 2018-04-23 12:37:20 -0300, Alvaro Herrera wrote:
> Michael Paquier wrote:
> > On Thu, Apr 12, 2018 at 08:49:03PM -0700, Andres Freund wrote:
> > > OTOH, that also kinda bloats the output noticeably... I'm somewhat
> > > inclined to just put the hex value or such there?
> > 
> > That would do as well for me.
> 
> Me too.  Should we consider this for pg11?  My vote is yes, with no
> backpatch (might break scripts reading pg_{wal,xlog}dump output.
> Though, really!?).

It's low risk enough, but why should we try to get it into v11? Flags
been around for years now.

Greetings,

Andres Freund

Re: Build fails with different versions of clang and LLVM

2018-04-23 Thread Heikki Linnakangas


On 23/04/18 18:35, Andres Freund wrote:

Hi,

On 2018-04-23 18:33:30 +0300, Heikki Linnakangas wrote:

To add to the story: I installed clang 3.9, but I still got the same error.
I now had both clang 3.8 and 3.9 installed, but /usr/bin/clang still pointed
to clang-3.8, so autoconf still picked that up. Only after removing
clang-3.8 altogether, it started working.


And CLANG=clang-3.9 to autoconf didn't cure that?


Oh, I'm sure it would have, I didn't try that :-).


I've experimented a bit w/ this and there seem to be fewer cross-version
issues in newer clang/llvm versions.


Good to hear. Nevertheless, I think we should try a bit harder to ensure 
that we pick the same version of clang and LLVM. Even if they're 
compatible, just seems more straightforward that way. And we now know 
there is an incompatibility between 3.8 and 3.9, at least.


- Heikki

Re: Make description of heap records more talkative for flags

2018-04-23 Thread Alvaro Herrera

Michael Paquier wrote:
> On Thu, Apr 12, 2018 at 08:49:03PM -0700, Andres Freund wrote:
> > OTOH, that also kinda bloats the output noticeably... I'm somewhat
> > inclined to just put the hex value or such there?
> 
> That would do as well for me.

Me too.  Should we consider this for pg11?  My vote is yes, with no
backpatch (might break scripts reading pg_{wal,xlog}dump output.
Though, really!?).

Please submit a patch implementing this.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Build fails with different versions of clang and LLVM

2018-04-23 Thread Andres Freund

Hi,

On 2018-04-23 18:33:30 +0300, Heikki Linnakangas wrote:
> To add to the story: I installed clang 3.9, but I still got the same error.
> I now had both clang 3.8 and 3.9 installed, but /usr/bin/clang still pointed
> to clang-3.8, so autoconf still picked that up. Only after removing
> clang-3.8 altogether, it started working.

And CLANG=clang-3.9 to autoconf didn't cure that?

I've experimented a bit w/ this and there seem to be fewer cross-version
issues in newer clang/llvm versions.

Greetings,

Andres Freund

Re: Build fails with different versions of clang and LLVM

2018-04-23 Thread Heikki Linnakangas


On 23/04/18 18:28, Andres Freund wrote:

Hi,

On 2018-04-23 04:33:04 -0400, Heikki Linnakangas wrote:

So, I have LLVM 3.9 installed, but no clang 3.9. Only clang 3.8.


Any specific reason, or just happenstance?


Just happenstance. I had clang and llvm 3.8 installed previously. But 
PostgreSQL needs llvm 3.9, so I installed that, but I didn't think to 
install clang 3.9.


To add to the story: I installed clang 3.9, but I still got the same 
error. I now had both clang 3.8 and 3.9 installed, but /usr/bin/clang 
still pointed to clang-3.8, so autoconf still picked that up. Only after 
removing clang-3.8 altogether, it started working.


For the record, this is on a pretty normal Debian Stretch installation.

- Heikki

Re: [HACKERS] Custom compression methods

2018-04-23 Thread Alexander Korotkov

On Mon, Apr 23, 2018 at 12:40 PM, Konstantin Knizhnik <
k.knizh...@postgrespro.ru> wrote:

> On 22.04.2018 16:21, Alexander Korotkov wrote:
>
> On Fri, Apr 20, 2018 at 7:45 PM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:
>
>> On 30.03.2018 19:50, Ildus Kurbangaliev wrote:
>>
>>> On Mon, 26 Mar 2018 20:38:25 +0300
>>> Ildus Kurbangaliev  wrote:
>>>
>>> Attached rebased version of the patch. Fixed conflicts in pg_class.h.

 New rebased version due to conflicts in master. Also fixed few errors
>>> and removed cmdrop method since it couldnt be tested.
>>>
>>>  I seems to be useful (and not so difficult) to use custom compression
>> methods also for WAL compression: replace direct calls of pglz_compress in
>> xloginsert.c
>
>
> I'm going to object this at point, and I've following arguments for that:
>
> 1) WAL compression is much more critical for durability than datatype
> compression.  Imagine, compression algorithm contains a bug which
> cause decompress method to issue a segfault.  In the case of datatype
> compression, that would cause crash on access to some value which
> causes segfault; but in the rest database will be working giving you
> a chance to localize the issue and investigate that.  In the case of
> WAL compression, recovery would cause a server crash.  That seems
> to be much more serious disaster.  You wouldn't be able to make
> your database up and running and the same happens on the standby.
>
> Well, I do not think that somebody will try to implement its own
> compression algorithm...
>

But that the main goal of this patch: let somebody implement own compression
algorithm which best fit for particular dataset.


> From my point of view the main value of this patch is that it allows to
> replace pglz algorithm with more efficient one, for example zstd.
> At some data sets zstd provides more than 10 times better compression
> ratio and at the same time is faster then pglz.
>

Not exactly.  If we want to replace pglz with more efficient one, then we
should
just replace pglz with better algorithm.  Pluggable compression methods are
definitely don't worth it for just replacing pglz with zstd.


> I do not think that risk of data corruption caused by WAL compression with
> some alternative compression algorithm (zlib, zstd,...) is higher than in
> case of using builtin Postgres compression.
>

It speaking about zlib or zstd, then yes risk of corruption is very low.
But again,
switching to zlib or zstd don't justify this patch.

> 2) Idea of custom compression method is that some columns may
> have specific data distribution, which could be handled better with
> particular compression method and particular parameters.  In the
> WAL compression you're dealing with the whole WAL stream containing
> all the values from database cluster.  Moreover, if custom compression
> method are defined for columns, then in WAL stream you've values
> already compressed in the most efficient way.  However, it might
> appear that some compression method is better for WAL in general
> case (there are benchmarks showing our pglz is not very good in
> comparison to the alternatives).  But in this case I would prefer to just
> switch our WAL to different compression method one day.  Thankfully
> we don't preserve WAL compatibility between major releases.
>
>
> Frankly speaking I do not believe that somebody will use custom
> compression in this way: implement its own compression methods for the
> specific data type.
> May be just for json/jsonb, but also only in the case when custom
> compression API allows to separately store compression dictionary (which as
> far as I understand is not currently supported).
>
> When I worked for SciDB (database for scientists which has to deal mostly
> with multidimensional arrays of data) our first intention was to implement
> custom compression methods for the particular data types and data
> distributions. For example, there are very fast, simple and efficient
> algorithms for encoding sequence of monotonically increased integers, 
> But after several experiments we rejected this idea and switch to using
> generic compression methods. Mostly because we do not want compressor to
> know much about page layout, data type representation,... In Postgres, from
> my point of view,  we have similar situation. Assume that we have column of
> serial type. So it is good candidate of compression, isn't it?
>

No, it's not.  Exactly because compressor shouldn't deal with page layout
etc.
But it's absolutely OK for datatype compressor to deal with particular type
representation.


> But this approach deals only with particular attribute values. It can not
> take any advantages from the fact that this particular column is
> monotonically increased. It can be done only with page level compression,
> but it is a different story.
>

Yes, compression of data series spear across multiple rows is different
story.


> So current approach

Re: Build fails with different versions of clang and LLVM

2018-04-23 Thread Andres Freund

Hi,

On 2018-04-23 04:33:04 -0400, Heikki Linnakangas wrote:
> So, I have LLVM 3.9 installed, but no clang 3.9. Only clang 3.8.

Any specific reason, or just happenstance?

> Googling around, the LLVM bitcode format is supposed to be compatible across
> versions, but I'm not sure I trust that. Perhaps we should check that the
> LLVM and clang versions match? Instead of searching for any 'clang' program
> with PGAC_PATH_PROGS, perhaps we should always use "`lvm-config-3.9
> --bindir`/clang", throwing an error if it doesn't exist.

That'll commonly not exist I think.  I kinda think we should just wait a
bit till we've collected more experience..

> BTW, I'm surprised that llvm-lto is invoked in the "make install" stage. I
> would expect it to be done by plain "make" already, and "make install" would
> just copy the files to the right places.

If you can come up with decent make foo, I'm open to changing that, but
I couldn't come up with something neatly encapsulated...  It's just an
index over the individual files, so it didn't seem that crazy.

Greetings,

Andres Freund

Re: Oddity in tuple routing for foreign partitions

2018-04-23 Thread Alvaro Herrera

Robert, I think this is your turf, per 3d956d9562aa.  Are you looking
into it?

Thanks,

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: minor fix for acquire_inherited_sample_rows

2018-04-23 Thread Alvaro Herrera

Hello Amit

Amit Langote wrote:

> acquire_inherited_sample_rows() currently uses equalTupleDescs() being
> false as the condition for going to tupconv.c to determine whether tuple
> conversion is needed.  But equalTupleDescs() will always return false if
> it's passed TupleDesc's of two different tables, which is the most common
> case here.  So I first thought we should just unconditionally go to
> tupconv.c, but there is still one case where we don't need to, which is
> the case where the child table is same as the parent table.  However, it
> would be much cheaper to just check if the relation OIDs are different
> instead of calling equalTupleDescs, which the attached patch teaches it to do.

When (not if) we get around to updating equalTupleDescs to cope, we will
need this call right where it is (and we'd have a hard time finding this
potential callsite later, I think).  If this were a hot spot I would be
happy to change it, but it's not so I'd rather leave it alone.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: minor fix for CloneRowTriggersToPartition

2018-04-23 Thread Alvaro Herrera

Amit Langote wrote:
> Hi.
> 
> I think we should apply the attached patch so that a CreateTriggerStmt
> that CloneRowTriggersToPartition creates for a partition doesn't contain
> pointers that point to the information in the parent table's relcache,
> which may go stale before the pointers in question are used.

Thanks for catching that!  Pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: bms_prev_member won't work correctly if bitmapword is 64-bits

2018-04-23 Thread Teodor Sigaev


Thank you, pushed

David Rowley wrote:

bms_prev_member mistakenly has a hardcoded 24 in the function. This
should really be BITS_PER_BITMAPWORD - 8 so that it is properly set to
56 if someone compiles with 64-bit bitmapwords.

The attached fixes this and also adds a test to exercise the function
a bit. [1] indicates there's currently no coverage of this function at
all.

[1] https://coverage.postgresql.org/src/backend/nodes/bitmapset.c.gcov.html



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: wal_consistency_checking reports an inconsistency on master branch

2018-04-23 Thread Andres Freund

On 2018-04-23 13:22:21 +0300, Heikki Linnakangas wrote:
> On 13/04/18 13:08, Michael Paquier wrote:
> > On Fri, Apr 13, 2018 at 02:15:35PM +0530, amul sul wrote:
> > > I have looked into this and found that the issue is in heap_xlog_delete 
> > > -- we
> > > have missed to set the correct offset number from the target_tid when
> > > XLH_DELETE_IS_PARTITION_MOVE flag is set.
> > 
> > Oh, this looks good to me.  So when a row was moved across partitions
> > this could have caused incorrect tuple references on a standby, which
> > could have caused corruptions.
> 
> Hmm. So, the problem was that HeapTupleHeaderSetMovedPartitions() only sets
> the block number to InvalidBlockNumber, and leaves the offset number
> unchanged. WAL replay didn't preserve the offset number, so the master and
> the standby had a different offset number in the ctid.

Right.

> Why does HeapTupleHeaderSetMovedPartitions() leave the offset number
> unchanged? The old offset number is meaningless without the block number.
> Also, bits and magic values in the tuple header are scarce. We're
> squandering a whole range of values in the ctid, everything with
> ip_blkid==InvalidBlockNumber, to mean "moved to different partition", when a
> single value would suffice.

Yes, I agree on that.


> I kept using InvalidBlockNumber there, so ItemPointerIsValid() still
> considers those item pointers as invalid. But my gut feeling is actually
> that it would be better to use e.g. 0 as the block number, so that these
> item pointers would appear valid. Again, to follow the precedent of
> speculative insertion tokens. But I'm not sure if there was some
> well-thought-out reason to make them appear invalid. A comment on that would
> be nice, at least.

That seems risky to me. We want something that stops EPQ style chasing
without running into asserts for invalid offsets...

Greetings,

Andres Freund

Re: JIT flag definitions not parenthesized

2018-04-23 Thread Andres Freund

Hi,

On 2018-04-23 04:49:17 -0400, Heikki Linnakangas wrote:
> On 22/04/18 23:21, David Rowley wrote:
> > Patch attached.
> 
> Fixed, thanks!

Thanks David, Heikki.

- Andres

Re: [HACKERS] port of INSTALL file generation to XSLT

2018-04-23 Thread Peter Eisentraut

On 2/27/17 10:55, Peter Eisentraut wrote:
> It appears we need pandoc 1.13 to get the good output.  This won't be
> available until Debian stretch.

I understand that borka is updated to stretch now.  So we could give
this another try.  Updated patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From dfed8d84d1738ecca6721e97b13e86cf481aaf6c Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 23 Apr 2018 10:40:17 -0400
Subject: [PATCH] Create INSTALL file using Pandoc

Replace using lynx with using pandoc.  Pandoc creates better looking
output and it avoids the delicate locale/encoding issues of lynx because
it always uses UTF-8 for both input and output.

Note: requires Pandoc >=1.13
---
 doc/src/sgml/Makefile | 14 +++---
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/doc/src/sgml/Makefile b/doc/src/sgml/Makefile
index 74aac01c39..93b0b60cdb 100644
--- a/doc/src/sgml/Makefile
+++ b/doc/src/sgml/Makefile
@@ -100,18 +100,10 @@ errcodes-table.sgml: 
$(top_srcdir)/src/backend/utils/errcodes.txt generate-errco
 ##
 
 ICONV = iconv
-LYNX = lynx
-
-# The documentation may contain non-ASCII characters (mostly for
-# contributor names), which lynx converts to the encoding determined
-# by the current locale.  To get text output that is deterministic and
-# easily readable by everyone, we make lynx produce LATIN1 and then
-# convert that to ASCII with transliteration for the non-ASCII characters.
-# Official releases were historically built on FreeBSD, which has limited
-# locale support and is very picky about locale name spelling.  The
-# below has been finely tuned to run on FreeBSD and Linux/glibc.
+PANDOC = pandoc
+
 INSTALL: % : %.html
-   $(PERL) -p -e 's, $@
+   $(PANDOC) $< -t plain | $(ICONV) -f utf8 -t us-ascii//TRANSLIT > $@
 
 INSTALL.html: %.html : stylesheet-text.xsl %.xml
$(XMLLINT) --noout --valid $*.xml

base-commit: 56811e57323faa453947eb82f007e323a952e1a1
-- 
2.17.0

Re: Is a modern build system acceptable for older platforms

2018-04-23 Thread Robert Haas

On Thu, Apr 19, 2018 at 10:16 AM, Tom Lane  wrote:
> The other side of that argument is that allowing a build system we haven't
> even adopted yet to dictate which platforms we can support is definitely
> letting the tail wag the dog.
>
> My gut reaction to Catalin's list is that requiring C+11 is a pretty
> darn high bar to clear for older platforms.  I have a positive impression
> of python's portability, so requiring a recent python version might not
> be too awful ... but then requiring ninja pretty much tosses away the
> advantage again.  So, while in principle you could probably get these
> toolchains going on an old platform, the reality is that moving to either
> will amount to "we're desupporting everything that wasn't released in
> this decade".  That's a pretty big shift from the project's traditional
> mindset.  It's possible that our users wouldn't care; I don't know.
> But to me it's a significant minus that we'd have to set against whatever
> pluses are claimed for a move.

Yeah, I agree.  I am not deathly opposed to moving, but I'd like to be
convinced that we're going to get real advantages from such a move,
and so far I'm not.  The arguments thus far advanced for moving boil
down to (1) the current system is kind of old and creaky, which is
true but which I'm not sure is by itself a compelling argument for
changing anything, and (2) it might make things easier on Windows,
which could be a sufficiently good reason but I don't think I've seen
anyone explain exactly how much easier it will make things and in what
ways.

I think it's inevitable that a move like this will create some
disruption -- developers will need to install and learn new tools,
buildfarm members will need updating, and there will be some bugs.
None of that is a blocker, but the gains need to outweigh those
disadvantages, and we can't judge whether they do without a clear
explanation of what the gains will be.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: psql leaks memory on query cancellation

2018-04-23 Thread Komяpa

>
> Therefore, I propose the attached patch, which simply sees to it that
> we discard any partial query result at the start of error message
> collection not the end.  This makes the behavior very much better,
> at least on Linux.
>

I have tested the build of Postgres with attached patch and confirm that I
don't see this problematic behavior anymore even with default allocator.
Thank you!

I think this is a back-patchable bug fix; certainly so at least back
> to 9.6 where \errverbose was added.  Versions before that do not show
> the persistent memory bloat the OP is complaining of, so that what
> we have here is arguably a performance regression.  Comments?
>

This should not bring regression, since the memory is freed anyway, but is
valuable as puts less pressure on client memory requirements for manual
data inspection workflows.

Darafei Praliaskouski,
GIS Engineer / Juno Minsk

Re: minor fix for acquire_inherited_sample_rows

2018-04-23 Thread Amit Langote

Thanks for the review.

On Mon, Apr 23, 2018 at 8:25 PM, Ashutosh Bapat
 wrote:
> On Mon, Apr 23, 2018 at 3:44 PM, Amit Langote
>  wrote:
>> Hi.
>>
>> acquire_inherited_sample_rows() currently uses equalTupleDescs() being
>> false as the condition for going to tupconv.c to determine whether tuple
>> conversion is needed.  But equalTupleDescs() will always return false if
>> it's passed TupleDesc's of two different tables, which is the most common
>> case here.  So I first thought we should just unconditionally go to
>> tupconv.c, but there is still one case where we don't need to, which is
>> the case where the child table is same as the parent table.  However, it
>> would be much cheaper to just check if the relation OIDs are different
>> instead of calling equalTupleDescs, which the attached patch teaches it to 
>> do.
>
> We want to check whether tuple conversion is needed. Theoretically
> (not necessarily practically), one could have tuple descs of two
> different tables stamped with the same tdtypeid. From that POV,
> equalTupleDescs seems to be a stronger check than what you have in the
> patch.

If I'm reading right, that sounds like a +1 to the patch. :)

> BTW the patch you have posted also has a fix you proposed in some
> other thread. Probably you want to get rid of it.

Oops, fixed that in the attached.

Thanks,
Amit


condition-may-need-tuple-conversion.patch
Description: Binary data

Re: Corrupted btree index on HEAD because of covering indexes

2018-04-23 Thread Teodor Sigaev


Thank you, pushed


I see your point. Maybe don't have the newline between the get and
set, though, to match the existing style. And, the note about the
assertion seems unnecessary.

Removed newline and note


"Get/set leaf page highkey's link. During the second phase of
deletion, the target leaf page's high key may point to an ancestor
page (at all other times, the leaf level high key's link is not used).
See the nbtree README for full details."

Changed as you suggested.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/

Re: minor fix for acquire_inherited_sample_rows

2018-04-23 Thread Ashutosh Bapat

On Mon, Apr 23, 2018 at 3:44 PM, Amit Langote
 wrote:
> Hi.
>
> acquire_inherited_sample_rows() currently uses equalTupleDescs() being
> false as the condition for going to tupconv.c to determine whether tuple
> conversion is needed.  But equalTupleDescs() will always return false if
> it's passed TupleDesc's of two different tables, which is the most common
> case here.  So I first thought we should just unconditionally go to
> tupconv.c, but there is still one case where we don't need to, which is
> the case where the child table is same as the parent table.  However, it
> would be much cheaper to just check if the relation OIDs are different
> instead of calling equalTupleDescs, which the attached patch teaches it to do.

We want to check whether tuple conversion is needed. Theoretically
(not necessarily practically), one could have tuple descs of two
different tables stamped with the same tdtypeid. From that POV,
equalTupleDescs seems to be a stronger check than what you have in the
patch.

BTW the patch you have posted also has a fix you proposed in some
other thread. Probably you want to get rid of it.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

lingering references to V0 calling convention

2018-04-23 Thread John Naylor

I noticed one in the SPI docs and tried to look for more. The attached
patch is an attempt at removing the remaining references. The original
SPI example returned a uint64 as a signed int8 SQL type, so I'm not
sure if I handled that correctly.

However, I didn't touch the documentation of the configure options for
--disable-floatN-byval, since this thread proposed something a bit
more invasive:

https://www.postgresql.org/message-id/flat/10862.1519228208%40sss.pgh.pa.us#10862.1519228...@sss.pgh.pa.us

-John Naylor
diff --git a/doc/src/sgml/plhandler.sgml b/doc/src/sgml/plhandler.sgml
index 363f84b..4f8c4d0 100644
--- a/doc/src/sgml/plhandler.sgml
+++ b/doc/src/sgml/plhandler.sgml
@@ -11,9 +11,8 @@

 All calls to functions that are written in a language other than
 the current version 1 interface for compiled
-languages (this includes functions in user-defined procedural languages,
-functions written in SQL, and functions using the version 0 compiled
-language interface) go through a call handler
+languages (this includes functions in user-defined procedural languages
+and functions written in SQL) go through a call handler
 function for the specific language.  It is the responsibility of
 the call handler to execute the function in a meaningful way, such
 as by interpreting the supplied source text.  This chapter outlines
diff --git a/doc/src/sgml/spi.sgml b/doc/src/sgml/spi.sgml
index 0bac342..1260275 100644
--- a/doc/src/sgml/spi.sgml
+++ b/doc/src/sgml/spi.sgml
@@ -4583,17 +4583,19 @@ INSERT INTO a SELECT * FROM a;
 
 PG_MODULE_MAGIC;
 
-int64 execq(text *sql, int cnt);
+PG_FUNCTION_INFO_V1(execq);
 
-int64
-execq(text *sql, int cnt)
+Datum
+execq(PG_FUNCTION_ARGS)
 {
 char *command;
+int cnt;
 int ret;
 uint64 proc;
 
 /* Convert given text object to a C string */
-command = text_to_cstring(sql);
+command = text_to_cstring(PG_GETARG_TEXT_PP(1));
+cnt = PG_GETARG_INT32(2);
 
 SPI_connect();
 
@@ -4626,17 +4628,11 @@ execq(text *sql, int cnt)
 SPI_finish();
 pfree(command);
 
-return proc;
+PG_RETURN_INT64(proc);
 }
 
 
   
-   (This function uses call convention version 0, to make the example
-   easier to understand.  In real applications you should use the new
-   version 1 interface.)
-  
-
-  
This is how you declare the function after having compiled it into
a shared library (details are in .):

Re: wal_consistency_checking reports an inconsistency on master branch

2018-04-23 Thread Heikki Linnakangas


On 13/04/18 13:08, Michael Paquier wrote:

On Fri, Apr 13, 2018 at 02:15:35PM +0530, amul sul wrote:

I have looked into this and found that the issue is in heap_xlog_delete -- we
have missed to set the correct offset number from the target_tid when
XLH_DELETE_IS_PARTITION_MOVE flag is set.


Oh, this looks good to me.  So when a row was moved across partitions
this could have caused incorrect tuple references on a standby, which
could have caused corruptions.


Hmm. So, the problem was that HeapTupleHeaderSetMovedPartitions() only 
sets the block number to InvalidBlockNumber, and leaves the offset 
number unchanged. WAL replay didn't preserve the offset number, so the 
master and the standby had a different offset number in the ctid.


Why does HeapTupleHeaderSetMovedPartitions() leave the offset number 
unchanged? The old offset number is meaningless without the block 
number. Also, bits and magic values in the tuple header are scarce. 
We're squandering a whole range of values in the ctid, everything with 
ip_blkid==InvalidBlockNumber, to mean "moved to different partition", 
when a single value would suffice.


Let's tighten that up. In the attached (untested) patch, I changed the 
macros so that "moved to different partition" is indicated by the magic 
TID (InvalidBlockNumber, 0xfffd). Offset number 0xfffe was already used 
for speculative insertion tokens, so this follows that precedent.


I kept using InvalidBlockNumber there, so ItemPointerIsValid() still 
considers those item pointers as invalid. But my gut feeling is actually 
that it would be better to use e.g. 0 as the block number, so that these 
item pointers would appear valid. Again, to follow the precedent of 
speculative insertion tokens. But I'm not sure if there was some 
well-thought-out reason to make them appear invalid. A comment on that 
would be nice, at least.


(Amit hinted at this in 
https://www.postgresql.org/message-id/CAA4eK1KtsTqsGDggDCrz2O9Jgo7ma-Co-B8%2Bv3L2zWMA2NHm6A%40mail.gmail.com. 
He was OK with the current approach, but I feel pretty strongly that we 
should also set the offset number.)


- Heikki
diff --git a/src/include/access/htup_details.h b/src/include/access/htup_details.h
index cf56d4ace4..1867a70f6f 100644
--- a/src/include/access/htup_details.h
+++ b/src/include/access/htup_details.h
@@ -83,13 +83,15 @@
  *
  * A word about t_ctid: whenever a new tuple is stored on disk, its t_ctid
  * is initialized with its own TID (location).  If the tuple is ever updated,
- * its t_ctid is changed to point to the replacement version of the tuple or
- * the block number (ip_blkid) is invalidated if the tuple is moved from one
- * partition to another partition relation due to an update of the partition
- * key.  Thus, a tuple is the latest version of its row iff XMAX is invalid or
+ * its t_ctid is changed to point to the replacement version of the tuple.  Or
+ * if the tuple is moved from one partition to another, due to an update of
+ * the partition key, t_ctid is set to a special value to indicate that
+ * (see ItemPointerSetMovedPartitions).  Thus, a tuple is the latest version
+ * of its row iff XMAX is invalid or
  * t_ctid points to itself (in which case, if XMAX is valid, the tuple is
  * either locked or deleted).  One can follow the chain of t_ctid links
- * to find the newest version of the row.  Beware however that VACUUM might
+ * to find the newest version of the row, unless it was moved to a different
+ * partition.  Beware however that VACUUM might
  * erase the pointed-to (newer) tuple before erasing the pointing (older)
  * tuple.  Hence, when following a t_ctid link, it is necessary to check
  * to see if the referenced slot is empty or contains an unrelated tuple.
@@ -288,14 +290,6 @@ struct HeapTupleHeaderData
 #define HEAP_TUPLE_HAS_MATCH	HEAP_ONLY_TUPLE /* tuple has a join match */
 
 /*
- * Special value used in t_ctid.ip_posid, to indicate that it holds a
- * speculative insertion token rather than a real TID.  This must be higher
- * than MaxOffsetNumber, so that it can be distinguished from a valid
- * offset number in a regular item pointer.
- */
-#define SpecTokenOffsetNumber		0xfffe
-
-/*
  * HeapTupleHeader accessor macros
  *
  * Note: beware of multiple evaluations of "tup" argument.  But the Set
@@ -447,11 +441,12 @@ do { \
 	ItemPointerSet(&(tup)->t_ctid, token, SpecTokenOffsetNumber) \
 )
 
-#define HeapTupleHeaderSetMovedPartitions(tup) \
-	ItemPointerSetMovedPartitions(&(tup)->t_ctid)
-
 #define HeapTupleHeaderIndicatesMovedPartitions(tup) \
-	ItemPointerIndicatesMovedPartitions(>t_ctid)
+	(ItemPointerGetOffsetNumber(&(tup)->t_ctid) == MovedPartitionsOffsetNumber && \
+	 ItemPointerGetBlockNumberNoCheck(&(tup)->t_ctid) == MovedPartitionsBlockNumber)
+
+#define HeapTupleHeaderSetMovedPartitions(tup) \
+	ItemPointerSet(&(tup)->t_ctid, MovedPartitionsBlockNumber, MovedPartitionsOffsetNumber)
 
 #define HeapTupleHeaderGetDatumLength(tup) \
 	VARSIZE(tup)
diff --git

minor fix for acquire_inherited_sample_rows

2018-04-23 Thread Amit Langote

Hi.

acquire_inherited_sample_rows() currently uses equalTupleDescs() being
false as the condition for going to tupconv.c to determine whether tuple
conversion is needed.  But equalTupleDescs() will always return false if
it's passed TupleDesc's of two different tables, which is the most common
case here.  So I first thought we should just unconditionally go to
tupconv.c, but there is still one case where we don't need to, which is
the case where the child table is same as the parent table.  However, it
would be much cheaper to just check if the relation OIDs are different
instead of calling equalTupleDescs, which the attached patch teaches it to do.

Thanks,
Amit
diff --git a/src/backend/commands/analyze.c b/src/backend/commands/analyze.c
index 25194e871c..262299eaee 100644
--- a/src/backend/commands/analyze.c
+++ b/src/backend/commands/analyze.c
@@ -1489,8 +1489,7 @@ acquire_inherited_sample_rows(Relation onerel, int elevel,
 
/* We may need to convert from child's rowtype 
to parent's */
if (childrows > 0 &&
-   
!equalTupleDescs(RelationGetDescr(childrel),
-
RelationGetDescr(onerel)))
+   RelationGetRelid(childrel) != 
RelationGetRelid(onerel))
{
TupleConversionMap *map;
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fe4265d4bb..9b1194d15f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14560,7 +14560,8 @@ CloneRowTriggersToPartition(Relation parent, Relation 
partition)
 
col = TupleDescAttr(parent->rd_att,

trigForm->tgattr.values[i] - 1);
-   cols = lappend(cols, 
makeString(NameStr(col->attname)));
+   cols = lappend(cols,
+  
makeString(pstrdup(NameStr(col->attname;
}
}

Re: documentation is now XML

2018-04-23 Thread Liudmila Mantrova


Hi everyone,

Reading this thread, I got an impression that everyone would benefit 
from converting back branches to XML, but the main concern is lack of 
resources to complete this task. Are there any other issues that affect 
this decision? Looks like Aleksander Lakhin's offer to prepare patches 
was missed somehow as the discussion sidetracked to other issues:


I can prepare such patches (scripts to generate them). In fact we 
(Postgres Pro) perform such conversion (SGML->XML) on-fly when 
building docs starting from 9.6. So it's not problem to convert *.sgml 
and replace Makefile and *.xsl. But I would prefer to perform the 
conversion when we finish the move on 11devel (renaming sgml to xml, 
maybe optimizing xsl's...).


Do you think it's worth considering after May releases? I could also 
help with manual testing of the prepared patches if required. Supporting 
documentation and translation in several branches will be much easier if 
the sources are consistent.


--
Liudmila Mantrova
Technical writer at Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

minor fix for CloneRowTriggersToPartition

2018-04-23 Thread Amit Langote

Hi.

I think we should apply the attached patch so that a CreateTriggerStmt
that CloneRowTriggersToPartition creates for a partition doesn't contain
pointers that point to the information in the parent table's relcache,
which may go stale before the pointers in question are used.

Thanks,
Amit
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index fe4265d4bb..9b1194d15f 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14560,7 +14560,8 @@ CloneRowTriggersToPartition(Relation parent, Relation 
partition)
 
col = TupleDescAttr(parent->rd_att,

trigForm->tgattr.values[i] - 1);
-   cols = lappend(cols, 
makeString(NameStr(col->attname)));
+   cols = lappend(cols,
+  
makeString(pstrdup(NameStr(col->attname;
}
}

Re: [HACKERS] Custom compression methods

2018-04-23 Thread Konstantin Knizhnik




On 22.04.2018 16:21, Alexander Korotkov wrote:
On Fri, Apr 20, 2018 at 7:45 PM, Konstantin Knizhnik 
> wrote:


On 30.03.2018 19:50, Ildus Kurbangaliev wrote:

On Mon, 26 Mar 2018 20:38:25 +0300
Ildus Kurbangaliev > wrote:

Attached rebased version of the patch. Fixed conflicts in
pg_class.h.

New rebased version due to conflicts in master. Also fixed few
errors
and removed cmdrop method since it couldnt be tested.

 I seems to be useful (and not so difficult) to use custom
compression methods also for WAL compression: replace direct calls
of pglz_compress in xloginsert.c


I'm going to object this at point, and I've following arguments for that:

1) WAL compression is much more critical for durability than datatype
compression.  Imagine, compression algorithm contains a bug which
cause decompress method to issue a segfault.  In the case of datatype
compression, that would cause crash on access to some value which
causes segfault; but in the rest database will be working giving you
a chance to localize the issue and investigate that. In the case of
WAL compression, recovery would cause a server crash. That seems
to be much more serious disaster.  You wouldn't be able to make
your database up and running and the same happens on the standby.

Well, I do not think that somebody will try to implement its own 
compression algorithm...
From my point of view the main value of this patch is that it allows to 
replace pglz algorithm with more efficient one, for example zstd.
At some data sets zstd provides more than 10 times better compression 
ratio and at the same time is faster then pglz.
I do not think that risk of data corruption caused by WAL compression 
with some alternative compression algorithm (zlib, zstd,...) is higher 
than in case of using builtin Postgres compression.





2) Idea of custom compression method is that some columns may
have specific data distribution, which could be handled better with
particular compression method and particular parameters.  In the
WAL compression you're dealing with the whole WAL stream containing
all the values from database cluster.  Moreover, if custom compression
method are defined for columns, then in WAL stream you've values
already compressed in the most efficient way.  However, it might
appear that some compression method is better for WAL in general
case (there are benchmarks showing our pglz is not very good in
comparison to the alternatives).  But in this case I would prefer to just
switch our WAL to different compression method one day.  Thankfully
we don't preserve WAL compatibility between major releases.


Frankly speaking I do not believe that somebody will use custom 
compression in this way: implement its own compression methods for the 
specific data type.
May be just for json/jsonb, but also only in the case when custom 
compression API allows to separately store compression dictionary (which 
as far as I understand is not currently supported).


When I worked for SciDB (database for scientists which has to deal 
mostly with multidimensional arrays of data) our first intention was to 
implement custom compression methods for the particular data types and 
data distributions. For example, there are very fast, simple and 
efficient algorithms for encoding sequence of monotonically increased 
integers, 
But after several experiments we rejected this idea and switch to using 
generic compression methods. Mostly because we do not want compressor to 
know much about page layout, data type representation,... In Postgres, 
from my point of view,  we have similar situation. Assume that we have 
column of serial type. So it is good candidate of compression, isn't it?
But this approach deals only with particular attribute values. It can 
not take any advantages from the fact that this particular column is 
monotonically increased. It can be done only with page level 
compression, but it is a different story.


So current approach works only for blob-like types: text, json,... But 
them usually have quite complex internal structure and for them 
universal compression algorithms used to be more efficient than any 
hand-written specific implementation. Also algorithms like zstd, are 
able to efficiently recognize and compress many common data 
distributions, line monotonic sequences, duplicates, repeated series,...




3) This patch provides custom compression methods recorded in
the catalog.  During recovery you don't have access to the system
catalog, because it's not recovered yet, and can't fetch compression
method metadata from there.  The possible thing is to have GUC,
which stores shared module and function names for WAL compression.
But that seems like quite different mechanism from the one present
in this patch.

I do not think

Re: [HACKERS] Custom compression methods

2018-04-23 Thread Ildus Kurbangaliev

On Sun, 22 Apr 2018 16:21:31 +0300
Alexander Korotkov  wrote:

> On Fri, Apr 20, 2018 at 7:45 PM, Konstantin Knizhnik <
> k.knizh...@postgrespro.ru> wrote:  
> 
> > On 30.03.2018 19:50, Ildus Kurbangaliev wrote:
> >  
> >> On Mon, 26 Mar 2018 20:38:25 +0300
> >> Ildus Kurbangaliev  wrote:
> >>
> >> Attached rebased version of the patch. Fixed conflicts in
> >> pg_class.h.  
> >>>
> >>> New rebased version due to conflicts in master. Also fixed few
> >>> errors  
> >> and removed cmdrop method since it couldnt be tested.
> >>
> >>  I seems to be useful (and not so difficult) to use custom
> >> compression  
> > methods also for WAL compression: replace direct calls of
> > pglz_compress in xloginsert.c  
> 
> 
> I'm going to object this at point, and I've following arguments for
> that:
> 
> 1) WAL compression is much more critical for durability than datatype
> compression.  Imagine, compression algorithm contains a bug which
> cause decompress method to issue a segfault.  In the case of datatype
> compression, that would cause crash on access to some value which
> causes segfault; but in the rest database will be working giving you
> a chance to localize the issue and investigate that.  In the case of
> WAL compression, recovery would cause a server crash.  That seems
> to be much more serious disaster.  You wouldn't be able to make
> your database up and running and the same happens on the standby.
> 
> 2) Idea of custom compression method is that some columns may
> have specific data distribution, which could be handled better with
> particular compression method and particular parameters.  In the
> WAL compression you're dealing with the whole WAL stream containing
> all the values from database cluster.  Moreover, if custom compression
> method are defined for columns, then in WAL stream you've values
> already compressed in the most efficient way.  However, it might
> appear that some compression method is better for WAL in general
> case (there are benchmarks showing our pglz is not very good in
> comparison to the alternatives).  But in this case I would prefer to
> just switch our WAL to different compression method one day.
> Thankfully we don't preserve WAL compatibility between major releases.
> 
> 3) This patch provides custom compression methods recorded in
> the catalog.  During recovery you don't have access to the system
> catalog, because it's not recovered yet, and can't fetch compression
> method metadata from there.  The possible thing is to have GUC,
> which stores shared module and function names for WAL compression.
> But that seems like quite different mechanism from the one present
> in this patch.
> 
> Taking into account all of above, I think we would give up with custom
> WAL compression method.  Or, at least, consider it unrelated to this
> patch.

I agree with these points. I also think this should be done in another
patch. It's not so hard to implement but would make sense if there will
be few more builtin compression methods suitable for wal compression.
Some static array could contain function pointers for direct calls.

-- 
---
Ildus Kurbangaliev
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Re: BGWORKER_BYPASS_ALLOWCONN used nowhere (infra part of on-line checksum switcher)

2018-04-23 Thread Michael Paquier

On Mon, Apr 23, 2018 at 10:31:43AM +0200, Magnus Hagander wrote:
> Yeah, that makes sense. Will add as well.

Thanks for the addition!
--
Michael


signature.asc
Description: PGP signature

Re: JIT flag definitions not parenthesized

2018-04-23 Thread Heikki Linnakangas


On 22/04/18 23:21, David Rowley wrote:

Looks like the JIT flags definitions are not properly parenthesized.
It might not matter too much for their current usage, but it might
save a bit of confusion later.

git grep "^#define .*<<.*[^)]$" indicates these are the only offenders.

Patch attached.


Fixed, thanks!

- Heikki

Re: Build fails with different versions of clang and LLVM

2018-04-23 Thread Pierre Ducroquet

On Monday, April 23, 2018 10:33:04 AM CEST Heikki Linnakangas wrote:
> Hi!
> 
> I tried compiling with --with-llvm on my laptop, and got this:
> > $ make -s
> > All of PostgreSQL successfully made. Ready to install.
> > $ make -s install
> > Invalid Summary Block: version expected
> > error: can't create ModuleSummaryIndexObjectFile for buffer: Corrupted
> > bitcode #0 0x7fbe98032085
> > llvm::sys::PrintStackTrace(llvm::raw_ostream&)
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x707085) #1
> > 0x7fbe9803023e llvm::sys::RunSignalHandlers()
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x70523e) #2
> > 0x7fbe98030362
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x705362) #3
> > 0x7fbe9771f160 __restore_rt
> > (/lib/x86_64-linux-gnu/libpthread.so.0+0x12160) #4 0x7fbe985cfba4
> > (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0xca4ba4) #5
> > 0x7fbe985e3318 llvm::WriteIndexToFile(llvm::ModuleSummaryIndex
> > const&, llvm::raw_ostream&, std::map > std::char_traits, std::allocator >, std::map > llvm::GlobalValueSummary*, std::less,
> > std::allocator > > >, std::less > std::allocator > >,
> > std::allocator > std::char_traits, std::allocator > const, std::map > long, llvm::GlobalValueSummary*, std::less,
> > std::allocator > > > > > >*) (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0xcb8318) #6
> > 0x55d1120f4ac7 (/usr/lib/llvm-3.9/bin/llvm-lto+0x19ac7)
> > #7 0x55d1120f5f95 (/usr/lib/llvm-3.9/bin/llvm-lto+0x1af95)
> > #8 0x55d1120ea0f0 _init (/usr/lib/llvm-3.9/bin/llvm-lto+0xf0f0)
> > #9 0x7fbe96a96f2a __libc_start_main
> > (/lib/x86_64-linux-gnu/libc.so.6+0x20f2a) #10 0x55d1120ebe8a _init
> > (/usr/lib/llvm-3.9/bin/llvm-lto+0x10e8a) Stack dump:
> > 0.  Program arguments: /usr/lib/llvm-3.9/bin/llvm-lto -thinlto
> > -thinlto-action=thinlink -o postgres.index.bc
> > postgres/access/brin/brin.bc postgres/access/brin/brin_pageops.bc [LOT OF
> > FILES REMOVED TO KEEP THIS SHORT] postgres/utils/fmgrtab.bc
> > postgres/jit/jit.bc Segmentation fault
> > Makefile:246: recipe for target 'install-postgres-bitcode' failed
> > make[2]: *** [install-postgres-bitcode] Error 139
> > Makefile:42: recipe for target 'install-backend-recurse' failed
> > make[1]: *** [install-backend-recurse] Error 2
> > GNUmakefile:11: recipe for target 'install-src-recurse' failed
> > make: *** [install-src-recurse] Error 2
> 
> I think this is because I have a funny combination of clang and LLVM:
> > $ clang --version
> > clang version 3.8.1-24 (tags/RELEASE_381/final)
> > Target: x86_64-pc-linux-gnu
> > Thread model: posix
> > InstalledDir: /usr/bin
> > $ llvm-config --version
> > bash: llvm-config: command not found
> > $ llvm-config-3.9 --version
> > 3.9.1
> 
> So, I have LLVM 3.9 installed, but no clang 3.9. Only clang 3.8.
> 
> The autoconf macro in llvm.m4 says:
> >   # Could check clang version, but it doesn't seem that
> >   # important. Systems with a new enough LLVM version are usually
> >   # going to have a decent clang version too. It's also not entirely
> >   # clear what the minimum version is.
> 
> In light of the above error, I think we should do more. Apparently LLVM
> 3.9 cannot read bitcode files generated by clang 3.8, so we should at
> least check that clang version is 3.9 or above.
> 
> Googling around, the LLVM bitcode format is supposed to be compatible
> across versions, but I'm not sure I trust that. Perhaps we should check
> that the LLVM and clang versions match? Instead of searching for any
> 'clang' program with PGAC_PATH_PROGS, perhaps we should always use
> "`lvm-config-3.9 --bindir`/clang", throwing an error if it doesn't exist.
> 
> BTW, I'm surprised that llvm-lto is invoked in the "make install" stage.
> I would expect it to be done by plain "make" already, and "make install"
> would just copy the files to the right places.

Hi

The bitcode format does not respect strict compatibility rules. LLVM and clang 
versions must match if we don't want any surprise.
See for instance
https://stackoverflow.com/questions/15836430/how-stable-is-the-llvm-assembly-language

In your case, this should have worked, but I am not surprised at all that this 
failed, I had similar issues when testing 3.9/4.0 on the same system. I 
thought the build system already checked for version equality.


 Pierre

Build fails with different versions of clang and LLVM

2018-04-23 Thread Heikki Linnakangas


Hi!

I tried compiling with --with-llvm on my laptop, and got this:

$ make -s 
All of PostgreSQL successfully made. Ready to install.

$ make -s install
Invalid Summary Block: version expected
error: can't create ModuleSummaryIndexObjectFile for buffer: Corrupted bitcode
#0 0x7fbe98032085 llvm::sys::PrintStackTrace(llvm::raw_ostream&) 
(/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x707085)
#1 0x7fbe9803023e llvm::sys::RunSignalHandlers() 
(/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x70523e)
#2 0x7fbe98030362 (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0x705362)
#3 0x7fbe9771f160 __restore_rt 
(/lib/x86_64-linux-gnu/libpthread.so.0+0x12160)
#4 0x7fbe985cfba4 (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0xca4ba4)
#5 0x7fbe985e3318 llvm::WriteIndexToFile(llvm::ModuleSummaryIndex const&, llvm::raw_ostream&, std::map, std::map, std::allocator >, 
std::less >, std::allocator const, std::map, std::allocator 
> > > >*) (/usr/lib/llvm-3.9/bin/../lib/libLLVM-3.9.so.1+0xcb8318)
#6 0x55d1120f4ac7 (/usr/lib/llvm-3.9/bin/llvm-lto+0x19ac7)
#7 0x55d1120f5f95 (/usr/lib/llvm-3.9/bin/llvm-lto+0x1af95)
#8 0x55d1120ea0f0 _init (/usr/lib/llvm-3.9/bin/llvm-lto+0xf0f0)
#9 0x7fbe96a96f2a __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x20f2a)
#10 0x55d1120ebe8a _init (/usr/lib/llvm-3.9/bin/llvm-lto+0x10e8a)
Stack dump:
0.	Program arguments: /usr/lib/llvm-3.9/bin/llvm-lto -thinlto -thinlto-action=thinlink -o postgres.index.bc postgres/access/brin/brin.bc postgres/access/brin/brin_pageops.bc [LOT OF FILES REMOVED TO KEEP THIS SHORT] postgres/utils/fmgrtab.bc postgres/jit/jit.bc 
Segmentation fault

Makefile:246: recipe for target 'install-postgres-bitcode' failed
make[2]: *** [install-postgres-bitcode] Error 139
Makefile:42: recipe for target 'install-backend-recurse' failed
make[1]: *** [install-backend-recurse] Error 2
GNUmakefile:11: recipe for target 'install-src-recurse' failed
make: *** [install-src-recurse] Error 2


I think this is because I have a funny combination of clang and LLVM:


$ clang --version
clang version 3.8.1-24 (tags/RELEASE_381/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ llvm-config --version
bash: llvm-config: command not found
$ llvm-config-3.9 --version
3.9.1


So, I have LLVM 3.9 installed, but no clang 3.9. Only clang 3.8.

The autoconf macro in llvm.m4 says:


  # Could check clang version, but it doesn't seem that
  # important. Systems with a new enough LLVM version are usually
  # going to have a decent clang version too. It's also not entirely
  # clear what the minimum version is.


In light of the above error, I think we should do more. Apparently LLVM 
3.9 cannot read bitcode files generated by clang 3.8, so we should at 
least check that clang version is 3.9 or above.


Googling around, the LLVM bitcode format is supposed to be compatible 
across versions, but I'm not sure I trust that. Perhaps we should check 
that the LLVM and clang versions match? Instead of searching for any 
'clang' program with PGAC_PATH_PROGS, perhaps we should always use 
"`lvm-config-3.9 --bindir`/clang", throwing an error if it doesn't exist.


BTW, I'm surprised that llvm-lto is invoked in the "make install" stage. 
I would expect it to be done by plain "make" already, and "make install" 
would just copy the files to the right places.


- Heikki

Re: BGWORKER_BYPASS_ALLOWCONN used nowhere (infra part of on-line checksum switcher)

2018-04-23 Thread Magnus Hagander

On Mon, Apr 23, 2018 at 9:25 AM, Michael Paquier 
wrote:

> On Sun, Apr 22, 2018 at 03:56:32PM +0200, Daniel Gustafsson wrote:
> > +1, I think this is worth keeping in for 11.
>
> Okay, that's fine for me.  Thanks for updating the documentation.
> Perhaps it would be nicer for plugin developers to add a comment in
> bgworker.h as well about what BGWORKER_BYPASS_ALLOWCONN allows to
> achieve?
>

Yeah, that makes sense. Will add as well.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Re: Toast issues with OldestXmin going backwards

2018-04-23 Thread Andrew Gierth

> "Amit" == Amit Kapila  writes:

 >> Your patch would actually be needed if (and only if) autovacuum was
 >> changed back to its old behavior of never vacuuming toast tables
 >> independently, and if manual VACUUM pg_toast.*; was disabled. But in
 >> the presence of either of those two possibilities, it does nothing
 >> useful.

 Amit> Yeah, right, I have missed the point that they can be vacuumed
 Amit> separately, however, I think that decision is somewhat
 Amit> questionable.

Some previous discussion links for reference, for the background to the
thread containing the patch:

https://www.postgresql.org/message-id/flat/87y7gpiqx3.fsf%40oxford.xeocode.com
https://www.postgresql.org/message-id/flat/20080608230348.GD11028%40alvh.no-ip.org

 Amit> I think it would have been better if along with decoupling of
 Amit> vacuum for main heap and toast tables, we would have come up with
 Amit> a way to selectively remove the corresponding rows from the main
 Amit> heap, say by just vacuuming heap pages/rows which have toast
 Amit> pointers but maybe that is not viable or involves much more work
 Amit> without equivalent benefit.

It should be fairly obvious why this is unworkable - most toast-using
tables will have toast pointers on every page, but without making a
whole new index of toast pointer OIDs (unacceptable overhead), it would
be impossible to find the toast pointers pointing to a specific item
without searching the whole rel (in which case we might just as well
have vacuumed it).

 Amit> Also, we can think along the lines of another idea suggested by
 Amit> Andres [2] on the thread mentioned by you.

That one is tricky for various reasons (locking, order of operations in
vacuum_rel, having to mess with the API of vacuum(), etc.)

-- 
Andrew (irc:RhodiumToad)

Re: [HACKERS] [BUGS] Bug in Physical Replication Slots (at least 9.5)?

2018-04-23 Thread Heikki Linnakangas


On 18/01/18 20:54, Kyotaro HORIGUCHI wrote:

At Thu, 18 Jan 2018 11:52:52 -0800, Andres Freund  wrote in 
<20180118195252.hyxgkb3krcqjz...@alap3.anarazel.de>

On 2018-01-18 20:58:27 +0900, Michael Paquier wrote:

b) The second patch that I would like to mention is doing things on the
standby side by being able to change a WAL source when fetching a single
record so as it is possible to fetch the beginning of a cross-segment
record from say an archive or the local xlog, and then request the rest
on the primary. This patch can be found in
https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp
and the diff in WaitForWALToBecomeAvailable() scares me a lot because,
it seems to me that this actually breaks timeline switch
consistency. The concept of switching a WAL source in the middle of a
WAL segment is risky anyway, because we perform sanity checks while
switching sources. The bug we are dealing about here is very rare, and
seeing a problem like that for a timeline switch is even more rare, but
the risk is not zero and we may finish with a corrupted instance, so we
should not in my opinion take this approach. Also this actually achieves
point 2) above, not completely though, but not 1).


I don't agree with the sentiment about the approach, but I agree there
might be weaknesses in the implementation (which I have not reviewed). I
think it's perfectly sensible to require fetching one segment from
pg_xlog and the next one via another method. Currently the inability to
do so often leads to us constantly refetching the same segment.


Though I'm still not fully confident, if reading a set of
continuation records from two (or more) sources can lead to
inconsistency, two successve (complete) records are facing the
same risk.


This seems like the best approach to me as well.

Looking at the patch linked above 
(https://www.postgresql.org/message-id/20171026.190551.208996945.horiguchi.kyotaro%40lab.ntt.co.jp):



--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11693,6 +11693,10 @@ retry:
Assert(reqLen <= readLen);
 
 	*readTLI = curFileTLI;

+
+   if (!XLogReaderValidatePageHeader(xlogreader, targetPagePtr, readBuf))
+   goto next_record_is_invalid;
+
return readLen;
 
 next_record_is_invalid:


What is the point of adding this XLogReaderValidatePageHeader() call? 
The caller of this callback function, ReadPageInternal(), checks the 
page header anyway. Earlier in this thread, you said:



The rest to do is let XLogPageRead retry other sources
immediately. To do this I made ValidXLogPageHeader(at)xlogreader(dot)c
public (and renamed to XLogReaderValidatePageHeader).


but I don't understand that.

- Heikki

Re: BGWORKER_BYPASS_ALLOWCONN used nowhere (infra part of on-line checksum switcher)

2018-04-23 Thread Michael Paquier

On Sun, Apr 22, 2018 at 03:56:32PM +0200, Daniel Gustafsson wrote:
> +1, I think this is worth keeping in for 11.

Okay, that's fine for me.  Thanks for updating the documentation.
Perhaps it would be nicer for plugin developers to add a comment in
bgworker.h as well about what BGWORKER_BYPASS_ALLOWCONN allows to
achieve?
--
Michael


signature.asc
Description: PGP signature

75 matches

Mail list logo