RE: Timeout parameters

2019-03-24 Thread Nagaura, Ryohei
Hi,

First, thank you for your insightful discussion.
I remade patches and attached in this mail.

> From: Tsunakawa, Takayuki
> OTOH, it may be better to commit the tcp_user_timeout patch when
> Nagaura-san has refined the documentation, and then continue
> socket_timeout.
Yes, I want to commit TCP_USER_TIMEOUT patches in PG12.
Also I'd like to continue to discuss about socket_timeout after this CF.
# Thank you for your answering.


About TCP_USER_TIMEOUT:
1) Documentation and checking on UNIX system
As far as I surveyed (solaris and BSD family), these UNIX OS don't have 
"TCP_USER_TIMEOUT" parameter.
Accordingly I have not checked on real machine.
Also, I modified documentations to remove "equivalent to socket option"


As for socket_timeout:
1) documentation
> From: Robert Haas 
> This can't be used to force a global query timeout, because any kind of 
> protocol
> message - e.g. a NOTICE or WARNING - would reset the timeout, allowing the
> continue past the supposedly-global query timeout.  There may be some other
> ways that can happen, too, either currently or in the future.
Understood. Indeed, doc was not correct.

Mikalai-san,
You used to advise me about via UNIX domain sockets, but it turned out to be 
able to work.
Therefore, the documentation doesn't have the sentence about it.

Mikalai-san and Tsunakawa-san,
I modified documentation based on your comments about socket_timeout > other 
parameters.

2) checking whether int type
I Implemented in the current patch. Please see line 73 and 85 in the patch.

3) setting inheritance by "\c" command.
Inheritance of optional parameters such as this parameter,
(but not basic connection parameters such as dbname,)
should be implemented as one feature of "\c".
It is because, seen from users,
it is strange that some parameters can be taken over while the others can't.


Sorry for my late reply again and again...

Best regards,
-
Ryohei Nagaura



socket_timeout_v9.patch
Description: socket_timeout_v9.patch


TCP_backend_v10.patch.patch
Description: TCP_backend_v10.patch.patch


TCP_interface_v10.patch
Description: TCP_interface_v10.patch


Re: Removing unneeded self joins

2019-03-24 Thread David Rowley
On Sat, 23 Mar 2019 at 04:13, David Rowley  wrote:
>
> On Sat, 23 Mar 2019 at 03:39, Alexander Kuzmenkov
> > This is a bug indeed. Unique index search is not exhaustive, so if many
> > indexes match the join quals, we might not find the same index for both
> > sides. I think this can be overcome if we switch to exhaustive search in
> > relation_has_unique_index_for, and then try to find matching indexes in
> > is_unique_self_join. Please see the new patch for the fix.
>
> I really don't think modifying relation_has_unique_index_for to
> collect details for up to 5 indexes is a good fix for this.  It looks
> like it's still possible to trigger this, just the example would need
> to be more complex. Also, you've likely just more than doubled the
> runtime of a successful match in relation_has_unique_index_for().
> Previously, on average we'd have found that match halfway through the
> list of indexes. Now it's most likely you'll end up looking at every
> index, even after a match has been found. That does not seem well
> aligned to keeping the CPU overhead for the patch low.
>
> What is wrong with just weeding out join quals that don't have the
> equivalent expression on either side before passing them to
> relation_has_unique_index_for()? That'll save you from getting false
> matches like I showed. To make that work it just seems mostly like
> you'd mostly just need to swap the order of operations in the patch,
> but you'd also likely end up needing to rip out all the UniqueRelInfo
> code, since I don't think that'll be needed any longer. Likely that
> means your entire patch would be limited to analyzejoins.c, although
> I'm unsure what of the eclass editing code should be moved into
> equivclass.c.

I've done some minimal modifications to your v12 patch to make it work
the way I described. I ended up splitting out the joinqual list into
two lists; one that contains just self join quals that match on either
side, and the other with the remaining quals.  We just pass the self
join matching quals to innerrel_is_unique() so we no longer can trick
it into matching with the wrong index.

I didn't modify remove_self_join_rel(). It could make use of the split
lists instead of checking what we've already checked in
split_selfjoin_quals(). i.e. selfjoinquals get a IS NOT NULL test
added to the basequals and the otherjoinquals have their varnos
changed and then applied to the basequals too.

There was a couple of regression test failures using your version of
the tests. One test just went back to what it was before you changed
the output and the other seems like a missed optimisation in your
version of the patch.

Namely, you didn't remove the self join in a case like:

explain select * from t1 inner join t1 t2 on t1.a=t2.a where t1.b = 1
and t2.b = 2;

but my version does. You had commented the test with:

-- If index conditions are different for each side, we won't select the same
-- row on both sides, so the join can't be removed.

but I don't quite understand why we can't remove the join in this
situation.  For this particular case no rows can match, so maybe the
plan should really be the same as what happens when you do:

# explain select * from t1 where b = 1 and b = 2;
   QUERY PLAN
-
 Result  (cost=0.15..8.17 rows=1 width=8)
   One-Time Filter: false
   ->  Index Scan using t1_b_key on t1  (cost=0.15..8.17 rows=1 width=8)
 Index Cond: (b = 1)
(4 rows)

For now, it still produces a plan like:

  QUERY PLAN
---
 Index Scan using t1_b_key on t1 t2  (cost=0.15..8.17 rows=1 width=16)
   Index Cond: ((b = 2) AND (b = 1))
   Filter: (a IS NOT NULL)
(3 rows)

My implementation of split_selfjoin_quals() still needs work. I only
wrote code to handle simple Vars.  There could be Unique indexes on
exprs too, which the current code will fail to detect. I don't think
the join quals can contain vars from higher levels, but I didn't go to
the trouble of checking that in detail. If it's possible then we'll
need to reject those.

I also finished off the renaming of remove_useless_left_joins().   I
didn't go into any detail on the actual removal code.

I've attached a complete patch and also a delta against v12.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v14-remove-unique-self-joins.patch
Description: Binary data


v14-remove-unique-self-joins_delta.patch
Description: Binary data


Re: Usage of epoch in txid_current

2019-03-24 Thread Thomas Munro
On Mon, Feb 4, 2019 at 8:41 PM Andres Freund  wrote:
> On 2018-09-19 13:58:36 +1200, Thomas Munro wrote:
>
> > +/*
> > + * Advance nextFullXid to the value after a given xid.  The epoch is 
> > inferred.
> > + * If lock_free_check is true, then the caller must be sure that it's safe 
> > to
> > + * read nextFullXid without holding XidGenLock (ie during recovery).
> > + */
> > +void
> > +AdvanceNextFullTransactionIdPastXid(TransactionId xid, bool 
> > lock_free_check)
> > +{
> > + TransactionId current_xid;
> > + uint32 epoch;
> > +
> > + if (lock_free_check &&
> > + !TransactionIdFollowsOrEquals(xid,
> > +   
> > XidFromFullTransactionId(ShmemVariableCache->nextFullXid)))
> > + return;
> > +
> > + LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
> > + current_xid = 
> > XidFromFullTransactionId(ShmemVariableCache->nextFullXid);
> > + if (TransactionIdFollowsOrEquals(xid, current_xid))
> > + {
> > + epoch = 
> > EpochFromFullTransactionId(ShmemVariableCache->nextFullXid);
> > + if (xid < current_xid)
> > + ++epoch; /* epoch wrapped */
> > + ShmemVariableCache->nextFullXid = 
> > MakeFullTransactionId(epoch, xid);
> > + FullTransactionIdAdvance(>nextFullXid);
> > + }
> > + LWLockRelease(XidGenLock);
> >  }
>
> Is this really a good idea? Seems like it's going to perpetuate the
> problematic epoch logic we had before?

We could get rid of this in future, if certain WAL records and 2PC
state start carrying full xids.  That would be much bigger surgery
than I wanted to do in this patch.  This is the only place that
converts 32 bit -> 64 bit with an inferred epoch component.  I have
explained why I think that's correct in new comments along with a new
assertion.

The theory is that the xids encountered in recovery and 2PC startup
can never be too far out of phase with the current nextFullXid.  In
contrast, the original epoch tracking from commit 35af5422 wasn't
bounded in the same sort of way.  Certainly no other code should be
allowed to do this sort of thing without very careful consideration of
how the epoch is bounded.  The patch deliberately provides no general
purpose make-me-a-FullTransactionId-by-guessing-the-epoch facility.

While reviewing this, I also realised that the "lock_free_check"
function argument was unnecessary.  The only place that was setting
that to false, namely RecordKnownAssignedTransactionIds(), might as
well just use the same behaviour.  I've now rewritten
AdvanceNextFullTransactionIdPastXid() completely in light of all that;
please review.

> >   printf(_("Latest checkpoint's full_page_writes: %s\n"),
> >  ControlFile.checkPointCopy.fullPageWrites ? _("on") : 
> > _("off"));
> >   printf(_("Latest checkpoint's NextXID:  %u:%u\n"),
> > -ControlFile.checkPointCopy.nextXidEpoch,
> > -ControlFile.checkPointCopy.nextXid);
> > +
> > EpochFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid),
> > +
> > XidFromFullTransactionId(ControlFile.checkPointCopy.nextFullXid));

> I still think it's a mistake to not display the full xid here, and
> rather perpetuate the epoch stuff. I'm ok with splitting that into a
> separate commit, but this ought to be fixed in the same release imo.

Ok.

> > +/*
> > + * A 64 bit value that contains an epoch and a TransactionId.  This is
> > + * wrapped in a struct to prevent implicit conversion to/from 
> > TransactionId.
> > + * Allowing such conversions seems likely to be error-prone.
> > + */
> > +typedef struct FullTransactionId
> > +{
> > + uint64  value;
> > +} FullTransactionId;
>
> Probably good to note here that not all values are valid normal xids.

Done.

> > +static inline FullTransactionId
> > +MakeFullTransactionId(uint32 epoch, TransactionId xid)
> > +{
> > + FullTransactionId result;
> > +
> > + result.value = ((uint64) epoch) << 32 | xid;
> > +
> > + return result;
> > +}
>
> Make sounds a bit like it's allocating...

Changed to FullTransactionIdFromEpochAndXid().

> > + dest->value++;
> > + if (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
> > + *dest = 
> > MakeFullTransactionId(EpochFromFullTransactionId(*dest),

> Hm, this seems pretty odd coding to me. Why not do something like
>
> dest->value++;
> while (XidFromFullTransactionId(*dest) < FirstNormalTransactionId)
> dest->value++;
>
> That'd a) be a bit more readable, b) possible to do in a lockfree way at
> a later point.

Done.

> > - TransactionId nextXid;  /* copy of 
> > ShmemVariableCache->nextXid */
> > + TransactionId nextXid;  /* xid from ShmemVariableCache->nextFullXid */
>
> Probably worthwhile to pgindent partially.

Done.

I also added FullTransactionId to typedefs.list, bumped
PG_CONTROL_VERSION and did some peformance 

Re: Error message inconsistency

2019-03-24 Thread Amit Kapila
On Sun, Mar 24, 2019 at 7:11 PM Greg Steiner  wrote:
>
> To me the error message that includes more detail is superior. Even though 
> you can get the detail from the logs, it seems like it would much more 
> convenient for it to be reported out via the error to allow 
> users/applications to identify the problem relation without fetching logs. I 
> understand if that's not worth breaking numerous tests, though.
>

Yeah, I think that is the main point.  There will be a quite some
churn in the regression test output, but OTOH, if it is for good of
users, then it might be worth.

> Personally, I think consistency here is important enough to warrant it.
>

Fair point.  Can such an error message change break any application?
I see some cases where users have check based on Error Code, but not
sure if there are people who have check based on error messages.

Anyone else having an opinion on this matter?  Basically, I would like
to hear if anybody thinks that this change can cause any sort of
problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Fix XML handling with DOCTYPE

2019-03-24 Thread Tom Lane
Chapman Flack  writes:
> On 03/24/19 21:04, Ryan Lambert wrote:
>> I am unable to get latest patches I found  [1] to apply cleanly to current
>> branches. It's possible I missed the latest patches so if I'm using the
>> wrong ones please let me know.  I tried against master, 11.2 stable and the
>> 11.2 tag with similar results.

> Tom pushed the content-with-DOCTYPE patch, it's now included in master,
> REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
> REL9_4_STABLE.

Right.  If you want to test (and please do!) you could grab the relevant
branch tip from our git repo, or download a nightly snapshot tarball from

https://www.postgresql.org/ftp/snapshot/

regards, tom lane



Re: Error message inconsistency

2019-03-24 Thread Amit Kapila
On Sun, Mar 24, 2019 at 11:53 PM Simon Riggs  wrote:
>
> On Sun, 24 Mar 2019 at 13:02, Amit Kapila  wrote:
>>
>> I think we are inconsistent for a similar message at a few other
>> places as well.  See, below two messages:
>>
>> column \"%s\" contains null values
>> column \"%s\" of table \"%s\" contains null values
>>
>> If we decide to change this case, then why not change another place
>> which has a similar symptom?
>
>
> Yes, lets do that.
>
> I'm passing on feedback, so if it applies in other cases, I'm happy to change 
> other common cases also for the benefit of users.
>
> Do you have a list of cases you'd like to see changed?
>

I think we can once scrutinize all the error messages with error codes
ERRCODE_NOT_NULL_VIOLATION and ERRCODE_CHECK_VIOLATION to see if
anything else need change.

>>
>> > > It causes breakage in multiple tests, which is easy to fix once/if we 
>> > > agree to change.
>> > >
>> >
>> > I totally agree with that change because I already get some negative 
>> > feedback from users about this message too.
>> >
>>
>> What kind of negative feedback did you get from users?  If I see in
>> the log file, the message is displayed as :
>>
>> 2019-03-24 18:12:49.331 IST [6348] ERROR:  null value in column "id"
>> violates not-null constraint
>> 2019-03-24 18:12:49.331 IST [6348] DETAIL:  Failing row contains (null).
>> 2019-03-24 18:12:49.331 IST [6348] STATEMENT:  insert into nn values (NULL);
>>
>> So, it is not difficult to identify the relation.
>
>
> The user is not shown the failing statement, and if they are, it might have 
> been generated for them.
>

I can imagine that in some cases where queries/statements are
generated for some application, they might be presented just with
errors that occurred while execution and now it will be difficult to
identify the relation for which that problem has occurred.

> Your example assumes the user has access to the log, that 
> log_min_error_statement is set appropriately and that the user can locate 
> their log entries to identify the table name. The log contains timed entries 
> but the user may not be aware of the time of the error accurately enough to 
> locate the correct statement amongst many others.
>
> If the statement is modified by triggers or rules, then you have no chance.
>
> e.g. add this to the above example:
>
> create or replace rule rr as on insert to nn2 do instead insert into nn 
> values (new.*);
>
>
> and its clear that the LOG of the statement, even if it is visible, is 
> misleading since the SQL refers to table nn, but the error is generated by 
> the insert into table nn2.
>

This example also indicates that it will be helpful for users to see
the relation name in the error message.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



RE: Protect syscache from bloating with negative cache entries

2019-03-24 Thread Ideriha, Takeshi

>From: Vladimir Sitnikov [mailto:sitnikov.vladi...@gmail.com]
>
>Robert> This email thread is really short on clear demonstrations that X
>Robert> or Y is useful.
>
>It is useful when the whole database does **not** crash, isn't it?
>
>Case A (==current PostgeSQL mode): syscache grows, then OOMkiller chimes in, 
>kills
>the database process, and it leads to the complete cluster failure (all other 
>PG
>processes terminate themselves).
>
>Case B (==limit syscache by 10MiB or whatever as Tsunakawa, Takayuki
>asks):  a single ill-behaved process works a bit slower and/or consumers more 
>CPU
>than the other ones. The whole DB is still alive.
>
>I'm quite sure "case B" is much better for the end users and for the database
>administrators.
>
>So, +1 to Tsunakawa, Takayuki, it would be so great if there was a way to 
>limit the
>memory consumption of a single process (e.g. syscache, workmem, etc, etc).
>
>Robert> However, memory usage is quite unpredictable.  It depends on how
>Robert> many backends are active
>
>The number of backends can be limited by ensuring a proper limits at 
>application
>connection pool level and/or pgbouncer and/or things like that.
>
>Robert>how many copies of work_mem and/or  maintenance_work_mem are in
>Robert>use
>
>There might be other patches to cap the total use of
>work_mem/maintenance_work_mem,
>
>Robert>I don't think we
>Robert> can say that just imposing a limit on the size of the system
>Robert>caches is  going to be enough to reliably prevent an out of
>Robert>memory condition
>
>The less possibilities there are for OOM the better. Quite often it is much 
>better to fail
>a single SQL rather than kill all the DB processes.

Yeah, I agree. This limit would be useful for such extreme situation. 

Regards,
Takeshi Ideriha


RE: Problem with default partition pruning

2019-03-24 Thread Yuzuko Hosoya
Hi,

> 
> Hi,
> 
> On 2019/03/23 2:36, Thibaut Madelaine wrote:
> > I tested your last patch and if I didn't mix up patches on the end of
> > a too long week, I get a problem when querying the sub-sub partition:
> >
> > test=# explain select * from test2_0_10 where id = 25;
> >  QUERY PLAN
> > 
> >  Seq Scan on test2_0_10  (cost=0.00..25.88 rows=6 width=36)
> >Filter: (id = 25)
> > (2 rows)
> 
> The problem here is not really related to partition pruning, but another 
> problem I recently sent an
> email about:
> 
> https://www.postgresql.org/message-id/9813f079-f16b-61c8-9ab7-4363cab28d80%40lab.ntt.co.jp
> 
> The problem in this case is that *constraint exclusion* is not working, 
> because partition constraint
> is not loaded by the planner.  Note that pruning is only used if a query 
> specifies the parent table,
> not a partition.

Thanks for the comments.

I saw that email.  Also, I checked that query Thibaut mentioned worked
correctly with Amit's patch discussed in that thread.


Best regards,
Yuzuko Hosoya





Re: Fix XML handling with DOCTYPE

2019-03-24 Thread Chapman Flack
On 03/24/19 21:04, Ryan Lambert wrote:
> I am unable to get latest patches I found  [1] to apply cleanly to current
> branches. It's possible I missed the latest patches so if I'm using the
> wrong ones please let me know.  I tried against master, 11.2 stable and the
> 11.2 tag with similar results.

Tom pushed the content-with-DOCTYPE patch, it's now included in master,
REL_11_STABLE, REL_10_STABLE, REL9_6_STABLE, REL9_5_STABLE, and
REL9_4_STABLE.

The only patch that's left to be reviewed and applied is the documentation
fix, latest in [1].

If you were interested in giving a review opinion on some XML documentation.

Regards,
-Chap


[1] https://www.postgresql.org/message-id/5c96dbb5.2080...@anastigmatix.net



RE: speeding up planning with partitions

2019-03-24 Thread Imai, Yoshikazu
On Fri, Mar 22, 2019 at 9:07 PM, Tom Lane wrote:
> BTW, it strikes me that we could take advantage of the fact that baserels
> must all appear before otherrels in the rel array, by having loops over
> that array stop early if they're only interested in baserels.  We could
> either save the index of the last baserel, or just extend the loop logic
> to fall out upon hitting an otherrel.
> Seems like this'd save some cycles when there are lots of partitions,
> although perhaps loops like that would be fast enough to not matter.

Actually, this speeds up planning time little when scanning a lot of otherrels 
like selecting thousands of partitions. I tested below.

* rt with 8192 partitions
* execute "select * from rt;" for 60 seconds.

[results]
HEAD: 4.19 TPS (4.06, 4.34, 4.17)
(v34 patches) + (looping over only baserels): 4.26 TPS (4.31, 4.28, 4.19)


Attached is the diff I used for this test.

--
Yoshikazu Imai



looping-over-last-baserel-idx.diff
Description: looping-over-last-baserel-idx.diff


Re: Fix XML handling with DOCTYPE

2019-03-24 Thread Ryan Lambert
I am unable to get latest patches I found  [1] to apply cleanly to current
branches. It's possible I missed the latest patches so if I'm using the
wrong ones please let me know.  I tried against master, 11.2 stable and the
11.2 tag with similar results.  It's quite possible it's user error on my
end, I am new to this process but didn't have issues with the previous
patches when I tested those a couple weeks ago.

[1] https://www.postgresql.org/message-id/5c8ecaa4.3090...@anastigmatix.net

Ryan Lambert


On Sat, Mar 23, 2019 at 5:07 PM Chapman Flack  wrote:

> On 03/23/19 18:22, Tom Lane wrote:
> >> Out of curiosity, what further processing would you expect libxml to do?
> >
> > Hm, I'd have thought it'd try to parse the arguments to some extent,
> > but maybe not.  Does everybody reimplement attribute parsing for
> > themselves when using PIs?
>
> Yeah, the content of a PI (whatever's after the target name) is left
> all to be defined by whatever XML-using application might care about
> that PI.
>
> It could have an attribute=value syntax inspired by XML elements, or
> some other form entirely, but there'd just better not be any ?> in it.
>
> Regards,
> -Chap
>


Re: [HACKERS] CLUSTER command progress monitor

2019-03-24 Thread Tatsuro Yamada

Hi Robert!

On 2019/03/23 3:31, Robert Haas wrote:

On Tue, Mar 19, 2019 at 2:47 PM Robert Haas  wrote:

how close you were getting to rewriting the entire heap.  This is the
one thing I found but did not fix; any chance you could make this
change and update the documentation to match?


Hi, is anybody working on this?


I sent this email using my personal email address: yamatattsu@gmail-.
I re-send it with the patch and my test result.

Thank you so much for reviewing the patch and sorry for the late reply.
Today, I realized that you sent the email for the patch because I took a
sick leave from work for a while. So, I created new patch based on your 
comments asap.
I hope it is acceptable to you. :)

Changes
  - Add new column *heap_tuples_written* in the view
  This column is updated when the phases are "seq scanning heap",
  "index scanning heap" or "writing new heap".
  - Fix document
  - Revised the patch on the current head: 
940311e4bb32a5fe99155052e41179c88b5d48af.

Please find attached files. :)


Regards,
Tatsuro Yamada



diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index ac2721c..26a6899 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -340,7 +340,15 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
   pg_stat_progress_vacuumpg_stat_progress_vacuum
   One row for each backend (including autovacuum worker processes) running
VACUUM, showing current progress.
-   See .
+   See .
+  
+ 
+
+ 
+  pg_stat_progress_clusterpg_stat_progress_cluster
+  One row for each backend running
+   CLUSTER or VACUUM FULL, showing current progress.
+   See .
   
  
 
@@ -3394,9 +3402,9 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
 
   
PostgreSQL has the ability to report the progress of
-   certain commands during command execution.  Currently, the only command
-   which supports progress reporting is VACUUM.  This may be
-   expanded in the future.
+   certain commands during command execution.  Currently, the only commands
+   which support progress reporting are VACUUM and
+   CLUSTER. This may be expanded in the future.
   
 
  
@@ -3408,9 +3416,11 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
one row for each backend (including autovacuum worker processes) that is
currently vacuuming.  The tables below describe the information
that will be reported and provide information about how to interpret it.
-   Progress reporting is not currently supported for VACUUM FULL
-   and backends running VACUUM FULL will not be listed in this
-   view.
+   Progress for VACUUM FULL commands is reported via
+   pg_stat_progress_cluster
+   because both VACUUM FULL and CLUSTER 
+   rewrite the table, while regular VACUUM only modifies it 
+   in place. See .
   
 
   
@@ -3588,6 +3598,183 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   
 
  
+
+ 
+  CLUSTER Progress Reporting
+
+  
+   Whenever CLUSTER or VACUUM FULL is
+   running, the pg_stat_progress_cluster view will
+   contain a row for each backend that is currently running either command. 
+   The tables below describe the information that will be reported and
+   provide information about how to interpret it.
+  
+
+  
+   pg_stat_progress_cluster View
+   
+
+
+  Column
+  Type
+  Description
+ 
+
+
+   
+
+ pid
+ integer
+ Process ID of backend.
+
+
+ datid
+ oid
+ OID of the database to which this backend is connected.
+
+
+ datname
+ name
+ Name of the database to which this backend is connected.
+
+
+ relid
+ oid
+ OID of the table being clustered.
+
+
+ command
+ text
+ 
+   The command that is running. Either CLUSTER or VACUUM FULL.
+ 
+
+
+ phase
+ text
+ 
+   Current processing phase. See .
+ 
+
+
+ cluster_index_relid
+ bigint
+ 
+   If the table is being scanned using an index, this is the OID of the
+   index being used; otherwise, it is zero.
+ 
+
+
+ heap_tuples_scanned
+ bigint
+ 
+   Number of heap tuples scanned.
+   This counter only advances when the phase is
+   seq scanning heap,
+   index scanning heap
+   or writing new heap.
+ 
+
+
+ heap_tuples_written
+ bigint
+ 
+   Number of heap tuples written.
+   This counter only advances when the phase is
+   seq scanning heap,
+   index scanning heap
+   or writing new heap.
+ 
+
+
+ heap_blks_total
+ bigint
+ 
+   Total number of heap blocks in the table.  This number is reported
+   as of the beginning of seq scanning heap.
+ 
+
+
+ heap_blks_scanned
+ bigint
+ 
+   Number of heap blocks scanned.  This counter only advances when the
+   phase is seq scanning heap.

Re: selecting from partitions and constraint exclusion

2019-03-24 Thread Amit Langote
On 2019/03/22 17:17, Amit Langote wrote:
> I'll add this to July fest to avoid forgetting about this.

I'd forgotten to do this, but done today. :)

Thanks,
Amit




Re: Problem with default partition pruning

2019-03-24 Thread Amit Langote
Hi,

On 2019/03/23 2:36, Thibaut Madelaine wrote:
> I tested your last patch and if I didn't mix up patches on the end of a
> too long week, I get a problem when querying the sub-sub partition:
> 
> test=# explain select * from test2_0_10 where id = 25;
>  QUERY PLAN
> 
>  Seq Scan on test2_0_10  (cost=0.00..25.88 rows=6 width=36)
>    Filter: (id = 25)
> (2 rows)

The problem here is not really related to partition pruning, but another
problem I recently sent an email about:

https://www.postgresql.org/message-id/9813f079-f16b-61c8-9ab7-4363cab28d80%40lab.ntt.co.jp

The problem in this case is that *constraint exclusion* is not working,
because partition constraint is not loaded by the planner.  Note that
pruning is only used if a query specifies the parent table, not a partition.

Thanks,
Amit




Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO


Hello Tom,


If this is done, some people with zipfian distribution that currently
work might be unhappy.



After giving it some thought, I think that this cannot be fully fixed for
12.


Just to clarify --- my complaint about "over engineering" referred to
the fact that a cache exists at all; fooling around with the overflow
behavior isn't really going to answer that.


Having some cache really makes sense for s < 1, AFAICS.


The bigger picture here is that a benchmarking tool that contains its
own performance surprises is not a nice thing to have.


Hmmm. It seems that it depends. Sometimes self inflected wounds are the 
soldier's responsability, sometimes they must be prevented.


It's not very hard to imagine somebody wasting a lot of time trying to 
explain weird results, only to find out that the cause is unstable 
performance of random_zipfian.  Or worse, people might draw totally 
incorrect conclusions if they fail to drill down enough to realize that 
there's a problem in pgbench rather than on the server side.


Yep, benchmarking is tougher than it looks: it is very easy to get it 
wrong without knowing, whatever tool you use.



Given the constraint of Jim Gray's approximated method for s in (0, 1),
which really does zipfian for the first two integers and then uses an
exponential approximation, the only approach is that the parameters must
be computed in a partial eval preparation phase before the bench code is
run. This means that only (mostly) constants would be allowed as
parameters when s is in (0, 1), but I think that this is acceptable
because anyway the method fundamentaly requires it.


Yeah, if we could store all the required harmonic numbers before the
test run timing starts, that would address the concern about stable
performance. But I have to wonder whether zipfian with s < 1 is useful
enough to justify so much code.


I do not know about that. I do not think that Jim Gray chose to invent an 
approximated method for s < 1 because he thought it was fun. I think he 
did it because his benchmark data required it. If you need it, you need 
it. If you do not need it, you do not care.



The attached other attached patch illustrate what I call poor performance
for stupid parameters (no point in doing zipfian on 2 integers…) :
...
Maybe the implementation could impose that s is at least 1.001 to avoid
the lower performance?


I was wondering about that too.  It seems like it'd be a wise idea to
further constrain s and/or n to ensure that the s > 1 code path doesn't do
anything too awful ...


Yep. The attached version enforces s >= 1.001, which avoids the worse cost
of iterating, according to my small tests.

unless someone comes up with a better implementation that has stable 
performance without such constraints.


Hmmm…

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 24833f46bc..d9c2b71c75 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1598,23 +1598,21 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
 
 
  
-  random_zipfian generates an approximated bounded Zipfian
-  distribution. For parameter in (0, 1), an
-  approximated algorithm is taken from
-  "Quickly Generating Billion-Record Synthetic Databases",
-  Jim Gray et al, SIGMOD 1994. For parameter
-  in (1, 1000), a rejection method is used, based on
+  random_zipfian generates an bounded Zipfian distribution,
+  for parameter in [1.001, 1000].
+  A rejection method is used, based on
   "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
   Springer 1986. The distribution is not defined when the parameter's
-  value is 1.0. The function's performance is poor for parameter values
-  close and above 1.0 and on a small range.
+  value is 1.0.
+  The implementation enforces
+  parameter >= 1.001
+  in order to avoid the method's poor drawing performance
+  for parameter values close and above 1.0 and on a small range.
  
  
   parameter defines how skewed the distribution
   is. The larger the parameter, the more
   frequently values closer to the beginning of the interval are drawn.
-  The closer to 0 parameter is,
-  the flatter (more uniform) the output distribution.
   The distribution is such that, assuming the range starts from 1,
   the ratio of the probability of drawing k
   versus drawing k+1 is
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4789ab92ee..81c64c7d71 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -137,7 +137,9 @@ static int	pthread_join(pthread_t th, void **thread_return);
 #define ZIPF_CACHE_SIZE	15		/* cache cells number */
 
 #define MIN_GAUSSIAN_PARAM		2.0 /* minimum parameter for gauss */
-#define MAX_ZIPFIAN_PARAM		1000	/* maximum parameter for zipfian */
+
+#define MIN_ZIPFIAN_PARAM		1.001	/* minimum parameter for 

Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Tom Lane
Fabien COELHO  writes:
 I remain of the opinion that we ought to simply rip out support for
 zipfian with s < 1.

>>> +1 to that

>> If this is done, some people with zipfian distribution that currently 
>> work might be unhappy.

> After giving it some thought, I think that this cannot be fully fixed for 
> 12.

Just to clarify --- my complaint about "over engineering" referred to
the fact that a cache exists at all; fooling around with the overflow
behavior isn't really going to answer that.

The bigger picture here is that a benchmarking tool that contains its
own performance surprises is not a nice thing to have.  It's not very
hard to imagine somebody wasting a lot of time trying to explain weird
results, only to find out that the cause is unstable performance of
random_zipfian.  Or worse, people might draw totally incorrect conclusions
if they fail to drill down enough to realize that there's a problem in
pgbench rather than on the server side.

> Given the constraint of Jim Gray's approximated method for s in (0, 1), 
> which really does zipfian for the first two integers and then uses an 
> exponential approximation, the only approach is that the parameters must 
> be computed in a partial eval preparation phase before the bench code is 
> run. This means that only (mostly) constants would be allowed as 
> parameters when s is in (0, 1), but I think that this is acceptable 
> because anyway the method fundamentaly requires it.

Yeah, if we could store all the required harmonic numbers before the
test run timing starts, that would address the concern about stable
performance.  But I have to wonder whether zipfian with s < 1 is useful
enough to justify so much code.

> The attached other attached patch illustrate what I call poor performance 
> for stupid parameters (no point in doing zipfian on 2 integers…) :
> ...
> Maybe the implementation could impose that s is at least 1.001 to avoid
> the lower performance?

I was wondering about that too.  It seems like it'd be a wise idea to
further constrain s and/or n to ensure that the s > 1 code path doesn't do
anything too awful ... unless someone comes up with a better implementation
that has stable performance without such constraints.

regards, tom lane



Re: Error message inconsistency

2019-03-24 Thread Simon Riggs
On Sun, 24 Mar 2019 at 13:02, Amit Kapila  wrote:

> On Sat, Mar 23, 2019 at 4:33 AM Fabrízio de Royes Mello
>  wrote:
> >
> > On Fri, Mar 22, 2019 at 2:25 PM Simon Riggs 
> wrote:
> > >
> > > As noted by a PostgreSQL user to me, error messages for NOT NULL
> constraints are inconsistent - they do not mention the relation name in the
> message, as all other variants of this message do. e.g.
> > >
> > > postgres=# create table nn (id integer not null);
> > > CREATE TABLE
> > > postgres=# insert into nn values (NULL);
> > > ERROR: null value in column "id" violates not-null constraint
> > > DETAIL: Failing row contains (null).
> > >
> > > postgres=# create table nn2 (id integer check (id is not null));
> > > CREATE TABLE
> > > postgres=# insert into nn2 values (NULL);
> > > ERROR: new row for relation "nn2" violates check constraint
> "nn2_id_check"
> > > DETAIL: Failing row contains (null).
> > >
> > > I propose the attached patch as a fix, changing the wording (of the
> first case) to
> > > ERROR: null value in column "id" for relation "nn" violates not-null
> constraint
> > >
>
> I think we are inconsistent for a similar message at a few other
> places as well.  See, below two messages:
>
> column \"%s\" contains null values
> column \"%s\" of table \"%s\" contains null values
>
> If we decide to change this case, then why not change another place
> which has a similar symptom?
>

Yes, lets do that.

I'm passing on feedback, so if it applies in other cases, I'm happy to
change other common cases also for the benefit of users.

Do you have a list of cases you'd like to see changed?


> > > It causes breakage in multiple tests, which is easy to fix once/if we
> agree to change.
> > >
> >
> > I totally agree with that change because I already get some negative
> feedback from users about this message too.
> >
>
> What kind of negative feedback did you get from users?  If I see in
> the log file, the message is displayed as :
>
> 2019-03-24 18:12:49.331 IST [6348] ERROR:  null value in column "id"
> violates not-null constraint
> 2019-03-24 18:12:49.331 IST [6348] DETAIL:  Failing row contains (null).
> 2019-03-24 18:12:49.331 IST [6348] STATEMENT:  insert into nn values
> (NULL);
>
> So, it is not difficult to identify the relation.
>

The user is not shown the failing statement, and if they are, it might have
been generated for them.

Your example assumes the user has access to the log, that
log_min_error_statement is set appropriately and that the user can locate
their log entries to identify the table name. The log contains timed
entries but the user may not be aware of the time of the error accurately
enough to locate the correct statement amongst many others.

If the statement is modified by triggers or rules, then you have no chance.

e.g. add this to the above example:

create or replace rule rr as on insert to nn2 do instead insert into nn
values (new.*);


and its clear that the LOG of the statement, even if it is visible, is
misleading since the SQL refers to table nn, but the error is generated by
the insert into table nn2.


-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: jsonpath

2019-03-24 Thread Alexander Korotkov
On Sun, Mar 24, 2019 at 7:45 PM Andres Freund  wrote:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper=2019-03-23%2013%3A01%3A28
>
> 2019-03-23 14:28:31.147 CET [18056:45] pg_regress/jsonpath LOG:  statement: 
> select '$.g ? (@.a == 1 || !(@.x >= 123 || @.a == 4) && @.b == 7)'::jsonpath;
> 2019-03-23 14:28:31.157 CET [18055:59] pg_regress/jsonb_jsonpath LOG:  
> statement: select jsonb_path_query('1', 'lax $.a');
> 2019-03-23 14:28:31.163 CET [9597:311] LOG:  server process (PID 18056) was 
> terminated by signal 11: Segmentation fault
> 2019-03-23 14:28:31.163 CET [9597:312] DETAIL:  Failed process was running: 
> select '$.g ? (@.a == 1 || !(@.x >= 123 || @.a == 4) && @.b == 7)'::jsonpath;
> 2019-03-23 14:28:31.164 CET [9597:313] LOG:  terminating any other
> active server processes
>
> Something's not quite right... Note this appears to be 32bit sparc.

Thank you for pointing.  Will investigate.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: warning to publication created and wal_level is not set to logical

2019-03-24 Thread Tom Lane
David Fetter  writes:
> On Thu, Mar 21, 2019 at 07:45:59PM -0300, Lucas Viecelli wrote:
>> I am sending a patch that when an PUBLICATION is created and the
>> wal_level is different from logical prints a WARNING in console/log:

> Is a WARNING sufficient?  Maybe I'm misunderstanding something
> important, but I think the attempt should fail with a HINT to set the
> wal_level ahead of time.

That would be a booby-trap for dump/restore and pg_upgrade, so I don't
think making CREATE PUBLICATION fail outright would be wise.

> Possibly in a separate patch, setting the wal_level to anything lower
> than logical when publications exist should also fail.

I do not believe this is practical either.  GUC manipulation cannot
look at the catalogs.

I agree that it'd be nice to be noisier about the problem, but I'm
not sure we can do more than bleat in the postmaster log from time
to time if a publication is active and wal_level is too low.
(And we'd better be careful about the log-spam aspect of that...)

regards, tom lane



Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO


Hello Tom & Tomas,


If the choice is between reporting the failure to the user, and
addressing the failure, surely the latter would be the default option?
Particularly if the user can't really address the issue easily
(recompiling psql is not very practical solution).


I remain of the opinion that we ought to simply rip out support for
zipfian with s < 1.


+1 to that


If this is done, some people with zipfian distribution that currently 
work might be unhappy.


After giving it some thought, I think that this cannot be fully fixed for 
12.


The attached patch removes the code for param in (0, 1), and slightly 
improve the documentation about the performance, if you want to proceed.


For s > 1, there is no such constraint, and it works fine, there is no 
reason to remove it.


Given the constraint of Jim Gray's approximated method for s in (0, 1), 
which really does zipfian for the first two integers and then uses an 
exponential approximation, the only approach is that the parameters must 
be computed in a partial eval preparation phase before the bench code is 
run. This means that only (mostly) constants would be allowed as 
parameters when s is in (0, 1), but I think that this is acceptable 
because anyway the method fundamentaly requires it. I think that it can be 
implemented reasonably well (meaning not too much code), but would 
requires a few round of reviews if someone implements it (for a reminder, 
I was only the reviewer on this one). An added benefit would be that the 
parameter cache could be shared between thread, which would be a good 
thing.


The attached other attached patch illustrate what I call poor performance 
for stupid parameters (no point in doing zipfian on 2 integers…) :


  ./pgbench -T 3 -D n=2 -D s=1.01 -f zipf_perf.sql   # 46981 tps
  ./pgbench -T 3 -D n=2 -D s=1.001 -f zipf_perf.sql   # 6187 tps
  ./pgbench -T 3 -D n=2 -D s=1.0001 -f zipf_perf.sql   # 710 tps

  ./pgbench -T 3 -D n=100 -D s=1.01 -f zipf_perf.sql  # 142910 tps
  ./pgbench -T 3 -D n=100 -D s=1.001 -f zipf_perf.sql  # 21214 tps
  ./pgbench -T 3 -D n=100 -D s=1.0001 -f zipf_perf.sql  # 2466 tps

  ./pgbench -T 3 -D n=100 -D s=1.01 -f zipf_perf.sql # 376453 tps
  ./pgbench -T 3 -D n=100 -D s=1.001 -f zipf_perf.sql # 57441 tps
  ./pgbench -T 3 -D n=100 -D s=1.0001 -f zipf_perf.sql # 6780 tps

Maybe the implementation could impose that s is at least 1.001 to avoid
the lower performance?

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 24833f46bc..6241b9f9c5 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1598,23 +1598,18 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
 
 
  
-  random_zipfian generates an approximated bounded Zipfian
-  distribution. For parameter in (0, 1), an
-  approximated algorithm is taken from
-  "Quickly Generating Billion-Record Synthetic Databases",
-  Jim Gray et al, SIGMOD 1994. For parameter
-  in (1, 1000), a rejection method is used, based on
+  random_zipfian generates an bounded Zipfian distribution,
+  for parameter in (1, 1000].
+  A rejection method is used, based on
   "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
   Springer 1986. The distribution is not defined when the parameter's
-  value is 1.0. The function's performance is poor for parameter values
-  close and above 1.0 and on a small range.
+  value is 1.0.  The function's drawing performance is poor for parameter
+  values close and above 1.0 (eg under 1.001) and on a small range (eg 10).
  
  
   parameter defines how skewed the distribution
   is. The larger the parameter, the more
   frequently values closer to the beginning of the interval are drawn.
-  The closer to 0 parameter is,
-  the flatter (more uniform) the output distribution.
   The distribution is such that, assuming the range starts from 1,
   the ratio of the probability of drawing k
   versus drawing k+1 is
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 4789ab92ee..e62a7d1bff 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -409,35 +409,6 @@ typedef struct
 	int			ecnt;			/* error count */
 } CState;
 
-/*
- * Cache cell for random_zipfian call
- */
-typedef struct
-{
-	/* cell keys */
-	double		s;/* s - parameter of random_zipfian function */
-	int64		n;/* number of elements in range (max - min + 1) */
-
-	double		harmonicn;		/* generalizedHarmonicNumber(n, s) */
-	double		alpha;
-	double		beta;
-	double		eta;
-
-	uint64		last_used;		/* last used logical time */
-} ZipfCell;
-
-/*
- * Zipf cache for zeta values
- */
-typedef struct
-{
-	uint64		current;		/* counter for LRU cache replacement algorithm */
-
-	int			nb_cells;		/* number of filled cells */
-	int			overflowCount;	/* number of cache overflows */
-	ZipfCell	cells[ZIPF_CACHE_SIZE];
-} 

Re: warning to publication created and wal_level is not set to logical

2019-03-24 Thread David Fetter
On Thu, Mar 21, 2019 at 07:45:59PM -0300, Lucas Viecelli wrote:
> Hi everyone,
> 
> A very common question among new users is how wal_level works and it
> levels. I heard about some situations like that, a user create a new
> publication in its master database and he/she simply does not change
> wal_level to logical, sometimes, this person lost maintenance
> window, or a chance to restart postgres service, usually a
> production database, and it will discover that wal_level is not
> right just in subscription creation.  Attempting to iterate between
> new (and even experienced) users with logical replication, I am
> sending a patch that when an PUBLICATION is created and the
> wal_level is different from logical prints a WARNING in console/log:

Is a WARNING sufficient?  Maybe I'm misunderstanding something
important, but I think the attempt should fail with a HINT to set the
wal_level ahead of time.

Possibly in a separate patch, setting the wal_level to anything lower
than logical when publications exist should also fail.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] CLUSTER command progress monitor

2019-03-24 Thread Tattsu Yama
Hi Robert! >On Tue, Mar 19, 2019 at 2:47 PM Robert Haas
 wrote: >> how close you were getting to
rewriting the entire heap. This is the >> one thing I found but did not
fix; any chance you could make this >> change and update the documentation
to match? > > >Hi, is anybody working on this? Thank you so much for
reviewing the patch and sorry for the late reply. Today, I realized that
you sent the email for the patch because I took a sick leave from work for
a while. So, I created new patch based on your comments asap. I hope it is
acceptable to you. :) Please find attached file. Changes - Add new column
*heap_tuples_written* in the view This column is updated when the phases
are "seq scanning heap", "index scanning heap" or "writing new heap". - Fix
document - Revised the patch on 280a408b48d5ee42969f981bceb9e9426c3a344c









Regards,



Tatsuro Yamada


progress_monitor_for_cluster_command_v12.patch
Description: Binary data


Re: GiST VACUUM

2019-03-24 Thread Andrey Borodin



> 22 марта 2019 г., в 17:03, Heikki Linnakangas  написал(а):
> 

I was working on new version of gist check in amcheck and understand one more 
thing:

/* Can this page be recycled yet? */
bool
gistPageRecyclable(Page page)
{
return PageIsNew(page) ||
(GistPageIsDeleted(page) &&
 TransactionIdPrecedes(GistPageGetDeleteXid(page), RecentGlobalXmin));
}

Here RecentGlobalXmin can wraparound and page will become unrecyclable for half 
of xid cycle. Can we prevent it by resetting PageDeleteXid to 
InvalidTransactionId before doing RecordFreeIndexPage()?
(Seems like same applies to GIN...)

Best regards, Andrey Borodin.


Re: jsonpath

2019-03-24 Thread Andres Freund
Hi,

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=snapper=2019-03-23%2013%3A01%3A28

2019-03-23 14:28:31.147 CET [18056:45] pg_regress/jsonpath LOG:  statement: 
select '$.g ? (@.a == 1 || !(@.x >= 123 || @.a == 4) && @.b == 7)'::jsonpath;
2019-03-23 14:28:31.157 CET [18055:59] pg_regress/jsonb_jsonpath LOG:  
statement: select jsonb_path_query('1', 'lax $.a');
2019-03-23 14:28:31.163 CET [9597:311] LOG:  server process (PID 18056) was 
terminated by signal 11: Segmentation fault
2019-03-23 14:28:31.163 CET [9597:312] DETAIL:  Failed process was running: 
select '$.g ? (@.a == 1 || !(@.x >= 123 || @.a == 4) && @.b == 7)'::jsonpath;
2019-03-23 14:28:31.164 CET [9597:313] LOG:  terminating any other
active server processes

Something's not quite right... Note this appears to be 32bit sparc.

- Andres



Re: chained transactions

2019-03-24 Thread Fabien COELHO



Hallo Peter,


In "xact.c", maybe I'd assign blockState in the else branch, instead of
overriding it?


I think it was better the way it is, since logically the block state is
first set, then set again after the new transaction starts.


Ok.


About the static _SPI_{commit,rollback} functions: I'm fine with these
functions, but I'm not sure about their name. Maybe
_SPI_chainable_{commit,rollback} would be is clearer about their content?


I kept it as is, to mirror the names of the SQL commands.


Hmmm. Function _SPI_commit does not implement just COMMIT, it implements 
both "COMMIT" and "COMMIT AND CHAIN"?


I'm fine with SPI_commit and SPI_commit_and_chain, and the rollback 
variants.


Maybe _SPI_commit_chainable? Well, you do as you please.

--
Fabie



Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO


Hello Tomas,


What would a user do with this information, and how would they know
what to do?


Sure, but it was unclear what to do. Extending the cache to avoid that 
would look like over-engineering.


That seems like a rather strange argument. What exactly is so complex on
resizing the cache to quality as over-engineering?


Because the cache size can diverge on a bad script, and the search is 
currently linear. Even with a log search it does not look attractive.



If the choice is between reporting the failure to the user, and
addressing the failure, surely the latter would be the default option?
Particularly if the user can't really address the issue easily
(recompiling psql is not very practical solution).


I remain of the opinion that we ought to simply rip out support for
zipfian with s < 1.


+1 to that


If this is done, some people with zipfian distribution that currently 
work might be unhappy.



Some people really want zipfian because it reflects their data access
pattern, possibly with s < 1.

We cannot helpt it: real life seems zipfianly distributed:-)


Sure. But that hardly means we ought to provide algorithms that we know
are not suitable for benchmarking tools, which I'd argue is this case.


Hmmm. It really depends on the values actually used.


Also, we have two algorithms for generating zipfian distributions. Why
wouldn't it be sufficient to keep just one of them?


Because the two methods work for different values of the parameter, so it 
depends on the distribution which is targetted. If you want s < 1, the 
exclusion method does not help (precisely, the real "infinite" zipfian 
distribution is not mathematical defined when s < 1 because the underlying 
sum diverges. Having s < 1 can only be done on a finite set).



It's not useful for benchmarking purposes to have a random-number
function with such poor computational properties.


This is mostly a startup cost, the generation cost when a bench is
running is reasonable. How to best implement the precomputation is an
open question.


... which means it's not a startup cost. IMHO this simply shows pgbench
does not have the necessary infrastructure to provide this feature.


Well, a quick one has been proposed with a cache. Now I can imagine 
alternatives, but I'm wondering how much it is worth it.


Eg, restraint random_zipfian to more or less constant parameters when s < 
1, so that a partial evaluation could collect the needed pairs and perform 
the pre-computations before the bench is started.


Now, that looks like over-engineering, and then someone would complain of 
the restrictions.



As a reviewer I was not thrilled by the cache stuff, but I had no better
idea that would not fall under "over-over engineering" or the like.

Maybe it could error out and say "recompile me", but then someone
would have said "that is unacceptable".

Maybe it could auto extend the cache, but that is still more unnecessary
over-engineering, IMHO.


I'm puzzled. Why would that be over-engineering?


As stated above, cache size divergence and O(n) search. Even a log2(n) 
search would be bad, IMO.



I think leaving it in there is just a foot-gun: we'd be a lot better
off throwing an error that tells people to use some other distribution.


When s < 1, the startup cost is indeed a pain. However, it is a pain
prescribed by a Turing Award.


Then we shouldn't have it, probably. Or we should at least implement a
proper startup phase, so that the costly precomputation is not included
in the test interval.


That can be done, but I'm not sure of an very elegant design. And I was 
just the reviewer on this one.



Or if we do leave it in there, we for sure have to have documentation
that *actually* explains how to use it, which this patch still doesn't.


I'm not sure what explaining there could be about how to use it: one
calls the function to obtain pseudo-random integers with the desired
distribution?


Well, I'd argue the current description "performance is poor" is not
particularly clear.


There's nothing suggesting that you'd better not use a large number of
different (n,s) combinations.


Indeed, there is no caveat about this point, as noted above.

Please find an updated patch for the documentation, pointing out the

existence of the cache and an advice not to overdo it.

It does not solve the underlying problem which raised your complaint,

but at least it is documented.



I think you forgot to attache the patch ...


Indeed, this is becoming a habbit:-( Attached. Hopefully.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index 9d18524834..7c961842fe 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -1605,7 +1605,8 @@ f(x) = PHI(2.0 * parameter * (x - mu) / (max - min + 1)) /
   "Non-Uniform Random Variate Generation", Luc Devroye, p. 550-551,
   Springer 1986. The distribution is not defined when the parameter's
   value is 1.0. The function's performance is 

Re: CPU costs of random_zipfian in pgbench

2019-03-24 Thread Fabien COELHO




What is the point of that, and if there is a point, why is it nowhere
mentioned in pgbench.sgml?


The attached patch simplifies the code by erroring on cache overflow,
instead of the LRU replacement strategy and unhelpful final report.
The above lines are removed.


Eh? Do I understand correctly that pgbench might start failing after
some random amount of time, instead of reporting the overflow at the
end?


Indeed, that what this patch would induce, although very probably under a 
*short* random amount of time.



I'm not sure that's really an improvement ...


Depends. If the cache is full it means repeating the possibly expensive 
constant computations, which looks like a very bad idea for the user 
anyway.



Why is the cache fixed-size at all?


The cache can diverge and the search is linear, so it does not seem a good 
idea to keep it for very long:


  \set size random(10, 100)
  \set i random_zipfian(1, :size, ...)

The implementation only makes some sense if there are very few values 
(param & size pairs with param < 1) used in the whole script.


Tom is complaining of over engineering, and he has a point, so I'm trying 
to simplify (eg dropping LRU and erroring) for cases where the feature is 
not really appropriate anyway.


--
Fabien.



Re: Error message inconsistency

2019-03-24 Thread Greg Steiner
To me the error message that includes more detail is superior. Even though
you can get the detail from the logs, it seems like it would much more
convenient for it to be reported out via the error to allow
users/applications to identify the problem relation without fetching logs.
I understand if that's not worth breaking numerous tests, though.
Personally, I think consistency here is important enough to warrant it.

On Sun, Mar 24, 2019, 9:02 AM Amit Kapila  wrote:

> On Sat, Mar 23, 2019 at 4:33 AM Fabrízio de Royes Mello
>  wrote:
> >
> > On Fri, Mar 22, 2019 at 2:25 PM Simon Riggs 
> wrote:
> > >
> > > As noted by a PostgreSQL user to me, error messages for NOT NULL
> constraints are inconsistent - they do not mention the relation name in the
> message, as all other variants of this message do. e.g.
> > >
> > > postgres=# create table nn (id integer not null);
> > > CREATE TABLE
> > > postgres=# insert into nn values (NULL);
> > > ERROR: null value in column "id" violates not-null constraint
> > > DETAIL: Failing row contains (null).
> > >
> > > postgres=# create table nn2 (id integer check (id is not null));
> > > CREATE TABLE
> > > postgres=# insert into nn2 values (NULL);
> > > ERROR: new row for relation "nn2" violates check constraint
> "nn2_id_check"
> > > DETAIL: Failing row contains (null).
> > >
> > > I propose the attached patch as a fix, changing the wording (of the
> first case) to
> > > ERROR: null value in column "id" for relation "nn" violates not-null
> constraint
> > >
>
> I think we are inconsistent for a similar message at a few other
> places as well.  See, below two messages:
>
> column \"%s\" contains null values
> column \"%s\" of table \"%s\" contains null values
>
> If we decide to change this case, then why not change another place
> which has a similar symptom?
>
> > > It causes breakage in multiple tests, which is easy to fix once/if we
> agree to change.
> > >
> >
> > I totally agree with that change because I already get some negative
> feedback from users about this message too.
> >
>
> What kind of negative feedback did you get from users?  If I see in
> the log file, the message is displayed as :
>
> 2019-03-24 18:12:49.331 IST [6348] ERROR:  null value in column "id"
> violates not-null constraint
> 2019-03-24 18:12:49.331 IST [6348] DETAIL:  Failing row contains (null).
> 2019-03-24 18:12:49.331 IST [6348] STATEMENT:  insert into nn values
> (NULL);
>
> So, it is not difficult to identify the relation.
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>
>


Re: Connections hang indefinitely while taking a gin index's LWLock buffer_content lock

2019-03-24 Thread Alexander Korotkov
On Sat, Mar 23, 2019 at 11:43 AM Alexander Korotkov
 wrote:
> On Fri, Mar 22, 2019 at 11:05 AM Alexander Korotkov
>  wrote:
> > On Fri, Mar 22, 2019 at 12:06 AM Alvaro Herrera
> >  wrote:
> > > On 2019-Mar-21, Alexander Korotkov wrote:
> > >
> > > > However, I think this still can be backpatched correctly.  We can
> > > > determine whether xlog record data contains deleteXid by its size.
> > > > See the attached patch.  I haven't test this yet.  I'm going to test
> > > > it.  If OK, then push.
> > >
> > > Wow, this is so magical that I think it merits a long comment explaining
> > > what is going on.
> >
> > Yeah, have to fo magic to fix such weird things :)
> > Please, find next revision of patch attached.  It uses static
> > assertion to check that data structure size has changed.  Also,
> > assertion checks that record data length match on of structures
> > length.
>
> I'm going to backpatch this on no objections.

So, pushed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Error message inconsistency

2019-03-24 Thread Amit Kapila
On Sat, Mar 23, 2019 at 4:33 AM Fabrízio de Royes Mello
 wrote:
>
> On Fri, Mar 22, 2019 at 2:25 PM Simon Riggs  wrote:
> >
> > As noted by a PostgreSQL user to me, error messages for NOT NULL 
> > constraints are inconsistent - they do not mention the relation name in the 
> > message, as all other variants of this message do. e.g.
> >
> > postgres=# create table nn (id integer not null);
> > CREATE TABLE
> > postgres=# insert into nn values (NULL);
> > ERROR: null value in column "id" violates not-null constraint
> > DETAIL: Failing row contains (null).
> >
> > postgres=# create table nn2 (id integer check (id is not null));
> > CREATE TABLE
> > postgres=# insert into nn2 values (NULL);
> > ERROR: new row for relation "nn2" violates check constraint "nn2_id_check"
> > DETAIL: Failing row contains (null).
> >
> > I propose the attached patch as a fix, changing the wording (of the first 
> > case) to
> > ERROR: null value in column "id" for relation "nn" violates not-null 
> > constraint
> >

I think we are inconsistent for a similar message at a few other
places as well.  See, below two messages:

column \"%s\" contains null values
column \"%s\" of table \"%s\" contains null values

If we decide to change this case, then why not change another place
which has a similar symptom?

> > It causes breakage in multiple tests, which is easy to fix once/if we agree 
> > to change.
> >
>
> I totally agree with that change because I already get some negative feedback 
> from users about this message too.
>

What kind of negative feedback did you get from users?  If I see in
the log file, the message is displayed as :

2019-03-24 18:12:49.331 IST [6348] ERROR:  null value in column "id"
violates not-null constraint
2019-03-24 18:12:49.331 IST [6348] DETAIL:  Failing row contains (null).
2019-03-24 18:12:49.331 IST [6348] STATEMENT:  insert into nn values (NULL);

So, it is not difficult to identify the relation.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Adding a TAP test checking data consistency on standby with minRecoveryPoint

2019-03-24 Thread Michael Paquier
On Sat, Mar 23, 2019 at 04:08:42PM -0700, Peter Geoghegan wrote:
> Seems like there might be a problem either caused by or detected by
> 016_min_consistency.pl on piculet:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=piculet=2019-03-23%2022%3A28%3A59

Interesting.  Based on what regress_log_016_min_consistency tells,
the test attempts to stop the standby in fast mode but it fails
because of a timeout:
### Stopping node "standby" using mode fast
[...]
pg_ctl: server does not shut down
Bail out!  system pg_ctl failed

There is only one place in the tests where that happens, and before
attempting to stop the standby we issue a checkpoint on it with its
primary killed:
# Issue a restart point on the standby now, which makes the checkpointer
# update minRecoveryPoint.
$standby->safe_psql('postgres', 'CHECKPOINT;');
[...]
$primary->stop('immediate');
$standby->stop('fast');

The failure is a bit weird, as I would expect all those three actions
to be sequential.  piculet is the only failure happening on the
buildfarm and it uses --disable-atomics, so I am wondering if that is
related and if 0dfe3d0 is part of that.  With a primary/standby set,
it could be possible to test that scenario pretty easily.  I'll give
it a shot.
--
Michael


signature.asc
Description: PGP signature


Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-03-24 Thread Michael Paquier
On Sun, Mar 24, 2019 at 09:16:44PM +0900, Michael Paquier wrote:
> After testing and reviewing the patch, I noticed that all versions
> sent up to now missed two things done by logfile_open():
> - Bufferring is line-buffered.  For current_logfiles it may not matter
> much as the contents are first written into a temporary file and then
> the file is renamed, but for debugging having the insurance of
> consistent contents is nice even for the temporary file.
> - current_logfiles uses \r\n.  While it does not have a consequence
> for the parsing of the file by pg_current_logfile, it breaks the
> readability of the file on Windows, which is not nice.
> So I have kept the patch with the previous defaults for consistency.
> Perhaps they could be changed, but the current set is a good set.

By the way, this also fixes a cosmetic issue with a failure in
creating current_logfiles: when update_metainfo_datafile() fails to
create the file, it logs a LOG message, but logfile_open() does the
same thing, so this finishes with two log entries for the same
failure.  v10 still has that issue, I don't think that it is worth
fixing as it has no actual consequence except perhaps bringing some
confusion.
--
Michael


signature.asc
Description: PGP signature


Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-03-24 Thread Michael Paquier
On Fri, Mar 22, 2019 at 01:01:44PM +0900, Michael Paquier wrote:
> On Fri, Mar 22, 2019 at 02:35:41PM +1100, Haribabu Kommi wrote:
> > Thanks for the correction.  Yes, that is correct and it works fine.
> 
> Thanks for double-checking.  Are there any objections with this patch?

Done and committed down to v11 where group access has been added.
There could be an argument to do the same in v10, but as the root of
the problem is the interaction between a data folder using 0640 as
base mode for files and log_file_mode being more restrictive, then it
cannot apply to v10.

After testing and reviewing the patch, I noticed that all versions
sent up to now missed two things done by logfile_open():
- Bufferring is line-buffered.  For current_logfiles it may not matter
much as the contents are first written into a temporary file and then
the file is renamed, but for debugging having the insurance of
consistent contents is nice even for the temporary file.
- current_logfiles uses \r\n.  While it does not have a consequence
for the parsing of the file by pg_current_logfile, it breaks the
readability of the file on Windows, which is not nice.
So I have kept the patch with the previous defaults for consistency.
Perhaps they could be changed, but the current set is a good set.
--
Michael


signature.asc
Description: PGP signature


Re: pg_basebackup ignores the existing data directory permissions

2019-03-24 Thread Haribabu Kommi
On Sat, Mar 23, 2019 at 2:23 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-22 05:00, Michael Paquier wrote:
> > On Fri, Mar 22, 2019 at 02:45:24PM +1100, Haribabu Kommi wrote:
> >> How about letting the pg_basebackup to decide group permissions of the
> >> standby directory irrespective of the primary directory permissions.
> >>
> >> Default - permissions are same as primary
> >> --allow-group-access - standby directory have group access permissions
> >> --no-group--access - standby directory doesn't have group permissions
> >>
> >> The last two options behave irrespective of the primary directory
> >> permissions.
> >
> > Yes, I'd imagine that we would want to be able to define three
> > different behaviors, by either having a set of options, or a sinple
> > option with a switch, say --group-access:
> > - "inherit" causes the permissions to be inherited from the source
> > node, and that's the default.
> > - "none" enforces the default 0700/0600.
> > - "group" enforces group read access.
>
> Yes, we could use those three behaviors.
>

Thanks for all your opinions, here I attached an updated patch as discussed.

New option -g --group-mode is added to pg_basebackup to specify the
group access permissions.

inherit - same permissions as source instance (default)
none - No group permissions irrespective of source instance
group - group permissions irrespective of source instance

With the above additional options, the pg_basebackup is able to control
the access permissions of the backup files, but when it comes to tar mode
all the files are sent from the server and stored as it is in backup, to
support
tar mode group access mode control, the BASE BACKUP protocol is
enhanced with new option GROUP_MODE 'none' or GROUP_MODE 'group'
to control the file permissions before they are sent to backup. Sending
GROUP_MODE to the server depends on the -g option received to the
pg_basebackup utility.

comments?

Regards,
Haribabu Kommi
Fujitsu Australia


0001-New-pg_basebackup-g-option-to-control-the-group-acce.patch
Description: Binary data


Re: BUG #15708: RLS 'using' running as wrong user when called from a view

2019-03-24 Thread Dean Rasheed
On Thu, 21 Mar 2019 at 00:39, PG Bug reporting form
 wrote:
>
> This fails, seemingly because the RLS on 'bar' is being checked by alice,
> instead of the view owner bob:
>

Yes I agree, that appears to be a bug. The subquery in the RLS policy
should be checked as the view owner -- i.e., we need to propagate the
checkAsUser for the RTE with RLS to any subqueries in its RLS
policies.

It looks like the best place to fix it is in
get_policies_for_relation(), since that's where all the policies to be
applied for a given RTE are pulled together. Patch attached.

Regards,
Dean


rls-perm-check-fix.patch
Description: Binary data


Re: chained transactions

2019-03-24 Thread Peter Eisentraut
Patch has been committed, thanks.

On 2019-03-18 21:20, Fabien COELHO wrote:
> Minor remarks:
> 
> In "xact.c", maybe I'd assign blockState in the else branch, instead of 
> overriding it?

I think it was better the way it is, since logically the block state is
first set, then set again after the new transaction starts.

> About the static _SPI_{commit,rollback} functions: I'm fine with these 
> functions, but I'm not sure about their name. Maybe 
> _SPI_chainable_{commit,rollback} would be is clearer about their content?

I kept it as is, to mirror the names of the SQL commands.

> Doc looks clear to me. ISTM "chain" should be added as an index term?

Added, good idea.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Assert failure when validating foreign keys

2019-03-24 Thread David Rowley
This results in an Assert failure on master and an elog ERROR prior to
c2fe139c201:

create role test_role with login;
create table ref(a int primary key);
grant references on ref to test_role;
set role test_role;
create table t1(a int, b int);
insert into t1 values(1,1);
alter table t1 add constraint t1_b_key foreign key (b) references ref(a);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.

Fails in heapam_tuple_satisfies_snapshot() at
Assert(BufferIsValid(bslot->buffer));

c2fe139c201~1:
ERROR:  expected buffer tuple

The test case is just a variation of the case in [1], but a different
bug, so reporting it on a different thread.

I've not looked into the cause or when it started happening.

[1] 
https://www.postgresql.org/message-id/CAK%3D1%3DWrnNmBbe5D9sm3t0a6dnAq3cdbF1vXY816j1wsMqzC8bw%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Avoid full GIN index scan when possible

2019-03-24 Thread Julien Rouhaud
Hi,

Marc (in Cc) reported me a problematic query using a GIN index hit in
production.  The issue is that even if an GIN opclass says that the
index can be used for an operator, it's still possible that some
values aren't really compatible and requires a full index scan.

One simple example is with a GIN pg_trgm index (but other opclasses
have similar restrictions) , doing a LIKE with wildcard on both side,
where the pattern is shorter than a trigram, e.g. col LIKE '%a%'.  So,
a where clause of the form:

WHERE col LIKE '%verylongpattern%' AND col LIKE '%a%'

is much more expensive than

WHERE col LKE '%verylongpattern%'

While there's nothing to do if the unhandled const is the only
predicate, if there are multiple AND-ed predicates and at least one of
them doesn't require a full index scan, we can avoid it.

Attached patch tries to fix the issue by detecting such cases and
dropping the unhandled quals in the BitmapIndexScan, letting the
recheck in BitmapHeapScan do the proper filtering.  I'm not happy to
call the extractQuery support functions an additional time, but i
didn't find a cleaner way.  This is of course intended for pg13.
diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index b3e709f496..6d96fca65f 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -3498,6 +3498,37 @@ select count(*) from test_trgm where t ~ '[qwerty]{2}-?[qwerty]{2}';
   1000
 (1 row)
 
+explain (costs off)
+  select * from test_trgm where t like '%0%' and t like '%azerty%';
+ QUERY PLAN  
+-
+ Bitmap Heap Scan on test_trgm
+   Recheck Cond: (t ~~ '%azerty%'::text)
+   Filter: (t ~~ '%0%'::text)
+   ->  Bitmap Index Scan on trgm_idx
+ Index Cond: (t ~~ '%azerty%'::text)
+(5 rows)
+
+select * from test_trgm where t like '%0%' and t like '%azerty%';
+ t 
+---
+(0 rows)
+
+explain (costs off)
+  select * from test_trgm where t like '%0%' and t like '%az%';
+QUERY PLAN
+--
+ Bitmap Heap Scan on test_trgm
+   Recheck Cond: ((t ~~ '%0%'::text) AND (t ~~ '%az%'::text))
+   ->  Bitmap Index Scan on trgm_idx
+ Index Cond: ((t ~~ '%0%'::text) AND (t ~~ '%az%'::text))
+(4 rows)
+
+select * from test_trgm where t like '%0%' and t like '%az%';
+ t 
+---
+(0 rows)
+
 create table test2(t text COLLATE "C");
 insert into test2 values ('abcdef');
 insert into test2 values ('quark');
diff --git a/contrib/pg_trgm/sql/pg_trgm.sql b/contrib/pg_trgm/sql/pg_trgm.sql
index 08459e64c3..9cdb6fda14 100644
--- a/contrib/pg_trgm/sql/pg_trgm.sql
+++ b/contrib/pg_trgm/sql/pg_trgm.sql
@@ -55,6 +55,13 @@ select t,similarity(t,'gwertyu0988') as sml from test_trgm where t % 'gwertyu098
 select t,similarity(t,'gwertyu1988') as sml from test_trgm where t % 'gwertyu1988' order by sml desc, t;
 select count(*) from test_trgm where t ~ '[qwerty]{2}-?[qwerty]{2}';
 
+explain (costs off)
+  select * from test_trgm where t like '%0%' and t like '%azerty%';
+select * from test_trgm where t like '%0%' and t like '%azerty%';
+explain (costs off)
+  select * from test_trgm where t like '%0%' and t like '%az%';
+select * from test_trgm where t like '%0%' and t like '%az%';
+
 create table test2(t text COLLATE "C");
 insert into test2 values ('abcdef');
 insert into test2 values ('quark');
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219dbd..19e68980c3 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -33,6 +33,7 @@
 #include "optimizer/prep.h"
 #include "optimizer/restrictinfo.h"
 #include "utils/lsyscache.h"
+#include "utils/index_selfuncs.h"
 #include "utils/selfuncs.h"
 
 
@@ -973,6 +974,24 @@ build_index_paths(PlannerInfo *root, RelOptInfo *rel,
 			return NIL;
 	}
 
+	/*
+	 * GIN access method can require a full index scan.  However, if there are
+	 * multiple AND-ed quals and at least one of them doesn't require a full
+	 * index scan, we can avoid the full scan by dropping all the quals
+	 * requiring it and let the recheck do the proper filtering.
+	 */
+	if (index_clauses != NIL && list_length(index_clauses) > 1 &&
+			index->relam == GIN_AM_OID)
+	{
+		Relids old_outer_relids = bms_copy(outer_relids);
+
+		bms_free(outer_relids);
+		outer_relids = bms_copy(rel->lateral_relids);
+
+		index_clauses = gin_get_optimizable_quals(root, index, index_clauses,
+_relids, old_outer_relids);
+	}
+
 	/* We do not want the index's rel itself listed in outer_relids */
 	outer_relids = bms_del_member(outer_relids, rel->relid);
 	/* Enforce convention that outer_relids is exactly NULL if empty */
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 12d30d7d63..dbab64a329 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ 

Re: Planning counters in pg_stat_statements (using pgss_store)

2019-03-24 Thread Julien Rouhaud
On Sat, Mar 23, 2019 at 11:08 PM legrand legrand
 wrote:
>
> > This patch has multiple trailing whitespace, indent and coding style
> > issues.  You should consider running pg_indent before submitting a
> > patch.  I attach the diff after running pgindent if you want more
> > details about the various issues.
>
> fixed

There are still trailing whitespaces and comments wider than 80
characters in the C code that should be fixed.

> > +   pgss_store("",
> > +  parse->queryId,  /* signal that it's a
> > utility stmt */
> > +  -1,
>
> > the comment makes no sense, and also you can't pass an empty query
> > string / unknown len.  There's no guarantee that the entry for the
> > given queryId won't have been evicted, and in this case you'll create
> > a new and unrelated entry.
>
> Fixed, comment was wrong
> Query text is not available in pgss_planner_hook
> that's why pgss_store execution is forced in pgss_post_parse_analyze
> (to initialize pgss entry with its query text).
>
> There is a very small risk that query has been evicted between
> pgss_post_parse_analyze and pgss_planner_hook.
>
> > @@ -832,13 +931,13 @@ pgss_post_parse_analyze(ParseState *pstate, Query 
> > *query)
> >  * the normalized string would be the same as the query text anyway, so
> >  * there's no need for an early entry.
> >  */
> > -   if (jstate.clocations_count > 0)
> > pgss_store(pstate->p_sourcetext,
>
> > Why did you remove this?  pgss_store() isn't free, and asking to
> > generate a normalized query for a query that doesn't have any constant
> > or storing the entry early won't do anything useful AFAICT.  Though if
> > that's useful, you definitely can't remove the test without adapting
> > the comment and the indentation.
>
> See explanation in previous answer (comments have been updated accordingly)

The alternative being to expose query text to the planner, which could
fix this (unlikely) issue and could also sometimes save a pgss_store()
call.  I did a quick check and at least AQO and pg_hint_plan
extensions have some hacks to be able to access the query text from
the planner, so there are at least multiple needs for that.  Perhaps
it's time to do it?

> > there are 4 tests to check if planning_time is zero or not, it's quite
> > messy.  Could you refactor the code to avoid so many tests?  It would
> > probably be useful to add some asserts to check that we don't provide
> > both planning_time == 0 and execution related values.  The function's
> > comment would also need to be adapted to mention the new rationale
> > with planning_time.
>
> Fixed

+   /* updating counters for execute OR planning */
+   Assert(planning_time > 0 && total_time > 0);
+   if (planning_time == 0)

This is obviously incorrect.  The general sanity check for exclusion
between planning_time and total_time should be at the beginning of
pgss_store.  Maybe some others asserts are needed to verify that
planning_time  cannot be provided along jstate or other conditions.



Re: Ordered Partitioned Table Scans

2019-03-24 Thread David Rowley
On Sat, 23 Mar 2019 at 19:42, Tom Lane  wrote:
>
> David Rowley  writes:
> > On Sat, 23 Mar 2019 at 05:40, Tom Lane  wrote:
> >> BTW, another thing we could possibly do to answer this objection is to
> >> give the ordered-Append node an artificially pessimistic startup cost,
> >> such as the sum or the max of its children's startup costs.  That's
> >> pretty ugly and unprincipled, but maybe it's better than not having the
> >> ability to generate the plan shape at all?
>
> > I admit to having thought of that while trying to get to sleep last
> > night, but I was too scared to even suggest it.  It's pretty much how
> > MergeAppend would cost it anyway.  I agree it's not pretty to lie
> > about the startup cost, but it does kinda seem silly to fall back on a
> > more expensive MergeAppend when we know fine well Append is cheaper.
>
> Yeah.  I'm starting to think that this might actually be the way to go,

Here's a version with it done that way.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


mergeappend_to_append_conversion_v13.patch
Description: Binary data


Re: Ordered Partitioned Table Scans

2019-03-24 Thread David Rowley
On Sun, 24 Mar 2019 at 05:16, Julien Rouhaud  wrote:
> ISTM that a query like
> SELECT * FROM nested ORDER BY 1, 2;
> could simply append all the partitions in the right order (or generate
> a tree of ordered appends), but:
>
> QUERY PLAN
> ---
>  Append
>->  Merge Append
>  Sort Key: nested_1_1.id1, nested_1_1.id2
>  ->  Index Scan using nested_1_1_id1_id2_idx on nested_1_1
>  ->  Index Scan using nested_1_2_id1_id2_idx on nested_1_2
>  ->  Index Scan using nested_1_3_id1_id2_idx on nested_1_3
>->  Merge Append
>  Sort Key: nested_2_1.id1, nested_2_1.id2
>  ->  Index Scan using nested_2_1_id1_id2_idx on nested_2_1
>  ->  Index Scan using nested_2_2_id1_id2_idx on nested_2_2
>  ->  Index Scan using nested_2_3_id1_id2_idx on nested_2_3
> (11 rows)
>
>
> Also, a query like
> SELECT * FROM nested_1 ORDER BY 1, 2;
> could generate an append path, since the first column is guaranteed to
> be identical in all partitions, but instead:
>
>  QUERY PLAN
> -
>  Merge Append
>Sort Key: nested_1_1.id1, nested_1_1.id2
>->  Index Scan using nested_1_1_id1_id2_idx on nested_1_1
>->  Index Scan using nested_1_2_id1_id2_idx on nested_1_2
>->  Index Scan using nested_1_3_id1_id2_idx on nested_1_3
> (5 rows)
>
> and of course
>
> # EXPLAIN (costs off) SELECT * FROM nested_1 ORDER BY 2;
>  QUERY PLAN
> 
>  Sort
>Sort Key: nested_1_1.id2
>->  Append
>  ->  Seq Scan on nested_1_1
>  ->  Seq Scan on nested_1_2
>  ->  Seq Scan on nested_1_3
> (6 rows)

I think both these cases could be handled, but I think the way it
would likely have to be done would be to run the partition constraints
through equivalence class processing. Likely doing that would need
some new field in EquivalenceClass that indicated that the eclass did
not need to be applied to the partition.  If it was done that way then
pathkey_is_redundant() would be true for the id1 column's pathkey in
the sub-partitioned tables. The last plan you show above could also
use an index scan too since build_index_pathkeys() would also find the
pathkey redundant.  Doing this would also cause a query like: select *
from nested_1_1 where id2=1; would not apply "id2 = 1" as a base qual
to the partition. That's good for 2 reasons.  1) No wasted effort
filtering rows that always match; and 2) A Seq scan can be used
instead of the planner possibly thinking that an index scan might be
useful to filter rows. Stats might tell the planner that anyway...
but...

I suggested some changes to equivalence classes a few years ago in [1]
and I failed to get that idea floating.  In ways, this is similar as
it requires having equivalence classes that are not used in all cases.
I think to get something working a week before code cutoff is a step
too far for this, but certainly, it would be interesting to look into
fixing it in a later release.

[1] 
https://www.postgresql.org/message-id/flat/CAKJS1f9FK_X_5HKcPcSeimy16Owe3EmPmmGsGWLcKkj_rW9s6A%40mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] proposal: schema variables

2019-03-24 Thread Pavel Stehule
ne 24. 3. 2019 v 10:25 odesílatel Erik Rijkers  napsal:

> On 2019-03-24 06:57, Pavel Stehule wrote:
> > Hi
> >
> > rebase against current master
> >
>
> I ran into this:
>
> (schema 'varschema2' does not exist):
>
> drop variable varschema2.testv cascade;
> ERROR:  schema "varschema2" does not exist
> create variable if not exists testv as text;
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> connection to server was lost
>
>
> (both statements are needed to force the crash)
>

I cannot to reproduce it.

please, try compilation with "make distclean"; configure ..

or if the problem persists, please send test case, or backtrace

Regards

Pavel

>
>
> thanks,
>
> Erik Rijkers
>
>
>


Re: [HACKERS] proposal: schema variables

2019-03-24 Thread Erik Rijkers

On 2019-03-24 06:57, Pavel Stehule wrote:

Hi

rebase against current master



I ran into this:

(schema 'varschema2' does not exist):

drop variable varschema2.testv cascade;
ERROR:  schema "varschema2" does not exist
create variable if not exists testv as text;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost


(both statements are needed to force the crash)


thanks,

Erik Rijkers





Re: [GSoC] application ideas

2019-03-24 Thread Andrey Borodin
Hi, Michael!

> 19 марта 2019 г., в 14:53, pantilimonov misha  
> написал(а):
> 
> 2) Changing buffer manager strategy.
>Somewhere in 2016 Andres Freund made a presention[6] of possible 
> improvements
>that can be done in buffer manager. I find the idea of changing hashtable 
> to
>trees of radix trees[7] promising. Most likely, taking into account 
> program's
>time constraints, this task won't be done as "ready to deploy" solution.
>   Instead, some kind of prototype can be implemented and benchmarked.  

I like the idea of more efficient BufferTag->Page data structure. I'm not sure 
cache locality is a real problem there, but I believe this idea deserves giving 
it a shot.
I'd happily review your proposal and co-mentor project, if it will be chosen 
for GSoC.

Also, plz check some work of my students in related area [0].


Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/89A121E3-B593-4D65-98D9-BBC210B87268%40yandex-team.ru


Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-24 Thread Dean Rasheed
On Sun, 24 Mar 2019 at 00:17, David Rowley  wrote:
>
> On Sun, 24 Mar 2019 at 12:41, Tomas Vondra  
> wrote:
> >
> > On 3/21/19 4:05 PM, David Rowley wrote:
>
> > > 29. Looking at the tests I see you're testing that you get bad
> > > estimates without extended stats.  That does not really seem like
> > > something that should be done in tests that are meant for extended
> > > statistics.
> > >
> >
> > True, it might be a bit unnecessary. Initially the tests were meant to
> > show old/new estimates for development purposes, but it might not be
> > appropriate for regression tests. I don't think it's a big issue, it's
> > not like it'd slow down the tests significantly. Opinions?
>
> My thoughts were that if someone did something to improve non-MV
> stats, then is it right for these tests to fail? What should the
> developer do in the case? update the expected result? remove the test?
> It's not so clear.
>

I think the tests are fine as they are. Don't think of these as "good"
and "bad" estimates. They should both be "good" estimates, but under
different assumptions -- one assuming no correlation between columns,
and one taking into account the relationship between the columns. If
someone does do something to "improve" the non-MV stats, then the
former tests ought to tell us whether it really was an improvement. If
so, then the test result can be updated and perhaps whatever was done
ought to be factored into the MV-stats' calculation of base
frequencies. If not, the test is providing valuable feedback that
perhaps it wasn't such a good improvement after all.

Regards,
Dean



Re: SQL statement PREPARE does not work in ECPG

2019-03-24 Thread Michael Meskes
Matsumura-san,

> Therefore, I think that the quoting statement name is needed in
> PREPARE/EXECUTE case, too.

I agree that we have to accept a quoted statement name and your
observations are correct of course, I am merely wondering if we need
the escaped quotes in the call to the ecpg functions or the libpq
functions.

> > I would prefer to merge as much as possible, as I am afraid that if
> > we
> > do not merge the approaches, we will run into issues later. There
> > was a
> > reason why we added PQprepare, but I do not remember it from the
> > top of
> > my head. Need to check when I'm online again.
> 
> I will also consider it.

Thank you.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL