date:20190619

Re: Choosing values for multivariate MCV lists

2019-06-19 Thread Dean Rasheed

On Tue, 18 Jun 2019 at 21:59, Tomas Vondra  wrote:
>
> The current implementation of multi-column MCV lists (added in this
> cycle) uses a fairly simple algorithm to pick combinations to include in
> the MCV list. We just compute a minimum number of occurences, and then
> include all entries sampled more often. See get_mincount_for_mcv_list().
>
> [snip]
>
> This however means that by choosing the MCV entries solely based on the
> number of occurrences in the sample, we end up with MCV lists where vast
> majority of entries has NULL street name.
>
> That's why we got such poor estimate in the example query, despite the
> fact that the city/street combination is the most frequent in the data
> set (with non-NULL street name).
>

I think the fact that they're NULL is a bit of a red herring because
we're treating NULL just like any other value. The same thing would
happen if there were some other very common non-NULL value that
dominated the dataset.

> The other weird thing is that frequency of NULL street names is fairly
> uniform in the whole data set. In total about 50% addresses match that,
> and for individual cities it's generally between 25% and 100%, so the
> estimate is less than 2x off in those cases.
>
> But for addresses with non-NULL street names, the situation is very
> different. Some street names are unique to a single city, etc.
>
> Overall, this means we end up with MCV list with entries representing
> the mostly-uniform part of the data set, instead of prefering the
> entries that are truly skewed.
>
> So I'm wondering if we could/should rethink the algorithm, so look at
> the frequency and base_frequency, and maybe pick entries based on their
> ratio, or something like that.
>

Hmm, interesting. I think I would like to see a more rigorous
justification for changing the algorithm deciding which values to
keep.

If I've understood correctly, I think the problem is this: The
mincount calculation is a good way of identifying MCV candidates to
keep, because it ensures that we don't keep values that don't appear
sufficiently often to produce accurate estimates, and ideally we'd
keep everything with count >= mincount. However, in the case were
there are more than stats_target items with count >= mincount, simply
ordering by count and keeping the most commonly seen values isn't
necessarily the best strategy in the case of multivariate statistics.

To identify what the best strategy might be, I think you need to
examine the errors that would occur as a result of *not* keeping a
value in the multivariate MCV list. Given a value that appears with
count >= mincount, N*freq ought to be a reasonable estimate for the
actual number of occurrences of that value in the table, and
N*base_freq ought to be a reasonable estimate for the
univariate-stats-based estimate that it would be given if it weren't
kept in the multivariate MCV list. So the absolute error resulting
from not keeping that value would be

  N * Abs(freq - base_freq)

But then I think we ought to take into account how often we're likely
to get that error. If we're simply picking values at random, the
likelihood of getting that value is just its frequency, so the average
average absolute error would be

  Sum( N * freq[i] * Abs(freq[i] - base_freq[i]) )

which suggests that, if we wanted to reduce the average absolute error
of the estimates, we should order by freq*Abs(freq-base_freq) and keep
the top n in the MCV list.

On the other hand, if we wanted to reduce the average *relative* error
of the estimates, we might instead order by Abs(freq-base_freq).

> For example, we might sort the entries by
>
> abs(freq - base_freq) / freq
>

I'm not sure it's easy to justify ordering by Abs(freq-base_freq)/freq
though, because that would seem likely to put too much weight on the
least commonly occurring values.

> Of course, this is a challenging problem, for a couple of reasons.
>
> Firstly, picking simply the most frequent groups is very simple and it
> gives us additional information about the largest group (which may be
> useful elsewhere, e.g. the incremental sort patch).
>

Yes, you would have to keep in mind that changing the algorithm would
mean that the MCV list no longer represented all the most common
values. For example, it would no longer be valid to assume that no
value appeared more often than the first value in the MCV list. I'm
not sure that we currently do that though.

> Secondly, if the correlated combinations are less frequent, how reliably
> can we even estimate the frequency from a sample? The combination in the
> example query was ~0.02% of the data, so how likely it's to be sampled?
>

I think that's OK as long as we keep the mincount filter in the new
algorithm. I think some experimentation is definitely worthwhile here,
but it looks plausible that a decent approach might be:

1). Discard all values with count < mincount.
2). Order by freq*Abs(freq-base_freq) (or possibly just
Abs(freq-base_freq)) and keep the top n,

Re: Inconsistent error message wording for REINDEX CONCURRENTLY

2019-06-19 Thread Michael Paquier

On Wed, Jun 19, 2019 at 11:29:37PM -0400, Alvaro Herrera wrote:
> Looks good.

Thanks for the review, and reminding me about it :)

While on it, I have removed some comments around the error messages
because they actually don't bring more information.
--
Michael


signature.asc
Description: PGP signature

Re: doc: update "relam" description in pg_class catalog reference

2019-06-19 Thread Ian Barwick


On 6/20/19 1:07 PM, Michael Paquier wrote:

On Thu, Jun 20, 2019 at 11:20:46AM +0900, Ian Barwick wrote:

Whoops, correct version attached. Sorry about the noise.


v2 looks fine to me, committed.  Thanks!


Thanks!


Regards

Ian Barwick


--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: doc: update "relam" description in pg_class catalog reference

2019-06-19 Thread Michael Paquier

On Thu, Jun 20, 2019 at 11:20:46AM +0900, Ian Barwick wrote:
> Whoops, correct version attached. Sorry about the noise.

v2 looks fine to me, committed.  Thanks!
--
Michael


signature.asc
Description: PGP signature

Re: Inconsistent error message wording for REINDEX CONCURRENTLY

2019-06-19 Thread Alvaro Herrera

On 2019-May-27, Michael Paquier wrote:

> On Mon, May 27, 2019 at 12:20:58AM -0400, Alvaro Herrera wrote:

> > I notice your patch changes "catalog relations" to "system catalogs".
> > I think we predominantly prefer the latter, so that part of your change
> > seems OK.  (In passing, I noticed we have a couple of places using
> > "system catalog tables", which is weird.)
> 
> Good point.  These are not new though, so I would prefer not touch
> those parts for this patch.

Sure.

> > We do have "is not yet implemented" in a
> > couple of other places, so all things considered I'm not so sure about
> > changing that one to "cannot".
> 
> Okay.  I can live with this difference.  Not changing the string in
> ReindexRelationConcurrently() has the merit to be consistent with the
> existing ones in reindex_relation() and ReindexPartitionedIndex().
> Please find attached an updated version.  What do you think?

Looks good.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

JOIN_SEMI planning question

2019-06-19 Thread Thomas Munro

Hello,

While looking at bug #15857[1], I wondered why the following two
queries get different plans, given the schema and data from the bug
report:

(1) SELECT COUNT (*)
  FROM a
  JOIN b
ON a.id=b.base_id
 WHERE EXISTS (
   SELECT 1
 FROM c
WHERE c.base_id = a.id
   );

(2) SELECT COUNT (*)
  FROM a
  JOIN b
ON a.id=b.base_id
 WHERE EXISTS (
   SELECT 1
 FROM c
WHERE c.base_id = b.base_id
   );

The only difference is a.id vs b.base_id in the WHERE clause, and
those are equivalent, and the planner knows it as we can see from the
join cond visible in the plan.  Query 1 gets a JOIN_UNIQUE_INNER for
BC (C is sorted, made unique and then hashed for an inner join with B)
while query 2 gets a JOIN_SEMI for BC (C is hashed for a semi join
with B).  In both cases JOIN_UNIQUE_INNER is considered for BC (though
it's later blocked for PHJ since commit aca127c1), but the better
JOIN_SEMI is considered only for query 2.

The relevant decision logic is in populate_joinrel_with_paths()'s
JOIN_SEMI case, where it considers which of JOIN_SEMI,
JOIN_UNIQUE_INNER and JOIN_UNIQUE_OUTER to add.

Here's my question: how could it ever be OK to sort/unique something
and put it in a hash table, but not OK to put exactly the same thing
in the hash table directly, with JOIN_SEMI logic to prevent multiple
matches?  And likewise for the inner side of other join strategies.
Or to put in in plain C, in what case would the following change be
wrong?

/*
 * We might have a normal semijoin, or a case
where we don't have
 * enough rels to do the semijoin but can
unique-ify the RHS and
 * then do an innerjoin (see comments in
join_is_legal).  In the
 * latter case we can't apply JOIN_SEMI joining.
 */
-   if (bms_is_subset(sjinfo->min_lefthand, rel1->relids) &&
-   bms_is_subset(sjinfo->min_righthand,
rel2->relids))
+   if ((bms_is_subset(sjinfo->min_lefthand,
rel1->relids) &&
+bms_is_subset(sjinfo->min_righthand,
rel2->relids)) ||
+bms_equal(sjinfo->syn_righthand, rel2->relids))
{
if (is_dummy_rel(rel1) || is_dummy_rel(rel2) ||

restriction_is_constant_false(restrictlist, joinrel, false))

Or to put it in the language of the comment, how could you ever have
enough rels to do a join between B and unique(C), but not enough rels
to do a semi-join between B and C?

I admit that I don't have a great grasp of equivalence classes,
(min|syn)_(left|right)hand or the join planning code in general,
having focused mostly on execution so far, so the above is a
cargo-cult change and I may be missing something fundamental...

Which plan wins is of course a costing matter, but having the
JOIN_SEMI option available has the advantage of being more profitably
parallelisable, which led me here.  With the above hack you get a
[Parallel] Hash Semi Join for BC with both queries (unless you set
work_mem higher and then you get a much slower merge join, but that's
an entirely separate problem).  Only one regression test plan changes
-- it's structurally similar to the bug report query, but at a glance
the new plan is better anyway.  The test still demonstrates what it
wants to demonstrate (namely that the absence or presence of a unique
index causes the plan to change, and with the above hack that is even
clearer because the two plans now differ only in "Nested Loop Semi
Join" vs "Nested Loop").

[1] 
https://www.postgresql.org/message-id/flat/15857-d1ba2a64bce0795e%40postgresql.org

-- 
Thomas Munro
https://enterprisedb.com

Re: doc: update "relam" description in pg_class catalog reference

2019-06-19 Thread Ian Barwick


On 6/20/19 11:17 AM, Ian Barwick wrote:

Hi

Here:

   https://www.postgresql.org/docs/devel/catalog-pg-class.html

the description for "relam" has not been updated to take into account
table access methods; patch attached.


Whoops, correct version attached. Sorry about the noise.


Regards


Ian Barwick

--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
new file mode 100644
index 1300c7b..84f730c
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
*** SCRAM-SHA-256$iteration
*** 1730,1736 
relam
oid
pg_am.oid
!   If this is an index, the access method used (B-tree, hash, etc.)
   
  
   
--- 1730,1736 
relam
oid
pg_am.oid
!   If this is a table or an index, the access method used (heap, B-tree, hash, etc.)

doc: update "relam" description in pg_class catalog reference

2019-06-19 Thread Ian Barwick


Hi

Here:

  https://www.postgresql.org/docs/devel/catalog-pg-class.html

the description for "relam" has not been updated to take into account
table access methods; patch attached.


Regards

Ian Barwick

--
 Ian Barwick   https://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
new file mode 100644
index 1300c7b..df68cf7
*** a/doc/src/sgml/catalogs.sgml
--- b/doc/src/sgml/catalogs.sgml
*** SCRAM-SHA-256$iteration
*** 1730,1736 
relam
oid
pg_am.oid
!   If this is an index, the access method used (B-tree, hash, etc.)
   
  
   
--- 1730,1736 
relam
oid
pg_am.oid
!   If this is table or an index, the access method used (heap, B-tree, hash, etc.)

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-19 Thread Tom Lane

I wrote:
> So I'm toying with the idea of extending Andrew's patch to put a negative
> preference on "localtime", ensuring we'll use some other name for the zone
> if one is available.

Oh ... after further review it seems like "posixrules" should be
de-preferred on the same basis: it's uninformative and unportable,
and it's short enough to have a good chance of capturing initdb's
attention.  I recall having seen at least one machine picking it
recently.

Moreover, while I think most tzdb installations have that file (ours
certainly do), the handwriting is on the wall for it to go away,
leaving only postmaster startup failures behind:

http://mm.icann.org/pipermail/tz/2019-June/028172.html

regards, tom lane

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-19 Thread Thomas Munro

On Thu, Jun 20, 2019 at 10:48 AM Tom Lane  wrote:
> As of now, six of the seven UCT-reporting members have switched to UTC;
> the lone holdout is elver which hasn't run in ten days.  (Perhaps it
> zneeds unwedged.)  There are no other changes, so it seems like Andrew's
> patch is doing what it says on the tin.

Oops.  Apparentlly REL_10 of the build farm scripts lost the ability
to find "buildroot" in the current working directory automatically.  I
have updated eelpout and elver's .conf file to have an explicit path,
and they are now busily building stuff.

-- 
Thomas Munro
https://enterprisedb.com

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-19 Thread Tom Lane

BTW ... now that that patch has been in long enough to collect some
actual data on what it's doing, I set out to scrape the buildfarm logs
to see what is happening in the farm.  Here are the popularities of
various timezone settings, as of the end of May:

  3 America/Los_Angeles
  9 America/New_York
  3 America/Sao_Paulo
  2 Asia/Tokyo
  2 CET
 24 Etc/UTC
  3 Europe/Amsterdam
 11 Europe/Berlin
  1 Europe/Brussels
  1 Europe/Helsinki
  1 Europe/Isle_of_Man
  2 Europe/London
  7 Europe/Paris
  6 Europe/Prague
  5 Europe/Stockholm
  1 ROK
  7 UCT
  1 US/Central
  7 US/Eastern
  2 US/Pacific
 15 UTC
  1 localtime

(These are the zone choices reported in the initdb-C step for the
animal's last successful run before 06-01.  I excluded animals for which
the configuration summary shows that their choice is being forced by a
TZ environment variable.)

As of now, six of the seven UCT-reporting members have switched to UTC;
the lone holdout is elver which hasn't run in ten days.  (Perhaps it
zneeds unwedged.)  There are no other changes, so it seems like Andrew's
patch is doing what it says on the tin.

However, that one entry for 'localtime' disturbs me. (It's from snapper.)
That seems like a particularly useless choice of representation: it's not
informative, it's not portable, and it would lead to postmaster startup
failure if someone were to remove the machine's localtime file, which
I assume is a nonstandard insertion into /usr/share/zoneinfo.  Very
likely the only reason we don't see this behavior more is that sticking
a "localtime" file into /usr/share/zoneinfo is an obsolescent practice.
On machines that have such a file, it has a good chance of winning on
the grounds of being a short name.

So I'm toying with the idea of extending Andrew's patch to put a negative
preference on "localtime", ensuring we'll use some other name for the zone
if one is available.

Also, now that we have this mechanism, maybe we should charge it with
de-preferencing the old "Factory" zone, removing the hard-wired kluge
that we currently have for rejecting that.  (Modern tzdb doesn't install
"Factory" at all, but some installations might still do so in the service
of blind backwards compatibility.)

Thoughts?

regards, tom lane

Re: more Unicode data updates

2019-06-19 Thread Thomas Munro

On Thu, Jun 20, 2019 at 8:35 AM Peter Eisentraut
 wrote:
> src/include/common/unicode_norm_table.h also should be updated to the
> latest Unicode tables, as described in src/common/unicode.  See attached
> patches.  This also passes the tests described in
> src/common/unicode/README.  (That is, the old code does not pass the
> current Unicode test file, but the updated code does pass it.)
>
> I also checked contrib/unaccent/ but it seems up to date.
>
> It seems to me that we ought to make this part of the standard major
> release preparations.  There is a new Unicode standard approximately
> once a year; see .  (The 13.0.0 listed
> there is not released yet.)
>
> It would also be nice to unify and automate all these "update to latest
> Unicode" steps.

+1, great idea.  Every piece of the system that derives from Unicode
data should derive from the same version, and the version should be
mentioned in the release notes when it changes, and should be
documented somewhere centrally.  I wondered about that when working on
the unaccent generator script but didn't wonder hard enough.

-- 
Thomas Munro
https://enterprisedb.com

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-19 Thread Tom Lane

Robert Haas  writes:
> On Mon, Jun 17, 2019 at 2:41 PM Stephen Frost  wrote:
>> Ah, ok, I agree that would have been good to do.  Of course, hindsight
>> being 20/20 and all that.  Something to keep in mind for the future
>> though.

> I think it was inappropriate to commit this at all.  You can't just
> say "some other committer objects, but I think I'm right so I'll just
> ignore them and commit anyway."  If we all do that it'll be chaos.

FWIW, that was my concern about this.

> I don't know exactly how many concurring vote it takes to override
> somebody else's -1, but it's got to be more than zero.

If even one other person had +1'd Andrew's proposal, I'd have yielded
to the consensus --- this was certainly an issue on which it's not
totally clear what to do.  But unless I missed some traffic, the vote
was exactly 1 to 1.  There is no way that that represents consensus to
commit.

Also on the topic of process: 48 hours before a wrap deadline is
*particularly* not the time to play fast and loose with this sort of
thing.  It'd have been better to wait till after this week's releases,
so there'd at least be time to reconsider if the patch turned out to
have unexpected side-effects.

regards, tom lane

Re: UCT (Re: pgsql: Update time zone data files to tzdata release 2019a.)

2019-06-19 Thread Robert Haas

On Mon, Jun 17, 2019 at 2:41 PM Stephen Frost  wrote:
> Ah, ok, I agree that would have been good to do.  Of course, hindsight
> being 20/20 and all that.  Something to keep in mind for the future
> though.

I think it was inappropriate to commit this at all.  You can't just
say "some other committer objects, but I think I'm right so I'll just
ignore them and commit anyway."  If we all do that it'll be chaos.

I don't know exactly how many concurring vote it takes to override
somebody else's -1, but it's got to be more than zero.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: pg_upgrade: Improve invalid option handling

2019-06-19 Thread Daniel Gustafsson

> On 19 Jun 2019, at 21:51, Peter Eisentraut  
> wrote:
> 
> On 2019-06-19 04:24, Michael Paquier wrote:
>> On Tue, Jun 18, 2019 at 10:25:44AM +0200, Daniel Gustafsson wrote:
>>> Correct, that matches how pg_basebackup and psql does it.
>> 
>> Perhaps you have a patch at hand?  I can see four strings in
>> pg_upgrade, two in exec.c and two in option.c, which could be
>> improved.
> 
> Committed my patch and the fixes for those error messages.

Looks good, thanks!

cheers ./daniel

Re: Race conditions with TAP test for syncrep

2019-06-19 Thread Alvaro Herrera

On 2019-Jun-18, Michael Paquier wrote:

> On Mon, Jun 17, 2019 at 10:50:39AM -0400, Alvaro Herrera wrote:
> > Hmm, this introduces a bit of latency: it waits for each standby to be
> > fully up before initializing the next standby.  Maybe it would be more
> > convenient to split the primitives: keep the current one to start the
> > standby, and add a separate one to wait for it to be registered.  Then
> > we could do
> > standby1->start;
> > standby2->start;
> > standby3->start;
> > foreach my $sby (@standbys) {
> > $sby->wait_for_standby
> > }
> 
> It seems to me that this sequence could still lead to inconsistencies:
> 1) standby 1 starts, reaches consistency so pg_ctl start -w exits.
> 2) standby 2 starts, reaches consistency.
> 3) standby 2 starts a WAL receiver, gets the first WAL sender slot of
> the primary.
> 4) standby 1 starts a WAL receiver, gets the second slot.

Ho ho .. you know what misled me into thinking that that would work?
Just look at the name of the test that failed, "asterisk comes before
another standby name".  That doesn't seem to be what the test is
testing!

# poll_query_until timed out executing this query:
# SELECT application_name, sync_priority, sync_state FROM pg_stat_replication 
ORDER BY application_name;
# expecting this output:
# standby1|1|sync
# standby2|2|sync
# standby3|2|potential
# standby4|2|potential
# last actual query output:
# standby1|1|sync
# standby2|2|potential
# standby3|2|sync
# standby4|2|potential
# with stderr:

#   Failed test 'asterisk comes before another standby name'

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Update list of combining characters

2019-06-19 Thread Tom Lane

Peter Eisentraut  writes:
>> Indeed.  Here is an updated script and patch.

> committed (to master)

Cool, but should we also put your recalculation script into git, to help
the next time we decide that we need to update this list?  It's
demonstrated to be nontrivial to get it right ;-)

regards, tom lane

Re: New vacuum option to do only freezing

2019-06-19 Thread Peter Geoghegan

On Tue, Jun 18, 2019 at 10:39 PM Michael Paquier  wrote:
> +INSERT INTO no_index_cleanup(i, t) VALUES(1, repeat('1234567890',3));
> Do we really need a string as long as that?

Specifying EXTERNAL storage might make things easier. I have used
PLAIN storage to test the 1/3 of a page restriction within nbtree, and
to test a bug in amcheck that was related to TOAST compression.

> It seems to me that we'd want tests to make sure that indexes are
> actually cleaned up, where pageinspect could prove to be useful.

That definitely seems preferable, but it'll be a bit tricky to do it
in a way that doesn't run into buildfarm issues due to alignment. I
suggest an index on a text column to avoid problems.

-- 
Peter Geoghegan

Re: pg_upgrade: Improve invalid option handling

2019-06-19 Thread Peter Eisentraut

On 2019-06-19 04:24, Michael Paquier wrote:
> On Tue, Jun 18, 2019 at 10:25:44AM +0200, Daniel Gustafsson wrote:
>> Correct, that matches how pg_basebackup and psql does it.
> 
> Perhaps you have a patch at hand?  I can see four strings in
> pg_upgrade, two in exec.c and two in option.c, which could be
> improved.

Committed my patch and the fixes for those error messages.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Update list of combining characters

2019-06-19 Thread Peter Eisentraut

On 2019-06-14 11:36, Peter Eisentraut wrote:
> On 2019-06-13 15:52, Alvaro Herrera wrote:
>> I think there's an off-by-one bug in your script.
> 
> Indeed.  Here is an updated script and patch.

committed (to master)

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: SQL/JSON path issues/questions

2019-06-19 Thread Alexander Korotkov

Hi, Liudmila!

> While I have no objections to the proposed fixes, I think we can further
> improve patch 0003 and the text it refers to.
> In attempt to clarify jsonpath docs and address the concern that ? is
> hard to trace in the current text, I'd also like to propose patch 0004.
> Please see both of them attached.

Thank you for your editing.  I'm going to commit them as well.

But I'm going to commit your changes separately from 0003 I've posted
before.  Because 0003 fixes factual error, while you're proposing set
of grammar/style fixes.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: SQL/JSON path issues/questions

2019-06-19 Thread Alexander Korotkov

On Wed, Jun 19, 2019 at 7:07 PM Thom Brown  wrote:
> On Thu, 13 Jun 2019 at 14:59, Thom Brown  wrote:
> >
> > Hi,
> >
> > I've been reading through the documentation regarding jsonpath and
> > jsonb_path_query etc., and I have found it lacking explanation for
> > some functionality, and I've also had some confusion when using the
> > feature.
> >
> > ? operator
> > ==
> > The first mention of '?' is in section 9.15, where it says:
> >
> > "Suppose you would like to retrieve all heart rate values higher than
> > 130. You can achieve this using the following expression:
> > '$.track.segments[*].HR ? (@ > 130)'"
> >
> > So what is the ? operator doing here?  Sure, there's the regular ?
> > operator, which is given as an example further down the page:
> >
> > '{"a":1, "b":2}'::jsonb ? 'b'
> >
> > But this doesn't appear to have the same purpose.
> >
> >
> > like_regex
> > ==
> > Then there's like_regex, which shows an example that uses the keyword
> > "flag", but that is the only instance of that keyword being mentioned,
> > and the flags available to this expression aren't anywhere to be seen.
> >
> >
> > is unknown
> > ==
> > "is unknown" suggests a boolean output, but the example shows an
> > output of "infinity".  While I understand what it does, this appears
> > inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> > pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> > pg_is_in_backup() etc.).
> >
> >
> > $varname
> > ==
> > The jsonpath variable, $varname, has an incomplete description: "A
> > named variable. Its value must be set in the PASSING clause of an
> > SQL/JSON query function. for details."
> >
> >
> > Binary operation error
> > ==
> > I get an error when I run this query:
> >
> > postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> > psql: ERROR:  right operand of jsonpath operator + is not a single numeric 
> > value
> >
> > While I know it's correct to get an error in this scenario as there is
> > no element beyond 0, the message I get is confusing.  I'd expect this
> > if it encountered another array in that position, but not for
> > exceeding the upper bound of the array.
> >
> >
> > Cryptic error
> > ==
> > postgres=# SELECT jsonb_path_query('[1, "2",
> > {},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
> > psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath 
> > input
> > LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
> >  ^
> > Again, I expect an error, but the message produced doesn't help me.
> > I'll remove the ANY_P if I can find it.
> >
> >
> > Can't use nested arrays with jsonpath
> > ==
> >
> > I encounter an error in this scenario:
> >
> > postgres=# select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == 
> > [1,2])');
> > psql: ERROR:  syntax error, unexpected '[' at or near "[" of jsonpath input
> > LINE 1: select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == ...
> >
> > So these filter operators only work with scalars?
> >
> >
>
> Another observation about the documentation is that the examples given
> in 9.15. JSON Functions, Operators, and Expressions aren't all
> functional.  Some example JSON is provided, followed by example
> jsonpath queries which could be used against it.  These will produce
> results for the reader wishing to test them out until this example:
>
> '$.track.segments[*].HR ? (@ > 130)'
>
> This is because there is no HR value greater than 130.  May I propose
> setting this and all similar examples to (@ > 120) instead?

Makes sense to me.

> Also, this example doesn't work:
>
> '$.track ? (@.segments[*] ? (@.HR > 130)).segments.size()'
>
> This gives me:
>
> psql: ERROR:  syntax error, unexpected $end at end of jsonpath input
> LINE 13: }','$.track ? (@.segments[*]');
> ^

Perhaps it should be following:

'$.track ? (exists(@.segments[*] ? (@.HR > 130))).segments.size()'

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Minimal logical decoding on standbys

2019-06-19 Thread Andres Freund

Hi,

On 2019-06-12 17:30:02 +0530, Amit Khandekar wrote:
> On Tue, 11 Jun 2019 at 12:24, Amit Khandekar  wrote:
> > On Mon, 10 Jun 2019 at 10:37, Amit Khandekar  wrote:
> > > Since this requires the test to handle the
> > > fire-create-slot-and-then-fire-checkpoint-from-master actions, I was
> > > modifying the test file to do this. After doing that, I found that the
> > > slave gets an assertion failure in XLogReadRecord()=>XRecOffIsValid().
> > > This happens only when the restart_lsn is set to ReplayRecPtr.
> > > Somehow, this does not happen when I manually create the logical slot.
> > > It happens only while running testcase. Working on it ...
> >
> > Like I mentioned above, I get an assertion failure for
> > Assert(XRecOffIsValid(RecPtr)) while reading WAL records looking for a
> > start position (DecodingContextFindStartpoint()). This is because in
> > CreateInitDecodingContext()=>ReplicationSlotReserveWal(), I now set
> > the logical slot's restart_lsn to XLogCtl->lastReplayedEndRecPtr. And
> > just after bringing up slave, lastReplayedEndRecPtr's initial values
> > are in this order : 0/228, 0/260, 0/2D8, 0/2000100,
> > 0/300, 0/360. You can see that 0/300 is not a valid value
> > because it points to the start of a WAL block, meaning it points to
> > the XLog page header (I think it's possible because it is 1 + endof
> > last replayed record, which can be start of next block). So when we
> > try to create a slot when it's in that position, then XRecOffIsValid()
> > fails while looking for a starting point.
> >
> > One option I considered was : If lastReplayedEndRecPtr points to XLog
> > page header, get a position of the first record on that WAL block,
> > probably with XLogFindNextRecord(). But it is not trivial because
> > while in ReplicationSlotReserveWal(), XLogReaderState is not created
> > yet.
>
> In the attached v6 version of the patch, I did the above. That is, I
> used XLogFindNextRecord() to bump up the restart_lsn of the slot to
> the first valid record. But since XLogReaderState is not available in
> ReplicationSlotReserveWal(), I did this in
> DecodingContextFindStartpoint(). And then updated the slot restart_lsn
> with this corrected position.

> Since XLogFindNextRecord() is currently disabled using #if 0, removed
> this directive.

Well, ifdef FRONTEND. I don't think that's a problem. It's a bit
overkill here, because I think we know the address has to be on a record
boundary (rather than being in the middle of a page spanning WAL
record). So we could just add add the size of the header manually - but
I think that's not worth doing.


> > Or else, do you think we can just increment the record pointer by
> > doing something like (lastReplayedEndRecPtr % XLOG_BLCKSZ) +
> > SizeOfXLogShortPHD() ?
>
> I found out that we can't do this, because we don't know whether the
> xlog header is SizeOfXLogShortPHD or SizeOfXLogLongPHD. In fact, in
> our context, it is SizeOfXLogLongPHD. So we indeed need the
> XLogReaderState handle.

Well, we can determine whether a long or a short header is going to be
used, as that's solely dependent on the LSN:

/*
 * If first page of an XLOG segment file, make it a long header.
 */
if ((XLogSegmentOffset(NewPage->xlp_pageaddr, 
wal_segment_size)) == 0)
{
XLogLongPageHeader NewLongPage = (XLogLongPageHeader) 
NewPage;

NewLongPage->xlp_sysid = ControlFile->system_identifier;
NewLongPage->xlp_seg_size = wal_segment_size;
NewLongPage->xlp_xlog_blcksz = XLOG_BLCKSZ;
NewPage->xlp_info |= XLP_LONG_HEADER;
}

but I don't think that's worth it.


> > Do you think that we can solve this using some other approach ? I am
> > not sure whether it's only the initial conditions that cause
> > lastReplayedEndRecPtr value to *not* point to a valid record, or is it
> > just a coincidence and that lastReplayedEndRecPtr can also have such a
> > value any time afterwards.

It's always possible. All that means is that the last record filled the
entire last WAL page.


> > If it's only possible initially, we can
> > just use GetRedoRecPtr() instead of lastReplayedEndRecPtr if
> > lastReplayedEndRecPtr is invalid.

I don't think so? The redo pointer will point to something *much*
earlier, where we'll not yet have done all the necessary conflict
handling during recovery? So we'd not necessarily notice that a slot
is not actually usable for decoding.

We could instead just handle that by starting decoding at the redo
pointer, and just ignore all WAL records until they're after
lastReplayedEndRecPtr, but that has no advantages, and will read a lot
more WAL.




>  static void _bt_cachemetadata(Relation rel, BTMetaPageData *input);
> @@ -773,6 +774,7 @@ _bt_log_reuse_page(Relation rel, BlockNumber blkno, 
> TransactionId latestRemovedX
>

Re: SQL/JSON path issues/questions

2019-06-19 Thread Alexander Korotkov

On Mon, Jun 17, 2019 at 8:40 PM Thom Brown  wrote:
> On Fri, 14 Jun 2019 at 08:16, Kyotaro Horiguchi  
> wrote:
> >
> > Hi, Thom.
> >
> > At Thu, 13 Jun 2019 14:59:51 +0100, Thom Brown  wrote
> > in 
> > > Hi,
> > >
> > > I've been reading through the documentation regarding jsonpath and
> > > jsonb_path_query etc., and I have found it lacking explanation for
> > > some functionality, and I've also had some confusion when using the
> > > feature.
> > >
> > > ? operator
> > > ==
> > > The first mention of '?' is in section 9.15, where it says:
> > >
> > > "Suppose you would like to retrieve all heart rate values higher than
> > > 130. You can achieve this using the following expression:
> > > '$.track.segments[*].HR ? (@ > 130)'"
> > >
> > > So what is the ? operator doing here?  Sure, there's the regular ?
> >
> > It is described just above as:
> >
> > | Each filter expression must be enclosed in parentheses and
> > | preceded by a question mark.
>
> Can I suggest that, rather than using "question mark", we use the "?"
> symbol, or provide a syntax structure which shows something like:
>
>  ? 
>
> This not only makes this key information clearer and more prominent,
> but it also makes the "?" symbol searchable in a browser for anyone
> wanting to find out what that symbol is doing.

Sounds good for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: POC: Cleaning up orphaned files using undo logs

2019-06-19 Thread Robert Haas

On Wed, Jun 19, 2019 at 9:13 AM Dilip Kumar  wrote:
> I think it's a fair point.   We can keep pointer to
> UndoRecordTransaction(urec_progress, dbid, uur_next) and
> UndoRecordLogSwitch(urec_prevurp, urec_prevlogstart) in
> UnpackedUndoRecord and include them whenever undo record contain these
> headers.  Transaction header in the first record of the transaction
> and log-switch header in the first record after undo-log switch during
> a transaction.  IMHO uur_fxid, we can keep as part of the main
> UnpackedUndoRecord, because as part of the other work  "Compression
> for undo records to consider rmgrid, xid,cid,reloid for each record",
> the FullTransactionId, will be present in every UnpackedUndoRecord
> (although it will not be stored in every undo record).

I agree that fxid needs to be set all the time.

I'm not sure I'm entirely following the rest of what you are saying
here, but let me say again that I don't think UnpackedUndoRecord
should include a bunch of stuff that callers (1) don't need to set
when inserting and (2) can't count on having set when fetching.  Stuff
of that type should be handled in some way that spares clients of the
undo system from having to worry about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: Cleaning up orphaned files using undo logs

2019-06-19 Thread Robert Haas

On Tue, Jun 18, 2019 at 2:07 PM Robert Haas  wrote:
> On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila  wrote:
> > [ new patches ]
>
> I tried writing some code [ to use these patches ].

I spent some more time experimenting with this patch set today and I
think that the UndoFetchRecord interface is far too zheap-centric.  I
expected that I would be able to do this:

UnpackedUndoRecord *uur = UndoFetchRecord(urp);
// do stuff with uur
UndoRecordRelease(uur);

But I can't, because the UndoFetchRecord API requires me to pass not
only an undo record but also a block number, an offset number, an XID,
and a callback.  I think I could get the effect that I want by
defining a callback that always returns true.  Then I could do:

UndoRecPtr junk;
UnpackedUndoRecord *uur = UndoFetchRecord(urp, InvalidBlockNumber,
InvalidOffsetNumber, , always_returns_true);
// do stuff with uur
UndoRecordRelease(uur);

That seems ridiculously baroque.  I think the most common thing that
an AM will want to do with an UndoRecPtr is look up that exact record;
that is, for example, what zedstore will want to do. However, even if
some AMs, like zheap, want to search backward through a chain of
records, there's no real reason to suppose that all of them will want
to search by block number + offset.  They might want to search by some
bit of data buried in the payload, for example.

I think the basic question here is whether we really need anything
more complicated than:

extern UnpackedUndoRecord *UndoFetchRecord(UndoRecPtr urp);

I mean, if you had that, the caller can implement looping easily
enough, and insert any test they want:

for (;;)
{
UnpackedUndoRecord *uur = UndoFetchRecord(urp);
if (i like this one)
break;
urp = uur->uur_blkprev; // should be renamed, since zedstore +
probably others will have tuple chains not block chains
UndoRecordRelease(uur);
}

The question in my mind is whether there's some performance advantage
of having the undo layer manage the looping rather than the caller do
it.  If there is, then there's a lot of zheap code that ought to be
changed to use it, because it's just using the same satisfies-callback
everywhere.  If there's not, we should just simplify the undo record
lookup along the lines mentioned above and put all the looping into
the callers.  zheap could provide a wrapper around UndoFetchRecord
that does a search by block and offset, so that we don't have to
repeat that logic in multiple places.

BTW, an actually generic iterator interface would probably look more like this:

typedef bool (*SatisfyUndoRecordCallback)(void *callback_data,
UnpackedUndoRecord *uur);
extern UnpackedUndoRecord *UndoFetchRecord(UndoRecPtr urp, UndoRecPtr
*found, SatisfyUndoRecordCallback callback, void *callback_data);

Now we're not assuming anything about what parts of the record the
callback wants to examine.  It can do whatever it likes.  Typically
with this sort of interface a caller will define a file-private struct
that is known to both the callback and the caller of UndoFetchRecord,
but not elsewhere.

If we decide we need an iterator within the undo machinery itself,
then I think it should look like the above, and I think it should
accept NULL for found, callback, and callback_data, so that somebody
who wants to just look up a record, full stop, can do just:

UnpackedUndoRecord *uur = UndoFetchRecord(urp, NULL, NULL, NULL);

which seems tolerable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: idea: log_statement_sample_rate - bottom limit for sampling

2019-06-19 Thread Pavel Stehule

st 19. 6. 2019 v 10:49 odesílatel Adrien Nayrat 
napsal:

> On 6/18/19 8:29 PM, Pavel Stehule wrote:
> >
> >
> > út 18. 6. 2019 v 14:03 odesílatel Adrien Nayrat <
> adrien.nay...@anayrat.info
> > > napsal:
> >
> > Hi,
> >
> > I tried the patch, here my comment:
> >
> > > gettext_noop("Zero effective disables sampling. "
> > >  "-1 use sampling every time (without
> limit)."),
> >
> > I do not agree with the zero case. In fact, sampling is disabled as
> soon as
> > setting is less than log_min_duration_statements. Furthermore, I
> think we should
> > provide a more straightforward description for users.
> >
> >
> > You have true, but I have not a idea,how to describe it in one line. In
> this
> > case the zero is corner case, and sampling is disabled without
> dependency on
> > log_min_duration_statement.
> >
> > I think so this design has only few useful values and ranges
> >
> > a) higher than log_min_duration_statement .. sampling is active with
> limit
> > b) 0 .. for this case - other way how to effective disable sampling - no
> > dependency on other
> > c) -1 or negative value - sampling is allowed every time.
> >
> > Sure, there is range (0..log_min_duration_statement), but for this range
> this
> > value has not sense. I think so this case cannot be mentioned in short
> > description. But it should be mentioned in documentation.
>
> Yes, it took me a while to understand :) I am ok to keep simple in GUC
> description and give more information in documentation.
>

Maybe some like. "The zero block sampling. Negative value forces sampling
without limit"


> >
> >
> > I changed few comments and documentation:
> >
> >   * As we added much more logic in this function with statement and
> transaction
> > sampling. And now with statement_sample_rate, it is not easy to
> understand the
> > logic on first look. I reword comment in check_log_duration, I hope
> it is more
> > straightforward.
> >
> >   * I am not sure if "every_time" is a good naming for the variable.
> In fact, if
> > duration exceeds limit we disable sampling. Maybe sampling_disabled
> is more
> > clear?
> >
> >
> > For me important is following line
> >
> > (exceeded && (in_sample || every_time))
> >
> > I think so "every_time" or "always" or "every" is in this context more
> > illustrative than "sampling_disabled". But my opinion is not strong in
> this
> > case, and I have not a problem accept common opinion.
>
> Oh, yes, that's correct. I do not have a strong opinion too. Maybe someone
> else
> will have better idea.
>

the naming in this case is not hard issue, and comitter  can decide.

Regards

Pavel

>
> --
> Adrien
>
>

Re: Remove one last occurrence of "replication slave" in comments

2019-06-19 Thread Dagfinn Ilmari Mannsåker

Daniel Gustafsson  writes:

> A Twitter thread today regarding the use of master/slave [1] made me curious
> and so I had a look.  It seems that commit a1ef920e27ba6ab3602aaf6d6751d8628
> replaced most instances but missed at least one which is fixed in the 
> attached.
>
> cheers ./daniel

There were some more master/slave references in the plpgsql foreign key
tests, which the attached chages to base/leaf instead.

I didn't touch the last mention of "slave", in the pltcl code, because
it's calling the Tcl_CreateSlave() API function.

- ilmari
-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law


>From 4b9f6cd9e019e61118e38caad452caf2ec23af35 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Wed, 19 Jun 2019 17:59:19 +0100
Subject: [PATCH] Remove master/slave usage from plpgsql tests

---
 src/pl/plpgsql/src/expected/plpgsql_trap.out | 24 ++--
 src/pl/plpgsql/src/sql/plpgsql_trap.sql  | 12 +-
 2 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/src/pl/plpgsql/src/expected/plpgsql_trap.out b/src/pl/plpgsql/src/expected/plpgsql_trap.out
index 881603f5c4..bb8f065bc0 100644
--- a/src/pl/plpgsql/src/expected/plpgsql_trap.out
+++ b/src/pl/plpgsql/src/expected/plpgsql_trap.out
@@ -186,17 +186,17 @@ NOTICE:  should see this only if -100 fits in smallint
 --
 -- test foreign key error trapping
 --
-create temp table master(f1 int primary key);
-create temp table slave(f1 int references master deferrable);
-insert into master values(1);
-insert into slave values(1);
-insert into slave values(2);	-- fails
-ERROR:  insert or update on table "slave" violates foreign key constraint "slave_f1_fkey"
-DETAIL:  Key (f1)=(2) is not present in table "master".
+create temp table base(f1 int primary key);
+create temp table leaf(f1 int references base deferrable);
+insert into base values(1);
+insert into leaf values(1);
+insert into leaf values(2);	-- fails
+ERROR:  insert or update on table "leaf" violates foreign key constraint "leaf_f1_fkey"
+DETAIL:  Key (f1)=(2) is not present in table "base".
 create function trap_foreign_key(int) returns int as $$
 begin
 	begin	-- start a subtransaction
-		insert into slave values($1);
+		insert into leaf values($1);
 	exception
 		when foreign_key_violation then
 			raise notice 'caught foreign_key_violation';
@@ -238,8 +238,8 @@ begin;
 
   savepoint x;
 set constraints all immediate; -- fails
-ERROR:  insert or update on table "slave" violates foreign key constraint "slave_f1_fkey"
-DETAIL:  Key (f1)=(2) is not present in table "master".
+ERROR:  insert or update on table "leaf" violates foreign key constraint "leaf_f1_fkey"
+DETAIL:  Key (f1)=(2) is not present in table "base".
   rollback to x;
   select trap_foreign_key_2();  -- detects FK violation
 NOTICE:  caught foreign_key_violation
@@ -249,7 +249,7 @@ NOTICE:  caught foreign_key_violation
 (1 row)
 
 commit;-- still fails
-ERROR:  insert or update on table "slave" violates foreign key constraint "slave_f1_fkey"
-DETAIL:  Key (f1)=(2) is not present in table "master".
+ERROR:  insert or update on table "leaf" violates foreign key constraint "leaf_f1_fkey"
+DETAIL:  Key (f1)=(2) is not present in table "base".
 drop function trap_foreign_key(int);
 drop function trap_foreign_key_2();
diff --git a/src/pl/plpgsql/src/sql/plpgsql_trap.sql b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
index 7e75f46934..b658cb1888 100644
--- a/src/pl/plpgsql/src/sql/plpgsql_trap.sql
+++ b/src/pl/plpgsql/src/sql/plpgsql_trap.sql
@@ -127,18 +127,18 @@ select test_variable_storage();
 -- test foreign key error trapping
 --
 
-create temp table master(f1 int primary key);
+create temp table base(f1 int primary key);
 
-create temp table slave(f1 int references master deferrable);
+create temp table leaf(f1 int references base deferrable);
 
-insert into master values(1);
-insert into slave values(1);
-insert into slave values(2);	-- fails
+insert into base values(1);
+insert into leaf values(1);
+insert into leaf values(2);	-- fails
 
 create function trap_foreign_key(int) returns int as $$
 begin
 	begin	-- start a subtransaction
-		insert into slave values($1);
+		insert into leaf values($1);
 	exception
 		when foreign_key_violation then
 			raise notice 'caught foreign_key_violation';
-- 
2.20.1

Re: SQL/JSON path issues/questions

2019-06-19 Thread Thom Brown

On Thu, 13 Jun 2019 at 14:59, Thom Brown  wrote:
>
> Hi,
>
> I've been reading through the documentation regarding jsonpath and
> jsonb_path_query etc., and I have found it lacking explanation for
> some functionality, and I've also had some confusion when using the
> feature.
>
> ? operator
> ==
> The first mention of '?' is in section 9.15, where it says:
>
> "Suppose you would like to retrieve all heart rate values higher than
> 130. You can achieve this using the following expression:
> '$.track.segments[*].HR ? (@ > 130)'"
>
> So what is the ? operator doing here?  Sure, there's the regular ?
> operator, which is given as an example further down the page:
>
> '{"a":1, "b":2}'::jsonb ? 'b'
>
> But this doesn't appear to have the same purpose.
>
>
> like_regex
> ==
> Then there's like_regex, which shows an example that uses the keyword
> "flag", but that is the only instance of that keyword being mentioned,
> and the flags available to this expression aren't anywhere to be seen.
>
>
> is unknown
> ==
> "is unknown" suggests a boolean output, but the example shows an
> output of "infinity".  While I understand what it does, this appears
> inconsistent with all other "is..." functions (e.g. is_valid(lsn),
> pg_is_other_temp_schema(oid), pg_opclass_is_visible(opclass_oid),
> pg_is_in_backup() etc.).
>
>
> $varname
> ==
> The jsonpath variable, $varname, has an incomplete description: "A
> named variable. Its value must be set in the PASSING clause of an
> SQL/JSON query function. for details."
>
>
> Binary operation error
> ==
> I get an error when I run this query:
>
> postgres=# SELECT jsonb_path_query('[2]', '2 + $[1]');
> psql: ERROR:  right operand of jsonpath operator + is not a single numeric 
> value
>
> While I know it's correct to get an error in this scenario as there is
> no element beyond 0, the message I get is confusing.  I'd expect this
> if it encountered another array in that position, but not for
> exceeding the upper bound of the array.
>
>
> Cryptic error
> ==
> postgres=# SELECT jsonb_path_query('[1, "2",
> {},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].type()');
> psql: ERROR:  syntax error, unexpected ANY_P at or near "**" of jsonpath input
> LINE 1: ...},[{"a":2}],2.3,null,"2019-06-05T13:25:43.511Z"]','$[**].typ...
>  ^
> Again, I expect an error, but the message produced doesn't help me.
> I'll remove the ANY_P if I can find it.
>
>
> Can't use nested arrays with jsonpath
> ==
>
> I encounter an error in this scenario:
>
> postgres=# select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == 
> [1,2])');
> psql: ERROR:  syntax error, unexpected '[' at or near "[" of jsonpath input
> LINE 1: select jsonb_path_query('[1, 2, 1, [1,2], 3]','$[*] ? (@ == ...
>
> So these filter operators only work with scalars?
>
>

Another observation about the documentation is that the examples given
in 9.15. JSON Functions, Operators, and Expressions aren't all
functional.  Some example JSON is provided, followed by example
jsonpath queries which could be used against it.  These will produce
results for the reader wishing to test them out until this example:

'$.track.segments[*].HR ? (@ > 130)'

This is because there is no HR value greater than 130.  May I propose
setting this and all similar examples to (@ > 120) instead?

Also, this example doesn't work:

'$.track ? (@.segments[*] ? (@.HR > 130)).segments.size()'

This gives me:

psql: ERROR:  syntax error, unexpected $end at end of jsonpath input
LINE 13: }','$.track ? (@.segments[*]');
^

Thanks

Thom

Re: POC: Cleaning up orphaned files using undo logs

2019-06-19 Thread Robert Haas

On Wed, Jun 19, 2019 at 2:45 AM Amit Kapila  wrote:
> The reason for the same is that currently, the undo worker keep on
> executing the requests if there are any.  I think this is good when
> there are different requests, but getting the same request from error
> queue and doing it, again and again, doesn't seem to be good and I
> think it will not help either.

Even if there are multiple requests involved, you don't want a tight
loop like this.

> > I assume we want some kind of cool-down between retries.
> > 10 seconds?  A minute?  Some kind of back-off algorithm that gradually
> > increases the retry time up to some maximum?
>
> Yeah, something on these lines would be good.  How about if we add
> failure_count with each request in error queue?   Now, it will get
> incremented on each retry and we can wait in proportion to that, say
> 10s after the first retry, 20s after second and so on and maximum up
> to 10 failure_count (100s) will be allowed after which worker will
> exit considering it has no more work to do.
>
> Actually, we also need to think about what we should with such
> requests because even if undo worker exits after retrying for some
> threshold number of times, undo launcher will again launch a new
> worker for this request unless we have some special handling for the
> same.
>
> We can issue some WARNING once any particular request reached the
> maximum number of retries but not sure if that is enough because the
> user might not notice the same or didn't take any action.  Do we want
> to PANIC at some point of time, if so, when or the other alternative
> is we can try at regular intervals till we succeed?

PANIC is a terrible idea.  How would that fix anything?  You'll very
possibly still have the same problem after restarting, and so you'll
just keep on hitting the PANIC. That will mean that in addition to
whatever problem with undo you already had, you now have a system that
you can't use for anything at all, because it keeps restarting.

The design goal here should be that if undo for a transaction fails,
we keep retrying periodically, but with minimal adverse impact on the
rest of the system.  That means you can't retry in a loop. It also
means that the system needs to provide fairness: that is, it shouldn't
be possible to create a system where one or more transactions for
which undo keeps failing cause other transactions that could have been
undone to get starved.

It seems to me that thinking of this in terms of what the undo worker
does and what the undo launcher does is probably not the right
approach. We need to think of it more as an integrated system. Instead
of storing a failure_count with each request in the error queue, how
about storing a next retry time?  I think the error queue needs to be
ordered by database_id, then by next_retry_time, and then by order of
insertion.  (The last part is important because next_retry_time is
going to be prone to having ties, and we need to break those ties in
the right way.) So, when a per-database worker starts up, it's pulling
from the queues in alternation, ignoring items that are not for the
current database.  When it pulls from the error queue, it looks at the
item for the current database that has the lowest retry time - if
that's still in the future, then it ignores the queue until something
new (perhaps with a lower retry_time) is added, or until the first
next_retry_time arrives.  If the item that it pulls again fails, it
gets inserted back into the error queue but with a higher next retry
time.

This might not be exactly right, but the point is that there should
probably be NO logic that causes a worker to retry the same
transaction immediately afterward, with or without a delay. It should
be all be driven off what gets pulled out of the error queue.  In the
above sketch, if a worker gets to the point where there's nothing in
the error queue for the current database with a timestamp that is <=
the current time, then it can't pull anything else from that queue; if
there's no other work to do, it exits.  If there is other work to do,
it does that and then maybe enough time will have passed to allow
something to be pulled from the error queue, or maybe not.  Meanwhile
some other worker running in the same database might pull the item
before the original worker gets back to it.  Meanwhile if the worker
exits because there's nothing more to do in that database, the
launcher can also see the error queue.  When enough time has passed,
it can notice that there is an item (or items) that could be pulled
from the error queue for that database and launch a worker for that
database if necessary (or else let an existing worker take care of
it).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: New EXPLAIN option: ALL

2019-06-19 Thread David Fetter

On Wed, Jun 19, 2019 at 02:08:21PM +0200, Daniel Gustafsson wrote:
> > On 19 Jun 2019, at 08:15, Peter Eisentraut 
> >  wrote:
> > 
> > On 2019-06-18 23:15, David Fetter wrote:
> >> Are you proposing something along the lines of this?
> >> 
> >> PROFILE [statement]; /* Shows the plan */
> >> PROFILE RUN [statement]; /* Actually executes the query */
> > 
> > No, it would be
> > 
> > EXPLAIN statement; /* Shows the plan */
> > PROFILE statement; /* Actually executes the query */
> 
> That makes a lot of sense.
> 
> cheers ./daniel

+1

Thanks for clarifying.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: psql UPDATE field [tab] expands to DEFAULT?

2019-06-19 Thread Tom Lane

[ moving thread to -hackers ]

So I propose the attached patch for fixing the clear bugs that have
emerged in this discussion: don't confuse UPDATE ... SET ... with
GUC-setting commands, and don't offer just DEFAULT in contexts where
that's unlikely to be the only valid completion.

Nosing around in tab-complete.c, I notice a fair number of other
places where we're doing COMPLETE_WITH() with just a single possible
completion.  Knowing what we know now, in each one of those places
libreadline will suppose that that completion is the only syntactically
legal continuation, and throw away anything else the user might've typed.
We should probably inspect each of those places to see if that's really
desirable behavior ... but I didn't muster the energy to do that this
morning.

regards, tom lane

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5e38f46..3530570 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -3361,8 +3361,13 @@ psql_completion(const char *text, int start, int end)
 	else if (HeadMatches("ALTER", "DATABASE|FUNCTION|PROCEDURE|ROLE|ROUTINE|USER") &&
 			 TailMatches("SET", MatchAny))
 		COMPLETE_WITH("FROM CURRENT", "TO");
-	/* Suggest possible variable values */
-	else if (TailMatches("SET", MatchAny, "TO|="))
+
+	/*
+	 * Suggest possible variable values in SET variable TO|=, along with the
+	 * preceding ALTER syntaxes.
+	 */
+	else if (TailMatches("SET", MatchAny, "TO|=") &&
+			 !TailMatches("UPDATE", MatchAny, "SET", MatchAny, "TO|="))
 	{
 		/* special cased code for individual GUCs */
 		if (TailMatches("DateStyle", "TO|="))
@@ -3390,8 +3395,18 @@ psql_completion(const char *text, int start, int end)
 			else if (guctype && strcmp(guctype, "bool") == 0)
 COMPLETE_WITH("on", "off", "true", "false", "yes", "no",
 			  "1", "0", "DEFAULT");
-			else
-COMPLETE_WITH("DEFAULT");
+
+			/*
+			 * Note: if we didn't recognize the GUC name, it's important to
+			 * not offer any completions, as most likely we've misinterpreted
+			 * the context and this isn't a GUC-setting command at all.  If we
+			 * did recognize the GUC but it's not enum or bool type, it still
+			 * seems better to do nothing.  We used to offer (only) "DEFAULT"
+			 * as a possible completion, but that turns out to be a bad idea,
+			 * because with a single completion libreadline will decide that
+			 * that's the only legal text and throw away whatever the user
+			 * might've typed already.
+			 */
 
 			if (guctype)
 free(guctype);

Re: How to produce a Soft Block case of Deadlock Detection?

2019-06-19 Thread Rui Hai Jiang

I finally found this.

https://www.postgresql.org/message-id/29104.1182785028%40sss.pgh.pa.us

This is very useful to understand the Soft Block.

On Wed, Jun 19, 2019 at 7:18 PM Rui Hai Jiang  wrote:

> Hello, hackers.
>
> Any body know how to produce a Soft Block case of Deadlock Detection?
> I have produced the Hard Block case, but can't produce the Soft Block case.
>
>
> I read the design: src/backend/storage/lmgr/README. It reads,
>
> "If a process A is behind a process B in some lock's wait queue, and
> their requested locks conflict, then we must say that A waits for B, since
> ProcLockWakeup will never awaken A before B.  This creates additional
> edges in the WFG.  We call these "soft" edges, as opposed to the "hard"
> edges induced by locks already held.  Note that if B already holds any
> locks conflicting with A's request, then their relationship is a hard edge
> not a soft edge."
>
>
> But after trying many testing, I couldn't figure out how to produce a Soft
> Block.
>
> Following is what I did.
>
> * Hard Block Case
>
> ** Prepare
>
> create table t1 ( id int primary key, test int );
> create table t2 ( id int primary key, test int );
>
> insert into t1 values (10,10);
> insert into t2 values (20,20);
>
> ** test
>
> step1/backend1:
> begin;
> update t1 set test=11 where id=10;
>
> step2/backend2:
>   begin;
>   update t2 set test=21 where id=20;
>
> step3/backend1:
>   update t2 set test=21 where id=20;
>
> step4/process2: /*deadlock detected*/
>   update t1 set test=11 where id=10;
>
>
>
>
> * Soft Block Case
>
> ** Prepare
>
> create table t1 ( id int primary key, test int );
>
> create table t2 ( id int primary key, test int );
>
> create table t3 ( id int primary key, test int );
>
> insert into t1 values (10,10);
> insert into t2 values (20,20);
> insert into t3 values (30,30);
>
> ** test
>
> step1/backend1: /*lock t1.row1*/
> begin;
> select * from t1 where id=10 for update;
>
>
> step2/backend2: /*lock t2.row1*/
> begin;
> select * from t2 where id=20 for no key update;
>
> step3/backend3: /*lock t2.row1*/
> begin;
> select * from t2 where id=20 for key share;
>
> step4/backend4:/*lock t3.row1*/
> begin;
> select * from t3 where id=30 for update;
>
> step5/backend4：/*try to lock t1.row1*/
> update t1 set id=11 where id=10;
>
> step6/backend3：/*try to lock t3.row1*/
> update t3 set id=31 where id=30;
>
> step7/backend5：/*try to lock t2.row1, hope to create a soft edge*/
> begin;
> update t2 set id=21 where id=20;
>
> step8/backend1：/*try to lock t2.row1*/   /*Expect to detect soft block,
> but there seems no soft block*/
> update t2 set test=21 where id=20;
>
>

Re: pgsql: Avoid spurious deadlocks when upgrading a tuple lock

2019-06-19 Thread Oleksii Kliukin

Alvaro Herrera  wrote:

> On 2019-Jun-18, Oleksii Kliukin wrote:
> 
>> Sorry, I was confused, as I was looking only at
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=de87a084c0a5ac927017cd0834b33a932651cfc9
>> 
>> without taking your subsequent commit that silences compiler warnings at
>> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=3da73d6839dc47f1f47ca57974bf28e5abd9b572
>> into consideration. With that commit, the danger is indeed in resetting the
>> skip mechanism on each jump and potentially causing deadlocks.
> 
> Yeah, I understand the confusion.
> 
> Anyway, as bugs go, this one seems pretty benign.  It would result in a
> unexplained deadlock, very rarely, and only for people who use a very
> strange locking pattern that includes (row-level) lock upgrades.  I
> think it also requires aborted savepoints too, though I don't rule out
> the possibility that there might be a way to reproduce this without
> that.
> 
> I pushed the patch again just now, with the new permutation.

Thank you very much for working on it and committing the fix!

Cheers,
Oleksii

Re: Index Skip Scan

2019-06-19 Thread Dmitry Dolgov

> On Sun, Jun 16, 2019 at 5:03 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > I also agree with James that this should not be limited to Index Only
> > Scans. From testing the patch, the following seems pretty strange to
> > me:
> > ...
> > explain analyze select distinct on (a) a,b from abc order by a,b;
> > explain analyze select distinct on (a) a,b,c from abc order by a,b;
> > ...
>
> Yes, but I believe this limitation is not intrinsic to the idea of the patch,
> and the very same approach can be used for IndexScan in the second example.
> I've already prepared a new version to enable it for IndexScan with minimal
> modifications, just need to rebase it on top of the latest changes and then
> can post it. Although still there would be some limitations I guess (e.g. the
> first thing I've stumbled upon is that an index scan with a filter wouldn't
> work well, because checking qual causes with a filter happens after
> ExecScanFetch)

Here is what I was talking about, POC for an integration with index scan. About
using of create_skipscan_unique_path and suggested planner improvements, I hope
together with Jesper we can come up with something soon.


v18-0001-Index-skip-scan.patch
Description: Binary data

[proposal] de-TOAST'ing using a iterator

2019-06-19 Thread Binguo Bao

Hi hackers!
This proposal aims to provide the ability to de-TOAST a fully TOAST'd and
compressed field using an iterator and then update the appropriate parts of
the code to use the iterator where possible instead of de-TOAST'ing and
de-compressing the entire value. Examples where this can be helpful include
using position() from the beginning of the value, or doing a pattern or
substring match.

de-TOAST iterator overview:
1. The caller requests the slice of the attribute value from the de-TOAST
iterator.
2. The de-TOAST iterator checks if there is a slice available in the output
buffer, if there is, return the result directly,
otherwise goto the step3.
3. The de-TOAST iterator checks if there is the slice available in the
input buffer, if there is, goto step44. Otherwise,
call fetch_datum_iterator to fetch datums from disk to input buffer.
4. If the data in the input buffer is compressed, extract some data from
the input buffer to the output buffer until the caller's
needs are met.

I've implemented the prototype and apply it to the position() function to
test performance.
Test tables:
-
create table detoast_c (id serial primary key,
a text
);
insert into detoast_c (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 100)||'321' as a from
generate_series(1,100);

create table detoast_u (id serial primary key,
a text
);
alter table detoast_u alter a set storage external;
insert into detoast_u (a) select
repeat('1234567890-=abcdefghijklmnopqrstuvwxyz', 100)||'321' as a from
generate_series(1,100);
**
-
 query|
 master (ms)  |  patch  (ms)  |
-
select position('123' in a) from detoast_c;| 4054.838   |
1440.735   |
-
select position('321' in a) from detoast_c;| 25549.270 |
 27696.245  |
-
select position('123' in a) from detoast_u;| 8116.996   |
1386.802   |
-
select position('321' in a) from detoast_u | 28442.116 |
 27672.319  |
-
**
It can be seen that the iterator greatly improves the efficiency of partial
de-TOAST when it has almost no degradation in full de-TOAST efficiency.
Next, I will continue to study how to apply iterators to more queries
and improve iterator efficiency, such as using macros instead of function
calls.

The patch is also available on github[1].
Any suggestions or comments would be much appreciated:)

Best regards, Binguo Bao.

[1] https://github.com/djydewang/postgres/pull/1/files
From b071bf9801f45d5ff48422b44bd5042ae19ea20c Mon Sep 17 00:00:00 2001
From: BBG 
Date: Tue, 4 Jun 2019 22:56:42 +0800
Subject: [PATCH] de-TOASTing using a iterator

---
 src/backend/access/heap/tuptoaster.c | 488 +++
 src/backend/utils/adt/varlena.c  |  48 ++--
 src/include/access/tuptoaster.h  |  92 +++
 3 files changed, 612 insertions(+), 16 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c
index 55d6e91..92dc87a 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -83,6 +83,13 @@ static int	toast_open_indexes(Relation toastrel,
 static void toast_close_indexes(Relation *toastidxs, int num_indexes,
 LOCKMODE lock);
 static void init_toast_snapshot(Snapshot toast_snapshot);
+static FetchDatumIterator create_fetch_datum_iterator(struct varlena *attr);
+static bool free_fetch_datum_iterator(FetchDatumIterator iter);
+static int32 fetch_datum_iterate(FetchDatumIterator iter);
+static void init_toast_buffer(ToastBuffer *buf, int size, bool compressed);
+static bool free_toast_buffer(ToastBuffer *buf);
+static int32 pglz_decompress_iterate(ToastBuffer *source, ToastBuffer *dest,
+	 DetoastIterator iter, int32 length);
 
 
 /* --
@@ -347,6 +354,145 @@ heap_tuple_untoast_attr_slice(struct varlena *attr,
 
 
 /* --
+ * create_detoast_iterator -
+ *
+ * Initialize detoast iterator.
+ * --
+ */
+DetoastIterator create_detoast_iterator(struct varlena *attr) {
+	struct varatt_external toast_pointer;
+	DetoastIterator iterator = NULL;
+	if

Re: POC: Cleaning up orphaned files using undo logs

2019-06-19 Thread Dilip Kumar

On Wed, Jun 19, 2019 at 2:40 AM Robert Haas  wrote:
>
> On Tue, Jun 18, 2019 at 2:07 PM Robert Haas  wrote:
> > On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila  wrote:
> > > [ new patches ]
> >
> > I tried writing some code that throws an error from an undo log
> > handler and the results were not good.
>
> I discovered another bothersome thing here: if you have a single
> transaction that generates a bunch of undo records, the first one has
> uur_dbid set correctly and the remaining records have uur_dbid set to
> 0.  That means if you try to write a sanity check like if
> (record->uur_dbid != MyDatabaseId) elog(ERROR, "the undo system messed
> up") it fails.
>
> The original idea of UnpackedUndoRecord was this: you would put a
> bunch of data into an UnpackedUndoRecord that you wanted written to
> undo, and the undo system would find ways to compress stuff out of the
> on-disk representation by e.g. omitting the fork number if it's
> MAIN_FORKNUM. Then, when you read an undo record, it would decompress
> so that you ended up with the same UnpackedUndoRecord that you had at
> the beginning. However, the inclusion of transaction headers has made
> this a bit confusing: that stuff isn't being added by the user but by
> the undo system itself. It's not very clear from the comments what the
> contract is around these things: do you need to set uur_dbid to
> MyDatabaseId when preparing to insert an undo record? Or can you just
> leave it unset and then it'll possibly be set at decoding time?  The
> comments for the UnpackedUndoRecord structure don't explain this.
>
> I'd really like to see this draw a cleaner distinction between the
> stuff that the user is expected to set and the other stuff we deal
> with internally to the undo subsystem.  For example, suppose that
> UnpackedUndoRecord didn't include any of the fields that are only
> present in the transaction header.  Maybe there's another structure,
> like UndoTransactionHeader, that includes those fields.  The client of
> the undo subsystem creates a bunch of UnpackedUndoRecords and inserts
> them.  At undo time, the callback gets back an identical set of
> UnpackedUndoRecords.  And maybe it also gets a pointer to the
> UndoTransactionHeader which contains all of the system-generated
> stuff. Under this scheme, uur_xid, uur_xidepoch (which still need to
> be combined into uur_fxid), uur_progress, uur_dbid, uur_next,
> uur_prevlogstart, and uur_prevurp would all move out of the
> UnpackedUndoRecord and into the UndoTransactionHeader. The user would
> supply none of those things when inserting undo records, but the
> rm_undo callback could examine those values if it wished.
>
> A weaker approach would be to at least clean up the structure
> definition so that the transaction-header fields set by the system are
> clearly segregated from the per-record fields set by the
> undo-inserter, with comments explaining that those fields don't need
> to be set but will (or may?) be set at undo time. That would be better
> than what we have right now - because it would hopefully make it much
> more clear which fields need to be set on insert and which fields can
> be expected to be set when decoding - but I think it's probably not
> going far enough.

I think it's a fair point.   We can keep pointer to
UndoRecordTransaction(urec_progress, dbid, uur_next) and
UndoRecordLogSwitch(urec_prevurp, urec_prevlogstart) in
UnpackedUndoRecord and include them whenever undo record contain these
headers.  Transaction header in the first record of the transaction
and log-switch header in the first record after undo-log switch during
a transaction.  IMHO uur_fxid, we can keep as part of the main
UnpackedUndoRecord, because as part of the other work  "Compression
for undo records to consider rmgrid, xid,cid,reloid for each record",
the FullTransactionId, will be present in every UnpackedUndoRecord
(although it will not be stored in every undo record).


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Typo in tableamapi.c

2019-06-19 Thread Magnus Hagander

On Wed, Jun 19, 2019 at 2:57 PM Daniel Gustafsson  wrote:

> s/hte/the/ fixed in the attached.
>

Might as well keep being a commit-pipeline for you today :) applied,
thanks!
-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Typo in tableamapi.c

2019-06-19 Thread Daniel Gustafsson

s/hte/the/ fixed in the attached.

cheers ./daniel



tableamapi_typo.patch
Description: Binary data

Re: Remove one last occurrence of "replication slave" in comments

2019-06-19 Thread Magnus Hagander

On Wed, Jun 19, 2019 at 2:35 PM Daniel Gustafsson  wrote:

> A Twitter thread today regarding the use of master/slave [1] made me
> curious
> and so I had a look.  It seems that commit
> a1ef920e27ba6ab3602aaf6d6751d8628
> replaced most instances but missed at least one which is fixed in the
> attached.
>

Applied, thanks.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/

Remove one last occurrence of "replication slave" in comments

2019-06-19 Thread Daniel Gustafsson

A Twitter thread today regarding the use of master/slave [1] made me curious
and so I had a look.  It seems that commit a1ef920e27ba6ab3602aaf6d6751d8628
replaced most instances but missed at least one which is fixed in the attached.

cheers ./daniel

[1] https://twitter.com/Xof/status/1141040942645776384



slave_comment.patch
Description: Binary data

Re: New EXPLAIN option: ALL

2019-06-19 Thread Daniel Gustafsson

> On 19 Jun 2019, at 08:15, Peter Eisentraut  
> wrote:
> 
> On 2019-06-18 23:15, David Fetter wrote:
>> Are you proposing something along the lines of this?
>> 
>> PROFILE [statement]; /* Shows the plan */
>> PROFILE RUN [statement]; /* Actually executes the query */
> 
> No, it would be
> 
> EXPLAIN statement; /* Shows the plan */
> PROFILE statement; /* Actually executes the query */

That makes a lot of sense.

cheers ./daniel

How to produce a Soft Block case of Deadlock Detection?

2019-06-19 Thread Rui Hai Jiang

Hello, hackers.

Any body know how to produce a Soft Block case of Deadlock Detection?
I have produced the Hard Block case, but can't produce the Soft Block case.


I read the design: src/backend/storage/lmgr/README. It reads,

"If a process A is behind a process B in some lock's wait queue, and
their requested locks conflict, then we must say that A waits for B, since
ProcLockWakeup will never awaken A before B.  This creates additional
edges in the WFG.  We call these "soft" edges, as opposed to the "hard"
edges induced by locks already held.  Note that if B already holds any
locks conflicting with A's request, then their relationship is a hard edge
not a soft edge."


But after trying many testing, I couldn't figure out how to produce a Soft
Block.

Following is what I did.

* Hard Block Case

** Prepare

create table t1 ( id int primary key, test int );
create table t2 ( id int primary key, test int );

insert into t1 values (10,10);
insert into t2 values (20,20);

** test

step1/backend1:
begin;
update t1 set test=11 where id=10;

step2/backend2:
  begin;
  update t2 set test=21 where id=20;

step3/backend1:
  update t2 set test=21 where id=20;

step4/process2: /*deadlock detected*/
  update t1 set test=11 where id=10;




* Soft Block Case

** Prepare

create table t1 ( id int primary key, test int );

create table t2 ( id int primary key, test int );

create table t3 ( id int primary key, test int );

insert into t1 values (10,10);
insert into t2 values (20,20);
insert into t3 values (30,30);

** test

step1/backend1: /*lock t1.row1*/
begin;
select * from t1 where id=10 for update;


step2/backend2: /*lock t2.row1*/
begin;
select * from t2 where id=20 for no key update;

step3/backend3: /*lock t2.row1*/
begin;
select * from t2 where id=20 for key share;

step4/backend4:/*lock t3.row1*/
begin;
select * from t3 where id=30 for update;

step5/backend4：/*try to lock t1.row1*/
update t1 set id=11 where id=10;

step6/backend3：/*try to lock t3.row1*/
update t3 set id=31 where id=30;

step7/backend5：/*try to lock t2.row1, hope to create a soft edge*/
begin;
update t2 set id=21 where id=20;

step8/backend1：/*try to lock t2.row1*/   /*Expect to detect soft block, but
there seems no soft block*/
update t2 set test=21 where id=20;

Re: [PATCH] Stop ALTER SYSTEM from making bad assumptions

2019-06-19 Thread Amit Kapila

On Wed, Jun 19, 2019 at 10:27 AM Ian Barwick
 wrote:
>
> n 6/18/19 12:41 AM, Stephen Frost wrote:
>  > Greetings,
>  >
>  > * Ian Barwick (ian.barw...@2ndquadrant.com) wrote
> (...)
>
>  >> I suggest explicitly documenting postgresql.auto.conf behaviour (and the 
> circumstances
>  >> where it's acceptable to modify it outside of ALTER SYSTEM calls) in the 
> documentation
>  >> (and possibly in the code), so anyone writing utilities which need to
>  >> append to postgresql.auto.conf knows what the situation is.
>  >
>  > Yeah, I would think that, ideally, we'd have some code in the common
>  > library that other utilities could leverage and which the backend would
>  > also use.
>
> So maybe something along the lines of creating a stripped-down variant of
> AlterSystemSetConfigFile() (from "src/backend/utils/misc/guc.c") which can be
> called from front-end code to safely modify .auto.conf while the server is 
> *not*
> running.
>
> I'm not terribly familiar with the GUC code, but that would presumably mean 
> making
> parts of the GUC parsing/handling code linkable externally (ParseConfigFp() 
> etc.)
>

Yeah, this looks a bit tricky as we can't use ereport in the frontend
code and that is used at multiple places in that code path.

> as you'd need to parse the file before rewriting it. Something like (minimal
> pseudo-code):
>
>  void
>  alter_system_set(char *name, char *value)
>  {
>  /*
>   * check here that the server is *not* running
>   */
>  ...
>  ParseConfigFp(infile, AutoConfFileName, 0, LOG, , )
>  ...
>
>  /*
>   * some robust portable way of ensuring another process can't
>   * modify the file(s) until we're done
>   */
>  lock_file(AutoConfFileName);
>
>  replace_auto_config_value(, , name, value);
>
>  write_auto_conf_file(AutoConfTmpFileName, head)
>
>  durable_rename(AutoConfTmpFileName, AutoConfFileName, ERROR);
>
>  FreeConfigVariables(head);
>  unlock_file(AutoConfFileName);
>  }
>
> I'm not sure how feasible it is to validate the provided parameter
> without the server running, but if not possible, that's not any worse than the
> status quo, i.e. the utility has to be trusted to write the correct parameters
> to the file anyway.
>

Right.  I think even if someone has given wrong values, it will get
detected on next reload.

> The question is though - is this a change which is practical to make at this 
> point
> in the release cycle for Pg12?
>

It depends on the solution/patch we come up with to solve this issue.
What is the alternative?   If we allow Alter System to remove the
duplicate entries and call the current situation good, then we are
in-a-way allowing the room for similar or more problems in the future.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: PG 12 beta 1 segfault during analyze

2019-06-19 Thread Dagfinn Ilmari Mannsåker

Andres Freund  writes:

> Hi,
>
> On 2019-06-17 21:46:02 -0400, Steve Singer wrote:
>> On 6/15/19 10:18 PM, Tom Lane wrote:
>> > Steve Singer  writes:
>> > > I encountered the following segfault when running against a  PG 12 beta1
>> > > during a analyze against a table.
>> > Nobody else has reported this, so you're going to have to work on
>> > producing a self-contained test case, or else debugging it yourself.
>> > 
>> >regards, tom lane
>> > 
>> > 
>> > 
>> The attached patch fixes the issue.
>
> Thanks for the bug report, diagnosis and patch. Pushed.

I was going to suggest trying to prevent similar bugs by declaring these
and other output parameters as `double *const foo` in tableam.h, but
doing that without adding the corresponding `const` in heapam_handler.c
doesn't even raise a warning.

Still, declaring them as *const in both places might serve as an
example/reminder for people writing their own table AMs.

- ilmari
-- 
"I use RMS as a guide in the same way that a boat captain would use
 a lighthouse.  It's good to know where it is, but you generally
 don't want to find yourself in the same spot." - Tollef Fog Heen

Re: idea: log_statement_sample_rate - bottom limit for sampling

2019-06-19 Thread Adrien Nayrat

On 6/18/19 8:29 PM, Pavel Stehule wrote:
> 
> 
> út 18. 6. 2019 v 14:03 odesílatel Adrien Nayrat  > napsal:
> 
> Hi,
> 
> I tried the patch, here my comment:
> 
> > gettext_noop("Zero effective disables sampling. "
> >                          "-1 use sampling every time (without limit)."),
> 
> I do not agree with the zero case. In fact, sampling is disabled as soon 
> as
> setting is less than log_min_duration_statements. Furthermore, I think we 
> should
> provide a more straightforward description for users.
> 
> 
> You have true, but I have not a idea,how to describe it in one line. In this
> case the zero is corner case, and sampling is disabled without dependency on
> log_min_duration_statement.
>  
> I think so this design has only few useful values and ranges
> 
> a) higher than log_min_duration_statement .. sampling is active with limit
> b) 0 .. for this case - other way how to effective disable sampling - no
> dependency on other
> c) -1 or negative value - sampling is allowed every time.
> 
> Sure, there is range (0..log_min_duration_statement), but for this range this
> value has not sense. I think so this case cannot be mentioned in short
> description. But it should be mentioned in documentation.

Yes, it took me a while to understand :) I am ok to keep simple in GUC
description and give more information in documentation.

> 
> 
> I changed few comments and documentation:
> 
>   * As we added much more logic in this function with statement and 
> transaction
> sampling. And now with statement_sample_rate, it is not easy to 
> understand the
> logic on first look. I reword comment in check_log_duration, I hope it is 
> more
> straightforward.
> 
>   * I am not sure if "every_time" is a good naming for the variable. In 
> fact, if
> duration exceeds limit we disable sampling. Maybe sampling_disabled is 
> more
> clear?
> 
> 
> For me important is following line
> 
> (exceeded && (in_sample || every_time))
> 
> I think so "every_time" or "always" or "every" is in this context more
> illustrative than "sampling_disabled". But my opinion is not strong in this
> case, and I have not a problem accept common opinion.

Oh, yes, that's correct. I do not have a strong opinion too. Maybe someone else
will have better idea.

-- 
Adrien



signature.asc
Description: OpenPGP digital signature

Re: New EXPLAIN option: ALL

2019-06-19 Thread Gavin Flower


On 19/06/2019 18:15, Peter Eisentraut wrote:

On 2019-06-18 23:15, David Fetter wrote:

Are you proposing something along the lines of this?

PROFILE [statement]; /* Shows the plan */
PROFILE RUN [statement]; /* Actually executes the query */

No, it would be

EXPLAIN statement; /* Shows the plan */
PROFILE statement; /* Actually executes the query */


I think that looks good, and the verbs seem well appropriate. IMnsHO

Re: Some reloptions non-initialized when loaded

2019-06-19 Thread Kuntal Ghosh

Hello,

On Wed, Jun 19, 2019 at 10:23 AM Michael Paquier  wrote:
>
> Hi all,
>
> While looking at this code, I have noticed that a couple of reloptions
> which are not toast-specific don't get properly initialized.
> toast_tuple_target and parallel_workers are the ones standing out.
>
Do we also need to initialize vacuum_cleanup_index_scale_factor?



-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2019-06-19 Thread Paul Guo

On Mon, May 27, 2019 at 9:39 PM Paul Guo  wrote:

>
>
> On Tue, May 14, 2019 at 11:06 AM Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
>
>> Hello.
>>
>> At Mon, 13 May 2019 17:37:50 +0800, Paul Guo  wrote in <
>> caeet0zf9yn4daxyuflzocayyxuff1ms_oqwea+rwv3gha5q...@mail.gmail.com>
>> > Thanks for the reply.
>> >
>> > On Tue, May 7, 2019 at 2:47 PM Kyotaro HORIGUCHI <
>> > horiguchi.kyot...@lab.ntt.co.jp> wrote:
>> >
>> > >
>> > > +  if (!(stat(parent_path, ) == 0 && S_ISDIR(st.st_mode)))
>> > > +  {
>> > >
>> > > This patch is allowing missing source and destination directory
>> > > even in consistent state. I don't think it is safe.
>> > >
>> >
>> > I do not understand this. Can you elaborate?
>>
>> Suppose we were recoverying based on a backup at LSN1 targeting
>> to LSN3 then it crashed at LSN2, where LSN1 < LSN2 <= LSN3. LSN2
>> is called as "consistency point", before where the database is
>> not consistent. It's because we are applying WAL recored older
>> than those that were already applied in the second trial. The
>> same can be said for crash recovery, where LSN1 is the latest
>> checkpoint ('s redo LSN) and LSN2=LSN3 is the crashed LSN.
>>
>> Creation of an existing directory or dropping of a non-existent
>> directory are apparently inconsistent or "broken" so we should
>> stop recovery when seeing such WAL records while database is in
>> consistent state.
>>
>
> This seems to be hard to detect. I thought using invalid_page mechanism
> long ago,
> but it seems to be hard to fully detect a dropped tablespace.
>
> > > > 2) Fixed dbase_desc(). Now the xlog output looks correct.
>> > > >
>> > > > rmgr: Databaselen (rec/tot): 42/42, tx:486, lsn:
>> > > > 0/016386A8, prev 0/01638630, desc: CREATE copy dir base/1 to
>> > > > pg_tblspc/16384/PG_12_201904281/16386
>> > > >
>> > > > rmgr: Databaselen (rec/tot): 34/34, tx:487, lsn:
>> > > > 0/01638EB8, prev 0/01638E40, desc: DROP dir
>> > > > pg_tblspc/16384/PG_12_201904281/16386
>> > >
>> > > WAL records don't convey such information. The previous
>> > > description seems right to me.
>> > >
>> >
>> > 2019-04-17 14:52:14.951 CST [23030] CONTEXT:  WAL redo at 0/3011650 for
>> > Database/CREATE: copy dir 1663/1 to 65546/65547
>> > The directories are definitely wrong and misleading.
>>
>> The original description is right in the light of how the server
>> recognizes. The record exactly says that "copy dir 1663/1 to
>> 65546/65547" and the latter path is converted in filesystem layer
>> via a symlink.
>>
>
> In either $PG_DATA/pg_tblspc or symlinked real tablespace directory,
> there is an additional directory like PG_12_201905221 between
> tablespace oid and database oid. See the directory layout as below,
> so the directory info in xlog dump output was not correct.
>
> $ ls -lh data/pg_tblspc/
>
>
> total 0
>
>
> lrwxrwxrwx. 1 gpadmin gpadmin 6 May 27 17:23 16384 -> /tmp/2
>
>
> $ ls -lh /tmp/2
>
>
> total 0
>
>
> drwx--. 3 gpadmin gpadmin 18 May 27 17:24 PG_12_201905221
>
>>
>>
>> > > > > Also I'd suggest we should use pg_mkdir_p() in
>> > > TablespaceCreateDbspace()
>> > > > > to replace
>> > > > > the code block includes a lot of
>> > > > > get_parent_directory(), MakePGDirectory(), etc even it
>> > > > > is not fixing a bug since pg_mkdir_p() code change seems to be
>> more
>> > > > > graceful and simpler.
>> > >
>> > > But I don't agree to this. pg_mkdir_p goes above two-parents up,
>> > > which would be unwanted here.
>> > >
>> > > I do not understand this also. pg_mkdir_p() is similar to 'mkdir -p'.
>> > This change just makes the code concise. Though in theory the change is
>> not
>> > needed.
>>
>> We don't want to create tablespace direcotory after concurrent
>> DROPing, as the comment just above is saying:
>>
>> |  * Acquire TablespaceCreateLock to ensure that no DROP TABLESPACE
>> |  * or TablespaceCreateDbspace is running concurrently.
>>
>> If the concurrent DROP TABLESPACE destroyed the grand parent
>> directory, we mustn't create it again.
>>
>
> Yes, this is a good reason to keep the original code. Thanks.
>
> By the way, based on your previous test patch I added another test which
> could easily detect
> the missing src directory case.
>
>

I updated the patch to v3. In this version, we skip the error if copydir
fails due to missing src/dst directory,
but to make sure the ignoring is legal, I add a simple log/forget mechanism
(Using List) similar to the xlog invalid page
checking mechanism. Two tap tests are included. One is actually from a
previous patch by Kyotaro in this
email thread and another is added by me. In addition, dbase_desc() is fixed
to make the message accurate.

Thanks.


v3-0001-skip-copydir-if-either-src-directory-or-dst-direc.patch
Description: Binary data

Re: POC: Cleaning up orphaned files using undo logs

2019-06-19 Thread Amit Kapila

On Tue, Jun 18, 2019 at 11:37 PM Robert Haas  wrote:
>
> On Tue, Jun 18, 2019 at 7:31 AM Amit Kapila  wrote:
> > [ new patches ]
>
> I tried writing some code that throws an error from an undo log
> handler and the results were not good.  It appears that the code will
> retry in a tight loop:
>
> 2019-06-18 13:58:53.262 EDT [42803] ERROR:  robert_undo
> 2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
> 2019-06-18 13:58:53.263 EDT [42803] ERROR:  robert_undo
..
>
> It seems clear that the error-handling aspect of this patch has not
> been given enough thought.  It's debatable what strategy should be
> used when undo fails, but retrying 40 times per millisecond isn't the
> right answer.
>

The reason for the same is that currently, the undo worker keep on
executing the requests if there are any.  I think this is good when
there are different requests, but getting the same request from error
queue and doing it, again and again, doesn't seem to be good and I
think it will not help either.

> I assume we want some kind of cool-down between retries.
> 10 seconds?  A minute?  Some kind of back-off algorithm that gradually
> increases the retry time up to some maximum?
>

Yeah, something on these lines would be good.  How about if we add
failure_count with each request in error queue?   Now, it will get
incremented on each retry and we can wait in proportion to that, say
10s after the first retry, 20s after second and so on and maximum up
to 10 failure_count (100s) will be allowed after which worker will
exit considering it has no more work to do.

Actually, we also need to think about what we should with such
requests because even if undo worker exits after retrying for some
threshold number of times, undo launcher will again launch a new
worker for this request unless we have some special handling for the
same.

We can issue some WARNING once any particular request reached the
maximum number of retries but not sure if that is enough because the
user might not notice the same or didn't take any action.  Do we want
to PANIC at some point of time, if so, when or the other alternative
is we can try at regular intervals till we succeed?

>  Should there be one or
> more GUCs?
>

Yeah, we can do that, something like undo_apply_error_retry_count, but
I am not completely sure about this, maybe some pre-defined number say
10 or 20 should be enough.  However, I am fine if you or others think
that a guc can help users in this case.

> Another thing that is not very nice is that when I tried to shut down
> the server via 'pg_ctl stop' while the above was happening, it did not
> shut down.  I had to use an immediate shutdown.  That's clearly not
> OK.
>

CHECK_FOR_INTERRUPTS is missing at one place, will fix.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: New EXPLAIN option: ALL

2019-06-19 Thread Peter Eisentraut

On 2019-06-18 23:15, David Fetter wrote:
> Are you proposing something along the lines of this?
> 
> PROFILE [statement]; /* Shows the plan */
> PROFILE RUN [statement]; /* Actually executes the query */

No, it would be

EXPLAIN statement; /* Shows the plan */
PROFILE statement; /* Actually executes the query */

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

51 matches

Mail list logo