from:"Alexander"

Re: [HACKERS] Fix bloom WAL tap test

2017-11-13 Thread Alexander Korotkov

Hi!

On Fri, Nov 10, 2017 at 9:12 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> I wrote:
> > Is there anything we can do to cut the runtime of the TAP test to
> > the point where running it by default wouldn't be so painful?
>
> As an experiment, I tried simply cutting the size of the test table 10X:
>
> diff --git a/contrib/bloom/t/001_wal.pl b/contrib/bloom/t/001_wal.pl
> index 1b319c9..566abf9 100644
> --- a/contrib/bloom/t/001_wal.pl
> +++ b/contrib/bloom/t/001_wal.pl
> @@ -57,7 +57,7 @@ $node_standby->start;
>  $node_master->safe_psql("postgres", "CREATE EXTENSION bloom;");
>  $node_master->safe_psql("postgres", "CREATE TABLE tst (i int4, t
> text);");
>  $node_master->safe_psql("postgres",
> -"INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM
> generate_series(1,10) i;"
> +"INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM
> generate_series(1,1) i;"
>  );
>  $node_master->safe_psql("postgres",
> "CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 =
> 3);");
> @@ -72,7 +72,7 @@ for my $i (1 .. 10)
> test_index_replay("delete $i");
> $node_master->safe_psql("postgres", "VACUUM tst;");
> test_index_replay("vacuum $i");
> -   my ($start, $end) = (11 + ($i - 1) * 1, 10 + $i *
> 1);
> +   my ($start, $end) = (10001 + ($i - 1) * 1000, 1 + $i * 1000);
> $node_master->safe_psql("postgres",
>  "INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM
> generate_series($start,$end) i;"
> );
>
> This about halved the runtime of the TAP test, and it changed the coverage
> footprint not at all according to lcov.  (Said coverage is only marginally
> better than what we get without running the bloom TAP test, AFAICT.)
>
> It seems like some effort could be put into both shortening this test
> and improving the amount of code it exercises.
>

Thank you for committing patch which fixes tap test.
I'll try to improve coverage of this test and reduce its run time.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-11-13 Thread Alexander Korotkov

Hi!

On Mon, Nov 13, 2017 at 6:47 AM, Connor Wolf <connorw@imaginaryindustries.
com> wrote:

> Ok, I've managed to get my custom index working.
>

Good!

It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if
> anyone else needs a fuzzy-image searching system
> that can integrate into postgresql..
>
> It should be a pretty good basis for anyone else to use if they want to
> implement a SP-GiST index too.
>

I took a look at the code, and I feel myself a bit confused :)
It appears that you're indexing int8 values.  That seems like unrealistic
short representation for image signature.
Also, name of repository make me think that hamming distance would be used
to compare signatures.  But after look at the code, I see that plain
absolute value of difference is used for that purpose.

static double
getDistance(Datum v1, Datum v2)
{
int64_t a1 = DatumGetInt64(v1);
int64_t a2 = DatumGetInt64(v2);
int64_t diff = Abs(a1 - a2);
fprintf_to_ereport("getDistance %ld <-> %ld : %ld", a1, a2, diff);
return diff;
}

For such notion of distance, you don't need a VP-tree or another complex
indexing.  B-tree is quite enough in this case.  Alternatively, distance
function is not what it meant to be.

It would be useful if you provide complete usage example of this extension:
from image to signature conversion to search queries.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans

2017-11-13 Thread Alexander Kuzmenkov


I've pushed the executor part of this, but mostly not the planner part,
because I didn't think the latter was anywhere near ready for prime
time: the regression test changes it was causing were entirely bogus.


Hi Tom,

Thanks for the commit and the explanation. I'll try to address your 
comments if I continue working on the planner part.


--

Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix bloom WAL tap test

2017-11-09 Thread Alexander Korotkov

On Wed, Nov 8, 2017 at 5:46 AM, Masahiko Sawada <sawada.m...@gmail.com>
wrote:

> > So I think
> > that you should instead do something like that:
> >
> > --- a/contrib/bloom/Makefile
> > +++ b/contrib/bloom/Makefile
> > @@ -20,5 +20,12 @@ include $(top_builddir)/src/Makefile.global
> >  include $(top_srcdir)/contrib/contrib-global.mk
> >  endif
> >
> > +installcheck: wal-installcheck
> > +
> > +check: wal-check
> > +
> > +wal-installcheck:
> > +   $(prove_installcheck)
> > +
> >  wal-check: temp-install
> > $(prove_check)
> >
> > This works for me as I would expect it should.
>
> Looks good to me too.


OK, then so be it :)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


wal-check-on-bloom-check-3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Alexander Korotkov

Hi!

On Tue, Nov 7, 2017 at 4:34 PM, Masahiko Sawada <sawada.m...@gmail.com>
wrote:

> I understood the necessity of this patch and reviewed two patches.
>

Good, thank you.

> For /fix-bloom-wal-check.patch, it looks good to me. I found no
> problem. But for wal-check-on-bloom-check.patch, if you want to run
> wal-check even when running 'make check' or 'make check-world', we can
> just add wal-check test as follows?
>
> diff --git a/contrib/bloom/Makefile b/contrib/bloom/Makefile
> index 13bd397..c1566d4 100644
> --- a/contrib/bloom/Makefile
> +++ b/contrib/bloom/Makefile
> @@ -20,5 +20,7 @@ include $(top_builddir)/src/Makefile.global
>  include $(top_srcdir)/contrib/contrib-global.mk
>  endif
>
> +check: wal-check
> +
>  wal-check: temp-install
> $(prove_check)
>

I've tried this version Makefile.  And I've seen the only difference: when
tap tests are enabled, this version of Makefile runs tap tests before
regression tests.  While my version of Makefile runs tap tests after
regression tests.  That seems like more desirable behavior for me.  But it
would be sill nice to simplify Makefile.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Fix bloom WAL tap test

2017-11-07 Thread Alexander Korotkov

On Tue, Nov 7, 2017 at 4:26 PM, Fabrízio Mello <fabriziome...@gmail.com>
wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> The patch doesn't apply against master:
>
> fabrizio@macanudo:/d/postgresql (master)
> $ git apply /tmp/wal-check-on-bloom-check.patch
> error: contrib/bloom/Makefile: already exists in working directory
>

Apparently I sent patches whose are ok for "patch -p1", but not ok for "git
apply".
Sorry for that.   I resubmit both of them.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


fix-bloom-wal-check-2.patch
Description: Binary data


wal-check-on-bloom-check-2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Display number of heap accesses for index scans

2017-11-06 Thread Alexander Korotkov

On Mon, Nov 6, 2017 at 6:33 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Peter Geoghegan <p...@bowt.ie> writes:
> > Andres Freund <and...@anarazel.de> wrote:
> >> The number of index lookups that failed to return anything can be a
> >> critical performance factor in OLTP workloads.  Therefore it seems like
> >> it'd be a good idea to extend the explain analyze output to include that
> >> information.
>
> > I certainly agree.
>
> Doesn't the EXPLAIN (BUFFERS) output already address this?
>

In plain index scan EXPLAIN (ANALYZE, BUFFERS) doesn't distinguish buffers
accessed in index and buffers accessed in heap

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-11-03 Thread Alexander Korotkov

On Fri, Nov 3, 2017 at 12:37 PM, Connor Wolf <
conn...@imaginaryindustries.com> wrote:

> EDIT: That's actually exactly how the example I'm working off of works.
> DERP. The SQL is
>
> CREATE TYPE vptree_area AS
> (
> center _int4,
> distance float8
> );
>
> CREATE OR REPLACE FUNCTION vptree_area_match(_int4, vptree_area) RETURNS
> boolean AS
> 'MODULE_PATHNAME','vptree_area_match'
> LANGUAGE C IMMUTABLE STRICT;
>
> CREATE OPERATOR <@ (
> LEFTARG = _int4,
> RIGHTARG = vptree_area,
> PROCEDURE = vptree_area_match,
> RESTRICT = contsel,
> JOIN = contjoinsel);
>
> so I just need to understand how to parse out the custom type in my index
> operator.
>

You can see the implementation of vptree_area_match function located in
vptree.c.  It just calls GetAttributeByNum() function.

There is also alternative approach for that implemented in pg_trgm contrib
module.  It has "text % text" operator which checks if two strings are
similar enough.  The similarity threshold is defined by
pg_trgm.similarity_threshold GUC.  Thus, you can also define GUC with
threshold distance value.  However, it would place some limitations.  For
instance, you wouldn't be able to use different distance threshold in the
same query.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-03 Thread Alexander Korotkov

On Wed, Nov 1, 2017 at 5:55 AM, Masahiko Sawada <sawada.m...@gmail.com>
wrote:

> On Tue, Oct 31, 2017 at 6:17 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada <sawada.m...@gmail.com>
> > wrote:
> >>
> >> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas <robertmh...@gmail.com>
> >> wrote:
> >> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
> >> > <i.kartys...@postgrespro.ru> wrote:
> >> >> Hello. I made some bugfixes and rewrite the patch.
> >> >
> >> > I don't think it's a good idea to deliberately leave the state of the
> >> > standby different from the state of the  master on the theory that it
> >> > won't matter.  I feel like that's something that's likely to come back
> >> > to bite us.
> >>
> >> I agree with Robert. What happen if we intentionally don't apply the
> >> truncation WAL and switched over? If we insert a tuple on the new
> >> master server to a block that has been truncated on the old master,
> >> the WAL apply on the new standby will fail? I guess there are such
> >> corner cases causing failures of WAL replay after switch-over.
> >
> >
> > Yes, that looks dangerous.  One approach to cope that could be teaching
> heap
> > redo function to handle such these situations.  But I expect this
> approach
> > to be criticized for bad design.  And I understand fairness of this
> > criticism.
> >
> > However, from user prospective of view, current behavior of
> > hot_standby_feedback is just broken, because it both increases bloat and
> > doesn't guarantee that read-only query on standby wouldn't be cancelled
> > because of vacuum.  Therefore, we should be looking for solution: if one
> > approach isn't good enough, then we should look for another approach.
> >
> > I can propose following alternative approach: teach read-only queries on
> hot
> > standby to tolerate concurrent relation truncation.  Therefore, when
> > non-existent heap page is accessed on hot standby, we can know that it
> was
> > deleted by concurrent truncation and should be assumed to be empty.  Any
> > thoughts?
> >
>
> You also meant that the applying WAL for AccessExclusiveLock is always
> skipped on standby servers to allow scans to access the relation?


Definitely not every AccessExclusiveLock WAL records should be skipped, but
only whose were emitted during heap truncation.  There are other cases when
AccessExclusiveLock WAL records are emitted, for instance, during DDL
operations.  But, I'd like to focus on AccessExclusiveLock WAL records
caused by VACUUM for now.  It's kind of understandable for users that DDL
might cancel read-only query on standby.  So, if you're running long report
query then you should wait with your DDL.  But VACUUM is a different
story.  It runs automatically when you do normal DML queries.

AccessExclusiveLock WAL records by VACUUM could be either not emitted, or
somehow distinguished and skipped on standby.  I haven't thought out that
level of detail for now.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-11-03 Thread Alexander Korotkov

On Thu, Nov 2, 2017 at 5:56 AM, Robert Haas <robertmh...@gmail.com> wrote:

> On Tue, Oct 31, 2017 at 2:47 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > However, from user prospective of view, current behavior of
> > hot_standby_feedback is just broken, because it both increases bloat and
> > doesn't guarantee that read-only query on standby wouldn't be cancelled
> > because of vacuum.  Therefore, we should be looking for solution: if one
> > approach isn't good enough, then we should look for another approach.
> >
> > I can propose following alternative approach: teach read-only queries on
> hot
> > standby to tolerate concurrent relation truncation.  Therefore, when
> > non-existent heap page is accessed on hot standby, we can know that it
> was
> > deleted by concurrent truncation and should be assumed to be empty.  Any
> > thoughts?
>
> Sounds like it might break MVCC?

I don't know why it might be broken.  VACUUM truncates heap only when tail
to be truncated is already empty.  When applying truncate WAL record,
previous WAL records deleting all those tuples in the tail are already
applied.  Thus, if even MVCC is broken and we will miss some tuples after
heap truncation, they were anyway were gone before heap truncation.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-31 Thread Alexander Korotkov

On Tue, Oct 31, 2017 at 5:16 AM, Masahiko Sawada <sawada.m...@gmail.com>
wrote:

> On Mon, Oct 30, 2017 at 10:16 PM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > On Tue, Oct 24, 2017 at 1:26 PM, Ivan Kartyshov
> > <i.kartys...@postgrespro.ru> wrote:
> >> Hello. I made some bugfixes and rewrite the patch.
> >
> > I don't think it's a good idea to deliberately leave the state of the
> > standby different from the state of the  master on the theory that it
> > won't matter.  I feel like that's something that's likely to come back
> > to bite us.
>
> I agree with Robert. What happen if we intentionally don't apply the
> truncation WAL and switched over? If we insert a tuple on the new
> master server to a block that has been truncated on the old master,
> the WAL apply on the new standby will fail? I guess there are such
> corner cases causing failures of WAL replay after switch-over.
>

Yes, that looks dangerous.  One approach to cope that could be teaching
heap redo function to handle such these situations.  But I expect this
approach to be criticized for bad design.  And I understand fairness of
this criticism.

However, from user prospective of view, current behavior of
hot_standby_feedback
is just broken, because it both increases bloat and doesn't guarantee that
read-only query on standby wouldn't be cancelled because of vacuum.
Therefore, we should be looking for solution: if one approach isn't good
enough, then we should look for another approach.

I can propose following alternative approach: teach read-only queries on
hot standby to tolerate concurrent relation truncation.  Therefore, when
non-existent heap page is accessed on hot standby, we can know that it was
deleted by concurrent truncation and should be assumed to be empty.  Any
thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-10-31 Thread Alexander Korotkov

On Sun, Oct 29, 2017 at 12:47 AM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

> 2017-10-28 23:35 GMT+02:00 Alexander Korotkov <a.korot...@postgrespro.ru>:
>
>> On Sat, Oct 28, 2017 at 3:46 PM, Pavel Stehule <pavel.steh...@gmail.com>
>> wrote:
>>
>>> 2017-09-22 21:31 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:
>>>
>>>>
>>>>
>>>> 2017-09-22 21:12 GMT+02:00 Peter Eisentraut <
>>>> peter.eisentr...@2ndquadrant.com>:
>>>>
>>>>> On 9/22/17 09:16, Pavel Stehule wrote:
>>>>> > Example: somebody set SORT_COLUMNS to schema_name value. This is
>>>>> > nonsense for \l command
>>>>> >
>>>>> > Now, I am thinking so more correct and practical design is based on
>>>>> > special mode, activated by variable
>>>>> >
>>>>> > PREFER_SIZE_SORT .. (off, asc, desc)
>>>>> >
>>>>> > This has sense for wide group of commands that can show size. And
>>>>> when
>>>>> > size is not visible, then this option is not active.
>>>>>
>>>>> Maybe this shouldn't be a variable at all.  It's not like you'll set
>>>>> this as a global preference.  You probably want it for one command
>>>>> only.
>>>>>  So a per-command option might make more sense.
>>>>>
>>>>
>>>> Sure, I cannot to know, what users will do. But, when I need to see a
>>>> size of objects, then I prefer the sort by size desc every time. If I need
>>>> to find some object, then I can to use a searching in pager. So in my case,
>>>> this settings will be in psqlrc. In GoodData we used years own
>>>> customization - the order by size was hardcoded and nobody reported me any
>>>> issue.
>>>>
>>>> Alexander proposed some per command option, but current syntax of psql
>>>> commands don't allows some simple parametrization. If it can be user
>>>> friendly, then it should be short. From implementation perspective, it
>>>> should be simply parsed. It should be intuitive too - too much symbols
>>>> together is not good idea.
>>>>
>>>> Maybe some prefix design - but it is not design for common people
>>>> (although these people don't use psql usually)
>>>>
>>>> '\sort size \dt ?
>>>>
>>>> \dt:sort_by_size
>>>> \dt+:sort_by_size ?
>>>>
>>>> I don't see any good design in this direction
>>>>
>>>>
>>> I though about Alexander proposal, and I am thinking so it can be
>>> probably best if we respect psql design. I implemented two command suffixes
>>> (supported only when it has sense) "s" sorted by size and "d" as descent
>>>
>>> so list of tables can be sorted with commands:
>>>
>>> \dt+sd (in this case, the order is not strict), so command
>>> \dtsd+ is working too (same \disd+ or \di+sd)
>>>
>>> These two chars are acceptable. Same principle is used for \l command
>>>
>>> \lsd+ or \l+sd
>>>
>>> What do you think about it?
>>>
>>
>> I think \lsd+ command would be another postgres meme :)
>> BTW, are you going to provide an ability to sort by name, schema?
>>
>
> It has sense only for tables - probably only \dtn "n" like name
>

In general, this approach looks good for me.
Regarding current state of patch, I'd like to see new options documented.
Also, it would be better to replace "bool sort_size" with enum assuming
there could be other sorting orders in future.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-30 Thread Alexander Korotkov

On Sun, Oct 29, 2017 at 1:30 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
> I've reviewed code of ~> operator and its KNN-GiST support.
> Unfortunately, it appears that it's broken in design...  The explanation is
> above.
>
> We've following implementation of ~> operator.
>
> if (coord <= DIM(cube))
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[coord - 1]);
>> else
>> PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
>> cube->x[coord - 1 + DIM(cube)]));
>> }
>> else
>> {
>> if (IS_POINT(cube))
>> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
>> else
>> PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
>> cube->x[coord - 1 - DIM(cube)]));
>> }
>
>
> Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N -
> 1] are upper bounds.  However, we might have indexed cubes of different
> dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects
> upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects
> lower bound of 3rd dimension.
>
> Therefore, despite ~> operator was specially invented to be supported by
> KNN-GIST, it can't serve this way.
>
> Regarding particular case discovered by Tomas, I think the error is in the
> GiST supporting code.
>
> if (strategy == CubeKNNDistanceCoord)
>> {
>> int coord = PG_GETARG_INT32(1);
>> if (DIM(cube) == 0)
>> retval = 0.0;
>> else if (IS_POINT(cube))
>> retval = cube->x[(coord - 1) % DIM(cube)];
>> else
>> retval = Min(cube->x[(coord - 1) % DIM(cube)],
>> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
>> }
>
>
> g_cube_distance() always returns lower bound of cube.  It should return
> upper bound for coord > DIM(cube).
>
> It would be also nice to provide descending ordering using KNN-GiST.  It
> would be especially effective for upper bound.  Since, KNN-GiST doesn't
> support explicit descending ordering, it might be useful to make ~>
> operator return negative of coordinate when negative argument is provided.
> For instance, '(1,2), (3,4)'::cube ~> -1 return -1.
>
> I'd like to propose following way to resolve design issue.  cube ~> (2*N -
> 1) should return lower bound of Nth coordinate of the cube while cube ~>
> 2*N should return upper bound of Nth coordinate.  Then it would be
> independent on number of coordinates in particular cube.  For sure, it
> would be user-visible incompatible change.  But I think there is not so
> many users of this operator yet.  Also, current behavior of ~> seems quite
> useless.
>

Thus, I heard no objection to my approach.  Attached patch changes ~>
operator in the proposed way and fixes KNN-GiST search for that.  I'm going
to register this patch to the nearest commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


cube-knn-fix-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: full merge join on comparison clause

2017-10-30 Thread Alexander Kuzmenkov


Hi,

I am attaching the updated patch, rebased to 820c03.

On 09.10.2017 13:47, Ashutosh Bapat wrote:

Hi Alexander,
Commit c7a9fa399 has added another test on mergeopfamilies. I think
your patch will need to take care of that test.

On Wed, Oct 4, 2017 at 6:38 PM, Alexander Kuzmenkov
<a.kuzmen...@postgrespro.ru> wrote:

As discussed earlier, I changed the way we work with mergeopfamilies. I use
the "is_equality" flag to indicate whether the clause is an equality one,
and fill mergeopfamilies for both equality and inequality operators.
The updated patch is attached (rebased to 20b6552242).


--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company






--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 925b4cf553..73e6a4ca74 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -172,31 +172,32 @@ typedef enum
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
  */
-static MergeJoinClause
+static void
 MJExamineQuals(List *mergeclauses,
 			   Oid *mergefamilies,
 			   Oid *mergecollations,
 			   int *mergestrategies,
 			   bool *mergenullsfirst,
-			   PlanState *parent)
+			   MergeJoinState *parent)
 {
-	MergeJoinClause clauses;
 	int			nClauses = list_length(mergeclauses);
 	int			iClause;
 	ListCell   *cl;
 
-	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool));
+	parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool));
 
 	iClause = 0;
 	foreach(cl, mergeclauses)
 	{
 		OpExpr	   *qual = (OpExpr *) lfirst(cl);
-		MergeJoinClause clause = [iClause];
+		MergeJoinClause clause = >mj_Clauses[iClause];
 		Oid			opfamily = mergefamilies[iClause];
 		Oid			collation = mergecollations[iClause];
-		StrategyNumber opstrategy = mergestrategies[iClause];
+		StrategyNumber sort_op_strategy = mergestrategies[iClause];
 		bool		nulls_first = mergenullsfirst[iClause];
-		int			op_strategy;
+		int			join_op_strategy;
 		Oid			op_lefttype;
 		Oid			op_righttype;
 		Oid			sortfunc;
@@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses,
 		/*
 		 * Prepare the input expressions for execution.
 		 */
-		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
-		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent);
 
 		/* Set up sort support data */
 		clause->ssup.ssup_cxt = CurrentMemoryContext;
 		clause->ssup.ssup_collation = collation;
-		if (opstrategy == BTLessStrategyNumber)
+		if (sort_op_strategy == BTLessStrategyNumber)
 			clause->ssup.ssup_reverse = false;
-		else if (opstrategy == BTGreaterStrategyNumber)
+		else if (sort_op_strategy == BTGreaterStrategyNumber)
 			clause->ssup.ssup_reverse = true;
 		else	/* planner screwed up */
-			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+			elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy);
 		clause->ssup.ssup_nulls_first = nulls_first;
 
 		/* Extract the operator's declared left/right datatypes */
 		get_op_opfamily_properties(qual->opno, opfamily, false,
-   _strategy,
+   _op_strategy,
    _lefttype,
    _righttype);
-		if (op_strategy != BTEqualStrategyNumber)	/* should not happen */
-			elog(ERROR, "cannot merge using non-equality operator %u",
- qual->opno);
+
+		/*
+		 * Determine whether we accept lesser and/or equal tuples of the inner
+		 * relation.
+		 */
+		switch (join_op_strategy)
+		{
+			case BTEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+break;
+
+			case BTLessEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTLessStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			case BTGreaterEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTGreaterStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			default:
+elog(ERROR, "unsupported join strategy %d", join_op_strategy);
+		}
 
 		/*
 		 * sortsupport routine must know if abbreviation optimization is
@@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses,
 
 		iClause++;
 	}
-
-	return clauses;
 }
 
 /*
@@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)

Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Alexander Korotkov

On Mon, Oct 30, 2017 at 2:07 PM, Oleg Bartunov <obartu...@gmail.com> wrote:

> On Mon, Oct 30, 2017 at 12:05 PM, Oleg Bartunov <obartu...@gmail.com>
> wrote:
> > On Sun, Oct 29, 2017 at 10:07 AM, Connor Wolf
> > <w...@imaginaryindustries.com> wrote:
> >> Hi there!
> >>
> >> I'm looking at implementing a custom indexing scheme, and I've been
> having
> >> trouble understanding the proper approach.
> >>
> >> Basically, I need a BK tree, which is a tree-structure useful for
> indexing
> >> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
> >> across the hamming edit-distance of perceptual hashes, for fuzzy image
> >> searching). I'm pretty sure a SP-GiST index is the correct index type,
> as my
> >> tree is intrinsically unbalanced.
> >>
> >> I have a functional stand-alone implementation of a BK-Tree, and it
> works
> >> very well, but the complexity of managing what is basically a external
> index
> >> for my database has reached the point where it's significantly
> problematic,
> >> and it seems to be it should be moved into the database.
> >>
> >> Anyways, looking at the contents of postgres/src/backend/access/spgist,
> it
> >> looks pretty straightforward in terms of the actual C implementation,
> but
> >> I'm stuck understanding how to "install" a custom SP-GiST
> implementation.
> >> There are several GiST indexing implementations in the contrib
> directory,
> >> but no examples for how I'd go about implementing a loadable SP-GiST
> index.
> >>
> >> Basically, my questions are:
> >>
> >> Is it possible to implement a SP-GiST indexing scheme as a loadable
> module?
> >>
> >> If so, how?
> >> And is there an example I can base my implementation off of?
> >
> > Look on RUM access method ( https://github.com/postgrespro/rum ) we
> > developed using
> > api available since 9.6.
>
> or even simple, there is contrib/bloom access method, which illustrates
> developing access method as an extension.


I think Connor struggles to implement just an operator class.  Advising him
to implement an index access method is a good way to get him away of
PostgreSQL hacking for a long time :)

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] How to implement a SP-GiST index as a extension module?

2017-10-30 Thread Alexander Korotkov

Hi!

On Sun, Oct 29, 2017 at 12:07 PM, Connor Wolf <w...@imaginaryindustries.com>
wrote:

> I'm looking at implementing a custom indexing scheme, and I've been having
> trouble understanding the proper approach.
>
> Basically, I need a BK tree, which is a tree-structure useful for indexing
> arbitrary discrete metric-spaces (in my case, I'm interested in indexing
> across the hamming edit-distance of perceptual hashes, for fuzzy image
> searching). I'm *pretty* sure a SP-GiST index is the correct index type,
> as my tree is intrinsically unbalanced.
>

Yes, SP-GiST is appropriate index type for BK tree.  I'm pretty sure BK
tree could be implemented as SP-GiST opclass.
The only thing worrying me is selection pivot values for nodes.  SP-GiST
builds by insertion of index tuples on by one.  First pivot value for root
node in SP-GIST would be created once first leaf page overflows.  Thus, you
would have to select this pivot value basing on very small fraction in the
beginning of dataset.
As I know, BK tree is most efficient when root pivot value is selected
after looking in whole dataset and then hierarchically to subtrees.

BTW, did you try my extension for searching similar images.  It's quite
primitive, but works for some cases.
https://github.com/postgrespro/imgsmlr
https://wiki.postgresql.org/images/4/43/Pgcon_2013_similar_images.pdf

I have a functional stand-alone implementation of a BK-Tree, and it works
> very well, but the complexity of managing what is basically a external
> index for my database has reached the point where it's significantly
> problematic, and it seems to be it should be moved into the database.
>

Sure, moving this index to the database is right decision.

Anyways, looking at the contents of postgres/src/backend/access/spgist, it
> looks pretty straightforward in terms of the actual C implementation, but
> I'm stuck understanding how to "install" a custom SP-GiST implementation.
> There are several GiST indexing implementations in the contrib directory,
> but no examples for how I'd go about implementing a loadable SP-GiST index.
>
> Basically, my questions are:
>
>- Is it possible to implement a SP-GiST indexing scheme as a loadable
>module?
>
>  Yes, it's possible to define SP-GiST.

>
>- If so, how?
>
> The pretty same way as GiST opclass extension.  You have to define
supporting functions and operators and then define operator class over them.

>
>- And is there an example I can base my implementation off of?
>
>
>
> I'm relatively comfortable with C (much moreso with C++), but I haven't
> spent a lot of time looking at the postgresql codebase.  I don't think I
> could start from a empty folder and make a properly-implemented module in
> any reasonable period of time, so if I have a working example for some sort
> of index that uses the same interfaces that would really help a lot
>
I don't think there is an example in PostgreSQL source code tree or on
github.  But I've attached by early experiment with VP-tree (seems to be
pretty same as BK tree) using SP-GiST (see vptree.tar.gz).  Basing on this
experiment I realized that it's important to select root pivot value basing
on the whole dataset.  However, for your metric/dataset/queries it might
appear to be different.

It also would be nice to someday improve SP-GiST to support some global
strategies on index creation.  In particular, it would allow to resolve
selection of pivot values problem that I mention above.  Right now my
colleagues and me don't have time for that.  But I can assist you with
advises if you will decide to implement that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

vptree.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-28 Thread Alexander Korotkov

On Fri, Oct 20, 2017 at 1:01 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra <
> tomas.von...@2ndquadrant.com> wrote:
>
>> I've noticed this suspicious behavior of "cube" data type with ORDER BY,
>> which I believe is a bug in the extension (or the GiST index support).
>> The following example comes directly from regression tests added by
>> 33bd250f (so CC Teodor and Stas, who are mentioned in the commit).
>>
>> This query should produce results with ordering "ascending by 2nd
>> coordinate or upper right corner". To make it clear, I've added the
>> "c~>4" expression to the query, otherwise it's right from the test.
>>
>> test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
>>  c~>4 | c
>> --+---
>>50 | (30333, 50),(30273, 6)
>>75 | (43301, 75),(43227, 43)
>>   142 | (19650, 142),(19630, 51)
>>   160 | (2424, 160),(2424, 81)
>>   171 | (3449, 171),(3354, 108)
>>   155 | (18037, 155),(17941, 109)
>>   208 | (28511, 208),(28479, 114)
>>   217 | (19946, 217),(19941, 118)
>>   191 | (16906, 191),(16816, 139)
>>   187 | (759, 187),(662, 163)
>>   266 | (22684, 266),(22656, 181)
>>   255 | (24423, 255),(24360, 213)
>>   249 | (45989, 249),(45910, 222)
>>   377 | (11399, 377),(11360, 294)
>>   389 | (12162, 389),(12103, 309)
>> (15 rows)
>>
>> As you can see, it's not actually sorted by the c~>4 coordinate (but by
>> c~>2, which it the last number).
>>
>> Moreover, disabling index scans fixes the ordering:
>>
>> test=# set enable_indexscan = off;
>> SET
>> test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
>> ascending by 2nd coordinate or upper right corner
>>  ?column? | c
>> --+---
>>50 | (30333, 50),(30273, 6)
>>75 | (43301, 75),(43227, 43)
>>   142 | (19650, 142),(19630, 51)
>>   155 | (18037, 155),(17941, 109)
>>   160 | (2424, 160),(2424, 81)
>>   171 | (3449, 171),(3354, 108)
>>   187 | (759, 187),(662, 163)
>>   191 | (16906, 191),(16816, 139)
>>   208 | (28511, 208),(28479, 114)
>>   217 | (19946, 217),(19941, 118)
>>   249 | (45989, 249),(45910, 222)
>>   255 | (24423, 255),(24360, 213)
>>   266 | (22684, 266),(22656, 181)
>>   367 | (31018, 367),(30946, 333)
>>   377 | (11399, 377),(11360, 294)
>> (15 rows)
>>
>>
>> Seems like a bug somewhere in gist_cube_ops, I guess?
>>
>
> +1,
> that definitely looks like a bug. Thank you for reporting!
> I'll take a look on it in couple days.
>

I've reviewed code of ~> operator and its KNN-GiST support.  Unfortunately,
it appears that it's broken in design...  The explanation is above.

We've following implementation of ~> operator.

if (coord <= DIM(cube))
> {
> if (IS_POINT(cube))
> PG_RETURN_FLOAT8(cube->x[coord - 1]);
> else
> PG_RETURN_FLOAT8(Min(cube->x[coord - 1],
> cube->x[coord - 1 + DIM(cube)]));
> }
> else
> {
> if (IS_POINT(cube))
> PG_RETURN_FLOAT8(cube->x[(coord - 1) % DIM(cube)]);
> else
> PG_RETURN_FLOAT8(Max(cube->x[coord - 1],
> cube->x[coord - 1 - DIM(cube)]));
> }

Thus, for cube of N dimensions [1; N - 1] are lower bounds and [N; 2*N - 1]
are upper bounds.  However, we might have indexed cubes of different
dimensions.  For example, for cube of 2 dimensions "cube ~> 3" selects
upper bound of 1st dimension.  For cube of 3 dimensions "cube ~> 3" selects
lower bound of 3rd dimension.

Therefore, despite ~> operator was specially invented to be supported by
KNN-GIST, it can't serve this way.

Regarding particular case discovered by Tomas, I think the error is in the
GiST supporting code.

if (strategy == CubeKNNDistanceCoord)
> {
> int coord = PG_GETARG_INT32(1);
> if (DIM(cube) == 0)
> retval = 0.0;
> else if (IS_POINT(cube))
> retval = cube->x[(coord - 1) % DIM(cube)];
> else
> retval = Min(cube->x[(coord - 1) % DIM(cube)],
> cube->x[(coord - 1) % DIM(cube) + DIM(cube)]);
> }

g_cube_distance() always returns lower bound of cube.  It should return
upper bound for coord > DIM(cube).

It would be also nice to provide descending ordering using KNN-GiST.  It
would be especially effective for upper bound.  Since, KNN-GiST doesn't
support explicit descending ordering, it might be useful to make ~>
operator return negative of coordinate when negative argument is provided.
For instance, '(1,2), (3,4)'::cube ~> -1 return -1.

I'd like to propose following way to resolve design issue.  cube ~> (2*N -
1) should return lower bound of Nth coordinate of the cube while cube ~>
2*N should return upper bound of Nth coordinate.  Then it would be
independent on number of coordinates in particular cube.  For sure, it
would be user-visible incompatible change.  But I think there is not so
many users of this operator yet.  Also, current behavior of ~> seems quite
useless.

Any thoughts?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-10-28 Thread Alexander Korotkov

On Sat, Oct 28, 2017 at 3:46 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

> 2017-09-22 21:31 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:
>
>>
>>
>> 2017-09-22 21:12 GMT+02:00 Peter Eisentraut <
>> peter.eisentr...@2ndquadrant.com>:
>>
>>> On 9/22/17 09:16, Pavel Stehule wrote:
>>> > Example: somebody set SORT_COLUMNS to schema_name value. This is
>>> > nonsense for \l command
>>> >
>>> > Now, I am thinking so more correct and practical design is based on
>>> > special mode, activated by variable
>>> >
>>> > PREFER_SIZE_SORT .. (off, asc, desc)
>>> >
>>> > This has sense for wide group of commands that can show size. And when
>>> > size is not visible, then this option is not active.
>>>
>>> Maybe this shouldn't be a variable at all.  It's not like you'll set
>>> this as a global preference.  You probably want it for one command only.
>>>  So a per-command option might make more sense.
>>>
>>
>> Sure, I cannot to know, what users will do. But, when I need to see a
>> size of objects, then I prefer the sort by size desc every time. If I need
>> to find some object, then I can to use a searching in pager. So in my case,
>> this settings will be in psqlrc. In GoodData we used years own
>> customization - the order by size was hardcoded and nobody reported me any
>> issue.
>>
>> Alexander proposed some per command option, but current syntax of psql
>> commands don't allows some simple parametrization. If it can be user
>> friendly, then it should be short. From implementation perspective, it
>> should be simply parsed. It should be intuitive too - too much symbols
>> together is not good idea.
>>
>> Maybe some prefix design - but it is not design for common people
>> (although these people don't use psql usually)
>>
>> '\sort size \dt ?
>>
>> \dt:sort_by_size
>> \dt+:sort_by_size ?
>>
>> I don't see any good design in this direction
>>
>>
> I though about Alexander proposal, and I am thinking so it can be probably
> best if we respect psql design. I implemented two command suffixes
> (supported only when it has sense) "s" sorted by size and "d" as descent
>
> so list of tables can be sorted with commands:
>
> \dt+sd (in this case, the order is not strict), so command
> \dtsd+ is working too (same \disd+ or \di+sd)
>
> These two chars are acceptable. Same principle is used for \l command
>
> \lsd+ or \l+sd
>
> What do you think about it?
>

I think \lsd+ command would be another postgres meme :)
BTW, are you going to provide an ability to sort by name, schema?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Index only scan for cube and seg

2017-10-28 Thread Alexander Korotkov

On Fri, Oct 27, 2017 at 9:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Robert Haas <robertmh...@gmail.com> writes:
> > On Thu, Oct 26, 2017 at 12:22 PM, Andrey Borodin <x4...@yandex-team.ru>
> wrote:
> >> For cube there is new default opclass.
>
> > I seem to recall that changing the default opclass causes unsolvable
> > problems with upgrades.  You might want to check the archives for
> > previous discussions of this issue; unfortunately, I don't recall the
> > details off-hand.
>
> Quite aside from that, replacing the opclass with a new one creates
> user-visible headaches that I don't think are justified, i.e. having to
> reconstruct indexes in order to get the benefit.
>
> Maybe I'm missing something, but ISTM you could just drop the compress
> function and call it good.  This would mean that an IOS scan would
> sometimes hand back a toast-compressed value, but what's the problem
> with that?
>

+1,
I think in this case replacing default opclass or even duplicating opclass
isn't justified.

(The only reason for making a decompress function that just detoasts
> is that your other support functions then do not have to consider
> the possibility that they're handed a toast-compressed value.  I have
> not checked whether that really matters for cube.)
>

As I can see, cube GiST code always uses DatumGetNDBOX() macro to transform
Datum to (NDBOX *).

#define DatumGetNDBOX(x) ((NDBOX *) PG_DETOAST_DATUM(x))

Thus, it should be safe to just remove both compress/decompress methods
from existing opclass.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] WIP: long transactions on hot standby feedback replica / proof of concept

2017-10-24 Thread Alexander Korotkov

On Tue, Oct 24, 2017 at 10:56 AM, Ivan Kartyshov <i.kartys...@postgrespro.ru
> wrote:

> Hello. I made some bugfixes and rewrite the patch.
>
> Simon Riggs писал 2017-09-05 14:44:
>
>> As Alexander says, simply skipping truncation if standby is busy isn't
>> a great plan.
>>
>> If we defer an action on standby replay, when and who will we apply
>> it? What happens if the standby is shutdown or crashes while an action
>> is pending.
>>
>
> After crash standby server will go to the nearest checkout and will replay
> all his wal`s to reach consistent state and then open for read-only load.
> (all collected wal`s will be replayed in right way, I meant wal`s that
> truncates table and wal`s that make table grow)


--- a/src/backend/storage/lmgr/lock.c
> +++ b/src/backend/storage/lmgr/lock.c
> @@ -49,6 +49,8 @@
>  #include "utils/ps_status.h"
>  #include "utils/resowner_private.h"
>
> +#include 
> +#include 
>
>  /* This configuration variable is used to set the lock table size */
>  int max_locks_per_xact; /* set by guc.c */


Why do you need these includes?

+ FreeFakeRelcacheEntry(rel);
> + } else
> + {
> + XLogFlush(lsn);
> + }


This code violates our coding style.  According to it, "else" shouldn't
share its line with braces.

Also, patch definitely needs some general comment explaining what's going
on.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-10-23 Thread Alexander Korotkov

On Mon, Oct 23, 2017 at 12:42 PM, Ivan Kartyshov <i.kartys...@postgrespro.ru
> wrote:

> New little cleanup code changes
>

Despite code cleanup, you still have some random empty lines removals in
your patch.

@@ -149,7 +150,6 @@ const struct config_enum_entry sync_method_options[] = {
>   {NULL, 0, false}
>  };
>
> -
>  /*
>   * Although only "on", "off", and "always" are documented,
>   * we accept all the likely variants of "on" and "off".



@@ -141,7 +141,6 @@
>  #include "utils/timestamp.h"
>  #include "utils/tqual.h"
>
> -
>  /*
>   * Maximum size of a NOTIFY payload, including terminating NULL.  This
>   * must be kept small enough so that a notification message fits on one


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] CUBE seems a bit confused about ORDER BY

2017-10-19 Thread Alexander Korotkov

Hi!

On Fri, Oct 20, 2017 at 12:52 AM, Tomas Vondra <tomas.von...@2ndquadrant.com
> wrote:

> I've noticed this suspicious behavior of "cube" data type with ORDER BY,
> which I believe is a bug in the extension (or the GiST index support).
> The following example comes directly from regression tests added by
> 33bd250f (so CC Teodor and Stas, who are mentioned in the commit).
>
> This query should produce results with ordering "ascending by 2nd
> coordinate or upper right corner". To make it clear, I've added the
> "c~>4" expression to the query, otherwise it's right from the test.
>
> test=# SELECT c~>4 "c~>4", * FROM test_cube ORDER BY c~>4 LIMIT 15;
>  c~>4 | c
> --+---
>50 | (30333, 50),(30273, 6)
>75 | (43301, 75),(43227, 43)
>   142 | (19650, 142),(19630, 51)
>   160 | (2424, 160),(2424, 81)
>   171 | (3449, 171),(3354, 108)
>   155 | (18037, 155),(17941, 109)
>   208 | (28511, 208),(28479, 114)
>   217 | (19946, 217),(19941, 118)
>   191 | (16906, 191),(16816, 139)
>   187 | (759, 187),(662, 163)
>   266 | (22684, 266),(22656, 181)
>   255 | (24423, 255),(24360, 213)
>   249 | (45989, 249),(45910, 222)
>   377 | (11399, 377),(11360, 294)
>   389 | (12162, 389),(12103, 309)
> (15 rows)
>
> As you can see, it's not actually sorted by the c~>4 coordinate (but by
> c~>2, which it the last number).
>
> Moreover, disabling index scans fixes the ordering:
>
> test=# set enable_indexscan = off;
> SET
> test=# SELECT c~>4, * FROM test_cube ORDER BY c~>4 LIMIT 15; --
> ascending by 2nd coordinate or upper right corner
>  ?column? | c
> --+---
>50 | (30333, 50),(30273, 6)
>75 | (43301, 75),(43227, 43)
>   142 | (19650, 142),(19630, 51)
>   155 | (18037, 155),(17941, 109)
>   160 | (2424, 160),(2424, 81)
>   171 | (3449, 171),(3354, 108)
>   187 | (759, 187),(662, 163)
>   191 | (16906, 191),(16816, 139)
>   208 | (28511, 208),(28479, 114)
>   217 | (19946, 217),(19941, 118)
>   249 | (45989, 249),(45910, 222)
>   255 | (24423, 255),(24360, 213)
>   266 | (22684, 266),(22656, 181)
>   367 | (31018, 367),(30946, 333)
>   377 | (11399, 377),(11360, 294)
> (15 rows)
>
>
> Seems like a bug somewhere in gist_cube_ops, I guess?
>

+1,
that definitely looks like a bug. Thank you for reporting!
I'll take a look on it in couple days.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Re: Is anything preventing us from allowing write to foreign tables from standby?

2017-10-17 Thread Alexander Korotkov

On Wed, Sep 6, 2017 at 4:42 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> We're currently blocking writing queries on standby if even they are
> modifying contents of foreign tables.  But do we have serious reasons for
> that?
> Keeping in the mind FDW-sharding, making FDW-tables writable from standby
> would be good to prevent single-master bottleneck.
> I wrote simple patch enabling writing to foreign tables from standbys.  It
> works from the first glance for me.
>

No interest yet, but no objections too :-)
I'm going to add this to next commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-10-13 Thread Alexander Korotkov

On Fri, Oct 13, 2017 at 9:41 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Oct 13, 2017 at 1:59 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> >> Fully agreed.
> >
> > If we implement that interface, where does that leave EvalPlanQual()?
>

>From the first glance, it seems that pluggable storage should
override EvalPlanQualFetch(), rest of EvalPlanQual() looks quite generic.

> > Do those semantics have to be preserved?
>
> For a general-purpose heap storage format, I would say yes.

+1

I mean, we don't really have control over how people use the API.  If
> somebody decides to implement a storage API that breaks EvalPlanQual
> semantics horribly, I can't stop them, and I don't want to stop them.
> Open source FTW.
>

Yeah.  We don't have any kind of "safe extensions".  Any extension can
break things really horribly.
For me that means user should absolutely trust extension developer.

But I don't really want that code in our tree, either.

We keep things in our tree as correct as we can.  And for sure, we should
follow this politics for pluggable storages too.

> I think a
> storage engine is and should be about the format in which data gets
> stored on disk, and that it should only affect the performance of
> queries not the answers that they give.

Pretty same idea as index access methods.  They also affects the
performance, but not query answers.  When it's not true, this situation
is considered as bug, and it needs to be fixed.

> I am sure there will be cases
> where, for reasons of implementation complexity, that turns out not to
> be true, but I think in general we should try to avoid it as much as
> we can.

I think in some cases we can tolerate missing features (and document it),
but don't tolerate wrong features.  For instance, we may have some pluggable
storage which doesn't support transactions at all (and that should be
documented for sure), but we shouldn't have pluggable storage which
transaction support is wrong.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-10-13 Thread Alexander Korotkov

On Fri, Oct 13, 2017 at 9:37 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Fri, Oct 13, 2017 at 5:25 AM, Kuntal Ghosh
> <kuntalghosh.2...@gmail.com> wrote:
> > For some other
> > storage engine, if we maintain the older version in different storage,
> > undo for example, and don't require a new index entry, should we still
> > call it HOT-chain?
>
> I would say, emphatically, no.  HOT is a creature of the existing
> heap.  If it's creeping into storage APIs they are not really
> abstracted from what we have currently.


+1,
different storage may need to insert entries to only *some* of indexes.
Wherein these new index entries may have either same or new TIDs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Predicate Locks for writes?

2017-10-13 Thread Alexander Korotkov

On Wed, Oct 11, 2017 at 7:27 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Wed, Oct 11, 2017 at 11:51 AM, Simon Riggs <si...@2ndquadrant.com>
> wrote:
> > I'm inclined to believe Robert's hypothesis that there is some race
> > condition there.
> >
> > The question is whether that still constitutes a serializability
> > problem since we haven't done anything with the data, just passed it
> > upwards to be modified.
>
> Well, the question is whether passing it upwards constitutes reading
> it.  I kind of suspect it does.  The plan tree isn't just blindly
> propagating values upward but stuff with them.  There could be quite a
> bit between the ModifyTable node and the underlying scan if DELETE ..
> FROM or UPDATE .. USING is in use.
>

Right.  In general we can't skip SIReadLock just basing on the fact that
we're under ModifyTable node.
It seems still possible for me to skip SIReadLock in some very limited (but
probably typical) cases.
But before analyzing this deeper, it would be nice to estimate possible
benefits.
We can try some broken version which skip all SIReadLock's under
ModifyTable node and benchmark it.
If it wouldn't have significant performance benefit, then there is no
reason to do something further...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-10-12 Thread Alexander Korotkov

On Wed, Oct 11, 2017 at 11:08 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Oct 9, 2017 at 10:22 AM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > For me, it's crucial point that pluggable storages should be able to have
> > different MVCC implementation, and correspondingly have full control over
> > its interactions with indexes.
> > Thus, it would be good if we would get consensus on that point.  I'd like
> > other discussion participants to comment whether they agree/disagree and
> > why.
> > Any comments?
>
> I think it's good for new storage managers to have full control over
> interactions with indexes.  I'm not sure about the MVCC part.  I think
> it would be legitimate to want a storage manager to ignore MVCC
> altogether - e.g. to build a non-transactional table.  I don't know
> that it would be a very good idea to have two different full-fledged
> MVCC implementations, though.  Like Tom says, that would be
> replicating a lot of the awfulness of the MySQL model.

It's probably that we imply different meaning to "MVCC implementation".
While writing "MVCC implementation" I meant that, for instance, alternative
storage
may implement UNDO chains to store versions of same row.  Correspondingly,
it may not have any analogue of our HOT.

However I imply that alternative storage would share our "MVCC model".  So,
it
should share our transactional model including transactions,
subtransactions, snapshots etc.
Therefore, if alternative storage is transactional, then in particular it
should be able to fetch tuple with
given TID according to given snapshot.  However, how it's implemented
internally is
a black box for us.  Thus, we don't insist that tuple should have different
TID after update;
we don't insist there is any analogue of HOT; we don't insist alternative
storage needs vacuum
(or if even it needs vacuum, it might be performed in completely different
way) and so on.

During conversations with you at PGCon and other conferences I had
impression
that you share this view on pluggable storages and MVCC.  Probably, we just
express
this view in different words.  Or alternatively I might understand you
terribly wrong.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-10-09 Thread Alexander Korotkov

On Mon, Oct 9, 2017 at 5:32 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alexander Korotkov <a.korot...@postgrespro.ru> writes:
> > For me, it's crucial point that pluggable storages should be able to have
> > different MVCC implementation, and correspondingly have full control over
> > its interactions with indexes.
> > Thus, it would be good if we would get consensus on that point.  I'd like
> > other discussion participants to comment whether they agree/disagree and
> > why.
> > Any comments?
>
> TBH, I think that's a good way of ensuring that nothing will ever get
> committed.  You're trying to draw the storage layer boundary at a point
> that will take in most of the system.  If we did build it like that,
> what we'd end up with would be very reminiscent of mysql's storage
> engines, complete with inconsistent behaviors and varying feature sets
> across engines.  I don't much want to go there.
>

However, if we insist that pluggable storage should have the same MVCC
implementation, interacts with indexes the same way and also use TIDs as
tuple identifiers, then what useful implementations might we have?
Per-page heap compression and encryption?  Or different heap page layout?
Or tuple format?  OK, but that doesn't justify such wide API as it's
implemented in the current version of patch in this thread.  If we really
want to restrict applicability of pluggable storages that way, then we
probably should give up with "pluggable storages" and make it "pluggable
heap page format" at I proposed upthread.

Implementation of alternative storage would be hard and challenging task.
Yes, it would include reimplementation of significant part of the system.
But that seems inevitable if we're going to implement alternative really
storages (not just hacks over existing storage).  And I don't think that
our pluggable storages would be reminiscent of mysql's storage engines
while we're keeping two properties:
1) All the storages use the same WAL stream,
2) All the storages use same transactions and snapshots.
If we keep these two properties, we wouldn't need neither 2PC to run
transactions across different storages, neither separate log for
replication.  These two are major drawbacks of MySQL model.
Varying feature sets across engines seems inevitable and natural.  We've to
invent alternative storages to have features whose are hard to have in our
current storage.  So, no wonder that feature sets would be varying...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-10-09 Thread Alexander Korotkov

On Wed, Sep 27, 2017 at 7:51 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I took a look on this patch.  I've following notes for now.
>
> tuple_insert_hook tuple_insert; /* heap_insert */
>> tuple_update_hook tuple_update; /* heap_update */
>> tuple_delete_hook tuple_delete; /* heap_delete */
>
>
> I don't think this set of functions provides good enough level of
> abstraction for storage AM.  This functions encapsulate only low-level work
> of insert/update/delete tuples into heap itself.  However, it still assumed
> that indexes are managed outside of storage AM.  I don't think this is
> right, assuming that one of most demanded storage API usage would be
> different MVCC implementation (for instance, UNDO log based as discussed
> upthread).  Different MVCC implementation is likely going to manage indexes
> in a different way.  For example, storage AM utilizing UNDO would implement
> in-place update even when indexed columns are modified.  Therefore this
> piece of code in ExecUpdate() wouldn't be relevant anymore.
>
> /*
>> * insert index entries for tuple
>> *
>> * Note: heap_update returns the tid (location) of the new tuple in
>> * the t_self field.
>> *
>> * If it's a HOT update, we mustn't insert new index entries.
>> */
>> if ((resultRelInfo->ri_NumIndices > 0) && 
>> !storage_tuple_is_heaponly(resultRelationDesc,
>> tuple))
>> recheckIndexes = ExecInsertIndexTuples(slot, &(slot->tts_tid),
>>   estate, false, NULL, NIL);
>
>
> I'm firmly convinced that this logic should be encapsulated into storage
> AM altogether with inserting new index tuples on storage insert.  Also, HOT
> should be completely encapsulated into heapam.  It's quite evident for me
> that storage utilizing UNDO wouldn't have anything like our current HOT.
> Therefore, I think there shouldn't be hot_search_buffer() API function.
>  tuple_fetch() may cover hot_search_buffer(). That might require some
> signature change of tuple_fetch() (probably, extra arguments).
>

For me, it's crucial point that pluggable storages should be able to have
different MVCC implementation, and correspondingly have full control over
its interactions with indexes.
Thus, it would be good if we would get consensus on that point.  I'd like
other discussion participants to comment whether they agree/disagree and
why.
Any comments?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-09 Thread Alexander Korotkov

On Thu, Oct 5, 2017 at 9:48 PM, Shubham Barai <shubhambara...@gmail.com>
wrote:

> On 3 October 2017 at 00:32, Alexander Korotkov <a.korot...@postgrespro.ru>
> wrote:
>
>> On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodin <amborodi...@gmail.com>
>> wrote:
>>
>>> On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov
>>> <a.korot...@postgrespro.ru> wrote:
>>> > What happen if exactly this "continue" fires?
>>> >
>>> >> if (GistFollowRight(stack->page))
>>> >> {
>>> >> if (!xlocked)
>>> >> {
>>> >> LockBuffer(stack->buffer, GIST_UNLOCK);
>>> >> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
>>> >> xlocked = true;
>>> >> /* someone might've completed the split when we unlocked */
>>> >> if (!GistFollowRight(stack->page))
>>> >> continue;
>>> >
>>> >
>>> > In this case we might get xlocked == true without calling
>>> > CheckForSerializableConflictIn().
>>> Indeed! I've overlooked it. I'm remembering this issue, we were
>>> considering not fixing splits if in Serializable isolation, but
>>> dropped the idea.
>>>
>>
>> Yeah, current insert algorithm assumes that split must be fixed before we
>> can correctly traverse the tree downwards.
>>
>>
>>> CheckForSerializableConflictIn() must be after every exclusive lock.
>>>
>>
>> I'm not sure, that fixing split is the case to necessary call
>> CheckForSerializableConflictIn().  This lock on leaf page is not taken
>> to do modification of the page.  It's just taken to ensure that nobody else
>> is fixing this split the same this.  After fixing the split, it might
>> appear that insert would go to another page.
>>
>> > I think it would be rather safe and easy for understanding to more
>>> > CheckForSerializableConflictIn() directly before gistinserttuple().
>>> The difference is that after lock we have conditions to change page,
>>> and before gistinserttuple() we have actual intent to change page.
>>>
>>> From the point of future development first version is better (if some
>>> new calls occasionally spawn in), but it has 3 calls while your
>>> proposal have 2 calls.
>>> It seems to me that CheckForSerializableConflictIn() before
>>> gistinserttuple() is better as for now.
>>>
>>
>> Agree.
>>
>
> I have updated the location of  CheckForSerializableConflictIn()  and
> changed the status of the patch to "needs review".
>

Now, ITSM that predicate locks and conflict checks are placed right for now.
However, it would be good to add couple comments to gistdoinsert() whose
would state why do we call CheckForSerializableConflictIn() in these
particular places.

I also take a look at isolation tests.  You made two separate test specs:
one to verify that serialization failures do fire, and another to check
there are no false positives.
I wonder if we could merge this two test specs into one, but use more
variety of statements with different keys for both inserts and selects.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Predicate Locks for writes?

2017-10-09 Thread Alexander Korotkov

On Sat, Oct 7, 2017 at 2:26 PM, Simon Riggs <si...@2ndquadrant.com> wrote:

> SERIALIZABLE looks for chains of rw cases.
>
> When we perform UPDATEs and DELETEs we search for rows and then modify
> them. The current implementation views that as a read followed by a
> write because we issue PredicateLockTuple() during the index fetch.
>
> Is it correct that a statement that only changes data will add
> predicate locks for the rows that it modifies?
>

I'm not very aware of how this SI code exactly works.  But it seems that
once we update row, read SIReadLock on that tuple is released, because
we're already holding stronger lock.  I made simple experiment.

# begin;
BEGIN
Time: 0,456 ms
smagen@postgres=*# select * from test where id = 1;
 id │ val
┼─
  1 │ xyz
(1 row)

*# select locktype, database, relation, page, tuple, mode, granted from
pg_locks where pid = pg_backend_pid();
  locktype  │ database │ relation │ page │ tuple │  mode   │ granted
┼──┼──┼──┼───┼─┼─
 relation   │12558 │11577 │ NULL │  NULL │ AccessShareLock │ t
 relation   │12558 │65545 │ NULL │  NULL │ AccessShareLock │ t
 relation   │12558 │65538 │ NULL │  NULL │ AccessShareLock │ t
 virtualxid │ NULL │ NULL │ NULL │  NULL │ ExclusiveLock   │ t
 page   │12558 │65545 │1 │  NULL │ SIReadLock  │ t
 tuple  │12558 │65538 │0 │ 3 │ SIReadLock  │ t
(6 rows)

*# update test set val = 'xyz' where id = 1;

*# select locktype, database, relation, page, tuple, mode, granted from
pg_locks where pid = pg_backend_pid();
   locktype│ database │ relation │ page │ tuple │   mode   │
granted
───┼──┼──┼──┼───┼──┼─
 relation  │12558 │11577 │ NULL │  NULL │ AccessShareLock  │ t
 relation  │12558 │65545 │ NULL │  NULL │ AccessShareLock  │ t
 relation  │12558 │65545 │ NULL │  NULL │ RowExclusiveLock │ t
 relation  │12558 │65538 │ NULL │  NULL │ AccessShareLock  │ t
 relation  │12558 │65538 │ NULL │  NULL │ RowExclusiveLock │ t
 virtualxid│ NULL │ NULL │ NULL │  NULL │ ExclusiveLock│ t
 transactionid │ NULL │ NULL │ NULL │  NULL │ ExclusiveLock│ t
 page  │12558 │65545 │1 │  NULL │ SIReadLock   │ t
(8 rows)

Once we update the same tuple that we read before, SIReadLock on that tuple
disappears.

PredicateLockTuple() specifically avoids adding an SIRead lock if the
> tuple already has a write lock on it, so surely it must also be
> correct to skip the SIRead lock if we are just about to update the
> row?
>
> I am suggesting that we ignore PredicateLockTuple() for cases where we
> are about to update or delete the found tuple.
>

If my thoughts above are correct, than it would be a reasonable
optimization.  We would skip cycle of getting SIReadLock on tuple and then
releasing it, without any change of behavior.


> ISTM that a Before Row Trigger would need to add an SIRead lock since
> that is clearly a read.
>

Yes, because depending on result of "Before Row Trigger" update might not
really happen.  That SIReadLock on tuple is necessary.
However, ISTM that we could place SIReadLock on tuple after Before Row
Trigger and only in the case when trigger has cancelled an update.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Proposal: Improve bitmap costing for lossy pages

2017-10-06 Thread Alexander Kuzmenkov




Analysis: The estimated value of the lossy_pages is way higher than
its actual value and reason is that the total_pages calculated by the
"Mackert and Lohman formula" is not correct.


I think the problem might be that the total_pages includes cache effects 
and rescans. For bitmap entries we should use something like relation 
pages * selectivity.


Meanwhile, I ran TPC-H briefly with the v3 patch: scale factor 25, 2 
workers, SSD storage.

It shows significant improvement on 4MB work_mem and no change on 2GB.

Here are the results (execution time in seconds, average of 5 runs):
work_mem4MB2GB
Query masterpatchmasterpatch
4179174168167
5190168155156
628087227229
8197114179172
10269227190192
14110108106    105

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: full merge join on comparison clause

2017-10-04 Thread Alexander Kuzmenkov

As discussed earlier, I changed the way we work with mergeopfamilies. I 
use the "is_equality" flag to indicate whether the clause is an equality 
one, and fill mergeopfamilies for both equality and inequality operators.

The updated patch is attached (rebased to 20b6552242).

--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/src/backend/executor/nodeMergejoin.c b/src/backend/executor/nodeMergejoin.c
index 925b4cf553..73e6a4ca74 100644
--- a/src/backend/executor/nodeMergejoin.c
+++ b/src/backend/executor/nodeMergejoin.c
@@ -172,31 +172,32 @@ typedef enum
  * to the opfamily and collation, with nulls at the indicated end of the range.
  * This allows us to obtain the needed comparison function from the opfamily.
  */
-static MergeJoinClause
+static void
 MJExamineQuals(List *mergeclauses,
 			   Oid *mergefamilies,
 			   Oid *mergecollations,
 			   int *mergestrategies,
 			   bool *mergenullsfirst,
-			   PlanState *parent)
+			   MergeJoinState *parent)
 {
-	MergeJoinClause clauses;
 	int			nClauses = list_length(mergeclauses);
 	int			iClause;
 	ListCell   *cl;
 
-	clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_Clauses = (MergeJoinClause) palloc0(nClauses * sizeof(MergeJoinClauseData));
+	parent->mj_UseEqual = (bool *) palloc0(nClauses * sizeof(bool));
+	parent->mj_UseLesser = (bool *) palloc0(nClauses * sizeof(bool));
 
 	iClause = 0;
 	foreach(cl, mergeclauses)
 	{
 		OpExpr	   *qual = (OpExpr *) lfirst(cl);
-		MergeJoinClause clause = [iClause];
+		MergeJoinClause clause = >mj_Clauses[iClause];
 		Oid			opfamily = mergefamilies[iClause];
 		Oid			collation = mergecollations[iClause];
-		StrategyNumber opstrategy = mergestrategies[iClause];
+		StrategyNumber sort_op_strategy = mergestrategies[iClause];
 		bool		nulls_first = mergenullsfirst[iClause];
-		int			op_strategy;
+		int			join_op_strategy;
 		Oid			op_lefttype;
 		Oid			op_righttype;
 		Oid			sortfunc;
@@ -207,28 +208,55 @@ MJExamineQuals(List *mergeclauses,
 		/*
 		 * Prepare the input expressions for execution.
 		 */
-		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), parent);
-		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), parent);
+		clause->lexpr = ExecInitExpr((Expr *) linitial(qual->args), (PlanState *) parent);
+		clause->rexpr = ExecInitExpr((Expr *) lsecond(qual->args), (PlanState *) parent);
 
 		/* Set up sort support data */
 		clause->ssup.ssup_cxt = CurrentMemoryContext;
 		clause->ssup.ssup_collation = collation;
-		if (opstrategy == BTLessStrategyNumber)
+		if (sort_op_strategy == BTLessStrategyNumber)
 			clause->ssup.ssup_reverse = false;
-		else if (opstrategy == BTGreaterStrategyNumber)
+		else if (sort_op_strategy == BTGreaterStrategyNumber)
 			clause->ssup.ssup_reverse = true;
 		else	/* planner screwed up */
-			elog(ERROR, "unsupported mergejoin strategy %d", opstrategy);
+			elog(ERROR, "unsupported mergejoin strategy %d", sort_op_strategy);
 		clause->ssup.ssup_nulls_first = nulls_first;
 
 		/* Extract the operator's declared left/right datatypes */
 		get_op_opfamily_properties(qual->opno, opfamily, false,
-   _strategy,
+   _op_strategy,
    _lefttype,
    _righttype);
-		if (op_strategy != BTEqualStrategyNumber)	/* should not happen */
-			elog(ERROR, "cannot merge using non-equality operator %u",
- qual->opno);
+
+		/*
+		 * Determine whether we accept lesser and/or equal tuples of the inner
+		 * relation.
+		 */
+		switch (join_op_strategy)
+		{
+			case BTEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+break;
+
+			case BTLessEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTLessStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			case BTGreaterEqualStrategyNumber:
+parent->mj_UseEqual[iClause] = true;
+/* fall through */
+
+			case BTGreaterStrategyNumber:
+parent->mj_UseLesser[iClause] = true;
+break;
+
+			default:
+elog(ERROR, "unsupported join strategy %d", join_op_strategy);
+		}
 
 		/*
 		 * sortsupport routine must know if abbreviation optimization is
@@ -265,8 +293,6 @@ MJExamineQuals(List *mergeclauses,
 
 		iClause++;
 	}
-
-	return clauses;
 }
 
 /*
@@ -378,6 +404,14 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
 	return result;
 }
 
+/* Tuple comparison result */
+typedef enum
+{
+	MJCR_NextInner = 1,
+	MJCR_NextOuter = -1,
+	MJCR_Join = 0
+} MJCompareResult;
+
 /*
  * MJCompare
  *
@@ -388,10 +422,10 @@ MJEvalInnerValues(MergeJoinState *mergestate, TupleTableSlot *innerslot)
  * MJEvalOuterValues and MJEvalInnerValues must already have been called
  * for the current outer and inner tuples, respectively.
  */
-static int
+static MJCompareResult
 MJC

Re: [HACKERS] Add TOAST to system tables with ACL?

2017-10-04 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 9:19 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> For pg_database, you'd have to make sure that the startup-time check of
> database CONNECT privilege still works if the ACL's been pushed out of
> line.
>

I've tried this case.

At first, make database temp with no connect privilege from public and
1 users.
create database temp;
revoke connect on database temp from public;
\copy (select 'create user u' || i || ';' from generate_series(1,1) i)
to 'script.sql'
\i script.sql

I've checked that user u1 can't login to database temp.
$ psql temp -U u1
psql: FATAL:  permission denied for database "temp"
DETAIL:  User does not have CONNECT privilege.

Than I grant connect privilege to all that 1 users.
\copy (select 'grant connect on database temp to u' || i || ';' from
generate_series(1,1) i) to 'script.sql'
\i script.sql

Then user u1 can login successfully.
$ psql temp -U u1
psql (11devel)
Type "help" for help.

u1@temp=#

Thus, in this simple case database CONNECT privilege works with out-of-line
ACL for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Pageinspect - add functions on GIN and GiST indexes from gevel

2017-10-03 Thread Alexander Korotkov

On Wed, Sep 13, 2017 at 10:57 AM, Ashutosh Sharma <ashu.coe...@gmail.com>
wrote:

> On Wed, Sep 13, 2017 at 1:15 PM, Alexey Chernyshov
> <a.chernys...@postgrespro.ru> wrote:
> > On Sat, 9 Sep 2017 13:53:35 +0530
> > Ashutosh Sharma <ashu.coe...@gmail.com> wrote:
> >
> >>
> >> Finally, i got some time to look into this patch and surprisingly I
> >> didn't find any function returning information at page level instead
> >> all the SQL functions are returning information at index level.
> >> Therefore, i too feel that it should be either integrated with
> >> pgstattuple which could a better match as Tomas also mentioned or may
> >> be add a new contrib module itself. I think, adding a new contrib
> >> module looks like a better option. The reason being, it doesn't simply
> >> include the function for showing index level statistics (for e.g..
> >> index size, no of rows, values..e.t.c) like pgstattuple does but,
> >> also, displays information contained in a page for e.g. the object
> >> stored in the page,  it's tid e.t.c. So, more or less, I would say
> >> that, it's the mixture of pageinspect and pgstattuple module and
> >> therefore, i feel, adding it as a new extension would be a better
> >> choice. Thought?
> >
> > Thank you for your interest, I will add a new contrib module named,
> > say, indexstat. I think we can add statistics on other indexes in the
> > future.
> >
>
> I think we should wait for experts opinion and then take a call. I am
> not expert. I just gave my opinion as i have worked in this area
> earlier when working for hash index. Thanks.


I'm not sure that we should port these functions from gevel directly.  We
could try to re-implement similar functionality which fits pageinspect
approach better.  In particular, we can implement some low-level functions
whose show detailed information at page level.  And on top of them we can
implement analogues of gevel functions in SQL or PL/pgSQL.  Is it feasible?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Add TOAST to system tables with ACL?

2017-10-03 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 9:19 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alexander Korotkov <a.korot...@postgrespro.ru> writes:
> > This topic was already discussed (at least one time) in 2011.  See [1]
> for
> > details.  I'd like to raise that again.
>
> I'm a bit worried about adding a toast table to pg_class, and more so
> about pg_database, because both of those have to be accessed in situations
> where it's not clear that we could successfully fetch from a toast table,
> because too little of the catalog access infrastructure is alive.
>
> pg_class is probably all right as long as only the ACL field could ever
> get toasted, since it's unlikely that any low-level accesses would be
> paying attention to that field anyway.
>
> For pg_database, you'd have to make sure that the startup-time check of
> database CONNECT privilege still works if the ACL's been pushed out of
> line.
>

Thank you for pointing.  I'll check this.

> Also, I've notice performance degradation of GRANT statements themselves.
> >  1000 GRANT statements are executed in 1.5 seconds while 1 GRANT
> > statements are executed in 42 seconds.  In average single GRANT
> statements
> > becomes 2.8 times slower.  That's significant degradation, but it doesn't
> > seem to be fatal degradation for me.
>
> Seems all right, since we could just say "we don't really recommend that
> usage pattern".
>

Yes, sure.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Add TOAST to system tables with ACL?

2017-10-03 Thread Alexander Korotkov

Hi!

This topic was already discussed (at least one time) in 2011.  See [1] for
details.  I'd like to raise that again.

Currently, we have table in system catalog with ACL, but without TOAST.
Thus, it might happen that we can't fit new ACL item, because row becomes
too large for in-line storage.

You can easily reproduce this situation in psql.

create table t (col int);
\copy (select 'create user u' || i || ';' from generate_series(1,1) i)
to 'script.sql'
\i script.sql
\copy (select 'grant all on t to u' || i || ';' from
generate_series(1,1) i) to 'script.sql'
\i script.sql

Eventually GRANT statements start to raise error.
psql:script.sql:2447: ERROR:  row is too big: size 8168, maximum size 8160

I understand that we shouldn't endorse users to behave like this.  We
should rather advise them to evade adding too many ACL items to single
object by using roles.  And that would be way easier to manage too.

However, current PostgreSQL behavior is rather unexpected and
undocumented.  Thus, I would say it's a bug.  This bug would be nice to fix
even if long ACL lists would work slower than normal.

In the discussion to the post [1] Tom comments that he isn't excited about
out-of-line ACL entry unless it's proven that performance doesn't
completely tank in this case.

I've done some basic experiments in this field on my laptop.  Attached
draft patch adds TOAST to all system catalog tables with ACL.  I've run
pgbench with custom script "SELECT * FROM t;" where t is empty table with
long ACL entry.  I've compared results with 1000 ACL items (in-line
storage) and 1 ACL items (out-of-line storage).

Also, I've notice performance degradation of GRANT statements themselves.
 1000 GRANT statements are executed in 1.5 seconds while 1 GRANT
statements are executed in 42 seconds.  In average single GRANT statements
becomes 2.8 times slower.  That's significant degradation, but it doesn't
seem to be fatal degradation for me.

Results of pgbench are presented in following table.

  Number of ACL items   | -M simple | -M prepared
+---+-
 1000 (in-line storage) |  6623 |7242
1 (out-of-line storage) | 14498 |   17827

So, it's 2.1-2.4 times degradation in this case.  That's very significant
degradation, but I wouldn't day that "performance completely tank".

Any thoughts?  Should we consider TOASTing ACL entries or should we give up
with this?

Links:

 1. https://www.commandprompt.com/blog/grant_schema_usage_
to_2500_users_no_can_do/

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


acl-toast-1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] 64-bit queryId?

2017-10-03 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 5:55 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Oct 2, 2017 at 8:07 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > +1,
> > I see 3 options there:
> > 1) Drop high-order bit, as you proposed.
> > 2) Allow negative queryIds.
> > 3) Implement unsigned 64-type.
> >
> > #1 causes minor loss of precision which looks rather insignificant in
> given
> > context.
> > #2 might be rather unexpected for users whose previously had non-negative
> > queryIds.  Changing queryId from 32-bit to 64-bit itself might require
> some
> > adoption from monitoring software. But queryIds are user-visible, and
> > negative queryIds would look rather nonlogical.
> > #3 would be attaching hard and long-term problem by insufficient reason.
> > Thus, #1 looks like most harmless solution.
>
> I think we should just allow negative queryIds.  I mean, the hash
> functions have behaved that way for a long time:
>
> rhaas=# select hashtext('');
>   hashtext
> -
>  -1477818771
> (1 row)
>
> It seems silly to me to throw away a perfectly good bit from the hash
> value just because of some risk of minor user confusion.  I do like
> Michael's suggestion of outputing hexa-like text, but changing the
> types of the user-visible output columns seems like a job for another
> patch.  Similarly, if we were to adopt Andres's suggestions of a new
> type or using numeric or Alexander's suggestion of implementing a
> 64-bit unsigned type, I think it should be a separate patch from this
> one.
>
> I would note that such a patch will actually create real
> incompatibility -- extension upgrades might fail if there are
> dependent views, for example -- whereas merely having query IDs start
> to sometimes be negative only creates an incompatibility for people
> who assumed that the int64 type wouldn't actually use its full range
> of allowable values.  I don't deny that someone may have done that, of
> course, but I wouldn't guess that it's a common thing... maybe I'm
> wrong.
>

BTW, you didn't comment Tom's suggestion about dropping high-order bit
which trades minor user user confusion to minor loss of precision.
TBH, for me it's not so important whether we allow negative queryIds or
drop high-order bit.  I would be anyway very good to have 64-(or 63-)bit
queryIds committed.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2017-10-03 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 2:52 PM, Robert Haas <robertmh...@gmail.com> wrote:

> On Mon, Oct 2, 2017 at 12:37 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > I've applied patch on top of c12d570f and rerun the same benchmarks.
> > CSV-file with results is attached.  There is no dramatical changes.
> There
> > is still minority of performance regression cases while majority of cases
> > has improvement.
>
> Yes, I think these results look pretty good.  But are these times in
> seconds?  You might need to do some testing with bigger sorts.


Good point.  I'll rerun benchmarks with larger dataset size.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] 64-bit queryId?

2017-10-02 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 12:32 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Peter Geoghegan <p...@bowt.ie> writes:
> > You need to change the SQL interface as well, although I'm not sure
> > exactly how. The problem is that you are now passing a uint64 queryId
> > to Int64GetDatumFast() within pg_stat_statements_internal(). That
> > worked when queryId was a uint32, because you can easily represent
> > values <= UINT_MAX as an int64/int8. However, you cannot represent the
> > second half of the range of uint64 within a int64/int8. I think that
> > this will behave different depending on USE_FLOAT8_BYVAL, if nothing
> > else.
>
> Maybe intentionally drop the high-order bit, so that it's a 63-bit ID?
>

+1,
I see 3 options there:
1) Drop high-order bit, as you proposed.
2) Allow negative queryIds.
3) Implement unsigned 64-type.

#1 causes minor loss of precision which looks rather insignificant in given
context.
#2 might be rather unexpected for users whose previously had non-negative
queryIds.  Changing queryId from 32-bit to 64-bit itself might require some
adoption from monitoring software. But queryIds are user-visible, and
negative queryIds would look rather nonlogical.
#3 would be attaching hard and long-term problem by insufficient reason.
Thus, #1 looks like most harmless solution.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type

2017-10-02 Thread Alexander Korotkov

On Tue, Oct 3, 2017 at 12:20 AM, Maksim Milyutin <milyuti...@gmail.com>
wrote:

> I have tested the following case:
>
> create type pair as (x int, y int);
> prepare test as select json_populate_record(null::pair, '{"x": 1, "y":
> 2}'::json);
> drop type pair cascade;
>
> execute test;
>
> -- The following output is obtained before patch
> ERROR:  cache lookup failed for type 16419
>
> -- After applying patch
> ERROR:  type "pair" does not exist
>
> But after recreating 'pair' type I'll get the following message:
> ERROR:  cached plan must not change result type
>
> I don't know whether it's right behavior. Anyhow your point is a good
> motivation to experiment and investigate different scenarios of work with
> cached plan that depends on non-stable type. Thanks for that.
>

I think ideally, cached plan should be automatically invalidated and stored
procedure should work without error.
Not really sure if it's feasible...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-02 Thread Alexander Korotkov

On Mon, Oct 2, 2017 at 9:11 PM, Andrew Borodin <amborodi...@gmail.com>
wrote:

> On Mon, Oct 2, 2017 at 8:00 PM, Alexander Korotkov
> <a.korot...@postgrespro.ru> wrote:
> > What happen if exactly this "continue" fires?
> >
> >> if (GistFollowRight(stack->page))
> >> {
> >> if (!xlocked)
> >> {
> >> LockBuffer(stack->buffer, GIST_UNLOCK);
> >> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
> >> xlocked = true;
> >> /* someone might've completed the split when we unlocked */
> >> if (!GistFollowRight(stack->page))
> >> continue;
> >
> >
> > In this case we might get xlocked == true without calling
> > CheckForSerializableConflictIn().
> Indeed! I've overlooked it. I'm remembering this issue, we were
> considering not fixing splits if in Serializable isolation, but
> dropped the idea.
>

Yeah, current insert algorithm assumes that split must be fixed before we
can correctly traverse the tree downwards.


> CheckForSerializableConflictIn() must be after every exclusive lock.
>

I'm not sure, that fixing split is the case to necessary call
CheckForSerializableConflictIn().  This lock on leaf page is not taken to
do modification of the page.  It's just taken to ensure that nobody else is
fixing this split the same this.  After fixing the split, it might appear
that insert would go to another page.

> I think it would be rather safe and easy for understanding to more
> > CheckForSerializableConflictIn() directly before gistinserttuple().
> The difference is that after lock we have conditions to change page,
> and before gistinserttuple() we have actual intent to change page.
>
> From the point of future development first version is better (if some
> new calls occasionally spawn in), but it has 3 calls while your
> proposal have 2 calls.
> It seems to me that CheckForSerializableConflictIn() before
> gistinserttuple() is better as for now.
>

Agree.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS][BUG] Cache invalidation for queries that contains const of temporary composite type

2017-10-02 Thread Alexander Korotkov

On Mon, Oct 2, 2017 at 3:43 PM, Maksim Milyutin <milyuti...@gmail.com>
wrote:

> On 26.09.2017 23:25, Maksim Milyutin wrote:
>
> Updated patchset contains more transparent definition of composite type
> for constant node and regression test for described above buggy case.
>
> Is there any interest on the problem in this thread?
>

Probably everybody are too busy with upcoming release and other patches.
Since this patch is a bug fix, it definitely should be considered.  Please,
register this patch at upcoming commitfest to ensure it wouldn't be
forgotten.
Regression test changes (both .sql and .out) are essential parts of the
patch.  I see no point in posting them separately.  Please, incorporate
them into your patch.
Did you check this patch with manually created composite type (made by
CREATE TYPE typname AS ...)?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2017-10-02 Thread Alexander Korotkov

On Sun, Oct 1, 2017 at 11:53 AM, Shubham Barai <shubhambara...@gmail.com>
wrote:

> Yes, page-level predicate locking should happen only when fast update is
> off.
> Actually, I forgot to put conditions in updated patch. Does everything
> else look ok ?
>

I think that isolation tests should be improved.  It doesn't seems that any
posting tree would be generated by the tests that you've provided, because
all the TIDs could fit the single posting list.  Note, that you can get
some insight into GIN physical structure using pageinspect contrib.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Commitfest 201709 is now closed

2017-10-02 Thread Alexander Korotkov

On Mon, Oct 2, 2017 at 6:57 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Daniel Gustafsson <dan...@yesql.se> writes:
> > Thanks to everyone who participated, and to everyone who have responded
> to my
> > nagging via the CF app email function. This is clearly an awesome
> community.
>
> And thanks to you for your hard work as CFM!  That's tedious and
> largely thankless work, but it's needed to keep things moving.
>

+1,
Thank you very much, Daniel!  It was a pleasure working with you at
commitfest.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-10-02 Thread Alexander Korotkov

On Sun, Oct 1, 2017 at 8:27 PM, chenhj <chjis...@163.com> wrote:

> On  2017-10-01 04:09:19,"Alexander Korotkov" <a.korot...@postgrespro.ru>
> wrote:
>
> On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjis...@163.com> wrote:
>
>> On 2017-09-30 02:17:54，"Alexander Korotkov" <a.korot...@postgrespro.ru
>> > wrote:
>>
>>
>> Great.  Now code of this patch looks good for me.
>> However, we forgot about documentation.
>>
>>   
>>>The result is equivalent to replacing the target data directory with
>>> the
>>>source one. Only changed blocks from relation files are copied;
>>>all other files are copied in full, including configuration files. The
>>>advantage of pg_rewind over taking a new base backup,
>>> or
>>>tools like rsync, is that pg_rewind
>>> does
>>>not require reading through unchanged blocks in the cluster. This
>>> makes
>>>it a lot faster when the database is large and only a small
>>>fraction of blocks differ between the clusters.
>>>   
>>
>>
>> At least, this paragraph need to be adjusted, because it states whose
>> files are copied.  And probably latter paragraphs whose state about WAL
>> files.
>>
>>
>>
>> Your are rigth.
>> I wrote a draft as following, but i'm afraid whether the english
>> statement is accurate.
>>
>
> I'm not native english speaker too :(
>
> Only the WAL files between the point of divergence and the current WAL
>> insert location of the source server are copied, *for* other WAL files are
>> useless for the target server.
>
>
> I'm not sure about this usage of word *for*.  For me, it probably should
> be just removed.  Rest of changes looks good for me.  Please, integrate
> them into the patch.
>
>
> I had removed the *for* , Pleae check the new patch again.
>

Now, this patch looks good for me.  It applies cleanly, builds cleanly,
passes regression tests, new functionality is covered by regression tests.
Code is OK for me and docs too.

I'm marking this patch as "Ready for committer".  BTW, authors field in the
commitfest app is empty (https://commitfest.postgresql.org/15/1302/).
Please, put your name there.

I hope this patch will be committed during 2017-11 commitfest.  Be ready to
rebase this patch if needed.  Thank you for your work.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-10-02 Thread Alexander Korotkov

Hi, Andrew!

On Mon, Oct 2, 2017 at 1:40 PM, Andrew Borodin <amborodi...@gmail.com>
wrote:

> Thanks for looking into the patch!
>
> On Thu, Sep 28, 2017 at 3:59 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>>
>>
>> In gistdoinsert() you do CheckForSerializableConflictIn() only if page
>> wasn't exclusively locked before (xlocked is false).
>>
>> if (!xlocked)
>>> {
>>> LockBuffer(stack->buffer, GIST_UNLOCK);
>>> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
>>> CheckForSerializableConflictIn(r, NULL, stack->buffer);
>>> xlocked = true;
>>
>>
>> However, page might be exclusively locked before.  And in this case
>> CheckForSerializableConflictIn() would be skipped.  That happens very
>> rarely (someone fixes incomplete split before we did), but nevertheless.
>>
>
> if xlocked = true, page was already checked for conflict after setting
> exclusive lock on it's buffer.  I still do not see any problem here...
>

What happen if exactly this "continue" fires?

if (GistFollowRight(stack->page))
> {
> if (!xlocked)
> {
> LockBuffer(stack->buffer, GIST_UNLOCK);
> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
> xlocked = true;
> /* someone might've completed the split when we unlocked */
> if (!GistFollowRight(stack->page))
> continue;


In this case we might get xlocked == true without
calling CheckForSerializableConflictIn().  This is very rare codepath, but
still...
I think it would be rather safe and easy for understanding to
more CheckForSerializableConflictIn() directly before gistinserttuple().

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Incremental sort

2017-09-30 Thread Alexander Korotkov

On Sat, Sep 16, 2017 at 2:46 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Thu, Sep 14, 2017 at 2:48 AM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> Patch rebased to current master is attached.  I'm going to improve my
>> testing script and post new results.
>>
>
> New benchmarking script and results are attached.  There new dataset
> parameter is introduced: skew factor.  Skew factor defines skew in
> distribution of groups sizes.
> My idea of generating is just usage of power function where power is from
> 0 to 1.  Following formula is used to get group number for particular item
> number i.
> [((i / number_of_indexes) ^ power) * number_of_groups]
> For example, power = 1/6 gives following distribution of groups sizes:
> group numbergroup size
> 0   2
> 1   63
> 2   665
> 3   3367
> 4   11529
> 5   31031
> 6   70993
> 7   144495
> 8   269297
> 9   468558
>
> For convenience, instead of power itself, I use skew factor where power =
> 1.0 / (1.0 + skew).  Therefore, with skew = 0.0, distribution of groups
> sizes is uniform.  Larger skew gives more skewed distribution (and that
> seems to be quite intuitive).  For, negative skew, group sizes are mirrored
> as for corresponding positive skew.  For example, skew factor = -5.0 gives
> following groups sizes distribution:
> group numbergroup size
> 0   468558
> 1   269297
> 2   144495
> 3   70993
> 4   31031
> 5   11529
> 6   3367
> 7   665
> 8   63
> 9   2
>
> Results shows that between 2172 test cases, in 2113 incremental sort gives
> speedup while in 59 it causes slowdown.  The following 4 test cases show
> most significant slowdown (>10% of time).
>
> Table   GroupedCols GroupCount Skew PreorderedFrac
> FullSortMedian IncSortMedian TimeChangePercent
> int4|int4|numeric   1  100  -10  0
> 1.5688240528  2.0607631207 31.36
> text|int8|text|int4 110  0
> 1.7785198689 <(778)%20519-8689>  2.1816160679 22.66
> int8|int8|int4  1   10  -10  0
>  1.136412859  1.3166360855 15.86
> numeric|text|int4|int8  2   10  -10  1
> 0.4403841496  0.5070910454 15.15
>
> As you can see, 3 of this 4 test cases have skewed distribution while one
> of them is related to costly location-aware comparison of text.  I've no
> particular idea of how to cope these slowdowns.  Probably, it's OK to have
> slowdown in some cases while have speedup in majority of cases (assuming
> there is an option to turn off new behavior).  Probably, we should teach
> optimizer more about skewed distributions of groups, but that doesn't seem
> feasible for me.
>
> Any thoughts?
>

BTW, replacement selection sort was removed by 8b304b8b.  I think it worth
to rerun benchmarks after that, because results might be changed.  Will do.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2017-09-30 Thread Alexander Korotkov

On Sat, Sep 30, 2017 at 6:12 PM, Shubham Barai <shubhambara...@gmail.com>
wrote:

> I have made changes according to your suggestions. Please have a look at
> the updated patch.
> I am also considering your suggestions for my other patches also. But, I
> will need some time to
> make changes as I am currently busy doing my master's.
>

I don't understand why sometimes you call PredicateLockPage() only when
fast update is off.  For example:

@@ -94,6 +101,9 @@ scanPostingTree(Relation index, GinScanEntry scanEntry,
>   break; /* no more pages */
>
>   buffer = ginStepRight(buffer, index, GIN_SHARE);
> +
> + if (!GinGetUseFastUpdate(index))
> + PredicateLockPage(index, BufferGetBlockNumber(buffer), snapshot);
>   }
>
>   UnlockReleaseBuffer(buffer);

But sometimes you call PredicateLockPage() unconditionally.

@@ -131,6 +141,8 @@ collectMatchBitmap(GinBtreeData *btree, GinBtreeStack
> *stack,
>   attnum = scanEntry->attnum;
>   attr = btree->ginstate->origTupdesc->attrs[attnum - 1];
>
> + PredicateLockPage(btree->index, stack->buffer, snapshot);
> +
>   for (;;)
>   {
>   Page page;

As I understand, all page-level predicate locking should happen only when
fast update is off.

Also, despite general idea is described in README-SSI, in-code comments
would be useful.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-30 Thread Alexander Korotkov

On Sat, Sep 30, 2017 at 8:18 PM, chenhj <chjis...@163.com> wrote:

> On 2017-09-30 02:17:54，"Alexander Korotkov" <a.korot...@postgrespro.ru>
> wrote:
>
>
> Great.  Now code of this patch looks good for me.
> However, we forgot about documentation.
>
>   
>>The result is equivalent to replacing the target data directory with
>> the
>>source one. Only changed blocks from relation files are copied;
>>all other files are copied in full, including configuration files. The
>>advantage of pg_rewind over taking a new base backup,
>> or
>>tools like rsync, is that pg_rewind
>> does
>>not require reading through unchanged blocks in the cluster. This makes
>>it a lot faster when the database is large and only a small
>>fraction of blocks differ between the clusters.
>>   
>
>
> At least, this paragraph need to be adjusted, because it states whose
> files are copied.  And probably latter paragraphs whose state about WAL
> files.
>
>
>
> Your are rigth.
> I wrote a draft as following, but i'm afraid whether the english statement
> is accurate.
>

I'm not native english speaker too :(

Only the WAL files between the point of divergence and the current WAL
> insert location of the source server are copied, *for* other WAL files are
> useless for the target server.


I'm not sure about this usage of word *for*.  For me, it probably should be
just removed.  Rest of changes looks good for me.  Please, integrate them
into the patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] 64-bit queryId?

2017-09-30 Thread Alexander Korotkov

On Sat, Sep 30, 2017 at 6:39 PM, Peter Geoghegan <p...@bowt.ie> wrote:

> On Sat, Sep 30, 2017 at 7:34 AM, Robert Haas <robertmh...@gmail.com>
> wrote:
> > Assuming, however, that you don't manage to prove all known
> > mathematics inconsistent, what one might reasonably hope to do is
> > render collisions remote enough that one need not worry about them too
> > much in practice.
>
> Isn't that already true in the case of queryId? I've never heard any
> complaints about collisions. Most people don't change
> pg_stat_statements.max, so the probability of a collision is more like
> 1%. And, that's the probability of *any* collision, not the
> probability of a collision that the user actually cares about. The
> majority of entries in pg_stat_statements among those ten thousand
> will not be interesting. Often, 90%+ will be one-hit wonders. If that
> isn't true, then you're probably not going to find pg_stat_statements
> very useful, because you have nothing to focus on.
>

I heard from customers that they periodically dump contents of
pg_stat_statements and then build statistics over long period of time.  If
even they leave default pg_stat_statements.max unchanged, probability of
collision would be significantly higher.

I have heard complaints about a number of different things in
> pg_stat_statements, like the fact that we don't always manage to
> replace constants with '?'/'$n' characters in all cases. I heard about
> that quite a few times during my time at Heroku. But never this.
>

I'm not sure that we should be oriented only by users complaints in this
problem by couple reasons.

1) Some defects could leave unrevealed by users during long periods of
time.  We have examples of GiST picksplit algorithm to be bad resulting bad
query performance.  However, users never complained about that because they
didn't know what is real fair performance for this kind of queries.  In the
same way users may suffer from queryId collisions during long time without
noticing it.  They might have inaccurate statistics of queries execution,
but they don't notice it because they have nothing to compare.

2) Ideally, we should fix potential problems before they materialize, not
only after users complaints.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-29 Thread Alexander Korotkov

On Fri, Sep 29, 2017 at 8:10 PM, chenhj <chjis...@163.com> wrote:

> On 2017-09-30 00:53:31，"chenhj" <chjis...@163.com> wrote:
>
> On 2017-09-29 19:29:40，"Alexander Korotkov" <a.korot...@postgrespro.ru>
> wrote:
>
> On Fri, Sep 29, 2017 at 10:07 AM, chenhj <chjis...@163.com> wrote:
>>
>>
>>
> OK.  That makes sense.  Thank you for the explanation.
>
> I still have some minor comments.
>
>
>> /*
>> +* Save the WAL filenames of the divergence and the current WAL insert
>> +* location of the source server. Later only the WAL files between
>> those
>> +* would be copied to the target data directory.
>>
>
> Comment is outdated.  We don't save filenames anymore, now we save segment
> numbers.
>
>
>> +* Note:The later generated WAL files in the source server before the
>> end
>> +* of the copy of the data files must be made available when the
>> target
>> +* server is started. This can be done by configuring the target
>> server as
>> +* a standby of the source server.
>> +*/
>
>
> You miss space after "Note:".  Also, it seems reasonable for me to leave
> empty line before "Note:".
>
> # Setup parameter for WAL reclaim
>
>
> Parameter*s*, because you're setting up multiple of them.
>
> # The accuracy of imodification from pg_ls_waldir() is seconds, so sleep
>> one seconds
>
>
> One second without "s".
>
> Also, please check empty lines in 006_wal_copy.pl to be just empty lines
> without tabs.
>
>
> Thanks for your comments, i had fix above problems.
> And also add several line breaks at long line in 006_wal_copy.pl
> Please check this patch again.
>
>
> Sorry, patch v6 did not remove tabs in two empty lines, please use the new
> one.
>

Great.  Now code of this patch looks good for me.
However, we forgot about documentation.

  
>The result is equivalent to replacing the target data directory with the
>source one. Only changed blocks from relation files are copied;
>all other files are copied in full, including configuration files. The
>advantage of pg_rewind over taking a new base backup, or
>tools like rsync, is that pg_rewind does
>not require reading through unchanged blocks in the cluster. This makes
>it a lot faster when the database is large and only a small
>fraction of blocks differ between the clusters.
>   


At least, this paragraph need to be adjusted, because it states whose files
are copied.  And probably latter paragraphs whose state about WAL files.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Multicolumn hash indexes

2017-09-29 Thread Alexander Korotkov

On Fri, Sep 29, 2017 at 6:33 PM, Robert Haas <robertmh...@gmail.com> wrote:

> Now you could also imagine something where we keep a separate set of
> hash buckets for each column and multi-column searches are handled by
> searching each hash table separately and taking the intersection of
> the sets of TIDs found for each column.  Not sure that's a good idea,
> but it's sort of like what GIN does.
>

I'd like to note that what GIN does for multicolumn indexes seems quite
artificial thing to me.  Logically multicolumn GIN index is the same thing
as multiple singlecolumn GIN indexes.  The reason why multicolumn GIN
exists is the fact that in particular case GIN does overlapping of scan
results over multiple attributes more efficient than our executor do.
Particular case is that scans over multiple columns return ordered sets of
TIDs.  GIN implements kind of merge join over TIDs while our executor can
only do BitmapAnd in this case.

However, if we imagine that at the moment we develop GIN our executor can
do merge join of TIDs produced by multiple index scans, then we wouldn't
invent multicolumn GIN at all. Or probably we would invent very different
kind of multicolumn GIN.

I mean that multicolumn GIN was a good solution of particular problem from
practical side of view.  It allowed to implement very efficient algorithms
with minimal changes in planner/executor infrastructure.  However,
multicolumn GIN doesn't look like a good design pattern we would like to
repeat.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 4) and patch for hash index

2017-09-29 Thread Alexander Korotkov

On Fri, Sep 8, 2017 at 4:07 AM, Thomas Munro <thomas.mu...@enterprisedb.com>
wrote:

> Hi Shubham,
>
> On Tue, Jun 27, 2017 at 9:21 PM, Shubham Barai <shubhambara...@gmail.com>
> wrote:
> > If these two hash keys (78988658 and 546789888) mapped to the same
> bucket, this will result in false serialization failure.
> > Please correct me if this assumption about false positives is wrong.
>
> I wonder if there is an opportunity to use computed hash values
> directly in predicate lock tags, rather than doing it on the basis of
> buckets.  Perhaps I'm missing something important about the way that
> locks need to escalate that would prevent that from working.


+1,
Very nice idea!  Locking hash values directly seems to be superior over
locking hash index pages.
Shubham, do you have any comment on this?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Fix bloom WAL tap test

2017-09-29 Thread Alexander Korotkov

On Wed, Sep 6, 2017 at 5:06 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> I just realized that these lines of contrib/bloom/t/001_wal.pl don't
>> check that queries give same results on master and standby.  They just
>> check that *return codes* of psql are equal.
>>
>> # Run test queries and compare their result
>>> my $master_result = $node_master->psql("postgres", $queries);
>>> my $standby_result = $node_standby->psql("postgres", $queries);
>>> is($master_result, $standby_result, "$test_name: query result matches");
>>
>>
>> Attached patch fixes this problem by using safe_psql() which returns psql
>> output instead of return code.  For safety, this patch replaces psql() with
>> safe_psql() in other places too.
>>
>> I think this should be backpatched to 9.6 as bugfix.
>>
>
> Also, it would be nice to call wal-check on bloom check (now wal-check
> isn't called even on check-world).
> See attached patch.  My changes to Makefile could be cumbersome.  Sorry
> for that, I don't have much experience with them...
>

This topic didn't receive any attention yet.  Apparently, it's because of
in-progress commitfest and upcoming release.
I've registered it on upcoming commitfest as bug fix.
https://commitfest.postgresql.org/15/1313/

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-29 Thread Alexander Korotkov

On Fri, Sep 29, 2017 at 10:07 AM, chenhj <chjis...@163.com> wrote:

> On 2017-09-29 05:31:51, "Alexander Korotkov" <a.korot...@postgrespro.ru>
> wrote:
>
> On Thu, Sep 28, 2017 at 10:52 PM, chenhj <chjis...@163.com> wrote:
>
>> On 2017-09-29 00:43:18，"Alexander Korotkov" <a.korot...@postgrespro.ru>
>> wrote:
>>
>> On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjis...@163.com> wrote:
>>
>>> On 2017-09-28 01:29:29，"Alexander Korotkov" <a.korot...@postgrespro.ru>
>>> wrote:
>>>
>>> It appears that your patch conflicts with fc49e24f.  Please, rebase it.
>>>
>>>
>>> Yes, i had rebased it, Please check the new patch.
>>>
>>
>> Good, now it applies cleanly.
>>
>> else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
>>>  IsXLogFileName(path + strlen(XLOGDIR"/")) &&
>>>  (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0
>>> ||
>>>   strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) >
>>> 0))
>>
>>
>> According to our conding style, you should leave a space betwen XLOGDIF
>> and "/".
>> Also, you do a trick by comparison xlog segment numbers using strcmp().
>> It's nice, but I would prefer seeing XLogFromFileName() here.  It would
>> improve code readability and be less error prone during further
>> modifications.
>>
>>
>> Thanks for advice!
>> I had modified it.
>>
>
> OK. Patch becomes better.
> I also have more general question.  Why do we need upper bound for segment
> number (last_source_segno)?  I understand the purpose of lower bound
> (divergence_segno) which save us from copying extra WAL files, but what is
> upper bound for?  As I understood, we anyway need to replay most recent WAL
> records to reach consistent state after pg_rewind.  I propose to
> remove last_source_segno unless I'm missing something.
>
>
> Thanks for relay!
> When checkpoint occurs, some old WAL files will be renamed as future WAL
> files for later use.
> The upper bound for segment number (last_source_segno) is used to avoid
> copying these extra WAL files.
>
> When the parameter max_wal_size or max_min_size is large,these may be many
> renamed old WAL files for reused.
>
> For example, I have just looked at one of our production systems
> (max_wal_size = 64GB, min_wal_size = 2GB),
> the total size of WALs is about 30GB, and contains about 4GB renamed old
> WAL files.
>
> [postgres@hostxxx pg_xlog]$ ll
> ...
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF0078
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF0079
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007A
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007B
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007C
> -rw--- 1 postgres postgres 16777216 Sep 29 14:05
> 00010BCF007D
> -rw--- 1 postgres postgres 16777216 Sep 29 11:22
> 00010BCF007E //after this, there are about 4GB WALs for reuse
> -rw--- 1 postgres postgres 16777216 Sep 29 11:08
> 00010BCF007F
> -rw--- 1 postgres postgres 16777216 Sep 29 11:06
> 00010BCF0080
> -rw--- 1 postgres postgres 16777216 Sep 29 12:05
> 00010BCF0081
> -rw--- 1 postgres postgres 16777216 Sep 29 11:28
> 00010BCF0082
> -rw--- 1 postgres postgres 16777216 Sep 29 11:06
> 00010BCF0083
> ...
>

OK.  That makes sense.  Thank you for the explanation.

I still have some minor comments.


> /*
> +* Save the WAL filenames of the divergence and the current WAL insert
> +* location of the source server. Later only the WAL files between
> those
> +* would be copied to the target data directory.
>

Comment is outdated.  We don't save filenames anymore, now we save segment
numbers.


> +* Note:The later generated WAL files in the source server before the
> end
> +* of the copy of the data files must be made available when the target
> +* server is started. This can be done by configuring the target
> server as
> +* a standby of the source server.
> +*/


You miss space after "Note:".  Also, it seems reasonable for me to leave
empty line before "Note:".

# Setup parameter for WAL reclaim


Parameter*s*, because you're setting up multiple of them.

# The accuracy of imodification from pg_ls_waldir() is seconds, so sleep
> one seconds


One second without "s".

Also, please check empty lines in 006_wal_copy.pl to be just empty lines
without tabs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 8)

2017-09-28 Thread Alexander Korotkov

On Thu, Sep 28, 2017 at 2:22 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Fri, Jul 28, 2017 at 7:58 AM, Shubham Barai <shubhambara...@gmail.com>
> wrote:
>
>> I am attaching a patch for predicate locking in SP-Gist index
>>
>
> I took a look over this patch.
>
> newLeafBuffer = SpGistGetBuffer(index,
>> GBUF_LEAF | (isNulls ? GBUF_NULLS : 0),
>> Min(totalLeafSizes,
>> SPGIST_PAGE_CAPACITY),
>> );
>> PredicateLockPageSplit(index,
>> BufferGetBlockNumber(current->buffer),
>> BufferGetBlockNumber(newLeafBuffer));
>>
>
> You move predicate lock during split only when new leaf page is
> allocated.  However SP-GiST may move items to the free space of another
> busy page during split (see other branches in doPickSplit()).  Your patch
> doesn't seem to handle this case correctly.
>

I've more thoughts about this patch.

+ * SPGist searches acquire predicate lock only on the leaf pages.
> + During a page split, a predicate lock is copied from the original
> + page to the new page.


This approach isn't going to work.  When new tuple is inserted into
SP-GiST, choose method can return spgAddNode.  In this case new branch of
the tree is added.  And this new branch could probably be used by scans we
made in the past.  Therefore, you need to do predicate locking for internal
pages too in order to detect all the possible conflicts.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-28 Thread Alexander Korotkov

On Thu, Sep 28, 2017 at 10:52 PM, chenhj <chjis...@163.com> wrote:

> On 2017-09-29 00:43:18，"Alexander Korotkov" <a.korot...@postgrespro.ru>
> wrote:
>
> On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjis...@163.com> wrote:
>
>> On 2017-09-28 01:29:29，"Alexander Korotkov" <a.korot...@postgrespro.ru>
>> wrote:
>>
>> It appears that your patch conflicts with fc49e24f.  Please, rebase it.
>>
>>
>> Yes, i had rebased it, Please check the new patch.
>>
>
> Good, now it applies cleanly.
>
> else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
>>  IsXLogFileName(path + strlen(XLOGDIR"/")) &&
>>  (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0
>> ||
>>   strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) >
>> 0))
>
>
> According to our conding style, you should leave a space betwen XLOGDIF
> and "/".
> Also, you do a trick by comparison xlog segment numbers using strcmp().
> It's nice, but I would prefer seeing XLogFromFileName() here.  It would
> improve code readability and be less error prone during further
> modifications.
>
>
> Thanks for advice!
> I had modified it.
>

OK. Patch becomes better.
I also have more general question.  Why do we need upper bound for segment
number (last_source_segno)?  I understand the purpose of lower bound
(divergence_segno) which save us from copying extra WAL files, but what is
upper bound for?  As I understood, we anyway need to replay most recent WAL
records to reach consistent state after pg_rewind.  I propose to
remove last_source_segno unless I'm missing something.


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-28 Thread Alexander Korotkov

On Thu, Sep 28, 2017 at 6:44 PM, chenhj <chjis...@163.com> wrote:

> On 2017-09-28 01:29:29，"Alexander Korotkov" <a.korot...@postgrespro.ru>
> wrote:
>
> It appears that your patch conflicts with fc49e24f.  Please, rebase it.
>
>
> Yes, i had rebased it, Please check the new patch.
>

Good, now it applies cleanly.

else if (strncmp(path, XLOGDIR"/", strlen(XLOGDIR"/")) == 0 &&
> IsXLogFileName(path + strlen(XLOGDIR"/")) &&
> (strcmp(path + strlen(XLOGDIR"/") + 8, divergence_wal_filename + 8) < 0 ||
>  strcmp(path + strlen(XLOGDIR"/") + 8, last_source_wal_filename + 8) > 0))

According to our conding style, you should leave a space betwen XLOGDIF and
"/".
Also, you do a trick by comparison xlog segment numbers using strcmp().
It's nice, but I would prefer seeing XLogFromFileName() here.  It would
improve code readability and be less error prone during further
modifications.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] PoC: full merge join on comparison clause

2017-09-28 Thread Alexander Kuzmenkov


Hi Ashutosh,

Thanks for the review.

*Jeff*, I'm copying you because this is relevant to our discussion about 
what to do with mergeopfamilies when adding new merge join types.



You have renamed RestrictInfo member mergeopfamilies as
equivopfamilies. I don't think that's a good name; it doesn't convey
that these are opfamilies containing merge operators. The changes in
check_mergejoinable() suggest that an operator may act as equality
operator in few operator families and comparison operator in others.
That looks odd. Actually an operator family contains operators other
than equality operators, so you may want to retain this member and add
a new member to specify whether the clause is an equality clause or
not.


For mergeopfamilies, I'm not sure what is the best thing to do. I'll try 
to explain my understanding of the situation, please correct me if I'm 
wrong.


Before the patch, mergeopfamilies was used for two things: creating 
equivalence classes and performing merge joins.


For equivalence classes: we look at the restriction clauses, and if they 
have mergeopfamilies set, it means that these clause are based on an 
equality operator, and the left and right variables must be equal. To 
record this fact, we create an equivalence class. The variables might be 
equal for one equality operator and not equal for another, so we record 
the particular operator families to which our equality operator belongs.


For merge join: we look at the join clauses, and if they have 
mergeopfamilies set, it means that these clauses are based on an 
equality operator, and we can try performing this particular join as 
merge join. These opfamilies are also used beforehand to create the 
equivalence classes for left and right variables. The equivalence 
classes are used to match the join clauses to pathkeys describing the 
ordering of join inputs.


So, if we want to start doing merge joins for operators other than 
equality, we still need to record their opfamilies, but only use them 
for the second case and not the first. I chose to put these opfamilies 
to different variables, and
name the one used for equivalence classes 'equivopfamilies' and the one 
used for merge joins 'mergeopfamilies'. The equality operators are used 
for both cases, so we put their opfamilies into both of these variables.


I agree this might look confusing. Indeed, we could keep a single 
variable for opfamilies, and add separate flags that show how they can 
be used, be that for equivalence classes, merge joins, range joins or 
some combination of them. This is similar to what Jeff did in his range 
merge join patch [1]. I will think more about this and try to produce an 
updated patch.




In mergejoinscansel() you have just removed Assert(op_strategy ==
BTEqualStrategyNumber); Probably this function is written considering
on equality operators. But now that we are using this for all other
operators, we will need more changes to this function. That may be the
reason why INNER join in your earlier example doesn't choose right
costing.


I changed mergejoinscansel() slightly to reflect the fact that the inner 
relation is scanned from the beginning if we have an inequality merge 
clause.




The comment change in final_cost_mergejoin() needs more work. n1, n2,
n3 are number of rows on inner side with values 1, 2, 3 resp. So n1 +
n2 + n3 + ... = size of inner relation is correct. In that context I
am not able to understand your change
+* If the merge clauses contain inequality, (n1 + n2 + ...) ~=
+* (size of inner relation)^2.


I extended the comment in final_cost_mergejoin(). Not sure if that 
approximation makes any sense, but this is the best I could think of.


Style problems are fixed.

Attached please find the new version of the patch that addresses all the 
review comments except mergeopfamilies.


The current commitfest is ending, but I'd like to continue working on 
this patch, so I am moving it to the next one.



[1] 
https://www.postgresql.org/message-id/flat/CAMp0ubfwAFFW3O_NgKqpRPmm56M4weTEXjprb2gP_NrDaEC4Eg%40mail.gmail.com#camp0ubfwaffw3o_ngkqprpmm56m4wetexjprb2gp_nrdaec...@mail.gmail.com 



--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 32dc4e6301..2958a9e53d 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -722,19 +722,19 @@ get_useful_ecs_for_relation(PlannerInfo *root, RelOptInfo *rel)
 	{
 		RestrictInfo *restrictinfo = (RestrictInfo *) lfirst(lc);
 
-		/* Consider only mergejoinable clauses */
-		if (restrictinfo->mergeopfamilies == NIL)
+		/* Consider only mergejoinable equality clauses */
+		if (restrictinfo->equivopfamilies == NIL)
 			continue;
 
 		/* Make sure we've got canonical ECs. */
-		update_mergeclause_eclasses(root, restrictinfo);
+		update_equivclause_eclasses(root, restri

Re: [HACKERS] GSoC 2017: weekly progress reports (week 8)

2017-09-28 Thread Alexander Korotkov

Hi!

On Fri, Jul 28, 2017 at 7:58 AM, Shubham Barai <shubhambara...@gmail.com>
wrote:

> I am attaching a patch for predicate locking in SP-Gist index
>

I took a look over this patch.

newLeafBuffer = SpGistGetBuffer(index,
> GBUF_LEAF | (isNulls ? GBUF_NULLS : 0),
> Min(totalLeafSizes,
> SPGIST_PAGE_CAPACITY),
> );
> PredicateLockPageSplit(index,
> BufferGetBlockNumber(current->buffer),
> BufferGetBlockNumber(newLeafBuffer));
>

You move predicate lock during split only when new leaf page is allocated.
However SP-GiST may move items to the free space of another busy page
during split (see other branches in doPickSplit()).  Your patch doesn't
seem to handle this case correctly.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017 : Patch for predicate locking in Gist index

2017-09-28 Thread Alexander Korotkov

Hi!

On Wed, Jun 21, 2017 at 10:52 AM, Shubham Barai <shubhambara...@gmail.com>
wrote:

> Hi,
>
> On 21 June 2017 at 13:11, Heikki Linnakangas <hlinn...@iki.fi> wrote:
>
>> On 06/16/2017 01:24 PM, Shubham Barai wrote:
>>
>>> @@ -497,6 +499,13 @@ gistplacetopage(Relation rel, Size freespace,
>>> GISTSTATE *giststate,
>>> for (ptr = dist->next; ptr; ptr = ptr->next)
>>> UnlockReleaseBuffer(ptr->buffer);
>>> }
>>> +
>>> +   for (ptr = dist; ptr; ptr = ptr->next)
>>> +   PredicateLockPageSplit(rel,
>>> +
>>>  BufferGetBlockNumber(buffer),
>>> +
>>>  BufferGetBlockNumber(ptr->buffer));
>>> +
>>> +
>>>
>>
>> I think this new code needs to go before the UnlockReleaseBuffer() calls
>> above. Calling BufferGetBlockNumber() on an already-released buffer is not
>> cool.
>>
>> - Heikki
>>
>> I know that. This is the old version of the patch. I had sent updated
> patch later. Please have a look at updated patch.
>

I took a look on this patch.

In gistdoinsert() you do CheckForSerializableConflictIn() only if page
wasn't exclusively locked before (xlocked is false).

if (!xlocked)
> {
> LockBuffer(stack->buffer, GIST_UNLOCK);
> LockBuffer(stack->buffer, GIST_EXCLUSIVE);
> CheckForSerializableConflictIn(r, NULL, stack->buffer);
> xlocked = true;


However, page might be exclusively locked before.  And in this case
CheckForSerializableConflictIn() would be skipped.  That happens very
rarely (someone fixes incomplete split before we did), but nevertheless.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2017-09-28 Thread Alexander Korotkov

On Thu, Sep 28, 2017 at 12:45 AM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Wed, Aug 9, 2017 at 6:30 PM, Shubham Barai <shubhambara...@gmail.com>
> wrote:
>
>> Please find the updated patch for predicate locking in gin index here.
>>
>> There was a small issue in the previous patch. I didn't consider the case
>> where only root page exists in the tree, and there is a predicate lock on
>> it,
>> and it gets split.
>>
>> If we treat the original page as a left page and create a new root and
>> right
>> page, then we just need to copy a predicate lock from the left to the
>> right
>> page (this is the case in B-tree).
>>
>> But if we treat the original page as a root and create a new left and
>> right
>> page, then we need to copy a predicate lock on both new pages (in the
>> case of rum and gin).
>>
>> link to updated code and tests: https://github.com/shub
>> hambaraiss/postgres/commit/6172639a104785f051cb4aa0d511c58f2bae65a6
>>
>
> I've assigned to review this patch.  First of all I'd like to understand
> general idea of this patch.
>
> As I get, you're placing predicate locks to both entry tree leaf pages and
> posting tree leaf pages.  But, GIN implements so called "fast scan"
> technique which allows it to skip some leaf pages of posting tree when
> these pages are guaranteed to not contain matching item pointers.  Wherein
> the particular order of posting list scan and skip depends of their
> estimated size (so it's a kind of coincidence).
>
> But thinking about this more generally, I found that proposed locking
> scheme is redundant.  Currently when entry has posting tree, you're locking
> both:
> 1) entry tree leaf page containing pointer to posting tree,
> 2) leaf pages of corresponding posting tree.
> Therefore conflicting transactions accessing same entry would anyway
> conflict while accessing the same entry tree leaf page.  So, there is no
> necessity to lock posting tree leaf pages at all.  Alternatively, if entry
> has posting tree, you can skip locking entry tree leaf page and lock
> posting tree leaf pages instead.
>

I'd like to note that I had following warnings during compilation using
clang.

gininsert.c:519:47: warning: incompatible pointer to integer conversion
> passing 'void *' to parameter of type 'Buffer' (aka 'int')
> [-Wint-conversion]
> CheckForSerializableConflictIn(index, NULL, NULL);
> ^~~~
> /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/../lib/clang/8.0.0/include/stddef.h:105:16:
> note: expanded from macro 'NULL'
> #  define NULL ((void*)0)
>^~
> ../../../../src/include/storage/predicate.h:64:87: note: passing argument
> to parameter 'buffer' here
> extern void CheckForSerializableConflictIn(Relation relation, HeapTuple
> tuple, Buffer buffer);
>
> ^
> 1 warning generated.
> ginvacuum.c:163:2: warning: implicit declaration of function
> 'PredicateLockPageCombine' is invalid in C99
> [-Wimplicit-function-declaration]
> PredicateLockPageCombine(gvs->index, deleteBlkno, rightlink);
> ^
> 1 warning generated.


Also, I tried to remove predicate locks from posting tree leafs.  At least
isolation tests passed correctly after this change.

However, after telegram discussion with Andrew Borodin, we decided that it
would be better to do predicate locking and conflict checking for posting
tree leafs, but skip that for entry tree leafs (in the case when entry has
posting tree).  That would give us more granular locking and less false
positives.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)

2017-09-27 Thread Alexander Korotkov

Hi!

On Wed, Aug 9, 2017 at 6:30 PM, Shubham Barai <shubhambara...@gmail.com>
wrote:

> Please find the updated patch for predicate locking in gin index here.
>
> There was a small issue in the previous patch. I didn't consider the case
> where only root page exists in the tree, and there is a predicate lock on
> it,
> and it gets split.
>
> If we treat the original page as a left page and create a new root and
> right
> page, then we just need to copy a predicate lock from the left to the
> right
> page (this is the case in B-tree).
>
> But if we treat the original page as a root and create a new left and right
> page, then we need to copy a predicate lock on both new pages (in the case
> of rum and gin).
>
> link to updated code and tests: https://github.com/
> shubhambaraiss/postgres/commit/6172639a104785f051cb4aa0d511c58f2bae65a6
>

I've assigned to review this patch.  First of all I'd like to understand
general idea of this patch.

As I get, you're placing predicate locks to both entry tree leaf pages and
posting tree leaf pages.  But, GIN implements so called "fast scan"
technique which allows it to skip some leaf pages of posting tree when
these pages are guaranteed to not contain matching item pointers.  Wherein
the particular order of posting list scan and skip depends of their
estimated size (so it's a kind of coincidence).

But thinking about this more generally, I found that proposed locking
scheme is redundant.  Currently when entry has posting tree, you're locking
both:
1) entry tree leaf page containing pointer to posting tree,
2) leaf pages of corresponding posting tree.
Therefore conflicting transactions accessing same entry would anyway
conflict while accessing the same entry tree leaf page.  So, there is no
necessity to lock posting tree leaf pages at all.  Alternatively, if entry
has posting tree, you can skip locking entry tree leaf page and lock
posting tree leaf pages instead.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-27 Thread Alexander Korotkov

On Mon, Sep 25, 2017 at 6:26 PM, chenhj <chjis...@163.com> wrote:

> On 2017-09-23 01:59:0, "Alexander Korotkov" <a.korot...@postgrespro.ru>
> wrote:
>
> On Fri, Sep 22, 2017 at 7:16 PM, chenhj <chjis...@163.com> wrote:
>
>> This is the new pacth with TAP test and use Macro XLOGDIR.
>>
>
> Good.  I took a quick look over the patch.
> Why do you need master_query(), standby_query() and run_query() in
> RewindTest.pm?
> You can do just $node_master->safe_psql() and $node_slave->safe_psql()
> instead.
>
>
> Ooh, i did not notice that function.Thank you for your advice!
>

Great.  I tried this patch.  It applies cleanly, but doesn't compile.

pg_rewind.c:310:36: error: too few arguments provided to function-like
> macro invocation
> XLByteToSeg(divergerec, startsegno);
>   ^
> ../../../src/include/access/xlog_internal.h:118:9: note: macro
> 'XLByteToSeg' defined here
> #define XLByteToSeg(xlrp, logSegNo, wal_segsz_bytes) \
> ^
> pg_rewind.c:310:2: error: use of undeclared identifier 'XLByteToSeg'
> XLByteToSeg(divergerec, startsegno);
> ^
> pg_rewind.c:311:89: error: too few arguments provided to function-like
> macro invocation
> XLogFileName(divergence_wal_filename,
> targetHistory[lastcommontliIndex].tli, startsegno);
>
>  ^
> ../../../src/include/access/xlog_internal.h:155:9: note: macro
> 'XLogFileName' defined here
> #define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \
> ^
> pg_rewind.c:311:2: error: use of undeclared identifier 'XLogFileName'
> XLogFileName(divergence_wal_filename,
> targetHistory[lastcommontliIndex].tli, startsegno);
> ^
> pg_rewind.c:312:34: error: too few arguments provided to function-like
> macro invocation
> XLByteToPrevSeg(endrec, endsegno);
> ^
> ../../../src/include/access/xlog_internal.h:121:9: note: macro
> 'XLByteToPrevSeg' defined here
> #define XLByteToPrevSeg(xlrp, logSegNo, wal_segsz_bytes) \
> ^
> pg_rewind.c:312:2: error: use of undeclared identifier 'XLByteToPrevSeg'
> XLByteToPrevSeg(endrec, endsegno);
> ^
> pg_rewind.c:313:57: error: too few arguments provided to function-like
> macro invocation
> XLogFileName(last_source_wal_filename, endtli, endsegno);
>^
> ../../../src/include/access/xlog_internal.h:155:9: note: macro
> 'XLogFileName' defined here
> #define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \
> ^
> pg_rewind.c:313:2: error: use of undeclared identifier 'XLogFileName'
> XLogFileName(last_source_wal_filename, endtli, endsegno);
> ^
> 8 errors generated.


It appears that your patch conflicts with fc49e24f.  Please, rebase it.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-09-27 Thread Alexander Korotkov

Hi!

I took a look on this patch.  I've following notes for now.

tuple_insert_hook tuple_insert; /* heap_insert */
> tuple_update_hook tuple_update; /* heap_update */
> tuple_delete_hook tuple_delete; /* heap_delete */


I don't think this set of functions provides good enough level of
abstraction for storage AM.  This functions encapsulate only low-level work
of insert/update/delete tuples into heap itself.  However, it still assumed
that indexes are managed outside of storage AM.  I don't think this is
right, assuming that one of most demanded storage API usage would be
different MVCC implementation (for instance, UNDO log based as discussed
upthread).  Different MVCC implementation is likely going to manage indexes
in a different way.  For example, storage AM utilizing UNDO would implement
in-place update even when indexed columns are modified.  Therefore this
piece of code in ExecUpdate() wouldn't be relevant anymore.

/*
> * insert index entries for tuple
> *
> * Note: heap_update returns the tid (location) of the new tuple in
> * the t_self field.
> *
> * If it's a HOT update, we mustn't insert new index entries.
> */
> if ((resultRelInfo->ri_NumIndices > 0) &&
> !storage_tuple_is_heaponly(resultRelationDesc, tuple))
> recheckIndexes = ExecInsertIndexTuples(slot, &(slot->tts_tid),
>   estate, false, NULL, NIL);


I'm firmly convinced that this logic should be encapsulated into storage AM
altogether with inserting new index tuples on storage insert.  Also, HOT
should be completely encapsulated into heapam.  It's quite evident for me
that storage utilizing UNDO wouldn't have anything like our current HOT.
Therefore, I think there shouldn't be hot_search_buffer() API function.
 tuple_fetch() may cover hot_search_buffer(). That might require some
signature change of tuple_fetch() (probably, extra arguments).

LockTupleMode and HeapUpdateFailureData shouldn't be private of heapam.
Any fullweight OLTP storage AM should support our tuple lock modes and
should be able to report update failures.  HeapUpdateFailureData should be
renamed to something like StorageUpdateFailureData.  Contents of
HeapUpdateFailureData seems to me general enough to be supported by any
storage with ItemPointer tuple locator.

storage_setscanlimits() is used only during index build.  I think that
since storage AM may have different MVCC implementation then storage AM
should decide how to communicate with indexes including index build.
Therefore, instead of exposing storage_setscanlimits(), the
whole IndexBuildHeapScan() should be encapsulated into storage AM.

Also, BulkInsertState should be private of structure of heapam.  Another
storages may have another state for bulk insert.  On API level we might
have some abstract pointer instead of BulkInsertState while having
GetBulkInsertState and others as API methods.

storage_freeze_tuple() is called only once from rewrite_heap_tuple().  That
makes me think that tuple_freeze API method is wrong for abstraction.  We
probably should make rewrite_heap_tuple() or even the
whole rebuild_relation() an API method...

Heap reloptions are untouched for now.  Storage AM should be able to
provide its own specific options just like index AMs do.

That's all I have for now.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-09-27 Thread Alexander Korotkov

On Wed, Aug 23, 2017 at 8:26 AM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

> - Minor: don't think the _function suffix for Storis necessary, just
>>   makes things long, and every member has it. Besides that, it's also
>>   easy to misunderstand - for a second I understood
>>   scan_getnext_function to be about getting the next function...
>>
>
> OK. How about adding _hook?
>

I've answered to Andrew why I think _function suffix is OK for now.
And I don't particularly like _hook suffix for this purpose, because those
functions are parts of API implementation, not hooks.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-09-27 Thread Alexander Korotkov

Hi,

Thank you for review on this subject.  I think it's extremely important for
PostgreSQL to eventually get pluggable storage API.
In general I agree with all your points.  But I'd like to make couple of
comments.

On Tue, Aug 15, 2017 at 9:53 AM, Andres Freund <and...@anarazel.de> wrote:

> - I don't think we should introduce this without a user besides
>   heapam. The likelihood that API will be usable by anything else
>   without a testcase seems fairly remote.  I think some src/test/modules
>   type implementation of a per-session, in-memory storage - relatively
>   easy to implement - or such is necessary.
>

+1 for having a user before committing API.  However, I'd like to note that
sample storage implementation should do something really different from out
current heap.  In particular, if per-session, in-memory storage would be
just another way to keep heap in local buffers, it wouldn't be OK for me;
because such kind of storage could be achieved way more easier without so
complex API.  But if per-session, in-memory storage would, for instance,
utilize different MVCC implementation, that would be very good sample of
storage API usage.

> - Minor: don't think the _function suffix for Storis necessary, just
>   makes things long, and every member has it. Besides that, it's also
>   easy to misunderstand - for a second I understood
>   scan_getnext_function to be about getting the next function...
>

_function suffix looks long for me too.  But we should look on this
question from uniformity point of view.
FdwRoutine, TsmRoutine, IndexAmRoutine use _function suffix.  This is why I
think we should use _function suffix for StorageAmRoutine unless we're
going to change that for other *Routines too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] make async slave to wait for lsn to be replayed

2017-09-26 Thread Alexander Korotkov

On Mon, Jan 23, 2017 at 2:56 PM, Ivan Kartyshov <i.kartys...@postgrespro.ru>
wrote:

> How to use it
> ==
> WAITLSN ‘LSN’ [, timeout in ms];
> WAITLSN_INFINITE ‘LSN’;
> WAITLSN_NO_WAIT ‘LSN’;

Adding suffix to the command name looks weird.  We don't do so for any
other command.
I propose following syntax options.

WAITLSN lsn;
WAITLSN lsn TIMEOUT delay;
WAITLSN lsn INFINITE;
WAITLSN lsn NOWAIT;

For me that looks rather better.  What do you think?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-22 Thread Alexander Korotkov

On Fri, Sep 22, 2017 at 7:16 PM, chenhj <chjis...@163.com> wrote:

> This is the new pacth with TAP test and use Macro XLOGDIR.
>

Good.  I took a quick look over the patch.
Why do you need master_query(), standby_query() and run_query() in
RewindTest.pm?
You can do just $node_master->safe_psql() and $node_slave->safe_psql()
instead.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Should we cacheline align PGXACT?

2017-09-22 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 12:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Mon, Sep 18, 2017 at 12:41 PM, Daniel Gustafsson <dan...@yesql.se>
> wrote:
>
>> > On 16 Sep 2017, at 01:51, Alexander Korotkov <a.korot...@postgrespro.ru>
>> wrote:
>> >
>> > On Tue, Sep 5, 2017 at 2:47 PM, Daniel Gustafsson <dan...@yesql.se
>> <mailto:dan...@yesql.se>> wrote:
>> > > On 04 Apr 2017, at 14:58, David Steele <da...@pgmasters.net > da...@pgmasters.net>> wrote:
>> > >
>> > > On 4/4/17 8:55 AM, Alexander Korotkov wrote:
>> > >> On Mon, Apr 3, 2017 at 9:58 PM, Andres Freund <and...@anarazel.de
>> <mailto:and...@anarazel.de>
>> > >>
>> > >>I'm inclined to push this to the next CF, it seems we need a lot
>> more
>> > >>benchmarking here.
>> > >>
>> > >> No objections.
>> > >
>> > > This submission has been moved to CF 2017-07.
>> >
>> > This CF has now started (well, 201709 but that’s what was meant in
>> above), can
>> > we reach closure on this patch in this CF?
>> >
>> > During previous commitfest I come to doubts if this patch is really
>> needed when same effect could be achieved by another (perhaps random)
>> change of alignment.  The thing I can do now is to retry my benchmark on
>> current master and check what effect this patch has now.
>>
>> Considering this I’ll mark this as Waiting on Author, in case you come to
>> conclusion that another patch is required then we’ll bump to a return
>> status.
>>
>
> I've made some read-only benchmarking.  There is clear win in this case.
> The only point where median of master is higher than median of patched
> version is 40 clients.
>
> In this point observations are following:
> master:   982432 939483 932075
> pgxact-align: 913151 921300 938132
> So, groups of observations form the overlapping ranges, and this anomaly
> can be explained by statistical error.
>
> I'm going to make some experiments with read-write and mixed workloads too.
>

I've made benchmarks with two more workloads.
scalability-rw.png – read-write tcpb-like workload (pgbench default)
scalability-rrw.png – 90% read-only transactions 10% read-write
transactions (-btpcb-like@1 -bselect-only@9)
It became clear that this patch causes regression.  On pure read-write
workload, it's not so evident due to high variability of observations.
However, on mixed workload it's very clear regression.
I would be ridiculous to consider patch which improves read-only
performance but degrades read-write performance in nearly same degree.
Thus, this topic needs more investigation if it's possible to at least get
the same benefit on read-only workload without penalty on read-write
workload (ideally read-write workload should benefit too).  I'm going to
mark this patch as "returned with feedback".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should we cacheline align PGXACT?

2017-09-21 Thread Alexander Korotkov

On Mon, Sep 18, 2017 at 12:41 PM, Daniel Gustafsson <dan...@yesql.se> wrote:

> > On 16 Sep 2017, at 01:51, Alexander Korotkov <a.korot...@postgrespro.ru>
> wrote:
> >
> > On Tue, Sep 5, 2017 at 2:47 PM, Daniel Gustafsson <dan...@yesql.se
> <mailto:dan...@yesql.se>> wrote:
> > > On 04 Apr 2017, at 14:58, David Steele <da...@pgmasters.net  da...@pgmasters.net>> wrote:
> > >
> > > On 4/4/17 8:55 AM, Alexander Korotkov wrote:
> > >> On Mon, Apr 3, 2017 at 9:58 PM, Andres Freund <and...@anarazel.de
> <mailto:and...@anarazel.de>
> > >>
> > >>I'm inclined to push this to the next CF, it seems we need a lot
> more
> > >>benchmarking here.
> > >>
> > >> No objections.
> > >
> > > This submission has been moved to CF 2017-07.
> >
> > This CF has now started (well, 201709 but that’s what was meant in
> above), can
> > we reach closure on this patch in this CF?
> >
> > During previous commitfest I come to doubts if this patch is really
> needed when same effect could be achieved by another (perhaps random)
> change of alignment.  The thing I can do now is to retry my benchmark on
> current master and check what effect this patch has now.
>
> Considering this I’ll mark this as Waiting on Author, in case you come to
> conclusion that another patch is required then we’ll bump to a return
> status.
>

I've made some read-only benchmarking.  There is clear win in this case.
The only point where median of master is higher than median of patched
version is 40 clients.

In this point observations are following:
master:   982432 939483 932075
pgxact-align: 913151 921300 938132
So, groups of observations form the overlapping ranges, and this anomaly
can be explained by statistical error.

I'm going to make some experiments with read-write and mixed workloads too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2

2017-09-21 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 3:14 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alexander Korotkov <aekorot...@gmail.com> writes:
> > On Thu, Sep 21, 2017 at 2:06 AM, Darafei "Komяpa" Praliaskouski <
> >> It seems to me that any circle with radius of any Infinity should
> become a
> >> [-Infinity .. Infinity, -Infinity .. Infinity] box.Then you won't have
> >> NaNs, and index structure shouldn't be broken.
>
> > We probably should produce [-Infinity .. Infinity, -Infinity .. Infinity]
> > box for any geometry containing inf or nan.
>
> Hm, we can do better in at least some cases, eg for a box ((0,1),(1,inf))
> there's no reason to give up our knowledge of finite bounds for the
> other three boundaries.  But certainly for a NaN circle radius
> what you suggest seems the most sensible thing to do.
>

OK.  I'll try implement this for circles.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 1:53 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/8/17 00:13, Pavel Stehule wrote:
> > I am sending rebased patch
> >
> > rebased again + fix obsolete help
>
> Why are the variables called VERBOSE_SORT_* ?  What is verbose about them?

I assume Pavel called them so, because they are working only for "verbose"
mode of command.  I.e. they are working for \dt+ not \dt.
However, in \dt 2 of 3 sorting modes might work: schema_name and
name_schema.  Thus, I think it worths enabling these variables for "non
verbose" mode of commands too.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-21 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 1:52 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 9/19/17 12:54, Pavel Stehule wrote:
> > However, patch misses regression tests covering added functionality.
> >
> > I am not sure if there are any tests related to output of \dt+ commands
> > - there result is platform depend.
>
> How so?


\dt+ reports relation sizes whose are platform depended.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] compress method for spgist - 2

2017-09-20 Thread Alexander Korotkov

On Thu, Sep 21, 2017 at 2:06 AM, Darafei "Komяpa" Praliaskouski <
m...@komzpa.net> wrote:

> It is possible for bbox->low.x to be NaN when circle->center.x is and
>> circle->radius are both +Infinity.
>>
>
> What is rationale behind this circle?
>

I would prefer to rather forbid any geometries with infs and nans.
However, then upgrade process will suffer.  User with such geometries would
get errors during dump/restore, pg_upgraded instances would still contain
invalid values...

> It seems to me that any circle with radius of any Infinity should become a
> [-Infinity .. Infinity, -Infinity .. Infinity] box.Then you won't have
> NaNs, and index structure shouldn't be broken.
>

We probably should produce [-Infinity .. Infinity, -Infinity .. Infinity]
box for any geometry containing inf or nan.  That MBR would be founded for
any query, saying: "index can't help you for this kind value, only recheck
can deal with that".  Therefore, we would at least guarantee that results
of sequential scan and index scan are the same.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] compress method for spgist - 2

2017-09-20 Thread Alexander Korotkov

On Wed, Sep 20, 2017 at 11:07 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Darafei Praliaskouski <m...@komzpa.net> writes:
> > I have some questions about the circles example though.
>
> >  * What is the reason for isnan check and swap of box ordinates for
> circle? It wasn't in the code previously.
>
> I hadn't paid any attention to this patch previously, but this comment
> excited my curiosity, so I went and looked:
>
> +   bbox->high.x = circle->center.x + circle->radius;
> +   bbox->low.x = circle->center.x - circle->radius;
> +   bbox->high.y = circle->center.y + circle->radius;
> +   bbox->low.y = circle->center.y - circle->radius;
> +
> +   if (isnan(bbox->low.x))
> +   {
> +   double tmp = bbox->low.x;
> +   bbox->low.x = bbox->high.x;
> +   bbox->high.x = tmp;
> +   }
>
> Maybe I'm missing something, but it appears to me that it's impossible for
> bbox->low.x to be NaN unless circle->center.x and/or circle->radius is a
> NaN, in which case bbox->high.x would also have been computed as a NaN,
> making the swap entirely useless.  Likewise for the Y case.  There may be
> something useful to do about NaNs here, but this doesn't seem like it.
>

Yeah, +1.

> How about removing circle from the scope of this patch, so it is smaller
> and cleaner?
>
> Neither of those patches would be particularly large, and since they'd need
> to touch adjacent code in some places, the diffs wouldn't be independent.
> I think splitting them is just make-work.
>

I've extracted polygon opclass into separate patch (attached).  I'll rework
and resubmit circle patch later.
I'm not particularly sure that polygon.sql is a good place for testing
sp-gist opclass for polygons...  But we've already done so for box.sql.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-spgist-compress-method-7.patch
Description: Binary data


0002-spgist-polygon-7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2

2017-09-20 Thread Alexander Korotkov

On Wed, Sep 20, 2017 at 10:00 PM, Darafei Praliaskouski <m...@komzpa.net>
wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:tested, passed
>
> Hi,
>
> I like the SP-GiST part of the patch. Looking forward to it, so PostGIS
> can benefit from SP-GiST infrastructure.
>
> I have some questions about the circles example though.
>
>  * What is the reason for isnan check and swap of box ordinates for
> circle? It wasn't in the code previously.
>
 * There are tests for infinities in circles, but checks are for NaNs.
>

This code was migrated from original patch by Nikita.  I can assume he
means that nan should be treated as greatest possible floating point value
(like float4_cmp_internal() does).  However, our current implementation of
geometrical datatypes don't correctly handles all the combinations of infs
as nans.  Most of code was written without taking infs and nans into
account.  Also, I'm not sure if this code fixes all possible issues with
infs and nans in SP-GiST opclass for circles.  This is why I'm going to
remove nans handling from this place.

>  * It seems to me that circle can be implemented without recheck, by
> making direct exact calculations.
>

Right.  Holding circles in the leafs instead of bounding boxes would both
allow exact calculations and take less space.

> How about removing circle from the scope of this patch, so it is smaller
> and cleaner?

Good point.  Polygons are enough for compress function example.  Opclass
for circles could be submitted as separate patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Allow GiST opcalsses without compress\decompres functions

2017-09-20 Thread Alexander Korotkov

On Wed, Sep 20, 2017 at 5:02 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Alexander Korotkov <a.korot...@postgrespro.ru> writes:
> > On Wed, Sep 20, 2017 at 4:25 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> >> Yes.  We don't allow out-of-line values, but we do allow compressed and
> >> short-header values.
>
> > BTW, this comment looks still invalid for me...
>
> >> #define SIGLENINT  4 /* >122 => key will *toast*, so very slow!!! */
>
> I haven't done the math to see if 122 is exactly the right value,
> but at some point index_form_tuple would decide that the tuple was
> uncomfortably big and try to compress it.  So the comment is right
> in general terms.
>

If comment means toast as compression, not out-of-line storage then it's
true.
However, it would be good to rephrase this comment to clarify that.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Allow GiST opcalsses without compress\decompres functions

2017-09-20 Thread Alexander Korotkov

On Wed, Sep 20, 2017 at 4:25 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> Andrey Borodin <x4...@yandex-team.ru> writes:
> > You mentioned that probably there cannot be TOASTed values in the index.
> > I need to settle this question in more deterministic way.
> > Can you point where to look at the code or who to ask:
> > Can there be TOASTed values in index tuples?
>
> Yes.  We don't allow out-of-line values, but we do allow compressed and
> short-header values.
>
> If you don't believe me, look at index_form_tuple().
>

OK.
So GiST opclass should either implement decompress method with detoasting
or detoast every input key in all other methods.

BTW, this comment looks still invalid for me...

> #define SIGLENINT  4 /* >122 => key will *toast*, so very slow!!! */


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] compress method for spgist - 2

2017-09-20 Thread Alexander Korotkov

On Mon, Sep 18, 2017 at 6:21 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> On Tue, Aug 25, 2015 at 4:05 PM, Michael Paquier <
> michael.paqu...@gmail.com> wrote:
>
>> On Thu, Jul 23, 2015 at 6:18 PM, Teodor Sigaev <teo...@sigaev.ru> wrote:
>> >>> Poorly, by hanging boxes that straddled dividing lines off the parent
>> >>> node in a big linear list. The hope would be that the case was
>> >>
>> >> Ok, I see, but that's not really what I was wondering. My question is
>> >> this:
>> >> SP-GiST partitions the space into non-overlapping sections. How can you
>> >> store
>> >> polygons - which can overlap - in an SP-GiST index? And how does the
>> >> compress
>> >> method help with that?
>> >
>> >
>> > I believe if we found a way to index boxes then we will need a compress
>> > method to build index over polygons.
>> >
>> > BTW, we are working on investigation a index structure for box where
>> 2d-box
>> > is treated as 4d-point.
>>
>> There has been no activity on this patch for some time now, and a new
>> patch version has not been submitted, so I am marking it as return
>> with feedback.
>>
>
> There is interest to this patch from PostGIS users.  It would be nice to
> pickup this patch.
> AFAICS, the progress on this patch was suspended because we have no
> example for SP-GiST compress method in core/contrib.
> However, now we have acdf2a8b committed with 2d to 4d indexing of boxes
> using SP-GiST.  So, extending this 2d to 4d approach to polygons would be
> good example of SP-GiST compress method in core.  Would anyone be
> volunteering for writing such patch?
> If nobody, then I could do it
>

Nobody answered yet.  And I decided to nail down this long term issue.
Please, find following attached patches.

0001-spgist-compress-method-6.patch

Patch with SP-GiST compress method rebased on current master.  Index AM
interface was changed since that time.  I've added validation for compress
method: it validates input and output types of compress method.  That
required to call config method before.  That is controversial solution.  In
particular, no collation is provided in config method call.  It would be
weird if collation could affect data types in SP-GiST config method output,
but anyway...

0002-spgist-circle-polygon-6.patch

This patch provides example of SP-GiST compress method usage.  It adds
SP-GiST indexing for circles and polygons using mapping of their bounding
boxes to 4d.  This patch is based on prior work by Nikita Glukhov for
SP-GiST KNN.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-spgist-compress-method-6.patch
Description: Binary data


0002-spgist-circle-polygon-6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow GiST opcalsses without compress\decompres functions

2017-09-20 Thread Alexander Korotkov

On Wed, Sep 20, 2017 at 12:43 PM, Andrey Borodin <x4...@yandex-team.ru>
wrote:

> Hello Tom! Thanks for committing the patch!
>
> > 20 сент. 2017 г., в 8:38, Tom Lane <t...@sss.pgh.pa.us> написал(а):
> >
> > Andrey Borodin <x4...@yandex-team.ru> writes:
> >> [ 0001-Allow-uncompressed-GiST-4.patch ]
> >
> > ... There's still room
> > to improve the contrib opclasses
>
> I have one important question regarding compress\decompres functions.
>
> You mentioned that probably there cannot be TOASTed values in the index.
> I need to settle this question in more deterministic way.
> Can you point where to look at the code or who to ask:
> Can there be TOASTed values in index tuples?
>
> If answer is NO, we can get rid of much more useless code.
>

And get rid of misleading comments too.  For instance, this comment in
hstore_gist.c seems misleading for me.

#define SIGLENINT  4 /* >122 => key will *toast*, so very slow!!! */


--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Index Only Scan support for cube

2017-09-20 Thread Alexander Korotkov

On Wed, Sep 20, 2017 at 8:26 AM, Andrey Borodin <x4...@yandex-team.ru>
wrote:

> Hi hackers!
> > 23 мая 2017 г., в 14:53, Andrew Borodin <boro...@octonica.com>
> написал(а):
> >
> > Here's a small patch that implements fetch function necessary for
> > Index Only Scans that use cube data type.
>
> Tom Lane have just commited d3a4f89 (Allow no-op GiST support functions to
> be omitted) Thanks, Tom! : )
> "Index Only Scan support for cube" patch now is obsolete. I'm working on
> another similar patch for contribs to support GiST IOS and remove no-op
> support functions.
>

Good.
BTW, some strangeness of g_cube_decompress() catch my eye.  It compares
results of two evaluations of same expression DatumGetNDBOXP(entry->key).

NDBOX   *key = DatumGetNDBOXP(entry->key);
> if (key != DatumGetNDBOXP(entry->key))

In fact it's correct, because it compares results of two detoasting.  If
datum isn't toasted then results would be the same.  And if data is toasted
then results would be two different allocation of detoasted datum.
However, we do extra detoasting here.

For example, see gbt_var_decompress().  There is no extra detoasting here.

GBT_VARKEY *key = (GBT_VARKEY *) PG_DETOAST_DATUM(entry->key);
> if (key != (GBT_VARKEY *) DatumGetPointer(entry->key))

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-19 Thread Alexander Korotkov

On Tue, Sep 19, 2017 at 7:54 PM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

> 2017-09-19 16:14 GMT+02:00 Alexander Korotkov <a.korot...@postgrespro.ru>:
>
>> On Fri, Sep 8, 2017 at 7:13 AM, Pavel Stehule <pavel.steh...@gmail.com>
>> wrote:
>>
>>> 2017-08-16 14:06 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:
>>>
>>>> Hi
>>>>
>>>> 2017-08-15 4:37 GMT+02:00 Peter Eisentraut <
>>>> peter.eisentr...@2ndquadrant.com>:
>>>>
>>>>> On 3/11/17 07:06, Pavel Stehule wrote:
>>>>> > I am sending a updated version with separated sort direction in
>>>>> special
>>>>> > variable
>>>>>
>>>>> This patch also needs a rebase.
>>>>>
>>>>
>>>> I am sending rebased patch
>>>>
>>>
>>> rebased again + fix obsolete help
>>>
>>
>> For me, patch applies cleanly, builds and passed regression tests.
>> However, patch misses regression tests covering added functionality.
>>
>
> I am not sure if there are any tests related to output of \dt+ commands -
> there result is platform depend.
>

BTW, why isn't order by name_schema available for \dt?  If it's available
we could at least cover this case by plain regression tests.
\dt+ could be covered by TAP tests, but it isn't yet.  I think one day we
should add them.  However, I don't think we should force you to write them
in order to push this simple patch.

Patch is definitely harmless, i.e. it doesn't affect anybody who doesn't
>> use new functionality.
>> But I still would prefer ordering to be options of \d* commands while
>> psql variables be defaults for those options...
>>
>
> I understand
>
> a) I don't think so commands like \dt++ (or similar) is good idea - these
> commands should be simple how it is possible
>

I don't particularly like \dt++, but second argument is probably an option.

> b) this patch doesn't block any other design - more it opens the door
> because the executive part will be implemented and users can have a
> experience with with different output sorts - so if people will need more
> quick change of result sort, then the work in this area will continue.
>

OK. As reviewer, I'm not going to block this patch if you see its
functionality limited by just psql variables.
I think you should add support of name_schema \dt and some regression tests
for this case, before I can mark this as "ready for committer".

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Re: proposal - psql: possibility to specify sort for describe commands, when size is printed

2017-09-19 Thread Alexander Korotkov

On Fri, Sep 8, 2017 at 7:13 AM, Pavel Stehule <pavel.steh...@gmail.com>
wrote:

> 2017-08-16 14:06 GMT+02:00 Pavel Stehule <pavel.steh...@gmail.com>:
>
>> Hi
>>
>> 2017-08-15 4:37 GMT+02:00 Peter Eisentraut <peter.eisentraut@2ndquadrant.
>> com>:
>>
>>> On 3/11/17 07:06, Pavel Stehule wrote:
>>> > I am sending a updated version with separated sort direction in special
>>> > variable
>>>
>>> This patch also needs a rebase.
>>>
>>
>> I am sending rebased patch
>>
>
> rebased again + fix obsolete help
>

For me, patch applies cleanly, builds and passed regression tests.
However, patch misses regression tests covering added functionality.
Patch is definitely harmless, i.e. it doesn't affect anybody who doesn't
use new functionality.
But I still would prefer ordering to be options of \d* commands while psql
variables be defaults for those options...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] type cache for concat functions

2017-09-19 Thread Alexander Kuzmenkov

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   tested, passed
Documentation:tested, passed

Looks good to me.

The new status of this patch is: Ready for Committer

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] compress method for spgist - 2

2017-09-18 Thread Alexander Korotkov

On Tue, Aug 25, 2015 at 4:05 PM, Michael Paquier <michael.paqu...@gmail.com>
wrote:

> On Thu, Jul 23, 2015 at 6:18 PM, Teodor Sigaev <teo...@sigaev.ru> wrote:
> >>> Poorly, by hanging boxes that straddled dividing lines off the parent
> >>> node in a big linear list. The hope would be that the case was
> >>
> >> Ok, I see, but that's not really what I was wondering. My question is
> >> this:
> >> SP-GiST partitions the space into non-overlapping sections. How can you
> >> store
> >> polygons - which can overlap - in an SP-GiST index? And how does the
> >> compress
> >> method help with that?
> >
> >
> > I believe if we found a way to index boxes then we will need a compress
> > method to build index over polygons.
> >
> > BTW, we are working on investigation a index structure for box where
> 2d-box
> > is treated as 4d-point.
>
> There has been no activity on this patch for some time now, and a new
> patch version has not been submitted, so I am marking it as return
> with feedback.
>

There is interest to this patch from PostGIS users.  It would be nice to
pickup this patch.
AFAICS, the progress on this patch was suspended because we have no example
for SP-GiST compress method in core/contrib.
However, now we have acdf2a8b committed with 2d to 4d indexing of boxes
using SP-GiST.  So, extending this 2d to 4d approach to polygons would be
good example of SP-GiST compress method in core.  Would anyone be
volunteering for writing such patch?
If nobody, then I could do it

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] SQL/JSON in PostgreSQL

2017-09-17 Thread Alexander Korotkov

On Sun, Sep 17, 2017 at 11:08 AM, Oleg Bartunov <obartu...@gmail.com> wrote:

> On 16 Sep 2017 02:32, "Nikita Glukhov" <n.glu...@postgrespro.ru> wrote:
>
> On 15.09.2017 22:36, Oleg Bartunov wrote:
>
> On Fri, Sep 15, 2017 at 7:31 PM, Robert Haas <robertmh...@gmail.com>
>> wrote:
>>
>>> On Fri, Sep 15, 2017 at 10:10 AM, Daniel Gustafsson <dan...@yesql.se>
>>> wrote:
>>>
>>>> Can we expect a rebased version of this patch for this commitfest?
>>>> Since it’s
>>>> a rather large feature it would be good to get it in as early as we can
>>>> in the
>>>> process.
>>>>
>>> Again, given that this needs a "major" rebase and hasn't been updated
>>> in a month, and given that the CF is already half over, this should
>>> just be bumped to the next CF.  We're supposed to be trying to review
>>> things that were ready to go by the start of the CF, not the end.
>>>
>> We are supporting v10 branch in our github repository
>> https://github.com/postgrespro/sqljson/tree/sqljson_v10
>>
>> Since the first post we made a lot of changes, mostly because of
>> better understanding the standard and availability of technical report
>> (http://standards.iso.org/ittf/PubliclyAvailableStandards/c0
>> 67367_ISO_IEC_TR_19075-6_2017.zip).
>> Most important are:
>>
>> 1.We abandoned FORMAT support, which could confuse our users, since we
>> have data types json[b].
>>
>> 2. We use XMLTABLE infrastructure, extended for  JSON_TABLE support.
>>
>> 3. Reorganize commits, so we could split one big patch by several
>> smaller patches, which could be reviewed independently.
>>
>> 4. The biggest problem is documentation, we are working on it.
>>
>> Nikita will submit patches soon.
>>
>
> Attached archive with 9 patches rebased onto latest master.
>
> 0001-jsonpath-v02.patch:
>  - jsonpath type
>  - jsonpath execution on jsonb type
>  - jsonpath operators for jsonb type
>  - GIN support for jsonpath operators
>
> 0002-jsonpath-json-v02.patch:
>  - jsonb-like iterators for json type
>  - jsonpath execution on json type
>  - jsonpath operators for json type
>
> 0003-jsonpath-extensions-v02.patch:
> 0004-jsonpath-extensions-tests-for-json-v02.patch:
>  - some useful standard extensions with tests
>  0005-sqljson-v02.patch:
>  - SQL/JSON constructors (JSON_OBJECT[AGG], JSON_ARRAY[AGG])
>  - SQL/JSON query functions (JSON_VALUE, JSON_QUERY, JSON_EXISTS)
>  - IS JSON predicate
>
> 0006-sqljson-json-v02.patch:
>  - SQL/JSON support for json type and tests
>
> 0007-json_table-v02.patch:
>  - JSON_TABLE using XMLTABLE infrastructure
>
> 0008-json_table-json-v02.patch:
>  - JSON_TABLE support for json type
>
> 0009-wip-extensions-v02.patch:
>  - FORMAT JSONB
>  - jsonb to/from bytea casts
>  - jsonpath operators
>  - some unfinished jsonpath extensions
>
>
> Originally, JSON path was implemented only for jsonb type, and I decided to
> add jsonb-like iterators for json type for json support implementation with
> minimal changes in JSON path code.  This solution (see jsonpath_json.c from
> patch 0002) looks a little dubious to me, so I separated json support into
> independent patches.
>
> The last WIP patch 0009 is unfinished and contains a lot of FIXMEs.  But
> the ability to use arbitrary Postgres operators in JSON path with
> explicitly
> specified  types is rather interesting, and I think it should be shown now
> to get a some kind of pre-review.
>
> We are supporting v11 and v10 branches in our github repository:
>
> https://github.com/postgrespro/sqljson/tree/sqljson
> https://github.com/postgrespro/sqljson/tree/sqljson_wip
> https://github.com/postgrespro/sqljson/tree/sqljson_v10
> https://github.com/postgrespro/sqljson/tree/sqljson_v10_wip
>
>
> We provide web interface to our build
> http://sqlfiddle.postgrespro.ru/#!21/
>

+1,
For experimenting with SQL/JSON select "PostgreSQL 10dev+SQL/JSON" in the
version select field on top toolbar.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH]make pg_rewind to not copy useless WAL files

2017-09-16 Thread Alexander Korotkov

Hi!

On Sat, Sep 16, 2017 at 5:56 PM, chenhj <chjis...@163.com> wrote:

> This patch optimizes the above mentioned issues, as follows:
> 1. In the target data directory, do not delete the WAL files before the
> divergence.
> 2. When copying files from the source server, do not copy the WAL files
> before the divergence and the WAL files after the current WAL insert
> localtion.
>

Looks like cool optimization for me.  Please, add this patch to the next
commitfest.
Do you think this patch should modify pg_rewind tap tests too?  It would be
nice to make WAL files fetching more covered by tap tests.  In particular,
new tests may generate more WAL files and make sure that pg_rewind fetches
only required files among them.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Should we cacheline align PGXACT?

2017-09-15 Thread Alexander Korotkov

On Tue, Sep 5, 2017 at 2:47 PM, Daniel Gustafsson <dan...@yesql.se> wrote:

> > On 04 Apr 2017, at 14:58, David Steele <da...@pgmasters.net> wrote:
> >
> > On 4/4/17 8:55 AM, Alexander Korotkov wrote:
> >> On Mon, Apr 3, 2017 at 9:58 PM, Andres Freund <and...@anarazel.de
> >>
> >>I'm inclined to push this to the next CF, it seems we need a lot more
> >>benchmarking here.
> >>
> >> No objections.
> >
> > This submission has been moved to CF 2017-07.
>
> This CF has now started (well, 201709 but that’s what was meant in above),
> can
> we reach closure on this patch in this CF?
>

During previous commitfest I come to doubts if this patch is really needed
when same effect could be achieved by another (perhaps random) change of
alignment.  The thing I can do now is to retry my benchmark on current
master and check what effect this patch has now.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Allow GiST opcalsses without compress\decompres functions

2017-09-15 Thread Alexander Korotkov

On Fri, Sep 15, 2017 at 3:36 PM, Andrey Borodin <x4...@yandex-team.ru>
wrote:

> > 14 сент. 2017 г., в 18:42, Alexander Korotkov <a.korot...@postgrespro.ru>
> написал(а):
> > It would be good if someone would write patch for removing useless
> compress/decompress methods from builtin and contrib GiST opclasses.
> Especially when it gives benefit in IndexOnlyScan enabling.
> If the patch will be accepted, I'll either do this myself for commitfest
> at November or charge some students from Yandex Data School to do this
> (they earn some points in algorithms course for contributing to OSS).
>

Great!  I'm looking forward see their patches on commitfest!

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] COMMIT TRIGGERs, take n, implemented with CONSTRAINT TRIGGERS

2017-09-15 Thread Alexander Korotkov

On Thu, Sep 14, 2017 at 10:41 PM, Nico Williams <n...@cryptonector.com>
wrote:

> https://github.com/twosigma/postgresql-contrib/
> https://github.com/twosigma/postgresql-contrib/blob/
> master/commit_trigger.sql
> https://raw.githubusercontent.com/twosigma/postgresql-
> contrib/master/commit_trigger.sql

Do I understand correctly that this is SQL implementation of COMMIT TRIGGER
functionality which is a prototype used to demonstrate it.  And if this
prototype is well-accepted then you're going to write a patch for builtin
COMMIT TRIGGER functionality.  Is it right?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Call RelationDropStorage() for broader range of object drops.

2017-09-14 Thread Alexander Korotkov

On Thu, Sep 14, 2017 at 1:05 PM, Alvaro Herrera <alvhe...@alvh.no-ip.org>
wrote:

> Hadi Moshayedi wrote:
> > To provide more context, in cstore_fdw creating the storage is easy, we
> > only need to hook into CREATE FOREIGN TABLE using event triggers.
> Removing
> > the storage is not that easy, for DROP FOREIGN TABLE we can use event
> > triggers.
>
> This all sounds a little more complicated than it should ... but since
> FDW are not really IMO the right interface to be implementing a
> different storage format, I'm not terribly on board with supporting this
> more completely.  So what you're doing now is probably acceptable.


+1,
FDW looks OK for prototyping pluggable storage, but it doesn't seem
suitable for permanent implementation.
BTW, Hadi, could you visit "Pluggable storage" thread and check how
suitable upcoming pluggable storage API is for cstore?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Pluggable storage

2017-09-14 Thread Alexander Korotkov

On Thu, Sep 14, 2017 at 8:17 AM, Haribabu Kommi <kommi.harib...@gmail.com>
wrote:

> Instead of modifying the Bitmap Heap and Sample scan's to avoid referring
> the internal members of the HeapScanDesc, I divided the HeapScanDesc
> into two parts.
>
> 1. StorageScanDesc
> 2. HeapPageScanDesc
>
> The StorageScanDesc contains the minimal information that is required
> outside
> the Storage routine and this must be provided by all storage routines. This
> structure contains minimal information such as relation, snapshot, buffer
> and
> etc.
>
> The HeapPageScanDesc contains other extra information that is required for
> Bitmap Heap and Sample scans to work. This structure contains the
> information
> of blocks, visible offsets and etc. Currently this structure is used only
> in
> Bitmap Heap and Sample scan and it's supported contrib modules, except
> the pgstattuple module. The pgstattuple needs some additional changes.
>
> By adding additional storage API to return HeapPageScanDesc as it required
> by the Bitmap Heap and Sample scan's and this API is called only in these
> two scan's. And also these scan methods are choosen by the planner only
> when the storage routine supports to returning of HeapPageScanDesc API.
> Currently Implemented the planner support only for Bitmap, yet to do it
> for Sample scan.
>
> With the above approach, I removed all the references of HeapScanDesc
> outside the heap. The changes of this approach is available in the
> 0008-Remove-HeapScanDesc-usage-outside-heap.patch
>
> Suggestions/comments with the above approach.
>

For me, that's an interesting idea.  Naturally, the way BitmapHeapScan and
SampleScan work even on very high-level is applicable only for some storage
AMs (i.e. heap-like storage AMs).  For example, index-organized table
wouldn't ever support BitmapHeapScan, because it refers tuples by PK values
not TIDs.  However, in this case, storage AM might have some alternative to
our BitmapHeapScan.  So, index-organized table might have some compressed
representation of ordered PK values set and use it for bulk fetch of PK
index.

Therefore, I think it would be nice to make BitmapHeapScan an
heap-storage-AM-specific scan method while other storage AMs could provide
other storage-AM-specific scan methods.  Probably it would be too much for
this patchset and should be done during one of next work cycles on storage
AM (I'm sure that such huge project as pluggable storage AMs would have
multiple iterations).

Similarly, SampleScans contain storage-AM-specific logic.  For instance,
our SYSTEM sampling method fetches random blocks from heap providing high
performance way to sample heap.  Coming back to the example of
index-organized table, it could provide it's own storage-AM-specific table
sampling methods including sophisticated PK tree traversal with fetching
random small ranges of PK.  Given that tablesample methods are already
pluggable, making them storage-AM-specific would lead to user-visible
changes.  I.e. tablesample method should be created for particular storage
AM or set of storage AMs.  However, I didn't yet figure out what should API
exactly look like...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Allow GiST opcalsses without compress\decompres functions

2017-09-14 Thread Alexander Korotkov

On Thu, Sep 14, 2017 at 2:20 PM, Dmitriy Sarafannikov <
dsarafanni...@yandex.ru> wrote:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   tested, passed
> Documentation:tested, passed
>
> This is simple and intuitive patch. Code looks pretty clear and well
> documented.
>
> I think it is good idea to decrease overhead on calling fake
> compressing/decomressing
> functions. This eliminates the need to implement the fetch function in
> such cases.
>
> I also tried to disable compress and decompress functions in contrib/cube:
> diff --git a/contrib/cube/cube--1.2.sql b/contrib/cube/cube--1.2.sql
> index dea2614888..26301b71d7 100644
> --- a/contrib/cube/cube--1.2.sql
> +++ b/contrib/cube/cube--1.2.sql
> @@ -370,8 +370,6 @@ CREATE OPERATOR CLASS gist_cube_ops
>
> FUNCTION1   g_cube_consistent (internal, cube,
> smallint, oid, internal),
> FUNCTION2   g_cube_union (internal, internal),
> -   FUNCTION3   g_cube_compress (internal),
> -   FUNCTION4   g_cube_decompress (internal),
> FUNCTION5   g_cube_penalty (internal, internal,
> internal),
> FUNCTION6   g_cube_picksplit (internal, internal),
> FUNCTION7   g_cube_same (cube, cube, internal),
>
> And it is enables IndexOnlyScan, this is expected behaviour.
> + -- test explain
> + set enable_seqscan to 'off';
> + set enable_bitmapscan to 'off';
> + select count(*) from test_cube where c && '(3000,1000),(0,0)';
> +  count
> + ---
> +  5
> + (1 row)
> +
> + explain analyze select c from test_cube where c && '(3000,1000),(0,0)';
> +QUERY PLAN
> + 
> 
> +  Index Only Scan using test_cube_ix on test_cube  (cost=0.15..56.43
> rows=16 width=32) (actual time=0.015..0.018 rows=5 loops=1)
> +Index Cond: (c && '(3000, 1000),(0, 0)'::cube)
> +Heap Fetches: 5
> +  Planning time: 0.038 ms
> +  Execution time: 0.032 ms
> + (5 rows)
>
> The new status of this patch is: Ready for Committer


It would be good if someone would write patch for removing useless
compress/decompress methods from builtin and contrib GiST opclasses.
Especially when it gives benefit in IndexOnlyScan enabling.

BTW, this change in the patch look suspicious for me.

> @@ -913,7 +931,6 @@ gistproperty(Oid index_oid, int attno,
>   ReleaseSysCache(tuple);
>
>   /* Now look up the opclass family and input datatype. */
> -
>   tuple = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclass));
>   if (!HeapTupleIsValid(tuple))
>   {

We are typically evade changing formatting in fragments of codes not
directly touched by the patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [POC] Faster processing at Gather node

2017-09-11 Thread Alexander Kuzmenkov

Thanks Rafia, Amit, now I understand the ideas behind the patch better. 
I'll see if I can look at the new one.


--

Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Red-black trees: why would anyone want preorder or postorder traversal?

2017-09-10 Thread Alexander Korotkov

On Sun, Sep 10, 2017 at 3:25 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:

> In short, therefore, I propose we rip out the DirectWalk and InvertedWalk
> options along with their support code, and then drop the portions of
> test_rbtree that are needed to exercise them.  Any objections?
>

+1,
I don't see any point in leaving DirectWalk and InvertedWalk in RB-tree.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [POC] Faster processing at Gather node

2017-09-08 Thread Alexander Kuzmenkov


Hi Rafia,

I like the idea of reducing locking overhead by sending tuples in bulk. 
The implementation could probably be simpler: you could extend the API 
of shm_mq to decouple notifying the sender from actually putting data 
into the queue (i.e., make shm_mq_notify_receiver public and make a 
variant of shm_mq_sendv that doesn't send the notification). From Amit's 
letter I understand that you have already tried something along these 
lines and the performance wasn't good. What was the bottleneck then? If 
it's the locking around mq_bytes_read/written, it can be rewritten with 
atomics. I think it would be great to try this approach because it 
doesn't add much code, doesn't add any additional copying and improves 
shm_mq performance in general.


--
Alexander Kuzmenkov
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Fix bloom WAL tap test

2017-09-06 Thread Alexander Korotkov

On Wed, Sep 6, 2017 at 4:08 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> I just realized that these lines of contrib/bloom/t/001_wal.pl don't
> check that queries give same results on master and standby.  They just
> check that *return codes* of psql are equal.
>
> # Run test queries and compare their result
>> my $master_result = $node_master->psql("postgres", $queries);
>> my $standby_result = $node_standby->psql("postgres", $queries);
>> is($master_result, $standby_result, "$test_name: query result matches");
>
>
> Attached patch fixes this problem by using safe_psql() which returns psql
> output instead of return code.  For safety, this patch replaces psql() with
> safe_psql() in other places too.
>
> I think this should be backpatched to 9.6 as bugfix.
>

Also, it would be nice to call wal-check on bloom check (now wal-check
isn't called even on check-world).
See attached patch.  My changes to Makefile could be cumbersome.  Sorry for
that, I don't have much experience with them...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


wal-check-on-bloom-check.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Is anything preventing us from allowing write to foreign tables from standby?

2017-09-06 Thread Alexander Korotkov

Hi!

We're currently blocking writing queries on standby if even they are
modifying contents of foreign tables.  But do we have serious reasons for
that?
Keeping in the mind FDW-sharding, making FDW-tables writable from standby
would be good to prevent single-master bottleneck.
I wrote simple patch enabling writing to foreign tables from standbys.  It
works from the first glance for me.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


standby-fdw-writable.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1172 matches

Mail list logo