[HACKERS] Migration to pglister - Before
Greetings, We will be migrating these lists to pglister in the next few minutes. This final email on the old list system is intended to let you know that future emails will have different headers and you will need to adjust your filters. The changes which we expect to be most significant to users can be found on the wiki here: https://wiki.postgresql.org/wiki/PGLister_Announce Once the migration of these lists is complete, an 'after' email will be sent out. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] UPDATE of partition key
On 10 November 2017 at 16:42, Amit Khandekar wrote: [ update-partition-key_v23.patch ] Hi Amit, Thanks for working on this. I'm looking forward to seeing this go in. So... I've signed myself up to review the patch, and I've just had a look at it, (after first reading this entire email thread!). Overall the patch looks like it's in quite a good shape. I think I do agree with Robert about the UPDATE anomaly that's been discussed. I don't think we're painting ourselves into any corner by not having this working correctly right away. Anyone who's using some trigger workarounds for the current lack of support for updating the partition key is already going to have the same issues, so at least this will save them some troubles implementing triggers and give them much better performance. I see you've documented this fact too, which is good. I'm writing this email now as I've just run out of review time for today. Here's what I noted down during my first pass: 1. Closing command tags in docs should not be abbreviated triggers are concerned, AFTER DELETE and This changed in c29c5789. I think Peter will be happy if you don't abbreviate the closing tags. 2. "about to do" would read better as "about to perform" concurrent session, and it is about to do an UPDATE I think this paragraph could be more clear if we identified the sessions with a number. Perhaps: Suppose, session 1 is performing an UPDATE on a partition key, meanwhile, session 2 tries to perform an UPDATE or DELETE operation on the same row. Session 2 can silently miss the row due to session 1's activity. In such a case, session 2's UPDATE/DELETE , being unaware of the row's movement, interprets this that the row has just been deleted, so there is nothing to be done for this row. Whereas, in the usual case where the table is not partitioned, or where there is no row movement, the second session would have identified the newly updated row and carried UPDATE/DELETE on this new row version. 3. Integer width. get_partition_natts returns int but we assign to int16. int16 partnatts = get_partition_natts(key); Confusingly get_partition_col_attnum() returns int16 instead of AttrNumber but that's existingly not correct. 4. The following code could just pull_varattnos(partexprs, 1, &child_keycols); foreach(lc, partexprs) { Node*expr = (Node *) lfirst(lc); pull_varattnos(expr, 1, &child_keycols); } 5. Triggers. Do we need a new "TG_" tag to allow trigger functions to do something special when the DELETE/INSERT is a partition move? I have audit tables in mind here it may appear as though a user performed a DELETE when they actually performed an UPDATE giving visibility of this to the trigger function will allow the application to work around this. 6. change "row" to "a row" and "old" to "the old" * depending on whether the event is for row being deleted from old But to be honest, I'm having trouble parsing the comment. I think it would be better to say explicitly when the row will be NULL rather than "depending on whether the event" 7. I'm confused with how this change came about. If the old comment was correct here then the comment you're referring to here should remain in ExecPartitionCheck(), but you're saying it's in ExecConstraints(). /* See the comments in ExecConstraints. */ If the comment really is in ExecConstraints(), then you might want to give an overview of what you mean, then reference ExecConstraints() if more details are required. 8. I'm having trouble parsing this comment: * 'update_rri' has the UPDATE per-subplan result rels. I think "has" should be "contains" ? 9. Also, this should likely be reworded: * 'num_update_rri' : number of UPDATE per-subplan result rels. For INSERT, * this is 0. 'num_update_rri' number of elements in 'update_rri' array or zero for INSERT. 10. There should be no space before the '?' /* Is this leaf partition present in the update resultrel ? */ 11. I'm struggling to understand this comment: * This is required when converting tuple as per root * partition tuple descriptor. "tuple" should probably be "the tuple", but not quite sure what you mean by "as per root". I may have misunderstood, but maybe it should read: * This is required when we convert the partition's tuple to * be compatible with the partitioned table's tuple descriptor. 12. I think "as well" would be better written as "either". * If we didn't open the partition rel, it means we haven't * initialized the result rel as well. 13. I'm unsure what is meant by the following comment: * Verify result relation is a valid target for insert operation. Even * for updates, we are doing this for tuple-routing, so again, we need * to check the validity for insert operation. I'm not quite sure where UPDATE comes in here as we're only checking for INSERT? 14. Use of underscores instead of camelCase. COPY_SCALAR_FIELD(part_cols_updated)
Re: [HACKERS] Parallel Hash take II
Hi Andres and Peter, Please see below for inline responses to your feedback. New patch attached. On Wed, Nov 8, 2017 at 10:01 AM, Andres Freund wrote: > +set min_parallel_table_scan_size = 0; > +set parallel_setup_cost = 0; > +-- Make a simple relation with well distributed keys and correctly > +-- estimated size. > +create table simple as > + select generate_series(1, 2) AS id, > 'aa'; > +alter table simple set (parallel_workers = 2); > +analyze simple; > +-- Make a relation whose size we will under-estimate. We want stats > +-- to say 1000 rows, but actually there are 20,000 rows. > +create table bigger_than_it_looks as > + select generate_series(1, 2) as id, > 'aa'; > +alter table bigger_than_it_looks set (autovacuum_enabled = 'false'); > +alter table bigger_than_it_looks set (parallel_workers = 2); > +delete from bigger_than_it_looks where id <= 19000; > +vacuum bigger_than_it_looks; > +analyze bigger_than_it_looks; > +insert into bigger_than_it_looks > + select generate_series(1, 19000) as id, > 'aa'; > > It seems kinda easier to just manipulate ndistinct and reltuples... Done. > +set max_parallel_workers_per_gather = 0; > +set work_mem = '4MB'; > > I hope there's a fair amount of slop here - with different archs you're > going to see quite some size differences. Yeah, this is a problem I wrestled with. See next. > +-- The "good" case: batches required, but we plan the right number; we > +-- plan for 16 batches, and we stick to that number, and peak memory > +-- usage says within our work_mem budget > +-- non-parallel > +set max_parallel_workers_per_gather = 0; > +set work_mem = '128kB'; > > So how do we know that's actually the case we're testing rather than > something arbitrarily different? There's IIRC tests somewhere that just > filter the json explain output to the right parts... Yeah, good idea. My earlier attempts to dump out the hash join dimensions ran into problems with architecture sensitivity and then some run-to-run non-determinism in the parallel case (due to varying fragmentation depending on how many workers get involved in time). The attached version tells you about batch growth without reporting the exact numbers, except in the "ugly" case where we know that there is only one possible outcome because the extreme skew detector is guaranteed to go off after the first nbatch increase (I got rid of all other tuples except ones with the same key to make this true). This exercise did reveal a bug in 0008-Show-hash-join-per-worker-information-in-EXPLAIN-ANA.patch though: it is capturing shared instrumentation too soon in the non-Parallel Hash case so the nbatch reported by EXPLAIN ANALYZE might be too low if we grew while probing. Oops. Will post a fix for that. > +/* > + * Build the name for a given segment of a given BufFile. > + */ > +static void > +MakeSharedSegmentName(char *name, const char *buffile_name, int segment) > +{ > + snprintf(name, MAXPGPATH, "%s.%d", buffile_name, segment); > +} > > Not a fan of this name - you're not "making" a filename here (as in > allocating or such). I think I'd just remove the Make prefix. Done. I also changed some similar code where I'd used GetXXX when building paths. > +/* > + * Open a file that was previously created in another backend with > + * BufFileCreateShared in the same SharedFileSet using the same name. The > + * backend that created the file must have called BufFileClose() or > + * BufFileExport() to make sure that it is ready to be opened by other > + * backends and render it read-only. > + */ > > Is it actually guaranteed that it's another backend / do we rely on > that? No, it could be any backend that is attached to the SharedFileSet, including the current one. Wording improved. > +BufFile * > +BufFileOpenShared(SharedFileSet *fileset, const char *name) > +{ > > + /* > +* If we didn't find any files at all, then no BufFile exists with > this > +* tag. > +*/ > + if (nfiles == 0) > + return NULL; > > s/taag/name/? Fixed. > +/* > + * Delete a BufFile that was created by BufFileCreateShared in the given > + * SharedFileSet using the given name. > + * > + * It is not necessary to delete files explicitly with this function. It is > + * provided only as a way to delete files proactively, rather than waiting > for > + * the SharedFileSet to be cleaned up. > + * > + * Only one backend should attempt to delete a given name, and should know > + * that it exists and has been exported or closed. > + */ > +void > +BufFileDeleteShared(SharedFileSet *fileset, const char *name) > +{ > + charsegment_name[MAXPGPATH]; > + int segment = 0; > + boolfound = false; > + > + /* > +* We don't know how many segments the file has. We'll keep deleting > +* until we run out. If we don't m
Re: [HACKERS] proposal: schema variables
Hi 2017-11-13 13:15 GMT+01:00 Pavel Golub : > Hello, Pavel. > > You wrote: > > PS> Hi, > > PS> I propose a new database object - a variable. The variable is > PS> persistent object, that holds unshared session based not > PS> transactional in memory value of any type. Like variables in any > PS> other languages. The persistence is required for possibility to do > PS> static checks, but can be limited to session - the variables can be > temporal. > > Great idea. > > PS> My proposal is related to session variables from Sybase, MSSQL or > PS> MySQL (based on prefix usage @ or @@), or package variables from > PS> Oracle (access is controlled by scope), or schema variables from > PS> DB2. Any design is coming from different sources, traditions and > PS> has some advantages or disadvantages. The base of my proposal is > PS> usage schema variables as session variables for stored procedures. > PS> It should to help to people who try to port complex projects to > PostgreSQL from other databases. > > PS> The Sybase (T-SQL) design is good for interactive work, but it > PS> is weak for usage in stored procedures - the static check is not > PS> possible. Is not possible to set some access rights on variables. > > PS> The ADA design (used on Oracle) based on scope is great, but our > PS> environment is not nested. And we should to support other PL than > PLpgSQL more strongly. > > PS> There is not too much other possibilities - the variable that > PS> should be accessed from different PL, different procedures (in > PS> time) should to live somewhere over PL, and there is the schema only. > > PS> The variable can be created by CREATE statement: > > PS> CREATE VARIABLE public.myvar AS integer; > PS> CREATE VARIABLE myschema.myvar AS mytype; > > PS> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type > PS> [ DEFAULT expression ] [[NOT] NULL] > PS> [ ON TRANSACTION END { RESET | DROP } ] > PS> [ { VOLATILE | STABLE } ]; > > > PS> It is dropped by command DROP VARIABLE [ IF EXISTS] varname. > > PS> The access rights is controlled by usual access rights - by > PS> commands GRANT/REVOKE. The possible rights are: READ, WRITE > > PS> The variables can be modified by SQL command SET (this is taken from > standard, and it natural) > > PS> SET varname = expression; > > I propose LET keyword for this to distinguish GUC from variables, e.g. > > LET varname = expression; > It is one possible variant. I plan to implement more variants and then choose one. Regards Pavel > > PS> Unfortunately we use the SET command for different purpose. But I > PS> am thinking so we can solve it with few tricks. The first is > PS> moving our GUC to pg_catalog schema. We can control the strictness > PS> of SET command. In one variant, we can detect custom GUC and allow > PS> it, in another we can disallow a custom GUC and allow only schema > PS> variables. A new command LET can be alternative. > > > > PS> The variables should be used in queries implicitly (without JOIN) > > > PS> SELECT varname; > > > PS> The SEARCH_PATH is used, when varname is located. The variables > PS> can be used everywhere where query parameters are allowed. > > > > PS> I hope so this proposal is good enough and simple. > > > PS> Comments, notes? > > > PS> regards > > > PS> Pavel > > > > > > > -- > With best wishes, > Pavel mailto:pa...@gf.microolap.com > >
Re: [HACKERS] proposal: schema variables
Hello, Pavel. You wrote: PS> Hi, PS> I propose a new database object - a variable. The variable is PS> persistent object, that holds unshared session based not PS> transactional in memory value of any type. Like variables in any PS> other languages. The persistence is required for possibility to do PS> static checks, but can be limited to session - the variables can be temporal. Great idea. PS> My proposal is related to session variables from Sybase, MSSQL or PS> MySQL (based on prefix usage @ or @@), or package variables from PS> Oracle (access is controlled by scope), or schema variables from PS> DB2. Any design is coming from different sources, traditions and PS> has some advantages or disadvantages. The base of my proposal is PS> usage schema variables as session variables for stored procedures. PS> It should to help to people who try to port complex projects to PostgreSQL from other databases. PS> The Sybase (T-SQL) design is good for interactive work, but it PS> is weak for usage in stored procedures - the static check is not PS> possible. Is not possible to set some access rights on variables. PS> The ADA design (used on Oracle) based on scope is great, but our PS> environment is not nested. And we should to support other PL than PLpgSQL more strongly. PS> There is not too much other possibilities - the variable that PS> should be accessed from different PL, different procedures (in PS> time) should to live somewhere over PL, and there is the schema only. PS> The variable can be created by CREATE statement: PS> CREATE VARIABLE public.myvar AS integer; PS> CREATE VARIABLE myschema.myvar AS mytype; PS> CREATE [TEMP] VARIABLE [IF NOT EXISTS] name AS type PS> [ DEFAULT expression ] [[NOT] NULL] PS> [ ON TRANSACTION END { RESET | DROP } ] PS> [ { VOLATILE | STABLE } ]; PS> It is dropped by command DROP VARIABLE [ IF EXISTS] varname. PS> The access rights is controlled by usual access rights - by PS> commands GRANT/REVOKE. The possible rights are: READ, WRITE PS> The variables can be modified by SQL command SET (this is taken from standard, and it natural) PS> SET varname = expression; I propose LET keyword for this to distinguish GUC from variables, e.g. LET varname = expression; PS> Unfortunately we use the SET command for different purpose. But I PS> am thinking so we can solve it with few tricks. The first is PS> moving our GUC to pg_catalog schema. We can control the strictness PS> of SET command. In one variant, we can detect custom GUC and allow PS> it, in another we can disallow a custom GUC and allow only schema PS> variables. A new command LET can be alternative. PS> The variables should be used in queries implicitly (without JOIN) PS> SELECT varname; PS> The SEARCH_PATH is used, when varname is located. The variables PS> can be used everywhere where query parameters are allowed. PS> I hope so this proposal is good enough and simple. PS> Comments, notes? PS> regards PS> Pavel -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fix bloom WAL tap test
Hi! On Fri, Nov 10, 2017 at 9:12 PM, Tom Lane wrote: > I wrote: > > Is there anything we can do to cut the runtime of the TAP test to > > the point where running it by default wouldn't be so painful? > > As an experiment, I tried simply cutting the size of the test table 10X: > > diff --git a/contrib/bloom/t/001_wal.pl b/contrib/bloom/t/001_wal.pl > index 1b319c9..566abf9 100644 > --- a/contrib/bloom/t/001_wal.pl > +++ b/contrib/bloom/t/001_wal.pl > @@ -57,7 +57,7 @@ $node_standby->start; > $node_master->safe_psql("postgres", "CREATE EXTENSION bloom;"); > $node_master->safe_psql("postgres", "CREATE TABLE tst (i int4, t > text);"); > $node_master->safe_psql("postgres", > -"INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM > generate_series(1,10) i;" > +"INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM > generate_series(1,1) i;" > ); > $node_master->safe_psql("postgres", > "CREATE INDEX bloomidx ON tst USING bloom (i, t) WITH (col1 = > 3);"); > @@ -72,7 +72,7 @@ for my $i (1 .. 10) > test_index_replay("delete $i"); > $node_master->safe_psql("postgres", "VACUUM tst;"); > test_index_replay("vacuum $i"); > - my ($start, $end) = (11 + ($i - 1) * 1, 10 + $i * > 1); > + my ($start, $end) = (10001 + ($i - 1) * 1000, 1 + $i * 1000); > $node_master->safe_psql("postgres", > "INSERT INTO tst SELECT i%10, substr(md5(i::text), 1, 1) FROM > generate_series($start,$end) i;" > ); > > This about halved the runtime of the TAP test, and it changed the coverage > footprint not at all according to lcov. (Said coverage is only marginally > better than what we get without running the bloom TAP test, AFAICT.) > > It seems like some effort could be put into both shortening this test > and improving the amount of code it exercises. > Thank you for committing patch which fixes tap test. I'll try to improve coverage of this test and reduce its run time. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] How to implement a SP-GiST index as a extension module?
Hi! On Mon, Nov 13, 2017 at 6:47 AM, Connor Wolf wrote: > Ok, I've managed to get my custom index working. > Good! It's all on github here: https://github.com/fake-name/pg-spgist_hamming, if > anyone else needs a fuzzy-image searching system > that can integrate into postgresql.. > > It should be a pretty good basis for anyone else to use if they want to > implement a SP-GiST index too. > I took a look at the code, and I feel myself a bit confused :) It appears that you're indexing int8 values. That seems like unrealistic short representation for image signature. Also, name of repository make me think that hamming distance would be used to compare signatures. But after look at the code, I see that plain absolute value of difference is used for that purpose. static double getDistance(Datum v1, Datum v2) { int64_t a1 = DatumGetInt64(v1); int64_t a2 = DatumGetInt64(v2); int64_t diff = Abs(a1 - a2); fprintf_to_ereport("getDistance %ld <-> %ld : %ld", a1, a2, diff); return diff; } For such notion of distance, you don't need a VP-tree or another complex indexing. B-tree is quite enough in this case. Alternatively, distance function is not what it meant to be. It would be useful if you provide complete usage example of this extension: from image to signature conversion to search queries. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] Jsonb transform for pl/python
On Thu, 09 Nov 2017 12:26:46 + Aleksander Alekseev wrote: > The following review has been posted through the commitfest > application: make installcheck-world: tested, failed > Implements feature: tested, passed > Spec compliant: tested, passed > Documentation:tested, passed > > Hello Anthony, > > Great job! > > I decided to take a closer look on your patch. Here are some defects > I discovered. > > > + Additional extensions are available that implement transforms > > for > > + the jsonb type for the language PL/Python. The > > + extensions for PL/Perl are called > > 1. The part regarding PL/Perl is obviously from another patch. > > 2. jsonb_plpython2u and jsonb_plpythonu are marked as relocatable, > while jsonb_plpython3u is not. Is it a mistake? Anyway if an > extension is relocatable there should be a test that checks this. > > 3. Not all json types are test-covered. Tests for 'true' :: jsonb, > '3.14' :: jsonb and 'null' :: jsonb are missing. > > 4. jsonb_plpython.c:133 - "Iterate trhrough Jsonb object." Typo, it > should be "through" or probably even "over". > > 5. It looks like you've implemented transform in two directions > Python <-> JSONB, however I see tests only for Python <- JSONB case. > > 6. Tests passed on Python 2.7.14 but failed on 3.6.2: > > > CREATE EXTENSION jsonb_plpython3u CASCADE; > > + ERROR: could not access file "$libdir/jsonb_plpython3": No such > > file or directory > > module_pathname in jsonb_plpython3u.control should be > $libdir/jsonb_plpython3u, not $libdir/jsonb_plpython3. > > Tested on Arch Linux x64, GCC 7.2.0. > > The new status of this patch is: Waiting on Author > Hello, Aleksander. Thank you for your time. The defects you have noticed were fixed. Please, find in attachments new version of the patch (it is called 0001-jsonb_plpython-extension-v2.patch). Most of changes were made to fix defects(list of the defects may be found in citation in the beginning of this message), but the algorithm of iterating through incoming jsonb was changed so that it looks tidier. -- Anthony Bykov Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/contrib/Makefile b/contrib/Makefile index 8046ca4..d9d9817 100644 --- a/contrib/Makefile +++ b/contrib/Makefile @@ -81,9 +81,9 @@ ALWAYS_SUBDIRS += hstore_plperl endif ifeq ($(with_python),yes) -SUBDIRS += hstore_plpython ltree_plpython +SUBDIRS += hstore_plpython ltree_plpython jsonb_plpython else -ALWAYS_SUBDIRS += hstore_plpython ltree_plpython +ALWAYS_SUBDIRS += hstore_plpython ltree_plpython jsonb_plpython endif # Missing: diff --git a/contrib/jsonb_plpython/Makefile b/contrib/jsonb_plpython/Makefile new file mode 100644 index 000..6371d11 --- /dev/null +++ b/contrib/jsonb_plpython/Makefile @@ -0,0 +1,39 @@ +# contrib/jsonb_plpython/Makefile + +MODULE_big = jsonb_plpython$(python_majorversion)u +OBJS = jsonb_plpython.o $(WIN32RES) +PGFILEDESC = "jsonb_plpython - transform between jsonb and plpythonu" + +PG_CPPFLAGS = -I$(top_srcdir)/src/pl/plpython $(python_includespec) -DPLPYTHON_LIBNAME='"plpython$(python_majorversion)"' + +EXTENSION = jsonb_plpythonu jsonb_plpython2u jsonb_plpython3u +DATA = jsonb_plpythonu--1.0.sql jsonb_plpython2u--1.0.sql jsonb_plpython3u--1.0.sql + +REGRESS = jsonb_plpython$(python_majorversion) +REGRESS_PLPYTHON3_MANGLE := $(REGRESS) + +ifdef USE_PGXS +PG_CONFIG = pg_config +PGXS := $(shell $(PG_CONFIG) --pgxs) +include $(PGXS) +else +subdir = contrib/jsonb_plpython +top_builddir = ../.. +include $(top_builddir)/src/Makefile.global +include $(top_srcdir)/contrib/contrib-global.mk +endif + +# We must link libpython explicitly +ifeq ($(PORTNAME), win32) +# ... see silliness in plpython Makefile ... +SHLIB_LINK += $(sort $(wildcard ../../src/pl/plpython/libpython*.a)) +else +rpathdir = $(python_libdir) +SHLIB_LINK += $(python_libspec) $(python_additional_libs) +endif + +ifeq ($(python_majorversion),2) +REGRESS_OPTS += --load-extension=plpython2u +else +REGRESS_OPTS += --load-extension=plpython3u +endif diff --git a/contrib/jsonb_plpython/expected/jsonb_plpython2.out b/contrib/jsonb_plpython/expected/jsonb_plpython2.out new file mode 100644 index 000..8ad5338 --- /dev/null +++ b/contrib/jsonb_plpython/expected/jsonb_plpython2.out @@ -0,0 +1,478 @@ +CREATE EXTENSION jsonb_plpython2u CASCADE; +-- test jsonb -> python dict +CREATE FUNCTION test1(val jsonb) RETURNS int +LANGUAGE plpython2u +TRANSFORM FOR TYPE jsonb +AS $$ +assert isinstance(val, dict) +plpy.info(sorted(val.items())) +return len(val) +$$; +SELECT test1('{"a":1, "c":"NULL"}'::jsonb); +INFO: [('a', Decimal('1')), ('c', 'NULL')] + test1 +--- + 2 +(1 row) + +-- test jsonb -> python dict +-- complex dict with dicts as value +CREATE FUNCTION test1complex(val jsonb) RETURNS int +LANGUAGE plpython2u +TRANSFORM FOR TYPE jsonb +AS $$ +assert isinstance(val, dict) +assert(val == {"d":{"d": 1}}) +return len(val) +$$; +SELECT test1comple
Re: [HACKERS] path toward faster partition pruning
Horiguchi-san, Thanks for taking a look. Replying to all your emails here. On 2017/11/10 12:30, Kyotaro HORIGUCHI wrote: > In 0002, bms_add_range has a bit naive-looking loop > > + while (wordnum <= uwordnum) > + { > + bitmapword mask = (bitmapword) ~0; > + > + /* If working on the lower word, zero out bits below 'lower'. */ > + if (wordnum == lwordnum) > + { > + int lbitnum = BITNUM(lower); > + mask >>= lbitnum; > + mask <<= lbitnum; > + } > + > + /* Likewise, if working on the upper word, zero bits above > 'upper' */ > + if (wordnum == uwordnum) > + { > + int ushiftbits = BITS_PER_BITMAPWORD - (BITNUM(upper) + > 1); > + mask <<= ushiftbits; > + mask >>= ushiftbits; > + } > + > + a->words[wordnum++] |= mask; > + } > > Without some aggressive optimization, the loop takes most of the > time to check-and-jump for nothing especially with many > partitions and somewhat unintuitive. > > The following uses a bit tricky bitmap operation but > is straightforward as a whole. > > = > /* fill the bits upper from BITNUM(lower) (0-based) of the first word */ > a->workds[wordnum++] += ~(bitmapword)((1 << BITNUM(lower)) - 1); > > /* fill up intermediate words */ > while (wordnum < uwordnum) >a->words[wordnum++] = ~(bitmapword) 0; > > /* fill up to BITNUM(upper) bit (0-based) of the last word */ > a->workds[wordnum++] |= > (~(bitmapword) 0) >> (BITS_PER_BITMAPWORD - (BITNUM(upper) - 1)); > = Considering also the David's comment downthread, I will try to incorporate this into bms_add_range(). > In 0003, > > +match_clauses_to_partkey(RelOptInfo *rel, > ... > + if (rinfo->pseudoconstant && > +(IsA(clause, Const) && > + Const *) clause)->constisnull) || > + !DatumGetBool(((Const *) clause)->constvalue > + { > +*constfalse = true; > +continue; > > If we call this function in both conjunction and disjunction > context (the latter is only in recursive case). constfalse == > true means no need of any clauses for the former case. > > Since (I think) just a list of RestrictInfo is expected to be > treated as a conjunction (it's quite doubious, though..), I think it makes sense to consider a list of RestrictInfo's, such as baserestrictinfo, that is passed as input to match_clauses_to_partkey(), to be mutually conjunctive for our purpose here. > we > might be better call this for each subnodes of a disjunction. Or, > like match_clauses_to_index, we might be better provide > match_clause_to_partkey(rel, rinfo, contains_const), which > returns NULL if constfalse. (I'm not self-confident on this..) After reading your comment, I realized that it was wrong that the recursive call to match_clauses_to_partkey() passed the arguments of an OR clause all at once. That broke the assumption mentioned above that all of the clauses in the list passed to match_clauses_to_partkey() are mutually conjunctive. Instead, we must make a single-member list for each of the OR clause's arguments and pass the same. Then if we got constfalse for all of the OR's arguments, then we return the constfalse=true to the original caller. > > + /* > + * If no commutator exists, cannot flip the qual's args, > + * so give up. > + */ > + if (!OidIsValid(expr_op)) > +continue; > > I suppose it's better to leftop and rightop as is rather than > flipping over so that var is placed left-side. Does that make > things so complex? Reason to do it that way is that the code placed far away (code in partition.c that extracts constant values to use for pruning from matched clauses) can always assume that the clauses determined to be useful for partition-pruning always come in the 'partkey op constant' form. > + * It the operator happens to be '<>', which is never listed > If? Right, will fix. > +if (!op_in_opfamily(expr_op, partopfamily)) > +{ > + Oidnegator = get_negator(expr_op); > + > + if (!OidIsValid(negator) || > +!op_in_opfamily(negator, partopfamily)) > +continue; > > classify_partition_bounding_keys() checks the same thing by > checking whether the negator's strategy is > BTEquealStrategyNumber. (I'm not sure the operator is guaranteed > to be of btreee, though..) Aren't they needed to be in similar > way? You're right. The <>'s negator may not always be a btree operator. So, we should add a check in match_clauses_to_partkey() that list or range partitioning is in use, because only those require a btree operator family. We now have hash partitioning, so need to be careful not to make the assumption that all partitioning operators are from btree operator families. If match_clauses_to_partkey()
Re: [HACKERS] 10beta1 sequence regression failure on sparc64
Christoph, what beta2 change was it that fixed that problem? I'm having exactly the same regression test failure in version 10.1 (not beta) running on Solaris 9 Compiler used: GCC 4.6.4 CFLAGS: -O2 -m64 On 13/07/2017 20:05, Christoph Berg wrote: Re: To Andres Freund 2017-05-24 <20170524170921.7pykzbt54dlfk...@msg.df7cb.de> If we had a typo or something in that code, the build farm should have caught it by now. I would try compiling with lower -O and see what happens. Trying -O0 now. Sorry for the late reply, I was hoping to catch you on IRC, but then didn't manage to be around at a time that would fit the timezone shift. The -O0 build passed the regression tests. Not sure what that means for our problem, though. Fwiw, the problem is fixed with beta2, even with -O2: https://buildd.debian.org/status/logs.php?pkg=postgresql-10&arch=sparc64 Christoph -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [Patch] Log SSL certificate verification errors
Graham Leggett wrote: > Currently neither the server side nor the client side SSL certificate verify > callback does anything, leading to potential hair-tearing-out moments. > > The following patch to master implements logging of all certificate > verification failures, as well as (crucially) which certificates failed to > verify, > and at what depth, so the admin can zoom in straight onto the problem without > any guessing. +1 for the idea. I have been in this situation before, and any information that helps to clarify what the problem is would be a great help. Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Partition-wise aggregation/grouping
On 11.11.2017 23:29, Konstantin Knizhnik wrote: On 10/27/2017 02:01 PM, Jeevan Chalke wrote: Hi, Attached new patch-set here. Changes include: 1. Added separate patch for costing Append node as discussed up-front in the patch-set. 2. Since we now cost Append node, we don't need partition_wise_agg_cost_factor GUC. So removed that. The remaining patch hence merged into main implementation patch. 3. Updated rows in test-cases so that we will get partition-wise plans. Thanks I applied partition-wise-agg-v6.tar.gz patch to the master and use shard.sh example from https://www.postgresql.org/message-id/14577.1509723225%40localhost Plan for count(*) is the following: shard=# explain select count(*) from orders; QUERY PLAN --- Finalize Aggregate (cost=100415.29..100415.30 rows=1 width=8) -> Append (cost=50207.63..100415.29 rows=2 width=8) -> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8) -> Foreign Scan on orders_0 (cost=101.00..50195.13 rows=5000 width=0) -> Partial Aggregate (cost=50207.63..50207.64 rows=1 width=8) -> Foreign Scan on orders_1 (cost=101.00..50195.13 rows=5000 width=0) We really calculate partial aggregate for each partition, but to do we still have to fetch all data from remote host. So for foreign partitions such plans is absolutely inefficient. Amy be it should be combined with some other patch? For example, with agg_pushdown_v4.tgz patch https://www.postgresql.org/message-id/14577.1509723225%40localhost ? But it is not applied after partition-wise-agg-v6.tar.gz patch. Also postgres_fdw in 11dev is able to push down aggregates without agg_pushdown_v4.tgz patch. In 0009-Teach-postgres_fdw-to-push-aggregates-for-child-rela.patch there is the following check: /* Partial aggregates are not supported. */ + if (extra->isPartial) + return; If we just comment this line then produced plan will be the following: shard=# explain select sum(product_id) from orders; QUERY PLAN Finalize Aggregate (cost=308.41..308.42 rows=1 width=8) -> Append (cost=144.18..308.41 rows=2 width=8) -> Foreign Scan (cost=144.18..154.20 rows=1 width=8) Relations: Aggregate on (public.orders_0 orders) -> Foreign Scan (cost=144.18..154.20 rows=1 width=8) Relations: Aggregate on (public.orders_1 orders) (6 rows) And it is actually desired plan! Obviously such approach will not always work. FDW really doesn't support partial aggregates now. But for most frequently used aggregates: sum, min, max, count aggtype==aggtranstype and there is no difference between partial and normal aggregate calculation. So instead of (extra->isPartial) condition we can add more complex check which will traverse pathtarget expressions and check if it can be evaluated in this way. Or... extend FDW API to support partial aggregation. But even the last plan is not ideal: it will calculate predicates at each remote node sequentially. There is parallel append patch: https://www.postgresql.org/message-id/CAJ3gD9ctEcrVUmpY6fq_JUB6WDKGXAGd70EY68jVFA4kxMbKeQ%40mail.gmail.com but ... FDW doesn't support parallel scan, so parallel append can not be applied in this case. And we actually do not need parallel append with all its dynamic workers here. We just need to start commands at all remote servers and only after it fetch results (which can be done sequentially). I am investigating problem of efficient execution of OLAP queries on sharded tables (tables with remote partitions). After reading all this threads and corresponding patches, it seems to me that we already have most of parts of the puzzle, what we need is to put them on right places and may be add missed ones. I wonder if somebody is busy with it and can I somehow help here? Also I am not quite sure about the best approach with parallel execution of distributed query at all nodes. Should we make postgres_fdw parallel safe and use parallel append? How difficult it will be? Or in addition to parallel append we should also have "asynchronous append" which will be able to initiate execution at all nodes? It seems to be close to merge append, because it should simultaneously traverse all cursors. Looks like second approach is easier for implementation. But in case of sharded table, distributed query may need to traverse both remote and local shards and this approach doesn't allow to processed several local shards in parallel. I attach small patch for postgres_fdw.c which allows concurrent execution of aggregates by all remote servers (when them are accessed through postgres_fdw). I have added "postgres_fdw.use_prefetch" GUC to enable/disable prefetching data in postgres_fdw. This
Re: [HACKERS] index-only count(*) for indexes supporting bitmap scans
I've pushed the executor part of this, but mostly not the planner part, because I didn't think the latter was anywhere near ready for prime time: the regression test changes it was causing were entirely bogus. Hi Tom, Thanks for the commit and the explanation. I'll try to address your comments if I continue working on the planner part. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] ginInsertCleanup called from vacuum could still miss tuples to be deleted
Hi, Commit e2c79e14 prevented multiple cleanup process for pending list in GIN index. But I think that there is still possibility that vacuum could miss tuples to be deleted if someone else is cleaning up the pending list. In ginInsertCleanup(), we lock the GIN meta page by LockPage and could wait for the concurrent cleaning up process if stats == NULL. And the source code comment says that this happen is when ginINsertCleanup is called by [auto]vacuum/analyze or gin_clean_pending_list(). I agree with this behavior. However, looking at the callers the stats is NULL only either if pending list exceeds to threshold during insertions or if only analyzing is performed by an autovacum worker or ANALYZE command. So I think we should inVacuum = (stats != NULL) instead. Also, we might want autoanalyze and ANALYZE command to wait for concurrent process as well. Attached patch fixes these two issue. If this is a bug we should back-patch to 9.6. void ginInsertCleanup(GinState *ginstate, bool full_clean, bool fill_fsm, IndexBulkDeleteResult *stats) { (snip) boolinVacuum = (stats == NULL); /* * We would like to prevent concurrent cleanup process. For that we will * lock metapage in exclusive mode using LockPage() call. Nobody other * will use that lock for metapage, so we keep possibility of concurrent * insertion into pending list */ if (inVacuum) { /* * We are called from [auto]vacuum/analyze or gin_clean_pending_list() * and we would like to wait concurrent cleanup to finish. */ LockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock); workMemory = (IsAutoVacuumWorkerProcess() && autovacuum_work_mem != -1) ? autovacuum_work_mem : maintenance_work_mem; } else { /* * We are called from regular insert and if we see concurrent cleanup * just exit in hope that concurrent process will clean up pending * list. */ if (!ConditionalLockPage(index, GIN_METAPAGE_BLKNO, ExclusiveLock)) return; workMemory = work_mem; } Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center fix_ginInsertCleanup.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers