[HACKERS] Incremental refresh of materialized view - Patch
Hi all I am building a patch to refresh materialized view incrementally from the change set decoded by using logical decoding from WAL. As of now i can able to generate the changes that has to be updated in the materialized view but the thing was it not possible to do any DML operations on MATVIEWS. Only from the concurrent refresh of matviews the DML operations are allowed. However if the same methods in matview.c OpenMatViewIncrementalMaintenance & CloseMatViewIncrementalMaintenance are mad extern its possible to do DML from the patches like i am building now. Is there any other way of doing DML operations on materialized views from patch.?cheers - Harry
Re: [HACKERS] silent data loss with ext4 / all current versions
On Mon, Mar 28, 2016 at 8:25 AM, Andres Freund wrote: > I've also noticed that Coming back to this issue because... > a) pg_basebackup doesn't do anything about durability (it probably needs >a very similar patch to the one pg_rewind just received). I think that one of the QE tests running here just got bitten by that. A base backup was taken with pg_basebackup and more or less after a VM was plugged off. The trick is that for pg_basebackup we cannot rely on initdb: pg_basebackup is a client-side utility. In most of the PG packages (Fedora, RHEL), it is put on the client-side package, where initdb is not. So it seems to me that the correct fix is not to use initdb -S but to have copies of fsync_parent_path, durable_rename and fsync_fname_ext in streamutil.c, and then we reuse them for both pg_receivexlog and pg_basebackup. At least that's less risky for back-branches this way. > b) nor does pg_dump[all] I have not hacked up that yet, but I would think that we would need one extra copy of some of those fsync_* routines in src/bin/pg_dump/. There is another thread for that already... On master I guess we'd end with something centralized in src/common/, but let's close the existing holes first. > So we're going to have another round of fsync stuff in the next set of > releases anyway... The sooner the better I think. Any people caring about this problem are now limited in using initdb -S after calling pg_basebackup or pg_dump. That's a solution, though the flushes should be contained inside each utility. -- Michael 0001-Issue-fsync-more-carefully-in-pg_receivexlog-and-pg_.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 12 May 2016 at 11:16, Rajeev rastogi wrote: > >Any others? > GPU offload. Some work on that already got done as part of the AXLE project, but there's still a lot more to do to get anything that can be usefully integrated into Pg. This likely ties in with batching work, since without batching it's unlikely you can get much benefit from GPU offload. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Use %u to print user mapping's umid and userid
On 2016/05/12 13:02, Tom Lane wrote: Etsuro Fujita writes: On 2016/05/11 18:03, Ashutosh Bapat wrote: A call to GetForeignTable would incur a catalog lookup which means a catalog table/index scan if corresponding entry is not in the cache. This is followed by GetUserMapping() which is another catalog access. That's bound to be expensive than an makeOid(), oidVal() call. Right, but such lookups have been incurred at the planning time (ie, build_simple_rel), and corresponding entries would be in the cache. So, the overhead in that recalculation at the execution time would be not that large in practice. No? It's a mistake to assume that execution immediately follows planning. Yeah, that would not be the case in PREPARE/EXECUTE, right? Having said that, I wonder whether you should be thinking less about performance and more about correctness. Is a user mapping lookup done at plan time still valid at execution, and if so what ensures that? I think if scanning a foreign join, the user mapping is still valid at execution, and that is ensured by RevalidateChachedQuery, IIUC. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use %u to print user mapping's umid and userid
Etsuro Fujita writes: > On 2016/05/11 18:03, Ashutosh Bapat wrote: >> A call to GetForeignTable would incur a catalog lookup which means a >> catalog table/index scan if corresponding entry is not in the cache. >> This is followed by GetUserMapping() which is another catalog access. >> That's bound to be expensive than an makeOid(), oidVal() call. > Right, but such lookups have been incurred at the planning time (ie, > build_simple_rel), and corresponding entries would be in the cache. So, > the overhead in that recalculation at the execution time would be not > that large in practice. No? It's a mistake to assume that execution immediately follows planning. Having said that, I wonder whether you should be thinking less about performance and more about correctness. Is a user mapping lookup done at plan time still valid at execution, and if so what ensures that? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Perf Benchmarking and regression.
On Wed, May 11, 2016 at 12:51 AM, Ashutosh Sharma wrote: > I am extremely sorry for the delayed response. As suggested by you, I have > taken the performance readings at 128 client counts after making the > following two changes: > > 1). Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, > NULL); from pq_init(). Below is the git diff for the same. > > diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c > index 8d6eb0b..399d54b 100644 > --- a/src/backend/libpq/pqcomm.c > +++ b/src/backend/libpq/pqcomm.c > @@ -206,7 +206,9 @@ pq_init(void) > AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE, > MyProcPort->sock, > NULL, NULL); > AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL); > +#if 0 > AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL); > +#endif > > 2). Disabled the guc vars "bgwriter_flush_after", "checkpointer_flush_after" > and "backend_flush_after" by setting them to zero. > > After doing the above two changes below are the readings i got for 128 > client counts: > > CASE : Read-Write Tests when data exceeds shared buffers. > > Non Default settings and test > ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c > max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c > checkpoint_completion_target=0.9 & > > ./pgbench -i -s 1000 postgres > > ./pgbench -c 128 -j 128 -T 1800 -M prepared postgres > > Run1 : tps = 9690.678225 > Run2 : tps = 9904.320645 > Run3 : tps = 9943.547176 > > Please let me know if i need to take readings with other client counts as > well. Can you please take four new sets of readings, like this: - Unpatched master, default *_flush_after - Unpatched master, *_flush_after=0 - That line removed with #if 0, default *_flush_after - That line removed with #if 0, *_flush_after=0 128 clients is fine. But I want to see four sets of numbers that were all taken by the same person at the same time using the same script. Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Vacuum Full by Scheme
On Thu, May 12, 2016 at 7:31 AM, Filho Arrais wrote: > There is some development to use the vacuum full by scheme? > > Ex: vacuumdb -d postgres -parameter public > > In databases with thousands of tables and multiple scheme, it would be > interesting to have this feature. No that I know of. The only way to achieve that now is to use a script that fetches the list of tables in schema first and runs VACUUM on those tables individually, as VACUUM cannot be run in a stored procedure. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use %u to print user mapping's umid and userid
On 2016/05/11 18:03, Ashutosh Bapat wrote: On Wed, May 11, 2016 at 1:34 PM, Etsuro Fujita mailto:fujita.ets...@lab.ntt.co.jp>> wrote: On 2016/05/11 16:49, Ashutosh Bapat wrote: The patch is calculating user mapping when it's readily available through RelOptInfo::fdw_private. That incurs a catalog lookup unnecessarily. Instead, can we add new function makeOid, oidVal on the lines of makeInteger and intVal to store and retrieve an OID resp. and also corresponding print function? It might be helpful in future. That might be an idea, but is the overhead in that re-calculation so large? A call to GetForeignTable would incur a catalog lookup which means a catalog table/index scan if corresponding entry is not in the cache. This is followed by GetUserMapping() which is another catalog access. That's bound to be expensive than an makeOid(), oidVal() call. Right, but such lookups have been incurred at the planning time (ie, build_simple_rel), and corresponding entries would be in the cache. So, the overhead in that recalculation at the execution time would be not that large in practice. No? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 11 May 2016 19:50, Bruce Momjian Wrote: >I am giving a keynote at an IEEE database conference in Helsinki next >week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa >because I accepted the Helsinki conference invitation before the PGCon >Ottawa date was changed from June to May). > >As part of the keynote, I would like to mention areas where academia can >help us. The topics I can think of are: > > Query optimization > Optimizer statistics > Indexing structures > Reducing function call overhead > CPU locality > Sorting > Parallelism > Sharding > >Any others? How about? 1. Considering NUMA aware architecture. 2. Optimizer tuning as per new hardware trends. 3. More effective version of Join algorithms (e.g. Compare to traditional "build and then probe" mechanism of Hash Join, now there is pipelining Hash join where probe and build both happens together). Thanks and Regards, Kumar Rajeev Rastogi -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 11 May 2016 at 22:20, Bruce Momjian wrote: > I am giving a keynote at an IEEE database conference in Helsinki next > week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa > because I accepted the Helsinki conference invitation before the PGCon > Ottawa date was changed from June to May). > > As part of the keynote, I would like to mention areas where academia can > help us. The topics I can think of are: > [snip] > > Any others? > When publishing work, publish source code somewhere stable that won't just vanish. And build on the latest stable release, don't build your prototype on Pg 8.0. Don't just publish a tarball with no information about what revision it's based on, publish a git tree or a patch series. While academic prototype source is rarely usable directly, it can serve a valuable role with helping to understand the changes that were made, reproducing results, exploring further related work, etc Include your dummy data or data generators, setup scripts, etc. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Does Type Have = Operator?
On May 11, 2016, at 11:01 AM, Fabrízio de Royes Mello wrote: > I know... but you can do that just in case the current behaviour fail by > cathing it with "begin...exception...", so you'll minimize the looking for > process on catalog. Yeah, I guess. Honestly 90% of this issue would go away for me if there was a `json = json` operator. I know there are a couple different ways to interpret JSON equality, though. David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Does Type Have = Operator?
On May 11, 2016, at 10:34 AM, Kevin Grittner wrote: > I'm not clear enough on your intended usage to know whether these > operators are a good fit, but they are sitting there waiting to be > used if they do fit. Huh. I haven’t had any problems with IS DISTINCT FROM for rows, except for the situation in which a failure is thrown because the types vary, say between TEXT and CITEXT. That can drive the tester crazy, since it says something like: Results differ beginning at row 3: have: (44,Anna) want: (44,Anna) But overall I think that’s okay; the tester really does want to make sure the type is correct. Thanks, David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] parallel.c is not marked as test covered
On 12 May 2016 at 07:04, Robert Haas wrote: > On Wed, May 11, 2016 at 1:57 PM, Robert Haas wrote: >> I don't immediately understand what's going wrong here. It looks to >> me like make_group_input_target() already called, and that worked OK, >> but now make_partialgroup_input_target() is failing using more-or-less >> the same logic. Presumably that's because make_group_input_target() >> was called on final_target as returned by create_pathtarget(root, >> tlist), but make_partialgroup_input_target() is being called on >> grouping_target, which I'm guessing came from >> make_window_input_target, which somehow lacks sortgroupref labeling. >> But I don't immediately see how that would happen, so there's >> obviously something I'm missing here. > > So, it turns out you can reproduce this bug pretty easily without > force_parallel_mode, like this: > > alter table int4_tbl set (parallel_degree = 4); > SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42; > > Or you can just a query that involves a window function on a table > large enough for parallelism to be considered: > > SELECT SUM(COUNT(aid)) OVER () FROM pgbench_accounts; > > The crash goes way if the target list involves at least one plain > column that uses no aggregate or window function, because then the > PathTarget list has a sortgrouprefs array. Trivial fix patch > attached, although some more review from you (Tom Lane, PathTarget > inventor and planner whiz) and David Rowley (author of this function) > would be appreciated in case there are deeper issues here. The problem is make_group_input_target() only calls add_column_to_pathtarget() (which allocates this array) when there's a GROUP BY, otherwise it just appends to the non_group_col list. Since your query has no GROUP BY it means that add_column_to_pathtarget() is never called with a non-zero sortgroupref. It looks like Tom has intended that PathTarget->sortgrouprefs can be NULL going by both the comment /* corresponding sort/group refnos, or 0 */, and the coding inside add_column_to_pathtarget(), which does not allocate the array if given a 0 sortgroupref. It looks like make_sort_input_target(), make_window_input_target() and make_group_input_target() all get away without this check because they're all using final_target, which was built by make_pathtarget_from_tlist() which *always* allocates the sortgrouprefs array, even if it's left filled with zeros. It might be better if this was all consistent. Perhaps it would be worth modifying make_pathtarget_from_tlist() to only allocate the sortgrouprefs array if there's any non-zero tle->ressortgroupref, then modify the other make_*_input_target() functions to handle a NULL array, similar to the fix that's in your patch. This saves an allocation which is likely much more expensive than the NULL check later. Alternatively add_column_to_pathtarget() could be modified to allocate the array even if sortgroupref is zero. I think consistency is good here, as if this had been consistent this would not be a bug. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
[HACKERS] Vacuum Full by Scheme
Hello, There is some development to use the vacuum full by scheme? Ex: *vacuumdb -d postgres **-parameter public* In databases with thousands of tables and multiple scheme, it would be interesting to have this feature. *Filho **Arrais * Analista de Tecnologia da Informação MBA em Gestão de Tecnologia da Informação
Re: [HACKERS] Academic help for Postgres
On 12/05/16 02:20, Bruce Momjian wrote: I am giving a keynote at an IEEE database conference in Helsinki next week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa because I accepted the Helsinki conference invitation before the PGCon Ottawa date was changed from June to May). As part of the keynote, I would like to mention areas where academia can help us. The topics I can think of are: Query optimization Optimizer statistics Indexing structures Reducing function call overhead CPU locality Sorting Parallelism Sharding Any others? optimization of performance under very heavy loads ranging from almost all reads to almost all writes/updates, & other usage profiles single box, and multiple boxen large numbers of CPU's most efficient use of SSD's best use of insanely large amounts of RAM optimization of handling arrays & JSON structures Cheers, Gavin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Wed, May 11, 2016 at 1:57 PM, Robert Haas wrote: > I don't immediately understand what's going wrong here. It looks to > me like make_group_input_target() already called, and that worked OK, > but now make_partialgroup_input_target() is failing using more-or-less > the same logic. Presumably that's because make_group_input_target() > was called on final_target as returned by create_pathtarget(root, > tlist), but make_partialgroup_input_target() is being called on > grouping_target, which I'm guessing came from > make_window_input_target, which somehow lacks sortgroupref labeling. > But I don't immediately see how that would happen, so there's > obviously something I'm missing here. So, it turns out you can reproduce this bug pretty easily without force_parallel_mode, like this: alter table int4_tbl set (parallel_degree = 4); SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42; Or you can just a query that involves a window function on a table large enough for parallelism to be considered: SELECT SUM(COUNT(aid)) OVER () FROM pgbench_accounts; The crash goes way if the target list involves at least one plain column that uses no aggregate or window function, because then the PathTarget list has a sortgrouprefs array. Trivial fix patch attached, although some more review from you (Tom Lane, PathTarget inventor and planner whiz) and David Rowley (author of this function) would be appreciated in case there are deeper issues here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 6770836..5c5e5ab 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -4305,7 +4305,10 @@ make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target) foreach(lc, final_target->exprs) { Expr *expr = (Expr *) lfirst(lc); - Index sgref = final_target->sortgrouprefs[i]; + Index sgref = 0; + + if (final_target->sortgrouprefs != NULL) + sgref = final_target->sortgrouprefs[i]; if (sgref && parse->groupClause && get_sortgroupref_clause_noerr(sgref, parse->groupClause) != NULL) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does Type Have = Operator?
On Wed, May 11, 2016 at 2:07 AM, David E. Wheeler wrote: > > On May 10, 2016, at 5:56 PM, Fabrízio de Royes Mello < fabriziome...@gmail.com> wrote: > > > Searching for the operator in pg_operator catalog isn't enought? > > Seems like overkill, but will do if there’s nothing else. > I know... but you can do that just in case the current behaviour fail by cathing it with "begin...exception...", so you'll minimize the looking for process on catalog. Regards, -- Fabrízio de Royes Mello Consultoria/Coaching PostgreSQL >> Timbira: http://www.timbira.com.br >> Blog: http://fabriziomello.github.io >> Linkedin: http://br.linkedin.com/in/fabriziomello >> Twitter: http://twitter.com/fabriziomello >> Github: http://github.com/fabriziomello
Re: [HACKERS] parallel.c is not marked as test covered
On Wed, May 11, 2016 at 1:48 PM, David G. Johnston wrote: > What happens when there are no workers available due to max_worker_processes > already being assigned? Then the leader runs the plan after all. > Related question, if max_parallel_degree is >1 and "the requested number of > workers may not actually be available at runtime" is true, does the degree > of parallelism minimize at 1 worker + leader or will the leader simply run > the query by itself? If the leader can get no workers at all, it will simply run the query by itself. Of course, it tries to get as many as it can. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Wed, May 11, 2016 at 1:38 PM, Robert Haas wrote: >> I would just go fix this, along the lines of >> >> *** create_plain_partial_paths(PlannerInfo * >> *** 702,708 >> * with all of its inheritance siblings it may well pay off. >> */ >> if (rel->pages < parallel_threshold && >> ! rel->reloptkind == RELOPT_BASEREL) >> return; >> >> /* >> --- 703,710 >> * with all of its inheritance siblings it may well pay off. >> */ >> if (rel->pages < parallel_threshold && >> ! rel->reloptkind == RELOPT_BASEREL && >> ! force_parallel_mode == FORCE_PARALLEL_OFF) >> return; >> >> /* >> >> except that doing so and running the regression tests with >> force_parallel_mode = on results in core dumps. > > Nonetheless, that is a bug. (I only get one crash - do you get more?) This looks like a bug in the parallel aggregate code. #0 make_partialgroup_input_target (root=0x7faa5f002d20, final_target=0x7faa5f004270) at planner.c:4308 4308Indexsgref = final_target->sortgrouprefs[i]; (gdb) p debug_query_string $1 = 0x7faa5d00b638 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;" Current language: auto; currently minimal (gdb) bt #0 make_partialgroup_input_target (root=0x7faa5f002d20, final_target=0x7faa5f004270) at planner.c:4308 #1 0x000101df4889 in create_grouping_paths (root=0x7faa5f002d20, input_rel=0x7faa5f0034b0, target=0x7faa5f004270, rollup_lists=0x0, rollup_groupclauses=0x0) at planner.c:3421 #2 0x000101df19a0 in grouping_planner (root=0x7faa5f002d20, inheritance_update=0 '\0', tuple_fraction=0) at planner.c:1796 #3 0x000101def276 in subquery_planner (glob=0x7faa5f002b90, parse=0x7faa5d00cb80, parent_root=0x0, hasRecursion=0 '\0', tuple_fraction=0) at planner.c:758 #4 0x000101dee2de in standard_planner (parse=0x7faa5d00cb80, cursorOptions=256, boundParams=0x0) at planner.c:307 #5 0x000101dedf81 in planner (parse=0x7faa5d00cb80, cursorOptions=256, boundParams=0x0) at planner.c:177 #6 0x000101eed7b6 in pg_plan_query (querytree=0x7faa5d00cb80, cursorOptions=256, boundParams=0x0) at postgres.c:798 #7 0x000101eed8a3 in pg_plan_queries (querytrees=0x7faa5f002cc0, cursorOptions=256, boundParams=0x0) at postgres.c:857 #8 0x000101ef05ad in exec_simple_query (query_string=0x7faa5d00b638 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;") at postgres.c:1022 #9 0x000101eefa8f in PostgresMain (argc=1, argv=0x7faa5b005be0, dbname=0x7faa5b005940 "regression", username=0x7faa5b005920 "rhaas") at postgres.c:4059 #10 0x000101e45209 in BackendRun (port=0x7faa5ae01770) at postmaster.c:4258 #11 0x000101e9 in BackendStartup (port=0x7faa5ae01770) at postmaster.c:3932 #12 0x000101e433ef in ServerLoop () at postmaster.c:1690 #13 0x000101e40a23 in PostmasterMain (argc=8, argv=0x7faa5ac099b0) at postmaster.c:1298 #14 0x000101d6e160 in main (argc=8, argv=0x7faa5ac099b0) at main.c:228 (gdb) p final_target->sortgrouprefs $2 = (Index *) 0x0 I don't immediately understand what's going wrong here. It looks to me like make_group_input_target() already called, and that worked OK, but now make_partialgroup_input_target() is failing using more-or-less the same logic. Presumably that's because make_group_input_target() was called on final_target as returned by create_pathtarget(root, tlist), but make_partialgroup_input_target() is being called on grouping_target, which I'm guessing came from make_window_input_target, which somehow lacks sortgroupref labeling. But I don't immediately see how that would happen, so there's obviously something I'm missing here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] parallel.c is not marked as test covered
On Wed, May 11, 2016 at 10:38 AM, Robert Haas wrote: > On Wed, May 11, 2016 at 12:34 AM, Tom Lane wrote: > >> Hmm, that is strange. I would have expected that to stuff a Gather on > >> top of the Aggregate. I wonder why it's not doing that. > > > > The reason is that create_plain_partial_paths() contains a hard-wired > > decision not to generate any partial paths for relations smaller than > > 1000 blocks, which means that no partial path will ever be generated > > for *any* relation in the standard regression tests, force_parallel_mode > > or no. > > Well that's an interesting theory, except that you've completely > missed the point of force_parallel_mode. force_parallel_mode pushes a > special Gather node on top of any plan that is not already parallel in > some way but which is parallel-safe. That special Gather node runs > only in the worker, not the leader, and always uses just one worker. > What happens when there are no workers available due to max_worker_processes already being assigned? Related question, if max_parallel_degree is >1 and "the requested number of workers may not actually be available at runtime" is true, does the degree of parallelism minimize at 1 worker + leader or will the leader simply run the query by itself? David J.
Re: [HACKERS] Re: Need help debugging why autovacuum seems "stuck" -- until I use superuser to vacuum freeze pg_database
Shawn wrote: > The problem is though, when I look at age(datfrozenxid) for > "my_database_name" it is the oldest at 654,189,546 and should trigger the > autovacuum_freeze_max_age of 200,000,000 but it doesn't. The relation with > this age is "pg_database". If I issue a manual "vacuum freeze pg_database" > in the database, autovacuum will run on the next oldest database but get > stuck again with same symptoms (i.e. pg_database is the relation with the > oldest age). If I don't issue the manual "vacuum freeze" for all databases > and left things alone, it would get to wrap-around state. I was thinking about this in connection with the shared catalog vacuuming bug I fixed yesterday. One problem here is that the relfrozenxid/relminmxid settings are local to each database, so a for-wraparound vacuum would have to scan the table in all databases, and advance the counter in all databases, before the overall "age" of the system goes down again, which is inconvenient. This is a problem inherent to having these columns in a per-database catalog, and the only solution I can see is to have another shared catalog containing age state for the shared catalogs. (However, this issue isn't terribly serious -- it just means you have to scan the catalog N times instead of one.) However this doesn't explain why the vacuuming of pg_database in one database doesn't actually advance the relfrozenxid in that database; it certainly should. This is the serious problem you're facing, it seems. I wonder if you've tweaked the freeze settings somehow. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 11 May 2016 at 12:58, Josh berkus wrote: > Together with that, automated substitution of materialized views for > query clauses. > > Also: optimizing for new hardware, like persistent memory. I recently saw some material in ACM SIGOPS on tuning filesystems to play better with some of the new sorts of storage An interesting such article was thus... < http://dl.acm.org/citation.cfm?id=2819002> The idea of it was to research better ways of doing hash table updates with PCM (Phase Change Memory) which apparently may be up-and-coming but with fairly different write characteristics than we're used to. You essentially write a fairly large page at a time, and can only do limited numbers of updates to any given page. That encourages things like log-structured filesystems, but with further efforts to reduce there being "hot spots." The paper was focused on hash tables; if the hardware turns out to be important, it'll also be important to have better variations on B-trees. -- When confronted by a difficult problem, solve it by reducing it to the question, "How would the Lone Ranger handle this?"
Re: [HACKERS] parallel.c is not marked as test covered
On Wed, May 11, 2016 at 12:34 AM, Tom Lane wrote: >> Hmm, that is strange. I would have expected that to stuff a Gather on >> top of the Aggregate. I wonder why it's not doing that. > > The reason is that create_plain_partial_paths() contains a hard-wired > decision not to generate any partial paths for relations smaller than > 1000 blocks, which means that no partial path will ever be generated > for *any* relation in the standard regression tests, force_parallel_mode > or no. Well that's an interesting theory, except that you've completely missed the point of force_parallel_mode. force_parallel_mode pushes a special Gather node on top of any plan that is not already parallel in some way but which is parallel-safe. That special Gather node runs only in the worker, not the leader, and always uses just one worker. The point is to test that queries run in the worker in the same way that they do in the leader. This has already found lots of bugs, so it's clearly useful, despite all the confusion it's created. > I would just go fix this, along the lines of > > *** create_plain_partial_paths(PlannerInfo * > *** 702,708 > * with all of its inheritance siblings it may well pay off. > */ > if (rel->pages < parallel_threshold && > ! rel->reloptkind == RELOPT_BASEREL) > return; > > /* > --- 703,710 > * with all of its inheritance siblings it may well pay off. > */ > if (rel->pages < parallel_threshold && > ! rel->reloptkind == RELOPT_BASEREL && > ! force_parallel_mode == FORCE_PARALLEL_OFF) > return; > > /* > > except that doing so and running the regression tests with > force_parallel_mode = on results in core dumps. Nonetheless, that is a bug. (I only get one crash - do you get more?) > Some debugging seems > indicated ... and we should at this point assume that there has been no > useful testing of parallel query in the buildfarm, not even on Noah's > machines. Yes and no. The force_parallel_mode stuff is primarily there to tell us about things like "you marked a function as parallel-safe, but it actually blows up when run in a worker". It won't catch a case where a function is marked parallel-safe but silently does the wrong thing instead of failing, but it will catch quite a few mistakes of that kind, and I think that's good. What it won't do, though, is actually run "real" parallel queries - that is, multiple workers doing a Parallel Seq Scan with some other stuff pushed on top, and then a Gather. And we should have test coverage for that, too, so that we're testing real concurrent behavior, and not just our ability to run plans in a worker. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does Type Have = Operator?
On Wed, May 11, 2016 at 12:23 PM, David E. Wheeler wrote: > Oh, well crap. Maybe I’d be better off just comparing the plain > text of the expressions as Tom suggested. At the other extreme are the row comparison operators that only consider values equal if they have the same storage value. See the last paragraph of: http://www.postgresql.org/docs/9.5/static/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON | To support matching of rows which include elements without a | default B-tree operator class, the following operators are | defined for composite type comparison: *=, *<>, *<, *<=, *>, and | *>=. These operators compare the internal binary representation | of the two rows. Two rows might have a different binary | representation even though comparisons of the two rows with the | equality operator is true. The ordering of rows under these | comparison operators is deterministic but not otherwise | meaningful. These operators are used internally for materialized | views and might be useful for other specialized purposes such as | replication but are not intended to be generally useful for | writing queries. I'm not clear enough on your intended usage to know whether these operators are a good fit, but they are sitting there waiting to be used if they do fit. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does Type Have = Operator?
On May 11, 2016, at 10:19 AM, Kevin Grittner wrote: > As long as you don't assume too much about *what* is equal. > > test=# select '(1,1)(2,2)'::box = '(-4.5,1000)(-2.5,1000.5)'::box; > ?column? > -- > t > (1 row) Oh, well crap. Maybe I’d be better off just comparing the plain text of the expressions as Tom suggested. David smime.p7s Description: S/MIME cryptographic signature
Re: [HACKERS] Does Type Have = Operator?
On Wed, May 11, 2016 at 12:01 PM, David G. Johnston wrote: > Its hard to imagine defining "=" to mean something different in logic, > though, without intentionally trying to be cryptic. As long as you don't assume too much about *what* is equal. test=# select '(1,1)(2,2)'::box = '(-4.5,1000)(-2.5,1000.5)'::box; ?column? -- t (1 row) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does Type Have = Operator?
On Wed, May 11, 2016 at 9:54 AM, Robert Haas wrote: > On Tue, May 10, 2016 at 9:16 PM, David G. Johnston > wrote: > > Brute force: you'd have to query pg_amop and note the absence of a row > with > > a btree (maybe hash too...) family strategy 3 (1 for hash) [equality] > where > > the left and right types are the same and match the type in question. > > The core system uses this kind of thing to find equality operators in > a number of cases. > > We often assume that the operator which implements equality for the > type's default btree operator class is the canonical one for some > purpose. Ditto for the default hash operator class. > Yeah, the user-facing documentation covers it pretty deeply if not in one central location. But apparently the core system also uses the fact that "=", if present, is an equality operator and, less so, that no other operator is expected to be used for equality. I suspect that such an expectation is not enforced though - e.g., someone could define "==" to mean equality if they so choose (the lesser property). Its hard to imagine defining "=" to mean something different in logic, though, without intentionally trying to be cryptic. David J.
Re: [HACKERS] Academic help for Postgres
On 05/11/2016 07:54 AM, Bruce Momjian wrote: > On Wed, May 11, 2016 at 05:41:21PM +0300, Heikki Linnakangas wrote: >> On 11/05/16 17:32, Bruce Momjian wrote: >>> On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote: On 11.05.2016 17:20, Bruce Momjian wrote: > I am giving a keynote at an IEEE database conference in Helsinki next > week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa > because I accepted the Helsinki conference invitation before the PGCon > Ottawa date was changed from June to May). > > As part of the keynote, I would like to mention areas where academia can > help us. The topics I can think of are: > > Query optimization > Optimizer statistics > Indexing structures > Reducing function call overhead > CPU locality > Sorting > Parallelism > Sharding > > Any others? > Incremental materialized views? >>> >>> I don't know. Is that something academics would research? >> >> Absolutely! There are plenty of papers on how to keep materialized views >> up-to-date. > > Oh, OK. I will add it. > Together with that, automated substitution of materialized views for query clauses. Also: optimizing for new hardware, like persistent memory. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Does Type Have = Operator?
On Tue, May 10, 2016 at 9:16 PM, David G. Johnston wrote: > Brute force: you'd have to query pg_amop and note the absence of a row with > a btree (maybe hash too...) family strategy 3 (1 for hash) [equality] where > the left and right types are the same and match the type in question. The core system uses this kind of thing to find equality operators in a number of cases. We often assume that the operator which implements equality for the type's default btree operator class is the canonical one for some purpose. Ditto for the default hash operator class. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Wed, May 11, 2016 at 12:30 PM, Andres Freund wrote: > On 2016-05-11 12:27:55 -0400, Robert Haas wrote: >> On Wed, May 11, 2016 at 11:49 AM, Andres Freund wrote: >> > On 2016-05-11 10:12:26 -0400, Robert Haas wrote: >> >> > Hm. Do we really have to keep the page locked in the page-at-a-time >> >> > mode? Shouldn't the pin suffice? >> >> >> >> I think we need a lock to examine MVCC visibility information. A pin >> >> is enough to prevent a tuple from being removed, but not from having >> >> its xmax and cmax overwritten at almost but not quite exactly the same >> >> time. >> > >> > We already batch visibility lookups in page-at-a-time >> > mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals >> > after releasing the lock, but before the pin is released, without that >> > much effort. IIRC that isn't used for index lookups, but that's >> > probably a good idea. >> >> The trouble with that is that if you fail the qual, you have to relock >> the page. Which kinda sucks, if the qual is really simple. > > Hm? I'm missing something here? We currently do the visibility checks in > bulk for the whole page. After that we release the page lock. What > prevents us from executing the quals directly after that? And why would > you need to relock the page? Oh, yeah, in page-at-a-time mode we can release the lock first. I was thinking at what to do in tuple-at-a-time mode (i.e. when the page is not all-visible). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On 2016-05-11 12:27:55 -0400, Robert Haas wrote: > On Wed, May 11, 2016 at 11:49 AM, Andres Freund wrote: > > On 2016-05-11 10:12:26 -0400, Robert Haas wrote: > >> > Hm. Do we really have to keep the page locked in the page-at-a-time > >> > mode? Shouldn't the pin suffice? > >> > >> I think we need a lock to examine MVCC visibility information. A pin > >> is enough to prevent a tuple from being removed, but not from having > >> its xmax and cmax overwritten at almost but not quite exactly the same > >> time. > > > > We already batch visibility lookups in page-at-a-time > > mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals > > after releasing the lock, but before the pin is released, without that > > much effort. IIRC that isn't used for index lookups, but that's > > probably a good idea. > > The trouble with that is that if you fail the qual, you have to relock > the page. Which kinda sucks, if the qual is really simple. Hm? I'm missing something here? We currently do the visibility checks in bulk for the whole page. After that we release the page lock. What prevents us from executing the quals directly after that? And why would you need to relock the page? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Wed, May 11, 2016 at 11:49 AM, Andres Freund wrote: > On 2016-05-11 10:12:26 -0400, Robert Haas wrote: >> > I've to admit I'm not that convinced about the speedups in the !fdw >> > case. There seems to be a lot easier avenues for performance >> > improvements. >> >> What I'm talking about is a query like this: >> >> SELECT * FROM inheritance_tree_of_foreign_tables WHERE very_rarely_true; > > Note that I said "!fdw case". Oh, wow, I totally missed that exclamation point. >> > FWIW, I've even hacked something up for a bunch of simple queries, and >> > the performance improvements were significant. Besides it only being a >> > weekend hack project, the big thing I got stuck on was considering how >> > to exactly determine when to batch and not to batch. >> >> Yeah. I think we need a system for signalling nodes as to when they >> will be run to completion. But a Boolean is somehow unsatisfying; >> LIMIT 100 is more like no LIMIT than it it is like LIMIT 1. I'm >> tempted to add a numTuples field to every ExecutorState and give upper >> nodes some way to set it, as a hint. > > I was wondering whether we should hand down TupleVectorStates to lower > nodes, and their size determines the max batch size... There's some appeal to that, but it seems complicated to make work. >> >> Some care is required here because any >> >> functions we execute as scan keys are run with the buffer locked, so >> >> we had better not run anything very complicated. But doing this for >> >> simple things like integer equality operators seems like it could save >> >> quite a few buffer lock/unlock cycles and some other executor overhead >> >> as well. >> > >> > Hm. Do we really have to keep the page locked in the page-at-a-time >> > mode? Shouldn't the pin suffice? >> >> I think we need a lock to examine MVCC visibility information. A pin >> is enough to prevent a tuple from being removed, but not from having >> its xmax and cmax overwritten at almost but not quite exactly the same >> time. > > We already batch visibility lookups in page-at-a-time > mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals > after releasing the lock, but before the pin is released, without that > much effort. IIRC that isn't used for index lookups, but that's > probably a good idea. The trouble with that is that if you fail the qual, you have to relock the page. Which kinda sucks, if the qual is really simple. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On 2016-05-11 10:32:20 -0400, Robert Haas wrote: > On Tue, May 10, 2016 at 8:50 PM, Andres Freund wrote: > > That seems to suggest that we need to restructure how we get to calling > > fmgr functions, before worrying about the actual fmgr call. > > Any ideas on how to do that? ExecMakeFunctionResultNoSets() isn't > really doing a heck of a lot. Changing FuncExprState to use an array > rather than a linked list to store its arguments might help some. We > could also consider having an optimized path that skips the fn_strict > stuff if we can somehow deduce that no NULLs can occur in this > context, but that's a lot of work and new infrastructure. I feel like > maybe there's something higher-level we could do that would help more, > but I don't know what it is. I think it's not just ExecMakeFunctionResultNoSets, it's the whole call-stack which needs to be optimized together. E.g. look at a few performance metrics for a simple seqscan query with a bunch of ORed equality constraints: SELECT count(*) FROM pgbench_accounts WHERE abalance = -1 OR abalance = -2 OR abalance = -3 OR abalance = -4 OR abalance = -5 OR abalance = -6 OR abalance = -7 OR abalance = -8 OR abalance = -9 OR abalance = -10; perf record -g -p 27286 -F 5000 -e cycles:ppp,branch-misses,L1-icache-load-misses,iTLB-load-misses,L1-dcache-load-misses,dTLB-load-misses,LLC-load-misses sleep 3 6K cycles:ppp 6K branch-misses 1K L1-icache-load-misses 472 iTLB-load-misses 5K L1-dcache-load-misses 6K dTLB-load-misses 6K LLC-load-misses You can see that a number of events sample at a high rate, especially when you take the cycle samples into account. cycles: + 32.35% postgres postgres [.] ExecMakeFunctionResultNoSets + 14.51% postgres postgres [.] slot_getattr +5.50% postgres postgres [.] ExecEvalOr +5.22% postgres postgres [.] check_stack_depth branch-misses: + 73.77% postgres postgres [.] ExecQual + 17.83% postgres postgres [.] ExecEvalOr +1.49% postgres postgres [.] heap_getnext L1-icache-load-misses: +4.71% postgres [kernel.kallsyms] [k] update_curr +4.37% postgres postgres [.] hash_search_with_hash_value +3.91% postgres postgres [.] heap_getnext +3.81% postgres [kernel.kallsyms] [k] task_tick_fair iTLB-load-misses: + 27.57% postgres postgres [.] LWLockAcquire + 18.32% postgres postgres [.] hash_search_with_hash_value +7.09% postgres postgres [.] ExecMakeFunctionResultNoSets +3.06% postgres postgres [.] ExecEvalConst L1-dcache-load-misses: + 20.35% postgres postgres [.] ExecMakeFunctionResultNoSets + 12.31% postgres postgres [.] check_stack_depth +8.84% postgres postgres [.] heap_getnext +8.00% postgres postgres [.] slot_deform_tuple +7.15% postgres postgres [.] HeapTupleSatisfiesMVCC dTLB-load-misses: + 50.13% postgres postgres [.] ExecQual + 41.36% postgres postgres [.] ExecEvalOr +2.96% postgres postgres [.] hash_search_with_hash_value +1.30% postgres postgres [.] PinBuffer.isra.3 +1.19% postgres postgres [.] heap_page_prune_op LLC-load-misses: + 24.25% postgres postgres [.] slot_deform_tuple + 17.45% postgres postgres [.] CheckForSerializableConflictOut + 10.52% postgres postgres [.] heapgetpage +9.55% postgres postgres [.] HeapTupleSatisfiesMVCC +7.52% postgres postgres [.] ExecMakeFunctionResultNoSets For this workload, we expect a lot of LLC-load-misses as the workload is lot bigger than memory, and it makes sense that they're in slot_deform_tuple(),heapgetpage(), HeapTupleSatisfiesMVCC() (but uh CheckForSerializableConflictOut?). One avenue to optimize is to make those accesses easier to predict/prefetch, which they're atm likely not. But leaving that aside, we can see that a lot of the cost is distributed over ExecQual, ExecEvalOr, ExecMakeFunctionResultNoSets - all of which judiciously use linked list. I suspect that by simplifying these functions / datastructures *AND* by calling them over a batch of tuples, instead of one-by-one we'd limit the time spent in them considerably. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock downgrades have broken pg_upgrade
Peter Eisentraut wrote: > On 5/3/16 1:25 PM, Alvaro Herrera wrote: > >If we can put together a script that runs test.sh for various versions > >and then verifies the runs, we could use it in both buildfarm and > >coverage. > > Not that that would be useless, but note that the value in this case (and > most others) comes from having a candidate object in the database before > upgrade that exercises the particular problem, mostly independent of what > version you upgrade from and to. So far the way to do that is to leave > "junk" in the regression test database, but that's clearly a bit silly. True. We have quite a few places in the standard regression tests that leave junk behind purposefully for this reason. > I think the way forward is to create a TAP test suite for pg_upgrade that > specifically exercises a lot of scenarios with small purpose-built test > databases. That works for me. > Then, the problem of having to compare dump output across versions also goes > away more easily. Not sure why, but if you think it does, then it sounds good. Andrew's current approach of counting lines in the diff seems brittle and not entirely trustworthy. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On 2016-05-11 10:12:26 -0400, Robert Haas wrote: > > I've to admit I'm not that convinced about the speedups in the !fdw > > case. There seems to be a lot easier avenues for performance > > improvements. > > What I'm talking about is a query like this: > > SELECT * FROM inheritance_tree_of_foreign_tables WHERE very_rarely_true; Note that I said "!fdw case". > > FWIW, I've even hacked something up for a bunch of simple queries, and > > the performance improvements were significant. Besides it only being a > > weekend hack project, the big thing I got stuck on was considering how > > to exactly determine when to batch and not to batch. > > Yeah. I think we need a system for signalling nodes as to when they > will be run to completion. But a Boolean is somehow unsatisfying; > LIMIT 100 is more like no LIMIT than it it is like LIMIT 1. I'm > tempted to add a numTuples field to every ExecutorState and give upper > nodes some way to set it, as a hint. I was wondering whether we should hand down TupleVectorStates to lower nodes, and their size determines the max batch size... > >> Some care is required here because any > >> functions we execute as scan keys are run with the buffer locked, so > >> we had better not run anything very complicated. But doing this for > >> simple things like integer equality operators seems like it could save > >> quite a few buffer lock/unlock cycles and some other executor overhead > >> as well. > > > > Hm. Do we really have to keep the page locked in the page-at-a-time > > mode? Shouldn't the pin suffice? > > I think we need a lock to examine MVCC visibility information. A pin > is enough to prevent a tuple from being removed, but not from having > its xmax and cmax overwritten at almost but not quite exactly the same > time. We already batch visibility lookups in page-at-a-time mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals after releasing the lock, but before the pin is released, without that much effort. IIRC that isn't used for index lookups, but that's probably a good idea. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
2016-05-11 23:20 GMT+09:00 Bruce Momjian : > I am giving a keynote at an IEEE database conference in Helsinki next > week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa > because I accepted the Helsinki conference invitation before the PGCon > Ottawa date was changed from June to May). > > As part of the keynote, I would like to mention areas where academia can > help us. The topics I can think of are: > > Query optimization > Optimizer statistics > Indexing structures > Reducing function call overhead > CPU locality > Sorting > Parallelism > Sharding > > Any others? > How about NVRAM utilization? -- KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
On May 11, 2016 7:06:05 AM PDT, Teodor Sigaev wrote: >>> Allow Pin/UnpinBuffer to operate in a lockfree manner. >>> I get the errors: >>> >>> ERROR: attempted to delete invisible tuple >>> ERROR: unexpected chunk number 1 (expected 2) for toast value > >Just reminder, you investigate "unexpected chunk number" problem, but, >seems, we >have another bug (first ERROR: attempted to delete invisible tuple). >IMHO, it's >a separate bug, not related to oid. Unfortunately, I've never seen such >error on >my notebook. Same issue. If the dead tuple is noticed by heap_delete (when it should have deleted the live version elsewhere) you get the invisible role error. The unexpected chunk bit only triggers with hint bit sets differently across chunks. Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On Wed, May 11, 2016 at 9:32 AM, Bruce Momjian wrote: > On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote: >> Incremental materialized views? > > I don't know. Is that something academics would research? One paper I have found particularly good is this: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.3208 Tantalizingly, there is mention that there is a longer version of the paper, but I have been unable to find it. There is enough in this paper, I think, to fill in the blanks and do a much better job with implementation than any ad hoc approach is likely to manage, but if there is anything that extends this work (or subsequent work which seems to be an improvement) that would be great. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 11/05/16 17:32, Bruce Momjian wrote: On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote: On 11.05.2016 17:20, Bruce Momjian wrote: I am giving a keynote at an IEEE database conference in Helsinki next week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa because I accepted the Helsinki conference invitation before the PGCon Ottawa date was changed from June to May). As part of the keynote, I would like to mention areas where academia can help us. The topics I can think of are: Query optimization Optimizer statistics Indexing structures Reducing function call overhead CPU locality Sorting Parallelism Sharding Any others? Incremental materialized views? I don't know. Is that something academics would research? Absolutely! There are plenty of papers on how to keep materialized views up-to-date. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On Wed, May 11, 2016 at 05:41:21PM +0300, Heikki Linnakangas wrote: > On 11/05/16 17:32, Bruce Momjian wrote: > >On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote: > >>On 11.05.2016 17:20, Bruce Momjian wrote: > >>>I am giving a keynote at an IEEE database conference in Helsinki next > >>>week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa > >>>because I accepted the Helsinki conference invitation before the PGCon > >>>Ottawa date was changed from June to May). > >>> > >>>As part of the keynote, I would like to mention areas where academia can > >>>help us. The topics I can think of are: > >>> > >>> Query optimization > >>> Optimizer statistics > >>> Indexing structures > >>> Reducing function call overhead > >>> CPU locality > >>> Sorting > >>> Parallelism > >>> Sharding > >>> > >>>Any others? > >>> > >>Incremental materialized views? > > > >I don't know. Is that something academics would research? > > Absolutely! There are plenty of papers on how to keep materialized views > up-to-date. Oh, OK. I will add it. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 11.05.2016 17:32, Bruce Momjian wrote: On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote: On 11.05.2016 17:20, Bruce Momjian wrote: I am giving a keynote at an IEEE database conference in Helsinki next week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa because I accepted the Helsinki conference invitation before the PGCon Ottawa date was changed from June to May). As part of the keynote, I would like to mention areas where academia can help us. The topics I can think of are: Query optimization Optimizer statistics Indexing structures Reducing function call overhead CPU locality Sorting Parallelism Sharding Any others? Incremental materialized views? I don't know. Is that something academics would research? I am not sure. There is definitely a question which views can be incrementally recalculated and which inductive extension has to be constructed to make it possible. If you google for "incremental materialized views phd", you will get a larger number of references to articles. But I do not know if all question in this area are already closed or not... -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On Wed, May 11, 2016 at 5:20 PM, Bruce Momjian wrote: > I am giving a keynote at an IEEE database conference in Helsinki next > week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa > because I accepted the Helsinki conference invitation before the PGCon > Ottawa date was changed from June to May). > > As part of the keynote, I would like to mention areas where academia can > help us. The topics I can think of are: > > Query optimization > Optimizer statistics > Indexing structures > Reducing function call overhead > CPU locality > Sorting > Parallelism > Sharding > > Any others? machine learning for adaptive planning distributed stuff > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription + > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Tue, May 10, 2016 at 8:50 PM, Andres Freund wrote: > That seems to suggest that we need to restructure how we get to calling > fmgr functions, before worrying about the actual fmgr call. Any ideas on how to do that? ExecMakeFunctionResultNoSets() isn't really doing a heck of a lot. Changing FuncExprState to use an array rather than a linked list to store its arguments might help some. We could also consider having an optimized path that skips the fn_strict stuff if we can somehow deduce that no NULLs can occur in this context, but that's a lot of work and new infrastructure. I feel like maybe there's something higher-level we could do that would help more, but I don't know what it is. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote: > > > On 11.05.2016 17:20, Bruce Momjian wrote: > >I am giving a keynote at an IEEE database conference in Helsinki next > >week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa > >because I accepted the Helsinki conference invitation before the PGCon > >Ottawa date was changed from June to May). > > > >As part of the keynote, I would like to mention areas where academia can > >help us. The topics I can think of are: > > > > Query optimization > > Optimizer statistics > > Indexing structures > > Reducing function call overhead > > CPU locality > > Sorting > > Parallelism > > Sharding > > > >Any others? > > > Incremental materialized views? I don't know. Is that something academics would research? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On Wed, May 11, 2016 at 05:24:44PM +0300, Oleg Bartunov wrote: > On Wed, May 11, 2016 at 5:20 PM, Bruce Momjian wrote: > > As part of the keynote, I would like to mention areas where academia can > > help us. The topics I can think of are: > > > > Query optimization > > Optimizer statistics > > Indexing structures > > Reducing function call overhead > > CPU locality > > Sorting > > Parallelism > > Sharding > > > > Any others? > > machine learning for adaptive planning Do these fall in the "Query optimization" item? Does that need different text? > distributed stuff Oh, yes, distributed transactions. Good. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Academic help for Postgres
On 11.05.2016 17:20, Bruce Momjian wrote: I am giving a keynote at an IEEE database conference in Helsinki next week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa because I accepted the Helsinki conference invitation before the PGCon Ottawa date was changed from June to May). As part of the keynote, I would like to mention areas where academia can help us. The topics I can think of are: Query optimization Optimizer statistics Indexing structures Reducing function call overhead CPU locality Sorting Parallelism Sharding Any others? Incremental materialized views? -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock downgrades have broken pg_upgrade
On Wed, May 11, 2016 at 09:40:09AM -0400, Peter Eisentraut wrote: > Not that that would be useless, but note that the value in this case (and > most others) comes from having a candidate object in the database before > upgrade that exercises the particular problem, mostly independent of what > version you upgrade from and to. So far the way to do that is to leave > "junk" in the regression test database, but that's clearly a bit silly. > > I think the way forward is to create a TAP test suite for pg_upgrade that > specifically exercises a lot of scenarios with small purpose-built test > databases. > > Then, the problem of having to compare dump output across versions also goes > away more easily. I do have some small tests like for tablespaces. I am attaching the SQL script, if that is helpful. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + pg_upgrade_test.sql Description: application/sql -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On 10.05.2016 20:26, Robert Haas wrote: At this moment (February) them have implemented translation of only few PostgreSQL operators used by ExecQuals and do not support aggregates. Them get about 2 times increase of speed at synthetic queries and 25% increase at TPC-H Q1 (for Q1 most critical is generation of native code for aggregates, because ExecQual itself takes only 6% of time for this query). Actually these 25% for Q1 were achieved not by using dynamic code generation, but switching from PULL to PUSH model in executor. It seems to be yet another interesting PostgreSQL executor transformation. As far as I know, them are going to publish result of their work to open source... Interesting. You may notice that in "asynchronous mode" my prototype works using a push model of sorts. Maybe that should be taken further. Latest information from ISP RAS guys: them have made good progress since February: them have rewritten most of methods of Scan, Aggregate and Join to LLVM API. Also then implemented automatic translation of PostgreSQL backend functions to LLVM API. As a result time of TPC-H Q1 query is reduced four times. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Wed, May 11, 2016 at 10:17 AM, Konstantin Knizhnik wrote: > Yes, I agree with you that complete rewriting of optimizer is huge project > with unpredictable influence on performance of some queries. > Changing things incrementally is good approach, but only if we are moving in > right direction. > I still not sure that introduction of async. operations is step in right > direction. Async.ops are used to significantly complicate code (since you > have to maintain state yourself). It will be bad if implementation of each > node has to deal with async state itself in its own manner. I don't really think so. The design I've proposed makes adding asynchronous capability to a node pretty easy, with only minor changes. > My suggestion is to try to provide some generic mechanism for managing state > transition and have some scheduler which controls this process. It should > not be responsibility of node implementation to organize > asynchronous/parallel execution. Instead of this it should just produce set > of jobs which execution should be controlled by scheduler. First > implementation of scheduler can be quite simple. But later in can become > more clever: try to bind data to processors and do many other optimizations. Whereas this would require a massive rewrite. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Tue, May 10, 2016 at 8:23 PM, Andres Freund wrote: >> c. Modify some nodes (perhaps start with nodeAgg.c) to allow them to >> process a batch TupleTableSlot. This will require some tight loop to >> aggregate the entire TupleTableSlot at once before returning. >> d. Add function in execAmi.c which returns true or false depending on >> if the node supports batch TupleTableSlots or not. >> e. At executor startup determine if the entire plan tree supports >> batch TupleTableSlots, if so enable batch scan mode. > > It doesn't really need to be the entire tree. Even if you have a subtree > (say a parametrized index nested loop join) which doesn't support batch > mode, you'll likely still see performance benefits by building a batch > one layer above the non-batch-supporting node. +1. I've also wondered about building a new executor node that is sort of a combination of Nested Loop and Hash Join, but capable of performing multiple joins in a single operation. (Merge Join is different, because it's actually matching up the two sides, not just doing probing once per outer tuple.) So the plan tree would look something like this: Multiway Join -> Seq Scan on driving_table -> Index Scan on something -> Index Scan on something_else -> Hash -> Seq Scan on other_thing -> Hash -> Seq Scan on other_thing_2 -> Index Scan on another_one With the current structure, every level of the plan tree has its own TupleTableSlot and we have to project into each new slot. Every level has to go through ExecProcNode. So it seems to me that this sort of structure might save quite a few cycles on deep join nests. I haven't tried it, though. With batching, things get even better for this sort of thing. Assuming the joins are all basically semi-joins, either because they were written that way or because they are probing unique indexes or whatever, you can fetch a batch of tuples from the driving table, do the first join for each tuple to create a matching batch of tuples, and repeat for each join step. Then at the end you project. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Academic help for Postgres
I am giving a keynote at an IEEE database conference in Helsinki next week (http://icde2016.fi/). (Yes, I am not attending PGCon Ottawa because I accepted the Helsinki conference invitation before the PGCon Ottawa date was changed from June to May). As part of the keynote, I would like to mention areas where academia can help us. The topics I can think of are: Query optimization Optimizer statistics Indexing structures Reducing function call overhead CPU locality Sorting Parallelism Sharding Any others? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On 11.05.2016 17:00, Robert Haas wrote: On Tue, May 10, 2016 at 3:42 PM, Konstantin Knizhnik wrote: Doesn't this actually mean that we need to have normal job scheduler which is given queue of jobs and having some pool of threads will be able to orginize efficient execution of queries? Optimizer can build pipeline (graph) of tasks, which corresponds to execution plan nodes, i.e. SeqScan, Sort, ... Each task is splitted into several jobs which can be concurretly scheduled by task dispatcher. So you will not have blocked worker waiting for something and all system resources will be utilized. Such approach with dispatcher allows to implement quotas, priorities,... Also dispatches can care about NUMA and cache optimizations which is especially critical on modern architectures. One more reference: http://db.in.tum.de/~leis/papers/morsels.pdf I read this as a proposal to redesign the entire optimizer and executor to use some new kind of plan. That's not a project I'm willing to entertain; it is hard to imagine we could do it in a reasonable period of time without introducing bugs and performance regressions. I think there is a great deal of performance benefit that we can get by changing things incrementally. Yes, I agree with you that complete rewriting of optimizer is huge project with unpredictable influence on performance of some queries. Changing things incrementally is good approach, but only if we are moving in right direction. I still not sure that introduction of async. operations is step in right direction. Async.ops are used to significantly complicate code (since you have to maintain state yourself). It will be bad if implementation of each node has to deal with async state itself in its own manner. My suggestion is to try to provide some generic mechanism for managing state transition and have some scheduler which controls this process. It should not be responsibility of node implementation to organize asynchronous/parallel execution. Instead of this it should just produce set of jobs which execution should be controlled by scheduler. First implementation of scheduler can be quite simple. But later in can become more clever: try to bind data to processors and do many other optimizations. -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Allow Pin/UnpinBuffer to operate in a lockfree manner. I get the errors: ERROR: attempted to delete invisible tuple ERROR: unexpected chunk number 1 (expected 2) for toast value Just reminder, you investigate "unexpected chunk number" problem, but, seems, we have another bug (first ERROR: attempted to delete invisible tuple). IMHO, it's a separate bug, not related to oid. Unfortunately, I've never seen such error on my notebook. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Tue, May 10, 2016 at 7:57 PM, Andres Freund wrote: >> 1. asynchronous execution, by which I mean the ability of a node to >> somehow say that it will generate a tuple eventually, but is not yet >> ready, so that the executor can go run some other part of the plan >> tree while it waits. [...]. It is also a problem >> for parallel query: in a parallel sequential scan, the next worker can >> begin reading the next block even if the current block hasn't yet been >> received from the OS. Whether or not this will be efficient is a >> research question, but it can be done. However, imagine a parallel >> scan of a btree index: we don't know what page to scan next until we >> read the previous page and examine the next-pointer. In the meantime, >> any worker that arrives at that scan node has no choice but to block. >> It would be better if the scan node could instead say "hey, thanks for >> coming but I'm really not ready to be on-CPU just at the moment" and >> potentially allow the worker to go work in some other part of the >> query tree. For that worker to actually find useful work to do >> elsewhere, we'll probably need it to be the case either that the table >> is partitioned or the original query will need to involve UNION ALL, >> but those are not silly cases to worry about, particularly if we get >> native partitioning in 9.7. > > I've to admit I'm not that convinced about the speedups in the !fdw > case. There seems to be a lot easier avenues for performance > improvements. What I'm talking about is a query like this: SELECT * FROM inheritance_tree_of_foreign_tables WHERE very_rarely_true; What we do today is run the remote query on the first child table to completion, then start it on the second child table, and so on. Sending all the queries at once can bring a speed-up of a factor of N to a query with N children, and it's completely independent of every other speed-up that we might attempt. This has been under discussion for years on FDW-related threads as a huge problem that we need to fix someday, and I really don't see how it's sane not to try. The shape of what that looks like is of course arguable, but saying the optimization isn't valuable blows my mind. Whether you care about this case or not, this is also important for parallel query. > FWIW, I've even hacked something up for a bunch of simple queries, and > the performance improvements were significant. Besides it only being a > weekend hack project, the big thing I got stuck on was considering how > to exactly determine when to batch and not to batch. Yeah. I think we need a system for signalling nodes as to when they will be run to completion. But a Boolean is somehow unsatisfying; LIMIT 100 is more like no LIMIT than it it is like LIMIT 1. I'm tempted to add a numTuples field to every ExecutorState and give upper nodes some way to set it, as a hint. >> For asynchronous execution, I have gone so far as to mock up a bit of >> what this might look like. This shouldn't be taken very seriously at >> this point, but I'm attaching a few very-much-WIP patches to show the >> direction of my line of thinking. Basically, I propose to have >> ExecBlah (that is, ExecBitmapHeapScan, ExecAppend, etc.) return tuples >> by putting them into a new PlanState member called "result", which is >> just a Node * so that we can support multiple types of results, >> instead of returning them. > > What different types of results are you envisioning? TupleTableSlots and TupleTableVectors, mostly. I think the stuff that is currently going through MultiExecProcNode() could probably be folded in as just another type of result. >> Some care is required here because any >> functions we execute as scan keys are run with the buffer locked, so >> we had better not run anything very complicated. But doing this for >> simple things like integer equality operators seems like it could save >> quite a few buffer lock/unlock cycles and some other executor overhead >> as well. > > Hm. Do we really have to keep the page locked in the page-at-a-time > mode? Shouldn't the pin suffice? I think we need a lock to examine MVCC visibility information. A pin is enough to prevent a tuple from being removed, but not from having its xmax and cmax overwritten at almost but not quite exactly the same time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Tue, May 10, 2016 at 3:42 PM, Konstantin Knizhnik wrote: > Doesn't this actually mean that we need to have normal job scheduler which > is given queue of jobs and having some pool of threads will be able to > orginize efficient execution of queries? Optimizer can build pipeline > (graph) of tasks, which corresponds to execution plan nodes, i.e. SeqScan, > Sort, ... Each task is splitted into several jobs which can be concurretly > scheduled by task dispatcher. So you will not have blocked worker waiting > for something and all system resources will be utilized. Such approach with > dispatcher allows to implement quotas, priorities,... Also dispatches can > care about NUMA and cache optimizations which is especially critical on > modern architectures. One more reference: > http://db.in.tum.de/~leis/papers/morsels.pdf I read this as a proposal to redesign the entire optimizer and executor to use some new kind of plan. That's not a project I'm willing to entertain; it is hard to imagine we could do it in a reasonable period of time without introducing bugs and performance regressions. I think there is a great deal of performance benefit that we can get by changing things incrementally. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Tue, May 10, 2016 at 4:57 PM, Jim Nasby wrote: > Even so, I would think that the simplification in the executor would be > worth it. If you need to add a new node there's dozens of places where you > might have to mess with these giant case statements. Dozens? I think the number is in the single digits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text
On 4/29/16 8:56 AM, Shulgin, Oleksandr wrote: It would probably make sense model this function after Python's "dump-to-JSON-string" function: https://docs.python.org/2/library/json.html#json.dumps With the optional parameters for sorting the keys, indentation size and punctuation. This way all the prettiness enhancements could be contained in a single function w/o the need for generalized interface used in many places. +1. I've found the output functions of json.dumps to be very handy. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ALTER TABLE lock downgrades have broken pg_upgrade
On 5/3/16 1:25 PM, Alvaro Herrera wrote: If we can put together a script that runs test.sh for various versions and then verifies the runs, we could use it in both buildfarm and coverage. Not that that would be useless, but note that the value in this case (and most others) comes from having a candidate object in the database before upgrade that exercises the particular problem, mostly independent of what version you upgrade from and to. So far the way to do that is to leave "junk" in the regression test database, but that's clearly a bit silly. I think the way forward is to create a TAP test suite for pg_upgrade that specifically exercises a lot of scenarios with small purpose-built test databases. Then, the problem of having to compare dump output across versions also goes away more easily. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Minor documentation patch
Hi, Yesterday I was going over some consultancy and went to check some syntax for CREATE FUNCTION, particularly related to SECURITY DEFINER part. Reading there I saw a paragraph which had a sentence that wasn't very clear at first. The patch's description gives a better idea on the change, and how I got there, and I believe it gives better meaning to the sentence in question. I applied the same change on another part which had the same phrase. Cheers, -- Martín Marquéshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services >From fbf6b9f6df20d38b5f16c6af94424042b41d7fad Mon Sep 17 00:00:00 2001 From: Martin Date: Tue, 10 May 2016 21:31:24 -0300 Subject: [PATCH] While reading the CREATE FUNCTION reference docs for some reference on SECURITY DEFINER usage I ran on this phrase: Particularly important in this regard is the temporary-table schema, which is searched first by default, and is normally writeable by anyone. A secure arrangement can be had by forcing the temporary schema to be searched last. The last sentence there was not clear at first. I feel that the word *obtained* instead of *had* gives a more clear understanding. I found a similar phase in the PL/PgSQL documentation as well, and so applied the same fix. --- doc/src/sgml/plpgsql.sgml | 2 +- doc/src/sgml/ref/create_function.sgml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml index a27bbc5..4ecd9e3 100644 --- a/doc/src/sgml/plpgsql.sgml +++ b/doc/src/sgml/plpgsql.sgml @@ -528,7 +528,7 @@ $$ LANGUAGE plpgsql; - The same effect can be had by declaring one or more output parameters as + The same effect can be obtained by declaring one or more output parameters as polymorphic types. In this case the special $0 parameter is not used; the output parameters themselves serve the same purpose. For example: diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml index bd11d2b..583cdf5 100644 --- a/doc/src/sgml/ref/create_function.sgml +++ b/doc/src/sgml/ref/create_function.sgml @@ -715,7 +715,7 @@ SELECT * FROM dup(42); malicious users from creating objects that mask objects used by the function. Particularly important in this regard is the temporary-table schema, which is searched first by default, and -is normally writable by anyone. A secure arrangement can be had +is normally writable by anyone. A secure arrangement can be obtained by forcing the temporary schema to be searched last. To do this, write pg_temppg_tempsecuring functions as the last entry in search_path. This function illustrates safe usage: -- 2.5.5 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use %u to print user mapping's umid and userid
On Wed, May 11, 2016 at 1:34 PM, Etsuro Fujita wrote: > On 2016/05/11 16:49, Ashutosh Bapat wrote: > >> The patch is calculating user mapping when it's readily available >> through RelOptInfo::fdw_private. That incurs a catalog lookup >> unnecessarily. Instead, can we add new function makeOid, oidVal on the >> lines of makeInteger and intVal to store and retrieve an OID resp. and >> also corresponding print function? It might be helpful in future. >> > > That might be an idea, but is the overhead in that re-calculation so large? > > A call to GetForeignTable would incur a catalog lookup which means a catalog table/index scan if corresponding entry is not in the cache. This is followed by GetUserMapping() which is another catalog access. That's bound to be expensive than an makeOid(), oidVal() call. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Use %u to print user mapping's umid and userid
On 2016/05/11 16:49, Ashutosh Bapat wrote: The patch is calculating user mapping when it's readily available through RelOptInfo::fdw_private. That incurs a catalog lookup unnecessarily. Instead, can we add new function makeOid, oidVal on the lines of makeInteger and intVal to store and retrieve an OID resp. and also corresponding print function? It might be helpful in future. That might be an idea, but is the overhead in that re-calculation so large? Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use %u to print user mapping's umid and userid
On Wed, May 11, 2016 at 1:10 PM, Etsuro Fujita wrote: > On 2016/05/10 16:56, Etsuro Fujita wrote: > >> Here is a patch to fix this. >> > > I found that the previous patch handles the ForeignScan's fs_relids > Bitmapset destructively. Also, I noticed that I removed some existing > comments inadvertently. So, I'm attaching the updated patch to fix those > things. I'll add this to the next CF. I think this should be addressed in > advance of the release of 9.6, though. > > The patch is calculating user mapping when it's readily available through RelOptInfo::fdw_private. That incurs a catalog lookup unnecessarily. Instead, can we add new function makeOid, oidVal on the lines of makeInteger and intVal to store and retrieve an OID resp. and also corresponding print function? It might be helpful in future. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Odd oid-system-column handling in postgres_fdw
On 2016/04/05 17:15, Etsuro Fujita wrote: On 2016/03/16 16:25, Etsuro Fujita wrote: PG9.5 allows us to add an oid system column to foreign tables, using ALTER FOREIGN TABLE SET WITH OIDS, but currently, that column reads as zeroes in postgres_fdw. That seems to me like a bug. So, I'd like to propose to fix that, by retrieving that column from the remote server when requested. I'm attaching a proposed patch for that. I rebased the patch against HEAD. Updated patch attached. I'll add this to the next CF. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Use %u to print user mapping's umid and userid
On 2016/05/10 16:56, Etsuro Fujita wrote: Here is a patch to fix this. I found that the previous patch handles the ForeignScan's fs_relids Bitmapset destructively. Also, I noticed that I removed some existing comments inadvertently. So, I'm attaching the updated patch to fix those things. I'll add this to the next CF. I think this should be addressed in advance of the release of 9.6, though. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *** *** 67,74 enum FdwScanPrivateIndex FdwScanPrivateRetrievedAttrs, /* Integer representing the desired fetch_size */ FdwScanPrivateFetchSize, - /* Oid of user mapping to be used while connecting to the foreign server */ - FdwScanPrivateUserMappingOid, /* * String describing join i.e. names of relations being joined and types --- 67,72 *** *** 1198,1208 postgresGetForeignPlan(PlannerInfo *root, * Build the fdw_private list that will be available to the executor. * Items in the list must match order in enum FdwScanPrivateIndex. */ ! fdw_private = list_make5(makeString(sql.data), remote_conds, retrieved_attrs, ! makeInteger(fpinfo->fetch_size), ! makeInteger(foreignrel->umid)); if (foreignrel->reloptkind == RELOPT_JOINREL) fdw_private = lappend(fdw_private, makeString(fpinfo->relation_name->data)); --- 1196,1205 * Build the fdw_private list that will be available to the executor. * Items in the list must match order in enum FdwScanPrivateIndex. */ ! fdw_private = list_make4(makeString(sql.data), remote_conds, retrieved_attrs, ! makeInteger(fpinfo->fetch_size)); if (foreignrel->reloptkind == RELOPT_JOINREL) fdw_private = lappend(fdw_private, makeString(fpinfo->relation_name->data)); *** *** 1234,1240 postgresBeginForeignScan(ForeignScanState *node, int eflags) --- 1231,1241 ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan; EState *estate = node->ss.ps.state; PgFdwScanState *fsstate; + RangeTblEntry *rte; + Oid userid; + ForeignTable *table; UserMapping *user; + int rtindex; int numParams; /* *** *** 1256,1285 postgresBeginForeignScan(ForeignScanState *node, int eflags) * planning to ensure that the join is safe to pushdown. In case the * information goes stale between planning and execution, plan will be * invalidated and replanned. */ if (fsplan->scan.scanrelid > 0) { - ForeignTable *table; - /* ! * Identify which user to do the remote access as. This should match ! * what ExecCheckRTEPerms() does. */ ! RangeTblEntry *rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table); ! Oid userid = rte->checkAsUser ? rte->checkAsUser : GetUserId(); ! ! fsstate->rel = node->ss.ss_currentRelation; ! table = GetForeignTable(RelationGetRelid(fsstate->rel)); ! ! user = GetUserMapping(userid, table->serverid); } ! else ! { ! Oid umid = intVal(list_nth(fsplan->fdw_private, FdwScanPrivateUserMappingOid)); ! user = GetUserMappingById(umid); ! Assert(fsplan->fs_server == user->serverid); ! } /* * Get connection to the foreign server. Connection manager will --- 1257,1283 * planning to ensure that the join is safe to pushdown. In case the * information goes stale between planning and execution, plan will be * invalidated and replanned. + * + * This should match what ExecCheckRTEPerms() does. */ if (fsplan->scan.scanrelid > 0) + rtindex = fsplan->scan.scanrelid; + else { /* ! * It is ensured that foreign tables appearing in a foreign join ! * belong to the same server and use the same user mapping, so pick ! * the lowest-numbered one as a representative. */ ! rtindex = -1; ! rtindex = bms_next_member(fsplan->fs_relids, rtindex); ! Assert(rtindex > 0); } ! rte = rt_fetch(rtindex, estate->es_range_table); ! userid = rte->checkAsUser ? rte->checkAsUser : GetUserId(); ! table = GetForeignTable(rte->relid); ! user = GetUserMapping(userid, table->serverid); /* * Get connection to the foreign server. Connection manager will *** *** 1316,1324 postgresBeginForeignScan(ForeignScanState *node, int eflags) --- 1314,1328 * into local representation and error reporting during that process. */ if (fsplan->scan.scanrelid > 0) + { + fsstate->rel = node->ss.ss_currentRelation; fsstate->tupdesc = RelationGetDescr(fsstate->rel); + } else + { + fsstate->rel = NULL; fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor; + } fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc); *** a/src/include/nodes/pg_list.h --- b/src/include/nodes/pg_list.h *** *** 134,152 list_length(const List *l) #define list_make2(x1
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On 2016/04/28 13:45, Michael Paquier wrote: On Wed, Apr 27, 2016 at 12:16 PM, Etsuro Fujita wrote: On 2016/04/26 21:45, Etsuro Fujita wrote: While re-reviewing the fix, I noticed that since PQcancel we added to pgfdw_xact_callback to cancel a DML pushdown query isn't followed by a ROLLBACK, the connection to the remote server will be discarded at the end of the while loop in that function, which will cause a FATAL error of "connection to client lost". Probably, that was proposed by me in the first version of the patch, but I don't think that's a good idea. Shouldn't we execute ROLLBACK after that PQcancel? Another thing I noticed is, ISTM that we miss the case where DML pushdown queries are performed in subtransactions. I think cancellation logic would also need to be added to pgfdw_subxact_callback. Attached is a patch for that. I have spent some time looking at that... And yeah, losing the connection because of that is a little bit annoying if there are ways to make things clean, and as a START TRANSACTION is always sent for such queries it seems really better to issue a ROLLBACK in any case. Actually, by using PQcancel there is no way to be sure if the cancel will be effective or not. So it could be possible that the command is still able to complete correctly, or it could be able to cancel correctly and it would return an ERROR earlier. In any case, doing the ROLLBACK unconditionally seems adapted to me because we had better clean up the remote state in both cases. Thanks for the review! I'll add this to the next CF. I think this should be addressed in advance of the release of 9.6, though. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous and vectorized execution
On Wed, May 11, 2016 at 3:52 AM, Andres Freund wrote: > On 2016-05-11 03:20:12 +0300, Ants Aasma wrote: >> On Tue, May 10, 2016 at 7:56 PM, Robert Haas wrote: >> > On Mon, May 9, 2016 at 8:34 PM, David Rowley >> > wrote: >> > I don't have any at the moment, but I'm not keen on hundreds of new >> > vector functions that can all have bugs or behavior differences versus >> > the unvectorized versions of the same code. That's a substantial tax >> > on future development. I think it's important to understand what >> > sorts of queries we are targeting here. KaiGai's GPU-acceleration >> > stuff does great on queries with complex WHERE clauses, but most >> > people don't care not only because it's out-of-core but because who >> > actually looks for the records where (a + b) % c > (d + e) * f / g? >> > This seems like it has the same issue. If we can speed up common >> > queries people are actually likely to run, OK, that's interesting. >> >> I have seen pretty complex expressions in the projection and >> aggregation. Couple dozen SUM(CASE WHEN a THEN b*c ELSE MIN(d,e)*f >> END) type of expressions. In critical places had to replace them with >> a C coded function that processed a row at a time to avoid the >> executor dispatch overhead. > > I've seen that as well, but Was it the actual fmgr indirection causing > the overhead, or was it ExecQual/ExecMakeFunctionResultNoSets et al? I don't remember what the exact profile looked like, but IIRC it was mostly Exec* stuff with advance_aggregates also up there. Regards, Ants Aasma -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers