[HACKERS] Incremental refresh of materialized view - Patch

2016-05-11 Thread hari.prasath
Hi all

  I am building a patch to refresh materialized view incrementally from the 
change set decoded by using logical decoding from WAL.

  

  As of now i can able to generate the changes that has to be updated in 
the materialized view but the thing was it not possible to do any DML 
operations on MATVIEWS. 



  Only from the concurrent refresh of matviews the DML operations are 
allowed.

  

  However if the same methods in matview.c 
OpenMatViewIncrementalMaintenance & CloseMatViewIncrementalMaintenance are 
mad extern its possible to do DML from the patches like i am building now.



  Is there any other way of doing DML operations on materialized views from 
patch.?

  

  









cheers

- Harry







Re: [HACKERS] silent data loss with ext4 / all current versions

2016-05-11 Thread Michael Paquier
On Mon, Mar 28, 2016 at 8:25 AM, Andres Freund  wrote:
> I've also noticed that

Coming back to this issue because...

> a) pg_basebackup doesn't do anything about durability (it probably needs
>a very similar patch to the one pg_rewind just received).

I think that one of the QE tests running here just got bitten by that.
A base backup was taken with pg_basebackup and more or less after a VM
was plugged off. The trick is that for pg_basebackup we cannot rely on
initdb: pg_basebackup is a client-side utility. In most of the PG
packages (Fedora, RHEL), it is put on the client-side package, where
initdb is not. So it seems to me that the correct fix is not to use
initdb -S but to have copies of fsync_parent_path, durable_rename and
fsync_fname_ext in streamutil.c, and then we reuse them for both
pg_receivexlog and pg_basebackup. At least that's less risky for
back-branches this way.

> b) nor does pg_dump[all]

I have not hacked up that yet, but I would think that we would need
one extra copy of some of those fsync_* routines in src/bin/pg_dump/.
There is another thread for that already... On master I guess we'd end
with something centralized in src/common/, but let's close the
existing holes first.

> So we're going to have another round of fsync stuff in the next set of
> releases anyway...

The sooner the better I think. Any people caring about this problem
are now limited in using initdb -S after calling pg_basebackup or
pg_dump. That's a solution, though the flushes should be contained
inside each utility.
-- 
Michael


0001-Issue-fsync-more-carefully-in-pg_receivexlog-and-pg_.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Craig Ringer
On 12 May 2016 at 11:16, Rajeev rastogi  wrote:


> >Any others?
>

GPU offload.

Some work on that already got done as part of the AXLE project, but there's
still a lot more to do to get anything that can be usefully integrated into
Pg.

This likely ties in with batching work, since without batching it's
unlikely you can get much benefit from GPU offload.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-11 Thread Etsuro Fujita

On 2016/05/12 13:02, Tom Lane wrote:

Etsuro Fujita  writes:

On 2016/05/11 18:03, Ashutosh Bapat wrote:

A call to GetForeignTable would incur a catalog lookup which means a
catalog table/index scan if corresponding entry is not in the cache.
This is followed by GetUserMapping() which is another catalog access.
That's bound to be expensive than an makeOid(), oidVal() call.



Right, but such lookups have been incurred at the planning time (ie,
build_simple_rel), and corresponding entries would be in the cache.  So,
the overhead in that recalculation at the execution time would be not
that large in practice.  No?



It's a mistake to assume that execution immediately follows planning.


Yeah, that would not be the case in PREPARE/EXECUTE, right?


Having said that, I wonder whether you should be thinking less about
performance and more about correctness.  Is a user mapping lookup done
at plan time still valid at execution, and if so what ensures that?


I think if scanning a foreign join, the user mapping is still valid at  
execution, and that is ensured by RevalidateChachedQuery, IIUC.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-11 Thread Tom Lane
Etsuro Fujita  writes:
> On 2016/05/11 18:03, Ashutosh Bapat wrote:
>> A call to GetForeignTable would incur a catalog lookup which means a
>> catalog table/index scan if corresponding entry is not in the cache.
>> This is followed by GetUserMapping() which is another catalog access.
>> That's bound to be expensive than an makeOid(), oidVal() call.

> Right, but such lookups have been incurred at the planning time (ie, 
> build_simple_rel), and corresponding entries would be in the cache.  So, 
> the overhead in that recalculation at the execution time would be not 
> that large in practice.  No?

It's a mistake to assume that execution immediately follows planning.

Having said that, I wonder whether you should be thinking less about
performance and more about correctness.  Is a user mapping lookup done
at plan time still valid at execution, and if so what ensures that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Perf Benchmarking and regression.

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 12:51 AM, Ashutosh Sharma  wrote:
> I am extremely sorry for the delayed response.  As suggested by you, I have
> taken the performance readings at 128 client counts after making the
> following two changes:
>
> 1). Removed AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL,
> NULL); from pq_init(). Below is the git diff for the same.
>
> diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
> index 8d6eb0b..399d54b 100644
> --- a/src/backend/libpq/pqcomm.c
> +++ b/src/backend/libpq/pqcomm.c
> @@ -206,7 +206,9 @@ pq_init(void)
> AddWaitEventToSet(FeBeWaitSet, WL_SOCKET_WRITEABLE,
> MyProcPort->sock,
>   NULL, NULL);
> AddWaitEventToSet(FeBeWaitSet, WL_LATCH_SET, -1, MyLatch, NULL);
> +#if 0
> AddWaitEventToSet(FeBeWaitSet, WL_POSTMASTER_DEATH, -1, NULL, NULL);
> +#endif
>
> 2). Disabled the guc vars "bgwriter_flush_after", "checkpointer_flush_after"
> and "backend_flush_after" by setting them to zero.
>
> After doing the above two changes below are the readings i got for 128
> client counts:
>
> CASE : Read-Write Tests when data exceeds shared buffers.
>
> Non Default settings and test
> ./postgres -c shared_buffers=8GB -N 200 -c min_wal_size=15GB -c
> max_wal_size=20GB -c checkpoint_timeout=900 -c maintenance_work_mem=1GB -c
> checkpoint_completion_target=0.9 &
>
> ./pgbench -i -s 1000 postgres
>
> ./pgbench -c 128 -j 128 -T 1800 -M prepared postgres
>
> Run1 : tps = 9690.678225
> Run2 : tps = 9904.320645
> Run3 : tps = 9943.547176
>
> Please let me know if i need to take readings with other client counts as
> well.

Can you please take four new sets of readings, like this:

- Unpatched master, default *_flush_after
- Unpatched master, *_flush_after=0
- That line removed with #if 0, default *_flush_after
- That line removed with #if 0, *_flush_after=0

128 clients is fine.  But I want to see four sets of numbers that were
all taken by the same person at the same time using the same script.

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Vacuum Full by Scheme

2016-05-11 Thread Michael Paquier
On Thu, May 12, 2016 at 7:31 AM, Filho Arrais  wrote:
> There is some development to use the vacuum full by scheme?
>
> Ex: vacuumdb -d postgres -parameter public
>
> In databases with thousands of tables and multiple scheme, it would be
> interesting to have this feature.

No that I know of. The only way to achieve that now is to use a script
that fetches the list of tables in schema first and runs VACUUM on
those tables individually, as VACUUM cannot be run in a stored
procedure.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-11 Thread Etsuro Fujita

On 2016/05/11 18:03, Ashutosh Bapat wrote:

On Wed, May 11, 2016 at 1:34 PM, Etsuro Fujita
mailto:fujita.ets...@lab.ntt.co.jp>> wrote:



On 2016/05/11 16:49, Ashutosh Bapat wrote:

The patch is calculating user mapping when it's readily available
through RelOptInfo::fdw_private. That incurs a catalog lookup
unnecessarily. Instead, can we add new function makeOid, oidVal
on the
lines of makeInteger and intVal to store and retrieve an OID
resp. and
also corresponding print function? It might be helpful in future.



That might be an idea, but is the overhead in that re-calculation so
large?



A call to GetForeignTable would incur a catalog lookup which means a
catalog table/index scan if corresponding entry is not in the cache.
This is followed by GetUserMapping() which is another catalog access.
That's bound to be expensive than an makeOid(), oidVal() call.


Right, but such lookups have been incurred at the planning time (ie, 
build_simple_rel), and corresponding entries would be in the cache.  So, 
the overhead in that recalculation at the execution time would be not 
that large in practice.  No?


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Rajeev rastogi
On 11 May 2016 19:50, Bruce Momjian Wrote:


>I am giving a keynote at an IEEE database conference in Helsinki next
>week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
>because I accepted the Helsinki conference invitation before the PGCon
>Ottawa date was changed from June to May).
>
>As part of the keynote, I would like to mention areas where academia can
>help us.  The topics I can think of are:
>
>   Query optimization
>   Optimizer statistics
>   Indexing structures
>   Reducing function call overhead
>   CPU locality
>   Sorting
>   Parallelism
>   Sharding
>
>Any others?

How about?
1. Considering NUMA aware architecture.
2. Optimizer tuning as per new hardware trends.
3. More effective version of Join algorithms (e.g. Compare to traditional 
"build and then probe" mechanism of Hash Join, now there is pipelining Hash 
join where probe and build both happens together).

Thanks and Regards,
Kumar Rajeev Rastogi



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Craig Ringer
On 11 May 2016 at 22:20, Bruce Momjian  wrote:

> I am giving a keynote at an IEEE database conference in Helsinki next
> week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
> because I accepted the Helsinki conference invitation before the PGCon
> Ottawa date was changed from June to May).
>
> As part of the keynote, I would like to mention areas where academia can
> help us.  The topics I can think of are:
>

[snip]


>
> Any others?
>

When publishing work, publish source code somewhere stable that won't just
vanish. And build on the latest stable release, don't build your prototype
on Pg 8.0. Don't just publish a tarball with no information about what
revision it's based on, publish a git tree or a patch series.

While academic prototype source is rarely usable directly, it can serve a
valuable role with helping to understand the changes that were made,
reproducing results, exploring further related work, etc

Include your dummy data or data generators, setup scripts, etc.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread David E. Wheeler
On May 11, 2016, at 11:01 AM, Fabrízio de Royes Mello  
wrote:

> I know... but you can do that just in case the current behaviour fail by 
> cathing it with "begin...exception...", so you'll minimize the looking for 
> process on catalog.

Yeah, I guess. Honestly 90% of this issue would go away for me if there was a 
`json = json` operator. I know there are a couple different ways to interpret 
JSON equality, though.

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread David E. Wheeler
On May 11, 2016, at 10:34 AM, Kevin Grittner  wrote:

> I'm not clear enough on your intended usage to know whether these
> operators are a good fit, but they are sitting there waiting to be
> used if they do fit.

Huh. I haven’t had any problems with IS DISTINCT FROM for rows, except for the 
situation in which a failure is thrown because the types vary, say between TEXT 
and CITEXT. That can drive the tester crazy, since it says something like:

Results differ beginning at row 3:
have: (44,Anna)
want: (44,Anna)

But overall I think that’s okay; the tester really does want to make sure the 
type is correct.

Thanks,

David




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] parallel.c is not marked as test covered

2016-05-11 Thread David Rowley
On 12 May 2016 at 07:04, Robert Haas  wrote:
> On Wed, May 11, 2016 at 1:57 PM, Robert Haas 
wrote:
>> I don't immediately understand what's going wrong here.  It looks to
>> me like make_group_input_target() already called, and that worked OK,
>> but now make_partialgroup_input_target() is failing using more-or-less
>> the same logic.  Presumably that's because make_group_input_target()
>> was called on final_target as returned by create_pathtarget(root,
>> tlist), but make_partialgroup_input_target() is being called on
>> grouping_target, which I'm guessing came from
>> make_window_input_target, which somehow lacks sortgroupref labeling.
>> But I don't immediately see how that would happen, so there's
>> obviously something I'm missing here.
>
> So, it turns out you can reproduce this bug pretty easily without
> force_parallel_mode, like this:
>
> alter table int4_tbl set (parallel_degree = 4);
> SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;
>
> Or you can just a query that involves a window function on a table
> large enough for parallelism to be considered:
>
> SELECT SUM(COUNT(aid)) OVER () FROM pgbench_accounts;
>
> The crash goes way if the target list involves at least one plain
> column that uses no aggregate or window function, because then the
> PathTarget list has a sortgrouprefs array.  Trivial fix patch
> attached, although some more review from you (Tom Lane, PathTarget
> inventor and planner whiz) and David Rowley (author of this function)
> would be appreciated in case there are deeper issues here.

The problem is make_group_input_target() only calls
add_column_to_pathtarget() (which allocates this array) when there's a
GROUP BY, otherwise it just appends to the non_group_col list. Since your
query has no GROUP BY it means that add_column_to_pathtarget() is never
called with a non-zero sortgroupref.

It looks like Tom has intended that PathTarget->sortgrouprefs can be NULL
going by both the comment /* corresponding sort/group refnos, or 0 */, and
the coding inside add_column_to_pathtarget(), which does not allocate the
array if given a 0 sortgroupref.

It looks like make_sort_input_target(), make_window_input_target() and
make_group_input_target() all get away without this check because they're
all using final_target, which was built by make_pathtarget_from_tlist()
which *always* allocates the sortgrouprefs array, even if it's left filled
with zeros.

It might be better if this was all consistent. Perhaps it would be worth
modifying make_pathtarget_from_tlist() to only allocate the sortgrouprefs
array if there's any non-zero tle->ressortgroupref, then modify the other
make_*_input_target() functions to handle a NULL array, similar to the fix
that's in your patch. This saves an allocation which is likely much more
expensive than the NULL check later. Alternatively
add_column_to_pathtarget() could be modified to allocate the array even if
sortgroupref is zero.

I think consistency is good here, as if this had been consistent this would
not be a bug.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


[HACKERS] Vacuum Full by Scheme

2016-05-11 Thread Filho Arrais
Hello,

There is some development to use the vacuum full by scheme?

Ex: *vacuumdb -d postgres **-parameter public*

In databases with thousands of tables and multiple scheme, it would be
interesting to have this feature.


*Filho **Arrais  *

Analista de Tecnologia da Informação

MBA em Gestão de Tecnologia da Informação


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Gavin Flower

On 12/05/16 02:20, Bruce Momjian wrote:

I am giving a keynote at an IEEE database conference in Helsinki next
week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
because I accepted the Helsinki conference invitation before the PGCon
Ottawa date was changed from June to May).

As part of the keynote, I would like to mention areas where academia can
help us.  The topics I can think of are:

Query optimization
Optimizer statistics
Indexing structures
Reducing function call overhead
CPU locality
Sorting
Parallelism
Sharding

Any others?


optimization of performance under very heavy loads
ranging from almost all reads to almost all writes/updates, & other usage 
profiles
single box, and multiple boxen

large numbers of CPU's
most efficient use of SSD's
best use of insanely large amounts of RAM

optimization of handling arrays & JSON structures


Cheers,
Gavin



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel.c is not marked as test covered

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 1:57 PM, Robert Haas  wrote:
> I don't immediately understand what's going wrong here.  It looks to
> me like make_group_input_target() already called, and that worked OK,
> but now make_partialgroup_input_target() is failing using more-or-less
> the same logic.  Presumably that's because make_group_input_target()
> was called on final_target as returned by create_pathtarget(root,
> tlist), but make_partialgroup_input_target() is being called on
> grouping_target, which I'm guessing came from
> make_window_input_target, which somehow lacks sortgroupref labeling.
> But I don't immediately see how that would happen, so there's
> obviously something I'm missing here.

So, it turns out you can reproduce this bug pretty easily without
force_parallel_mode, like this:

alter table int4_tbl set (parallel_degree = 4);
SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;

Or you can just a query that involves a window function on a table
large enough for parallelism to be considered:

SELECT SUM(COUNT(aid)) OVER () FROM pgbench_accounts;

The crash goes way if the target list involves at least one plain
column that uses no aggregate or window function, because then the
PathTarget list has a sortgrouprefs array.  Trivial fix patch
attached, although some more review from you (Tom Lane, PathTarget
inventor and planner whiz) and David Rowley (author of this function)
would be appreciated in case there are deeper issues here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 6770836..5c5e5ab 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -4305,7 +4305,10 @@ make_partialgroup_input_target(PlannerInfo *root, PathTarget *final_target)
 	foreach(lc, final_target->exprs)
 	{
 		Expr	   *expr = (Expr *) lfirst(lc);
-		Index		sgref = final_target->sortgrouprefs[i];
+		Index		sgref = 0;
+
+		if (final_target->sortgrouprefs != NULL)
+			sgref = final_target->sortgrouprefs[i];
 
 		if (sgref && parse->groupClause &&
 			get_sortgroupref_clause_noerr(sgref, parse->groupClause) != NULL)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread Fabrízio de Royes Mello
On Wed, May 11, 2016 at 2:07 AM, David E. Wheeler 
wrote:
>
> On May 10, 2016, at 5:56 PM, Fabrízio de Royes Mello <
fabriziome...@gmail.com> wrote:
>
> > Searching for the operator in pg_operator catalog isn't enought?
>
> Seems like overkill, but will do if there’s nothing else.
>

I know... but you can do that just in case the current behaviour fail by
cathing it with "begin...exception...", so you'll minimize the looking for
process on catalog.

Regards,

--
Fabrízio de Royes Mello
Consultoria/Coaching PostgreSQL
>> Timbira: http://www.timbira.com.br
>> Blog: http://fabriziomello.github.io
>> Linkedin: http://br.linkedin.com/in/fabriziomello
>> Twitter: http://twitter.com/fabriziomello
>> Github: http://github.com/fabriziomello


Re: [HACKERS] parallel.c is not marked as test covered

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 1:48 PM, David G. Johnston
 wrote:
> What happens when there are no workers available due to max_worker_processes
> already being assigned?

Then the leader runs the plan after all.

> Related question, if max_parallel_degree is >1 and "the requested number of
> workers may not actually be available at runtime" is true, does the degree
> of parallelism minimize at 1 worker + leader or will the leader simply run
> the query by itself?

If the leader can get no workers at all, it will simply run the query
by itself.  Of course, it tries to get as many as it can.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel.c is not marked as test covered

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 1:38 PM, Robert Haas  wrote:
>> I would just go fix this, along the lines of
>>
>> *** create_plain_partial_paths(PlannerInfo *
>> *** 702,708 
>>  * with all of its inheritance siblings it may well pay off.
>>  */
>> if (rel->pages < parallel_threshold &&
>> !   rel->reloptkind == RELOPT_BASEREL)
>> return;
>>
>> /*
>> --- 703,710 
>>  * with all of its inheritance siblings it may well pay off.
>>  */
>> if (rel->pages < parallel_threshold &&
>> !   rel->reloptkind == RELOPT_BASEREL &&
>> !   force_parallel_mode == FORCE_PARALLEL_OFF)
>> return;
>>
>> /*
>>
>> except that doing so and running the regression tests with
>> force_parallel_mode = on results in core dumps.
>
> Nonetheless, that is a bug.  (I only get one crash - do you get more?)

This looks like a bug in the parallel aggregate code.

#0  make_partialgroup_input_target (root=0x7faa5f002d20,
final_target=0x7faa5f004270) at planner.c:4308
4308Indexsgref = final_target->sortgrouprefs[i];
(gdb) p debug_query_string
$1 = 0x7faa5d00b638 "SELECT SUM(COUNT(f1)) OVER () FROM int4_tbl WHERE f1=42;"
Current language:  auto; currently minimal
(gdb) bt
#0  make_partialgroup_input_target (root=0x7faa5f002d20,
final_target=0x7faa5f004270) at planner.c:4308
#1  0x000101df4889 in create_grouping_paths (root=0x7faa5f002d20,
input_rel=0x7faa5f0034b0, target=0x7faa5f004270, rollup_lists=0x0,
rollup_groupclauses=0x0) at planner.c:3421
#2  0x000101df19a0 in grouping_planner (root=0x7faa5f002d20,
inheritance_update=0 '\0', tuple_fraction=0) at planner.c:1796
#3  0x000101def276 in subquery_planner (glob=0x7faa5f002b90,
parse=0x7faa5d00cb80, parent_root=0x0, hasRecursion=0 '\0',
tuple_fraction=0) at planner.c:758
#4  0x000101dee2de in standard_planner (parse=0x7faa5d00cb80,
cursorOptions=256, boundParams=0x0) at planner.c:307
#5  0x000101dedf81 in planner (parse=0x7faa5d00cb80,
cursorOptions=256, boundParams=0x0) at planner.c:177
#6  0x000101eed7b6 in pg_plan_query (querytree=0x7faa5d00cb80,
cursorOptions=256, boundParams=0x0) at postgres.c:798
#7  0x000101eed8a3 in pg_plan_queries (querytrees=0x7faa5f002cc0,
cursorOptions=256, boundParams=0x0) at postgres.c:857
#8  0x000101ef05ad in exec_simple_query
(query_string=0x7faa5d00b638 "SELECT SUM(COUNT(f1)) OVER () FROM
int4_tbl WHERE f1=42;") at postgres.c:1022
#9  0x000101eefa8f in PostgresMain (argc=1, argv=0x7faa5b005be0,
dbname=0x7faa5b005940 "regression", username=0x7faa5b005920 "rhaas")
at postgres.c:4059
#10 0x000101e45209 in BackendRun (port=0x7faa5ae01770) at postmaster.c:4258
#11 0x000101e9 in BackendStartup (port=0x7faa5ae01770) at
postmaster.c:3932
#12 0x000101e433ef in ServerLoop () at postmaster.c:1690
#13 0x000101e40a23 in PostmasterMain (argc=8, argv=0x7faa5ac099b0)
at postmaster.c:1298
#14 0x000101d6e160 in main (argc=8, argv=0x7faa5ac099b0) at main.c:228
(gdb) p final_target->sortgrouprefs
$2 = (Index *) 0x0

I don't immediately understand what's going wrong here.  It looks to
me like make_group_input_target() already called, and that worked OK,
but now make_partialgroup_input_target() is failing using more-or-less
the same logic.  Presumably that's because make_group_input_target()
was called on final_target as returned by create_pathtarget(root,
tlist), but make_partialgroup_input_target() is being called on
grouping_target, which I'm guessing came from
make_window_input_target, which somehow lacks sortgroupref labeling.
But I don't immediately see how that would happen, so there's
obviously something I'm missing here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] parallel.c is not marked as test covered

2016-05-11 Thread David G. Johnston
On Wed, May 11, 2016 at 10:38 AM, Robert Haas  wrote:

> On Wed, May 11, 2016 at 12:34 AM, Tom Lane  wrote:
> >> Hmm, that is strange.  I would have expected that to stuff a Gather on
> >> top of the Aggregate.  I wonder why it's not doing that.
> >
> > The reason is that create_plain_partial_paths() contains a hard-wired
> > decision not to generate any partial paths for relations smaller than
> > 1000 blocks, which means that no partial path will ever be generated
> > for *any* relation in the standard regression tests, force_parallel_mode
> > or no.
>
> Well that's an interesting theory, except that you've completely
> missed the point of force_parallel_mode.  force_parallel_mode pushes a
> special Gather node on top of any plan that is not already parallel in
> some way but which is parallel-safe.   That special Gather node runs
> only in the worker, not the leader, and always uses just one worker.
>

​What happens when there are no workers available due to
max_worker_processes ​already being assigned?

Related question, if max_parallel_degree is >1 and "the requested number of
workers may not actually be available at runtime" is true, does the degree
of parallelism minimize at 1 worker + leader or will the leader simply run
the query by itself?

David J.


Re: [HACKERS] Re: Need help debugging why autovacuum seems "stuck" -- until I use superuser to vacuum freeze pg_database

2016-05-11 Thread Alvaro Herrera
Shawn wrote:
 
> The problem is though, when I look at age(datfrozenxid) for
> "my_database_name" it is the oldest at 654,189,546 and should trigger the
> autovacuum_freeze_max_age of 200,000,000 but it doesn't.  The relation with
> this age is "pg_database".  If I issue a manual "vacuum freeze pg_database"
> in the database, autovacuum will run on the next oldest database but get
> stuck again with same symptoms (i.e. pg_database is the relation with the
> oldest age).  If I don't issue the manual "vacuum freeze" for all databases
> and left things alone, it would get to wrap-around state.

I was thinking about this in connection with the shared catalog
vacuuming bug I fixed yesterday.  One problem here is that the
relfrozenxid/relminmxid settings are local to each database, so a
for-wraparound vacuum would have to scan the table in all databases, and
advance the counter in all databases, before the overall "age" of the
system goes down again, which is inconvenient.  This is a problem
inherent to having these columns in a per-database catalog, and the only
solution I can see is to have another shared catalog containing age
state for the shared catalogs.  (However, this issue isn't terribly
serious -- it just means you have to scan the catalog N times instead of
one.)

However this doesn't explain why the vacuuming of pg_database in one
database doesn't actually advance the relfrozenxid in that database; it
certainly should.  This is the serious problem you're facing, it seems.
I wonder if you've tweaked the freeze settings somehow.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Christopher Browne
On 11 May 2016 at 12:58, Josh berkus  wrote:

> Together with that, automated substitution of materialized views for
> query clauses.
>
> Also: optimizing for new hardware, like persistent memory.


I recently saw some material in ACM SIGOPS on tuning filesystems to play
better with some of the new sorts of storage

An interesting such article was thus... <
http://dl.acm.org/citation.cfm?id=2819002>  The idea of it was to research
better ways of doing hash table updates with PCM (Phase Change Memory)
which apparently may be up-and-coming but with fairly different write
characteristics than we're used to.  You essentially write a fairly large
page at a time, and can only do limited numbers of updates to any given
page.
That encourages things like log-structured filesystems, but with further
efforts to reduce there being "hot spots."

The paper was focused on hash tables; if the hardware turns out to be
important, it'll also be important to have better variations on B-trees.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: [HACKERS] parallel.c is not marked as test covered

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 12:34 AM, Tom Lane  wrote:
>> Hmm, that is strange.  I would have expected that to stuff a Gather on
>> top of the Aggregate.  I wonder why it's not doing that.
>
> The reason is that create_plain_partial_paths() contains a hard-wired
> decision not to generate any partial paths for relations smaller than
> 1000 blocks, which means that no partial path will ever be generated
> for *any* relation in the standard regression tests, force_parallel_mode
> or no.

Well that's an interesting theory, except that you've completely
missed the point of force_parallel_mode.  force_parallel_mode pushes a
special Gather node on top of any plan that is not already parallel in
some way but which is parallel-safe.   That special Gather node runs
only in the worker, not the leader, and always uses just one worker.
The point is to test that queries run in the worker in the same way
that they do in the leader.  This has already found lots of bugs, so
it's clearly useful, despite all the confusion it's created.

> I would just go fix this, along the lines of
>
> *** create_plain_partial_paths(PlannerInfo *
> *** 702,708 
>  * with all of its inheritance siblings it may well pay off.
>  */
> if (rel->pages < parallel_threshold &&
> !   rel->reloptkind == RELOPT_BASEREL)
> return;
>
> /*
> --- 703,710 
>  * with all of its inheritance siblings it may well pay off.
>  */
> if (rel->pages < parallel_threshold &&
> !   rel->reloptkind == RELOPT_BASEREL &&
> !   force_parallel_mode == FORCE_PARALLEL_OFF)
> return;
>
> /*
>
> except that doing so and running the regression tests with
> force_parallel_mode = on results in core dumps.

Nonetheless, that is a bug.  (I only get one crash - do you get more?)

> Some debugging seems
> indicated ... and we should at this point assume that there has been no
> useful testing of parallel query in the buildfarm, not even on Noah's
> machines.

Yes and no.  The force_parallel_mode stuff is primarily there to tell
us about things like "you marked a function as parallel-safe, but it
actually blows up when run in a worker".  It won't catch a case where
a function is marked parallel-safe but silently does the wrong thing
instead of failing, but it will catch quite a few mistakes of that
kind, and I think that's good.

What it won't do, though, is actually run "real" parallel queries -
that is, multiple workers doing a Parallel Seq Scan with some other
stuff pushed on top, and then a Gather.  And we should have test
coverage for that, too, so that we're testing real concurrent
behavior, and not just our ability to run plans in a worker.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread Kevin Grittner
On Wed, May 11, 2016 at 12:23 PM, David E. Wheeler
 wrote:

> Oh, well crap. Maybe I’d be better off just comparing the plain
> text of the expressions as Tom suggested.

At the other extreme are the row comparison operators that only
consider values equal if they have the same storage value.  See the
last paragraph of:

http://www.postgresql.org/docs/9.5/static/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON

| To support matching of rows which include elements without a
| default B-tree operator class, the following operators are
| defined for composite type comparison: *=, *<>, *<, *<=, *>, and
| *>=. These operators compare the internal binary representation
| of the two rows. Two rows might have a different binary
| representation even though comparisons of the two rows with the
| equality operator is true. The ordering of rows under these
| comparison operators is deterministic but not otherwise
| meaningful. These operators are used internally for materialized
| views and might be useful for other specialized purposes such as
| replication but are not intended to be generally useful for
| writing queries.

I'm not clear enough on your intended usage to know whether these
operators are a good fit, but they are sitting there waiting to be
used if they do fit.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread David E. Wheeler
On May 11, 2016, at 10:19 AM, Kevin Grittner  wrote:

> As long as you don't assume too much about *what* is equal.
> 
> test=# select '(1,1)(2,2)'::box = '(-4.5,1000)(-2.5,1000.5)'::box;
> ?column?
> --
> t
> (1 row)

Oh, well crap. Maybe I’d be better off just comparing the plain text of the 
expressions as Tom suggested.

David



smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread Kevin Grittner
On Wed, May 11, 2016 at 12:01 PM, David G. Johnston
 wrote:

> Its hard to imagine defining "=" to mean something different in logic,
> though, without intentionally trying to be cryptic.

As long as you don't assume too much about *what* is equal.

test=# select '(1,1)(2,2)'::box = '(-4.5,1000)(-2.5,1000.5)'::box;
 ?column?
--
 t
(1 row)

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread David G. Johnston
On Wed, May 11, 2016 at 9:54 AM, Robert Haas  wrote:

> On Tue, May 10, 2016 at 9:16 PM, David G. Johnston
>  wrote:
> > Brute force: you'd have to query pg_amop and note the absence of a row
> with
> > a btree (maybe hash too...) family strategy 3 (1 for hash) [equality]
> where
> > the left and right types are the same and match the type in question.
>
> The core system uses this kind of thing to find equality operators in
> a number of cases.
>
> We often assume that the operator which implements equality for the
> type's default btree operator class is the canonical one for some
> purpose.  Ditto for the default hash operator class.
>

​Yeah, the user-facing documentation covers it pretty deeply if not in one
central location.

But apparently the core system also uses the fact that "=", if present, is
an equality operator and, less so, that no other operator is expected​

​to be used for equality.

I suspect that such an expectation is not enforced though - e.g., someone
could define "==" to mean equality ​if they so choose (the lesser
property).  Its hard to imagine defining "=" to mean something different in
logic, though, without intentionally trying to be cryptic.

David J.


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Josh berkus
On 05/11/2016 07:54 AM, Bruce Momjian wrote:
> On Wed, May 11, 2016 at 05:41:21PM +0300, Heikki Linnakangas wrote:
>> On 11/05/16 17:32, Bruce Momjian wrote:
>>> On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote:
 On 11.05.2016 17:20, Bruce Momjian wrote:
> I am giving a keynote at an IEEE database conference in Helsinki next
> week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
> because I accepted the Helsinki conference invitation before the PGCon
> Ottawa date was changed from June to May).
>
> As part of the keynote, I would like to mention areas where academia can
> help us.  The topics I can think of are:
>
>   Query optimization
>   Optimizer statistics
>   Indexing structures
>   Reducing function call overhead
>   CPU locality
>   Sorting
>   Parallelism
>   Sharding
>
> Any others?
>
 Incremental materialized views?
>>>
>>> I don't know.  Is that something academics would research?
>>
>> Absolutely! There are plenty of papers on how to keep materialized views
>> up-to-date.
> 
> Oh, OK. I will add it.
> 

Together with that, automated substitution of materialized views for
query clauses.

Also: optimizing for new hardware, like persistent memory.

-- 
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Does Type Have = Operator?

2016-05-11 Thread Robert Haas
On Tue, May 10, 2016 at 9:16 PM, David G. Johnston
 wrote:
> Brute force: you'd have to query pg_amop and note the absence of a row with
> a btree (maybe hash too...) family strategy 3 (1 for hash) [equality] where
> the left and right types are the same and match the type in question.

The core system uses this kind of thing to find equality operators in
a number of cases.

We often assume that the operator which implements equality for the
type's default btree operator class is the canonical one for some
purpose.  Ditto for the default hash operator class.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 12:30 PM, Andres Freund  wrote:
> On 2016-05-11 12:27:55 -0400, Robert Haas wrote:
>> On Wed, May 11, 2016 at 11:49 AM, Andres Freund  wrote:
>> > On 2016-05-11 10:12:26 -0400, Robert Haas wrote:
>> >> > Hm. Do we really have to keep the page locked in the page-at-a-time
>> >> > mode? Shouldn't the pin suffice?
>> >>
>> >> I think we need a lock to examine MVCC visibility information.  A pin
>> >> is enough to prevent a tuple from being removed, but not from having
>> >> its xmax and cmax overwritten at almost but not quite exactly the same
>> >> time.
>> >
>> > We already batch visibility lookups in page-at-a-time
>> > mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals
>> > after releasing the lock, but before the pin is released, without that
>> > much effort.  IIRC that isn't used for index lookups, but that's
>> > probably a good idea.
>>
>> The trouble with that is that if you fail the qual, you have to relock
>> the page.  Which kinda sucks, if the qual is really simple.
>
> Hm? I'm missing something here? We currently do the visibility checks in
> bulk for the whole page. After that we release the page lock. What
> prevents us from executing the quals directly after that? And why would
> you need to relock the page?

Oh, yeah, in page-at-a-time mode we can release the lock first.  I was
thinking at what to do in tuple-at-a-time mode (i.e. when the page is
not all-visible).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Andres Freund
On 2016-05-11 12:27:55 -0400, Robert Haas wrote:
> On Wed, May 11, 2016 at 11:49 AM, Andres Freund  wrote:
> > On 2016-05-11 10:12:26 -0400, Robert Haas wrote:
> >> > Hm. Do we really have to keep the page locked in the page-at-a-time
> >> > mode? Shouldn't the pin suffice?
> >>
> >> I think we need a lock to examine MVCC visibility information.  A pin
> >> is enough to prevent a tuple from being removed, but not from having
> >> its xmax and cmax overwritten at almost but not quite exactly the same
> >> time.
> >
> > We already batch visibility lookups in page-at-a-time
> > mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals
> > after releasing the lock, but before the pin is released, without that
> > much effort.  IIRC that isn't used for index lookups, but that's
> > probably a good idea.
> 
> The trouble with that is that if you fail the qual, you have to relock
> the page.  Which kinda sucks, if the qual is really simple.

Hm? I'm missing something here? We currently do the visibility checks in
bulk for the whole page. After that we release the page lock. What
prevents us from executing the quals directly after that? And why would
you need to relock the page?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 11:49 AM, Andres Freund  wrote:
> On 2016-05-11 10:12:26 -0400, Robert Haas wrote:
>> > I've to admit I'm not that convinced about the speedups in the !fdw
>> > case. There seems to be a lot easier avenues for performance
>> > improvements.
>>
>> What I'm talking about is a query like this:
>>
>> SELECT * FROM inheritance_tree_of_foreign_tables WHERE very_rarely_true;
>
> Note that I said "!fdw case".

Oh, wow, I totally missed that exclamation point.

>> > FWIW, I've even hacked something up for a bunch of simple queries, and
>> > the performance improvements were significant.  Besides it only being a
>> > weekend hack project, the big thing I got stuck on was considering how
>> > to exactly determine when to batch and not to batch.
>>
>> Yeah.  I think we need a system for signalling nodes as to when they
>> will be run to completion.  But a Boolean is somehow unsatisfying;
>> LIMIT 100 is more like no LIMIT than it it is like LIMIT 1.  I'm
>> tempted to add a numTuples field to every ExecutorState and give upper
>> nodes some way to set it, as a hint.
>
> I was wondering whether we should hand down TupleVectorStates to lower
> nodes, and their size determines the max batch size...

There's some appeal to that, but it seems complicated to make work.

>> >> Some care is required here because any
>> >> functions we execute as scan keys are run with the buffer locked, so
>> >> we had better not run anything very complicated.  But doing this for
>> >> simple things like integer equality operators seems like it could save
>> >> quite a few buffer lock/unlock cycles and some other executor overhead
>> >> as well.
>> >
>> > Hm. Do we really have to keep the page locked in the page-at-a-time
>> > mode? Shouldn't the pin suffice?
>>
>> I think we need a lock to examine MVCC visibility information.  A pin
>> is enough to prevent a tuple from being removed, but not from having
>> its xmax and cmax overwritten at almost but not quite exactly the same
>> time.
>
> We already batch visibility lookups in page-at-a-time
> mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals
> after releasing the lock, but before the pin is released, without that
> much effort.  IIRC that isn't used for index lookups, but that's
> probably a good idea.

The trouble with that is that if you fail the qual, you have to relock
the page.  Which kinda sucks, if the qual is really simple.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Andres Freund
On 2016-05-11 10:32:20 -0400, Robert Haas wrote:
> On Tue, May 10, 2016 at 8:50 PM, Andres Freund  wrote:
> > That seems to suggest that we need to restructure how we get to calling
> > fmgr functions, before worrying about the actual fmgr call.
> 
> Any ideas on how to do that?  ExecMakeFunctionResultNoSets() isn't
> really doing a heck of a lot.  Changing FuncExprState to use an array
> rather than a linked list to store its arguments might help some.   We
> could also consider having an optimized path that skips the fn_strict
> stuff if we can somehow deduce that no NULLs can occur in this
> context, but that's a lot of work and new infrastructure.  I feel like
> maybe there's something higher-level we could do that would help more,
> but I don't know what it is.

I think it's not just ExecMakeFunctionResultNoSets, it's the whole
call-stack which needs to be optimized together.

E.g. look at a few performance metrics for a simple seqscan query with a
bunch of ORed equality constraints:
SELECT count(*) FROM pgbench_accounts WHERE abalance = -1 OR abalance = -2 OR 
abalance = -3 OR abalance = -4 OR abalance = -5 OR abalance = -6 OR abalance = 
-7 OR abalance = -8 OR abalance = -9 OR abalance = -10;

perf record -g -p 27286 -F 5000 -e 
cycles:ppp,branch-misses,L1-icache-load-misses,iTLB-load-misses,L1-dcache-load-misses,dTLB-load-misses,LLC-load-misses
 sleep 3
6K cycles:ppp
6K branch-misses
1K L1-icache-load-misses
472 iTLB-load-misses
5K L1-dcache-load-misses
6K dTLB-load-misses
6K LLC-load-misses

You can see that a number of events sample at a high rate, especially
when you take the cycle samples into account.

cycles:
+   32.35%  postgres  postgres   [.] ExecMakeFunctionResultNoSets
+   14.51%  postgres  postgres   [.] slot_getattr
+5.50%  postgres  postgres   [.] ExecEvalOr
+5.22%  postgres  postgres   [.] check_stack_depth

branch-misses:
+   73.77%  postgres  postgres   [.] ExecQual
+   17.83%  postgres  postgres   [.] ExecEvalOr
+1.49%  postgres  postgres   [.] heap_getnext

L1-icache-load-misses:
+4.71%  postgres  [kernel.kallsyms]  [k] update_curr
+4.37%  postgres  postgres   [.] hash_search_with_hash_value
+3.91%  postgres  postgres   [.] heap_getnext
+3.81%  postgres  [kernel.kallsyms]  [k] task_tick_fair

iTLB-load-misses:
+   27.57%  postgres  postgres   [.] LWLockAcquire
+   18.32%  postgres  postgres   [.] hash_search_with_hash_value
+7.09%  postgres  postgres   [.] ExecMakeFunctionResultNoSets
+3.06%  postgres  postgres   [.] ExecEvalConst

L1-dcache-load-misses:
+   20.35%  postgres  postgres   [.] ExecMakeFunctionResultNoSets
+   12.31%  postgres  postgres   [.] check_stack_depth
+8.84%  postgres  postgres   [.] heap_getnext
+8.00%  postgres  postgres   [.] slot_deform_tuple
+7.15%  postgres  postgres   [.] HeapTupleSatisfiesMVCC

dTLB-load-misses:
+   50.13%  postgres  postgres   [.] ExecQual
+   41.36%  postgres  postgres   [.] ExecEvalOr
+2.96%  postgres  postgres   [.] hash_search_with_hash_value
+1.30%  postgres  postgres   [.] PinBuffer.isra.3
+1.19%  postgres  postgres   [.] heap_page_prune_op

LLC-load-misses:
+   24.25%  postgres  postgres   [.] slot_deform_tuple
+   17.45%  postgres  postgres   [.] CheckForSerializableConflictOut
+   10.52%  postgres  postgres   [.] heapgetpage
+9.55%  postgres  postgres   [.] HeapTupleSatisfiesMVCC
+7.52%  postgres  postgres   [.] ExecMakeFunctionResultNoSets


For this workload, we expect a lot of LLC-load-misses as the workload is
lot bigger than memory, and it makes sense that they're in
slot_deform_tuple(),heapgetpage(), HeapTupleSatisfiesMVCC() (but uh
CheckForSerializableConflictOut?).  One avenue to optimize is to make
those accesses easier to predict/prefetch, which they're atm likely not.

But leaving that aside, we can see that a lot of the cost is distributed
over ExecQual, ExecEvalOr, ExecMakeFunctionResultNoSets - all of which
judiciously use linked list.  I suspect that by simplifying these
functions / datastructures *AND* by calling them over a batch of tuples,
instead of one-by-one we'd limit the time spent in them considerably.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock downgrades have broken pg_upgrade

2016-05-11 Thread Alvaro Herrera
Peter Eisentraut wrote:
> On 5/3/16 1:25 PM, Alvaro Herrera wrote:
> >If we can put together a script that runs test.sh for various versions
> >and then verifies the runs, we could use it in both buildfarm and
> >coverage.
> 
> Not that that would be useless, but note that the value in this case (and
> most others) comes from having a candidate object in the database before
> upgrade that exercises the particular problem, mostly independent of what
> version you upgrade from and to.  So far the way to do that is to leave
> "junk" in the regression test database, but that's clearly a bit silly.

True.  We have quite a few places in the standard regression tests that
leave junk behind purposefully for this reason.

> I think the way forward is to create a TAP test suite for pg_upgrade that
> specifically exercises a lot of scenarios with small purpose-built test
> databases.

That works for me.

> Then, the problem of having to compare dump output across versions also goes
> away more easily.

Not sure why, but if you think it does, then it sounds good.  Andrew's
current approach of counting lines in the diff seems brittle and not
entirely trustworthy.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Andres Freund
On 2016-05-11 10:12:26 -0400, Robert Haas wrote:
> > I've to admit I'm not that convinced about the speedups in the !fdw
> > case. There seems to be a lot easier avenues for performance
> > improvements.
> 
> What I'm talking about is a query like this:
> 
> SELECT * FROM inheritance_tree_of_foreign_tables WHERE very_rarely_true;

Note that I said "!fdw case".


> > FWIW, I've even hacked something up for a bunch of simple queries, and
> > the performance improvements were significant.  Besides it only being a
> > weekend hack project, the big thing I got stuck on was considering how
> > to exactly determine when to batch and not to batch.
> 
> Yeah.  I think we need a system for signalling nodes as to when they
> will be run to completion.  But a Boolean is somehow unsatisfying;
> LIMIT 100 is more like no LIMIT than it it is like LIMIT 1.  I'm
> tempted to add a numTuples field to every ExecutorState and give upper
> nodes some way to set it, as a hint.

I was wondering whether we should hand down TupleVectorStates to lower
nodes, and their size determines the max batch size...

> >> Some care is required here because any
> >> functions we execute as scan keys are run with the buffer locked, so
> >> we had better not run anything very complicated.  But doing this for
> >> simple things like integer equality operators seems like it could save
> >> quite a few buffer lock/unlock cycles and some other executor overhead
> >> as well.
> >
> > Hm. Do we really have to keep the page locked in the page-at-a-time
> > mode? Shouldn't the pin suffice?
> 
> I think we need a lock to examine MVCC visibility information.  A pin
> is enough to prevent a tuple from being removed, but not from having
> its xmax and cmax overwritten at almost but not quite exactly the same
> time.

We already batch visibility lookups in page-at-a-time
mode. Cf. heapgetpage() / scan->rs_vistuples. So we can evaluate quals
after releasing the lock, but before the pin is released, without that
much effort.  IIRC that isn't used for index lookups, but that's
probably a good idea.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Kohei KaiGai
2016-05-11 23:20 GMT+09:00 Bruce Momjian :
> I am giving a keynote at an IEEE database conference in Helsinki next
> week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
> because I accepted the Helsinki conference invitation before the PGCon
> Ottawa date was changed from June to May).
>
> As part of the keynote, I would like to mention areas where academia can
> help us.  The topics I can think of are:
>
> Query optimization
> Optimizer statistics
> Indexing structures
> Reducing function call overhead
> CPU locality
> Sorting
> Parallelism
> Sharding
>
> Any others?
>
How about NVRAM utilization?

-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

2016-05-11 Thread Andres Freund


On May 11, 2016 7:06:05 AM PDT, Teodor Sigaev  wrote:
>>>  Allow Pin/UnpinBuffer to operate in a lockfree manner.
>>> I get the errors:
>>>
>>> ERROR:  attempted to delete invisible tuple
>>> ERROR:  unexpected chunk number 1 (expected 2) for toast value
>
>Just reminder, you investigate "unexpected chunk number" problem, but,
>seems, we 
>have another bug (first ERROR: attempted to delete invisible tuple).
>IMHO, it's 
>a separate bug, not related to oid. Unfortunately, I've never seen such
>error on 
>my notebook.

Same issue. If the dead tuple is noticed by heap_delete (when it should have 
deleted the live version elsewhere) you get the invisible role error.  The 
unexpected chunk bit only triggers with hint bit sets differently across chunks.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Kevin Grittner
On Wed, May 11, 2016 at 9:32 AM, Bruce Momjian  wrote:
> On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote:

>> Incremental materialized views?
>
> I don't know.  Is that something academics would research?

One paper I have found particularly good is this:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.3208

Tantalizingly, there is mention that there is a longer version of
the paper, but I have been unable to find it.  There is enough in
this paper, I think, to fill in the blanks and do a much better job
with implementation than any ad hoc approach is likely to manage,
but if there is anything that extends this work (or subsequent work
which seems to be an improvement) that would be great.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Heikki Linnakangas

On 11/05/16 17:32, Bruce Momjian wrote:

On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote:

On 11.05.2016 17:20, Bruce Momjian wrote:

I am giving a keynote at an IEEE database conference in Helsinki next
week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
because I accepted the Helsinki conference invitation before the PGCon
Ottawa date was changed from June to May).

As part of the keynote, I would like to mention areas where academia can
help us.  The topics I can think of are:

Query optimization
Optimizer statistics
Indexing structures
Reducing function call overhead
CPU locality
Sorting
Parallelism
Sharding

Any others?


Incremental materialized views?


I don't know.  Is that something academics would research?


Absolutely! There are plenty of papers on how to keep materialized views 
up-to-date.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Bruce Momjian
On Wed, May 11, 2016 at 05:41:21PM +0300, Heikki Linnakangas wrote:
> On 11/05/16 17:32, Bruce Momjian wrote:
> >On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote:
> >>On 11.05.2016 17:20, Bruce Momjian wrote:
> >>>I am giving a keynote at an IEEE database conference in Helsinki next
> >>>week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
> >>>because I accepted the Helsinki conference invitation before the PGCon
> >>>Ottawa date was changed from June to May).
> >>>
> >>>As part of the keynote, I would like to mention areas where academia can
> >>>help us.  The topics I can think of are:
> >>>
> >>>   Query optimization
> >>>   Optimizer statistics
> >>>   Indexing structures
> >>>   Reducing function call overhead
> >>>   CPU locality
> >>>   Sorting
> >>>   Parallelism
> >>>   Sharding
> >>>
> >>>Any others?
> >>>
> >>Incremental materialized views?
> >
> >I don't know.  Is that something academics would research?
> 
> Absolutely! There are plenty of papers on how to keep materialized views
> up-to-date.

Oh, OK. I will add it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Konstantin Knizhnik



On 11.05.2016 17:32, Bruce Momjian wrote:

On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote:


On 11.05.2016 17:20, Bruce Momjian wrote:

I am giving a keynote at an IEEE database conference in Helsinki next
week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
because I accepted the Helsinki conference invitation before the PGCon
Ottawa date was changed from June to May).

As part of the keynote, I would like to mention areas where academia can
help us.  The topics I can think of are:

Query optimization
Optimizer statistics
Indexing structures
Reducing function call overhead
CPU locality
Sorting
Parallelism
Sharding

Any others?


Incremental materialized views?

I don't know.  Is that something academics would research?


I am not sure.
There is definitely a question which views can be incrementally 
recalculated and which inductive extension has to be constructed to make 
it possible. If you google for "incremental materialized views phd", you 
will get a larger number of references to articles.

But I do not know if all question in this area are already closed or not...


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Oleg Bartunov
On Wed, May 11, 2016 at 5:20 PM, Bruce Momjian  wrote:
> I am giving a keynote at an IEEE database conference in Helsinki next
> week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
> because I accepted the Helsinki conference invitation before the PGCon
> Ottawa date was changed from June to May).
>
> As part of the keynote, I would like to mention areas where academia can
> help us.  The topics I can think of are:
>
> Query optimization
> Optimizer statistics
> Indexing structures
> Reducing function call overhead
> CPU locality
> Sorting
> Parallelism
> Sharding
>
> Any others?

machine learning  for adaptive planning
distributed stuff

>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Ancient Roman grave inscription +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Tue, May 10, 2016 at 8:50 PM, Andres Freund  wrote:
> That seems to suggest that we need to restructure how we get to calling
> fmgr functions, before worrying about the actual fmgr call.

Any ideas on how to do that?  ExecMakeFunctionResultNoSets() isn't
really doing a heck of a lot.  Changing FuncExprState to use an array
rather than a linked list to store its arguments might help some.   We
could also consider having an optimized path that skips the fn_strict
stuff if we can somehow deduce that no NULLs can occur in this
context, but that's a lot of work and new infrastructure.  I feel like
maybe there's something higher-level we could do that would help more,
but I don't know what it is.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Bruce Momjian
On Wed, May 11, 2016 at 05:31:10PM +0300, Konstantin Knizhnik wrote:
> 
> 
> On 11.05.2016 17:20, Bruce Momjian wrote:
> >I am giving a keynote at an IEEE database conference in Helsinki next
> >week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
> >because I accepted the Helsinki conference invitation before the PGCon
> >Ottawa date was changed from June to May).
> >
> >As part of the keynote, I would like to mention areas where academia can
> >help us.  The topics I can think of are:
> >
> > Query optimization
> > Optimizer statistics
> > Indexing structures
> > Reducing function call overhead
> > CPU locality
> > Sorting
> > Parallelism
> > Sharding
> >
> >Any others?
> >
> Incremental materialized views?

I don't know.  Is that something academics would research?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Bruce Momjian
On Wed, May 11, 2016 at 05:24:44PM +0300, Oleg Bartunov wrote:
> On Wed, May 11, 2016 at 5:20 PM, Bruce Momjian  wrote:
> > As part of the keynote, I would like to mention areas where academia can
> > help us.  The topics I can think of are:
> >
> > Query optimization
> > Optimizer statistics
> > Indexing structures
> > Reducing function call overhead
> > CPU locality
> > Sorting
> > Parallelism
> > Sharding
> >
> > Any others?
> 
> machine learning  for adaptive planning

Do these fall in the "Query optimization" item?  Does that need
different text?

> distributed stuff

Oh, yes, distributed transactions.  Good.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Academic help for Postgres

2016-05-11 Thread Konstantin Knizhnik



On 11.05.2016 17:20, Bruce Momjian wrote:

I am giving a keynote at an IEEE database conference in Helsinki next
week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
because I accepted the Helsinki conference invitation before the PGCon
Ottawa date was changed from June to May).

As part of the keynote, I would like to mention areas where academia can
help us.  The topics I can think of are:

Query optimization
Optimizer statistics
Indexing structures
Reducing function call overhead
CPU locality
Sorting
Parallelism
Sharding

Any others?


Incremental materialized views?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock downgrades have broken pg_upgrade

2016-05-11 Thread Bruce Momjian
On Wed, May 11, 2016 at 09:40:09AM -0400, Peter Eisentraut wrote:
> Not that that would be useless, but note that the value in this case (and
> most others) comes from having a candidate object in the database before
> upgrade that exercises the particular problem, mostly independent of what
> version you upgrade from and to.  So far the way to do that is to leave
> "junk" in the regression test database, but that's clearly a bit silly.
> 
> I think the way forward is to create a TAP test suite for pg_upgrade that
> specifically exercises a lot of scenarios with small purpose-built test
> databases.
> 
> Then, the problem of having to compare dump output across versions also goes
> away more easily.

I do have some small tests like for tablespaces.  I am attaching the SQL
script, if that is helpful.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


pg_upgrade_test.sql
Description: application/sql

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Konstantin Knizhnik



On 10.05.2016 20:26, Robert Haas wrote:

At this moment (February) them have implemented translation of only few
PostgreSQL operators used by ExecQuals  and do not support aggregates.
Them get about 2 times increase of speed at synthetic queries and 25%
increase at TPC-H Q1 (for Q1 most critical is generation of native code for
aggregates, because ExecQual itself takes only 6% of time for this query).
Actually these 25% for Q1 were achieved not by using dynamic code
generation, but switching from PULL to PUSH model in executor.
It seems to be yet another interesting PostgreSQL executor transformation.
As far as I know, them are going to publish result of their work to open
source...
Interesting.  You may notice that in "asynchronous mode" my prototype
works using a push model of sorts.  Maybe that should be taken
further.

Latest information from ISP RAS guys: them have made good progress since 
February: them have rewritten most of methods of Scan, Aggregate and 
Join to LLVM API. Also then implemented automatic translation of  
PostgreSQL backend functions to LLVM API.

As a result time of TPC-H Q1 query is reduced four times.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Wed, May 11, 2016 at 10:17 AM, Konstantin Knizhnik
 wrote:
> Yes, I agree with you that complete rewriting of optimizer is huge project
> with unpredictable influence on performance of some queries.
> Changing things incrementally is good approach, but only if we are moving in
> right direction.
> I still not sure that introduction of async. operations is step in right
> direction. Async.ops are used to significantly complicate code (since you
> have to maintain state yourself). It will be bad if implementation of each
> node has to deal with async state itself in its own manner.

I don't really think so.  The design I've proposed makes adding
asynchronous capability to a node pretty easy, with only minor
changes.

> My suggestion is to try to provide some generic mechanism for managing state
> transition and have some scheduler which controls this process. It should
> not be responsibility of node implementation to organize
> asynchronous/parallel execution. Instead of this it should just produce set
> of jobs which execution should  be controlled by scheduler. First
> implementation of scheduler can be quite simple. But later in can become
> more clever: try to bind data to processors and do many other optimizations.

Whereas this would require a massive rewrite.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Tue, May 10, 2016 at 8:23 PM, Andres Freund  wrote:
>> c. Modify some nodes (perhaps start with nodeAgg.c) to allow them to
>> process a batch TupleTableSlot. This will require some tight loop to
>> aggregate the entire TupleTableSlot at once before returning.
>> d. Add function in execAmi.c which returns true or false depending on
>> if the node supports batch TupleTableSlots or not.
>> e. At executor startup determine if the entire plan tree supports
>> batch TupleTableSlots, if so enable batch scan mode.
>
> It doesn't really need to be the entire tree. Even if you have a subtree
> (say a parametrized index nested loop join) which doesn't support batch
> mode, you'll likely still see performance benefits by building a batch
> one layer above the non-batch-supporting node.

+1.

I've also wondered about building a new executor node that is sort of
a combination of Nested Loop and Hash Join, but capable of performing
multiple joins in a single operation. (Merge Join is different,
because it's actually matching up the two sides, not just doing
probing once per outer tuple.) So the plan tree would look something
like this:

Multiway Join
-> Seq Scan on driving_table
-> Index Scan on something
-> Index Scan on something_else
-> Hash
  -> Seq Scan on other_thing
-> Hash
  -> Seq Scan on other_thing_2
-> Index Scan on another_one

With the current structure, every level of the plan tree has its own
TupleTableSlot and we have to project into each new slot.  Every level
has to go through ExecProcNode.  So it seems to me that this sort of
structure might save quite a few cycles on deep join nests.  I haven't
tried it, though.

With batching, things get even better for this sort of thing.
Assuming the joins are all basically semi-joins, either because they
were written that way or because they are probing unique indexes or
whatever, you can fetch a batch of tuples from the driving table, do
the first join for each tuple to create a matching batch of tuples,
and repeat for each join step.  Then at the end you project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Academic help for Postgres

2016-05-11 Thread Bruce Momjian
I am giving a keynote at an IEEE database conference in Helsinki next
week (http://icde2016.fi/).  (Yes, I am not attending PGCon Ottawa
because I accepted the Helsinki conference invitation before the PGCon
Ottawa date was changed from June to May).

As part of the keynote, I would like to mention areas where academia can
help us.  The topics I can think of are:

Query optimization
Optimizer statistics
Indexing structures
Reducing function call overhead
CPU locality
Sorting
Parallelism
Sharding

Any others?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Ancient Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Konstantin Knizhnik



On 11.05.2016 17:00, Robert Haas wrote:

On Tue, May 10, 2016 at 3:42 PM, Konstantin Knizhnik
 wrote:

Doesn't this actually mean that we need to have normal job scheduler which
is given queue of jobs and having some pool of threads will be able to
orginize efficient execution of queries? Optimizer can build pipeline
(graph) of tasks, which corresponds to execution plan nodes, i.e. SeqScan,
Sort, ... Each task is splitted into several jobs which can be concurretly
scheduled by task dispatcher.  So you will not have blocked worker waiting
for something and all system resources will be utilized. Such approach with
dispatcher allows to implement quotas, priorities,... Also dispatches can
care about NUMA and cache optimizations which is especially critical on
modern architectures. One more reference:
http://db.in.tum.de/~leis/papers/morsels.pdf

I read this as a proposal to redesign the entire optimizer and
executor to use some new kind of plan.  That's not a project I'm
willing to entertain; it is hard to imagine we could do it in a
reasonable period of time without introducing bugs and performance
regressions.  I think there is a great deal of performance benefit
that we can get by changing things incrementally.

Yes, I agree with you that complete rewriting of optimizer is huge 
project with unpredictable influence on performance of some queries.
Changing things incrementally is good approach, but only if we are 
moving in right direction.
I still not sure that introduction of async. operations is step in right 
direction. Async.ops are used to significantly complicate code (since 
you have to maintain state yourself). It will be bad if implementation 
of each node has to deal with async state itself in its own manner.


My suggestion is to try to provide some generic mechanism for managing 
state transition and have some scheduler which controls this process. It 
should not be responsibility of node implementation to organize 
asynchronous/parallel execution. Instead of this it should just produce 
set of jobs which execution should  be controlled by scheduler. First 
implementation of scheduler can be quite simple. But later in can become 
more clever: try to bind data to processors and do many other 
optimizations.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

2016-05-11 Thread Teodor Sigaev

 Allow Pin/UnpinBuffer to operate in a lockfree manner.
I get the errors:

ERROR:  attempted to delete invisible tuple
ERROR:  unexpected chunk number 1 (expected 2) for toast value


Just reminder, you investigate "unexpected chunk number" problem, but, seems, we 
have another bug (first ERROR: attempted to delete invisible tuple). IMHO, it's 
a separate bug, not related to oid. Unfortunately, I've never seen such error on 
my notebook.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Tue, May 10, 2016 at 7:57 PM, Andres Freund  wrote:
>> 1. asynchronous execution, by which I mean the ability of a node to
>> somehow say that it will generate a tuple eventually, but is not yet
>> ready, so that the executor can go run some other part of the plan
>> tree while it waits.  [...].  It is also a problem
>> for parallel query: in a parallel sequential scan, the next worker can
>> begin reading the next block even if the current block hasn't yet been
>> received from the OS.  Whether or not this will be efficient is a
>> research question, but it can be done.  However, imagine a parallel
>> scan of a btree index: we don't know what page to scan next until we
>> read the previous page and examine the next-pointer.  In the meantime,
>> any worker that arrives at that scan node has no choice but to block.
>> It would be better if the scan node could instead say "hey, thanks for
>> coming but I'm really not ready to be on-CPU just at the moment" and
>> potentially allow the worker to go work in some other part of the
>> query tree.  For that worker to actually find useful work to do
>> elsewhere, we'll probably need it to be the case either that the table
>> is partitioned or the original query will need to involve UNION ALL,
>> but those are not silly cases to worry about, particularly if we get
>> native partitioning in 9.7.
>
> I've to admit I'm not that convinced about the speedups in the !fdw
> case. There seems to be a lot easier avenues for performance
> improvements.

What I'm talking about is a query like this:

SELECT * FROM inheritance_tree_of_foreign_tables WHERE very_rarely_true;

What we do today is run the remote query on the first child table to
completion, then start it on the second child table, and so on.
Sending all the queries at once can bring a speed-up of a factor of N
to a query with N children, and it's completely independent of every
other speed-up that we might attempt.  This has been under discussion
for years on FDW-related threads as a huge problem that we need to fix
someday, and I really don't see how it's sane not to try.  The shape
of what that looks like is of course arguable, but saying the
optimization isn't valuable blows my mind.

Whether you care about this case or not, this is also important for
parallel query.

> FWIW, I've even hacked something up for a bunch of simple queries, and
> the performance improvements were significant.  Besides it only being a
> weekend hack project, the big thing I got stuck on was considering how
> to exactly determine when to batch and not to batch.

Yeah.  I think we need a system for signalling nodes as to when they
will be run to completion.  But a Boolean is somehow unsatisfying;
LIMIT 100 is more like no LIMIT than it it is like LIMIT 1.  I'm
tempted to add a numTuples field to every ExecutorState and give upper
nodes some way to set it, as a hint.

>> For asynchronous execution, I have gone so far as to mock up a bit of
>> what this might look like.  This shouldn't be taken very seriously at
>> this point, but I'm attaching a few very-much-WIP patches to show the
>> direction of my line of thinking.  Basically, I propose to have
>> ExecBlah (that is, ExecBitmapHeapScan, ExecAppend, etc.) return tuples
>> by putting them into a new PlanState member called "result", which is
>> just a Node * so that we can support multiple types of results,
>> instead of returning them.
>
> What different types of results are you envisioning?

TupleTableSlots and TupleTableVectors, mostly.  I think the stuff that
is currently going through MultiExecProcNode() could probably be
folded in as just another type of result.

>> Some care is required here because any
>> functions we execute as scan keys are run with the buffer locked, so
>> we had better not run anything very complicated.  But doing this for
>> simple things like integer equality operators seems like it could save
>> quite a few buffer lock/unlock cycles and some other executor overhead
>> as well.
>
> Hm. Do we really have to keep the page locked in the page-at-a-time
> mode? Shouldn't the pin suffice?

I think we need a lock to examine MVCC visibility information.  A pin
is enough to prevent a tuple from being removed, but not from having
its xmax and cmax overwritten at almost but not quite exactly the same
time.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Tue, May 10, 2016 at 3:42 PM, Konstantin Knizhnik
 wrote:
> Doesn't this actually mean that we need to have normal job scheduler which
> is given queue of jobs and having some pool of threads will be able to
> orginize efficient execution of queries? Optimizer can build pipeline
> (graph) of tasks, which corresponds to execution plan nodes, i.e. SeqScan,
> Sort, ... Each task is splitted into several jobs which can be concurretly
> scheduled by task dispatcher.  So you will not have blocked worker waiting
> for something and all system resources will be utilized. Such approach with
> dispatcher allows to implement quotas, priorities,... Also dispatches can
> care about NUMA and cache optimizations which is especially critical on
> modern architectures. One more reference:
> http://db.in.tum.de/~leis/papers/morsels.pdf

I read this as a proposal to redesign the entire optimizer and
executor to use some new kind of plan.  That's not a project I'm
willing to entertain; it is hard to imagine we could do it in a
reasonable period of time without introducing bugs and performance
regressions.  I think there is a great deal of performance benefit
that we can get by changing things incrementally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Robert Haas
On Tue, May 10, 2016 at 4:57 PM, Jim Nasby  wrote:
> Even so, I would think that the simplification in the executor would be
> worth it. If you need to add a new node there's dozens of places where you
> might have to mess with these giant case statements.

Dozens? I think the number is in the single digits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add jsonb_compact(...) for whitespace-free jsonb to text

2016-05-11 Thread Jim Nasby

On 4/29/16 8:56 AM, Shulgin, Oleksandr wrote:

It would probably make sense model this function after Python's
"dump-to-JSON-string"
function: https://docs.python.org/2/library/json.html#json.dumps  With
the optional parameters for sorting the keys, indentation size and
punctuation.  This way all the prettiness enhancements could be
contained in a single function w/o the need for generalized interface
used in many places.


+1. I've found the output functions of json.dumps to be very handy.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] ALTER TABLE lock downgrades have broken pg_upgrade

2016-05-11 Thread Peter Eisentraut

On 5/3/16 1:25 PM, Alvaro Herrera wrote:

If we can put together a script that runs test.sh for various versions
and then verifies the runs, we could use it in both buildfarm and
coverage.


Not that that would be useless, but note that the value in this case 
(and most others) comes from having a candidate object in the database 
before upgrade that exercises the particular problem, mostly independent 
of what version you upgrade from and to.  So far the way to do that is 
to leave "junk" in the regression test database, but that's clearly a 
bit silly.


I think the way forward is to create a TAP test suite for pg_upgrade 
that specifically exercises a lot of scenarios with small purpose-built 
test databases.


Then, the problem of having to compare dump output across versions also 
goes away more easily.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Minor documentation patch

2016-05-11 Thread Martín Marqués
Hi,

Yesterday I was going over some consultancy and went to check some
syntax for CREATE FUNCTION, particularly related to SECURITY DEFINER part.

Reading there I saw a paragraph which had a sentence that wasn't very
clear at first.

The patch's description gives a better idea on the change, and how I got
there, and I believe it gives better meaning to the sentence in question.

I applied the same change on another part which had the same phrase.

Cheers,

-- 
Martín Marquéshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services
>From fbf6b9f6df20d38b5f16c6af94424042b41d7fad Mon Sep 17 00:00:00 2001
From: Martin 
Date: Tue, 10 May 2016 21:31:24 -0300
Subject: [PATCH] While reading the CREATE FUNCTION reference docs for some
 reference on SECURITY DEFINER usage I ran on this phrase:

Particularly important in this regard is the temporary-table schema,
which is searched first by default, and is normally writeable by anyone.
A secure arrangement can be had by forcing the temporary schema to be
searched last.

The last sentence there was not clear at first. I feel that the word
*obtained* instead of *had* gives a more clear understanding.

I found a similar phase in the PL/PgSQL documentation as well, and
so applied the same fix.
---
 doc/src/sgml/plpgsql.sgml | 2 +-
 doc/src/sgml/ref/create_function.sgml | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/plpgsql.sgml b/doc/src/sgml/plpgsql.sgml
index a27bbc5..4ecd9e3 100644
--- a/doc/src/sgml/plpgsql.sgml
+++ b/doc/src/sgml/plpgsql.sgml
@@ -528,7 +528,7 @@ $$ LANGUAGE plpgsql;
  
 
  
-  The same effect can be had by declaring one or more output parameters as
+  The same effect can be obtained by declaring one or more output parameters as
   polymorphic types.  In this case the
   special $0 parameter is not used; the output
   parameters themselves serve the same purpose.  For example:
diff --git a/doc/src/sgml/ref/create_function.sgml b/doc/src/sgml/ref/create_function.sgml
index bd11d2b..583cdf5 100644
--- a/doc/src/sgml/ref/create_function.sgml
+++ b/doc/src/sgml/ref/create_function.sgml
@@ -715,7 +715,7 @@ SELECT * FROM dup(42);
 malicious users from creating objects that mask objects used by the
 function.  Particularly important in this regard is the
 temporary-table schema, which is searched first by default, and
-is normally writable by anyone.  A secure arrangement can be had
+is normally writable by anyone.  A secure arrangement can be obtained
 by forcing the temporary schema to be searched last.  To do this,
 write pg_temppg_tempsecuring functions as the last entry in search_path.
 This function illustrates safe usage:
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-11 Thread Ashutosh Bapat
On Wed, May 11, 2016 at 1:34 PM, Etsuro Fujita 
wrote:

> On 2016/05/11 16:49, Ashutosh Bapat wrote:
>
>> The patch is calculating user mapping when it's readily available
>> through RelOptInfo::fdw_private. That incurs a catalog lookup
>> unnecessarily. Instead, can we add new function makeOid, oidVal on the
>> lines of makeInteger and intVal to store and retrieve an OID resp. and
>> also corresponding print function? It might be helpful in future.
>>
>
> That might be an idea, but is the overhead in that re-calculation so large?
>
>
A call to GetForeignTable would incur a catalog lookup which means a
catalog table/index scan if corresponding entry is not in the cache. This
is followed by GetUserMapping() which is another catalog access. That's
bound to be expensive than an makeOid(), oidVal() call.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-11 Thread Etsuro Fujita

On 2016/05/11 16:49, Ashutosh Bapat wrote:

The patch is calculating user mapping when it's readily available
through RelOptInfo::fdw_private. That incurs a catalog lookup
unnecessarily. Instead, can we add new function makeOid, oidVal on the
lines of makeInteger and intVal to store and retrieve an OID resp. and
also corresponding print function? It might be helpful in future.


That might be an idea, but is the overhead in that re-calculation so large?

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-11 Thread Ashutosh Bapat
On Wed, May 11, 2016 at 1:10 PM, Etsuro Fujita 
wrote:

> On 2016/05/10 16:56, Etsuro Fujita wrote:
>
>> Here is a patch to fix this.
>>
>
> I found that the previous patch handles the ForeignScan's fs_relids
> Bitmapset destructively.  Also, I noticed that I removed some existing
> comments inadvertently.  So, I'm attaching the updated patch to fix those
> things.  I'll add this to the next CF.  I think this should be addressed in
> advance of the release of 9.6, though.
>
>
The patch is calculating user mapping when it's readily available through
RelOptInfo::fdw_private. That incurs a catalog lookup unnecessarily.
Instead, can we add new function makeOid, oidVal on the lines of
makeInteger and intVal to store and retrieve an OID resp. and also
corresponding print function? It might be helpful in future.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


Re: [HACKERS] Odd oid-system-column handling in postgres_fdw

2016-05-11 Thread Etsuro Fujita

On 2016/04/05 17:15, Etsuro Fujita wrote:

On 2016/03/16 16:25, Etsuro Fujita wrote:

PG9.5 allows us to add an oid system column to foreign tables, using
ALTER FOREIGN TABLE SET WITH OIDS, but currently, that column reads as
zeroes in postgres_fdw.  That seems to me like a bug.  So, I'd like to
propose to fix that, by retrieving that column from the remote server
when requested.  I'm attaching a proposed patch for that.



I rebased the patch against HEAD.  Updated patch attached.


I'll add this to the next CF.

Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Use %u to print user mapping's umid and userid

2016-05-11 Thread Etsuro Fujita

On 2016/05/10 16:56, Etsuro Fujita wrote:

Here is a patch to fix this.


I found that the previous patch handles the ForeignScan's fs_relids 
Bitmapset destructively.  Also, I noticed that I removed some existing 
comments inadvertently.  So, I'm attaching the updated patch to fix 
those things.  I'll add this to the next CF.  I think this should be 
addressed in advance of the release of 9.6, though.


Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 67,74  enum FdwScanPrivateIndex
  	FdwScanPrivateRetrievedAttrs,
  	/* Integer representing the desired fetch_size */
  	FdwScanPrivateFetchSize,
- 	/* Oid of user mapping to be used while connecting to the foreign server */
- 	FdwScanPrivateUserMappingOid,
  
  	/*
  	 * String describing join i.e. names of relations being joined and types
--- 67,72 
***
*** 1198,1208  postgresGetForeignPlan(PlannerInfo *root,
  	 * Build the fdw_private list that will be available to the executor.
  	 * Items in the list must match order in enum FdwScanPrivateIndex.
  	 */
! 	fdw_private = list_make5(makeString(sql.data),
  			 remote_conds,
  			 retrieved_attrs,
! 			 makeInteger(fpinfo->fetch_size),
! 			 makeInteger(foreignrel->umid));
  	if (foreignrel->reloptkind == RELOPT_JOINREL)
  		fdw_private = lappend(fdw_private,
  			  makeString(fpinfo->relation_name->data));
--- 1196,1205 
  	 * Build the fdw_private list that will be available to the executor.
  	 * Items in the list must match order in enum FdwScanPrivateIndex.
  	 */
! 	fdw_private = list_make4(makeString(sql.data),
  			 remote_conds,
  			 retrieved_attrs,
! 			 makeInteger(fpinfo->fetch_size));
  	if (foreignrel->reloptkind == RELOPT_JOINREL)
  		fdw_private = lappend(fdw_private,
  			  makeString(fpinfo->relation_name->data));
***
*** 1234,1240  postgresBeginForeignScan(ForeignScanState *node, int eflags)
--- 1231,1241 
  	ForeignScan *fsplan = (ForeignScan *) node->ss.ps.plan;
  	EState	   *estate = node->ss.ps.state;
  	PgFdwScanState *fsstate;
+ 	RangeTblEntry *rte;
+ 	Oid			userid;
+ 	ForeignTable *table;
  	UserMapping *user;
+ 	int			rtindex;
  	int			numParams;
  
  	/*
***
*** 1256,1285  postgresBeginForeignScan(ForeignScanState *node, int eflags)
  	 * planning to ensure that the join is safe to pushdown. In case the
  	 * information goes stale between planning and execution, plan will be
  	 * invalidated and replanned.
  	 */
  	if (fsplan->scan.scanrelid > 0)
  	{
- 		ForeignTable *table;
- 
  		/*
! 		 * Identify which user to do the remote access as.  This should match
! 		 * what ExecCheckRTEPerms() does.
  		 */
! 		RangeTblEntry *rte = rt_fetch(fsplan->scan.scanrelid, estate->es_range_table);
! 		Oid			userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
! 
! 		fsstate->rel = node->ss.ss_currentRelation;
! 		table = GetForeignTable(RelationGetRelid(fsstate->rel));
! 
! 		user = GetUserMapping(userid, table->serverid);
  	}
! 	else
! 	{
! 		Oid			umid = intVal(list_nth(fsplan->fdw_private, FdwScanPrivateUserMappingOid));
  
! 		user = GetUserMappingById(umid);
! 		Assert(fsplan->fs_server == user->serverid);
! 	}
  
  	/*
  	 * Get connection to the foreign server.  Connection manager will
--- 1257,1283 
  	 * planning to ensure that the join is safe to pushdown. In case the
  	 * information goes stale between planning and execution, plan will be
  	 * invalidated and replanned.
+ 	 *
+ 	 * This should match what ExecCheckRTEPerms() does.
  	 */
  	if (fsplan->scan.scanrelid > 0)
+ 		rtindex = fsplan->scan.scanrelid;
+ 	else
  	{
  		/*
! 		 * It is ensured that foreign tables appearing in a foreign join
! 		 * belong to the same server and use the same user mapping, so pick
! 		 * the lowest-numbered one as a representative.
  		 */
! 		rtindex = -1;
! 		rtindex = bms_next_member(fsplan->fs_relids, rtindex);
! 		Assert(rtindex > 0);
  	}
! 	rte = rt_fetch(rtindex, estate->es_range_table);
! 	userid = rte->checkAsUser ? rte->checkAsUser : GetUserId();
  
! 	table = GetForeignTable(rte->relid);
! 	user = GetUserMapping(userid, table->serverid);
  
  	/*
  	 * Get connection to the foreign server.  Connection manager will
***
*** 1316,1324  postgresBeginForeignScan(ForeignScanState *node, int eflags)
--- 1314,1328 
  	 * into local representation and error reporting during that process.
  	 */
  	if (fsplan->scan.scanrelid > 0)
+ 	{
+ 		fsstate->rel = node->ss.ss_currentRelation;
  		fsstate->tupdesc = RelationGetDescr(fsstate->rel);
+ 	}
  	else
+ 	{
+ 		fsstate->rel = NULL;
  		fsstate->tupdesc = node->ss.ss_ScanTupleSlot->tts_tupleDescriptor;
+ 	}
  
  	fsstate->attinmeta = TupleDescGetAttInMetadata(fsstate->tupdesc);
  
*** a/src/include/nodes/pg_list.h
--- b/src/include/nodes/pg_list.h
***
*** 134,152  list_length(const List *l)
  #define list_make2(x1

Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW

2016-05-11 Thread Etsuro Fujita

On 2016/04/28 13:45, Michael Paquier wrote:

On Wed, Apr 27, 2016 at 12:16 PM, Etsuro Fujita
 wrote:

On 2016/04/26 21:45, Etsuro Fujita wrote:

While re-reviewing the fix, I noticed that since PQcancel we added to
pgfdw_xact_callback to cancel a DML pushdown query isn't followed by a
ROLLBACK, the connection to the remote server will be discarded at the
end of the while loop in that function, which will cause a FATAL error
of "connection to client lost".  Probably, that was proposed by me in
the first version of the patch, but I don't think that's a good idea.
Shouldn't we execute ROLLBACK after that PQcancel?

Another thing I noticed is, ISTM that we miss the case where DML
pushdown queries are performed in subtransactions.  I think cancellation
logic would also need to be added to pgfdw_subxact_callback.



Attached is a patch for that.



I have spent some time looking at that...

And yeah, losing the connection because of that is a little bit
annoying if there are ways to make things clean, and as a START
TRANSACTION is always sent for such queries it seems really better to
issue a ROLLBACK in any case. Actually, by using PQcancel there is no
way to be sure if the cancel will be effective or not. So it could be
possible that the command is still able to complete correctly, or it
could be able to cancel correctly and it would return an ERROR
earlier. In any case, doing the ROLLBACK unconditionally seems adapted
to me because we had better clean up the remote state in both cases.


Thanks for the review!

I'll add this to the next CF.  I think this should be addressed in 
advance of the release of 9.6, though.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] asynchronous and vectorized execution

2016-05-11 Thread Ants Aasma
On Wed, May 11, 2016 at 3:52 AM, Andres Freund  wrote:
> On 2016-05-11 03:20:12 +0300, Ants Aasma wrote:
>> On Tue, May 10, 2016 at 7:56 PM, Robert Haas  wrote:
>> > On Mon, May 9, 2016 at 8:34 PM, David Rowley
>> >  wrote:
>> > I don't have any at the moment, but I'm not keen on hundreds of new
>> > vector functions that can all have bugs or behavior differences versus
>> > the unvectorized versions of the same code.  That's a substantial tax
>> > on future development.  I think it's important to understand what
>> > sorts of queries we are targeting here.  KaiGai's GPU-acceleration
>> > stuff does great on queries with complex WHERE clauses, but most
>> > people don't care not only because it's out-of-core but because who
>> > actually looks for the records where (a + b) % c > (d + e) * f / g?
>> > This seems like it has the same issue.  If we can speed up common
>> > queries people are actually likely to run, OK, that's interesting.
>>
>> I have seen pretty complex expressions in the projection and
>> aggregation. Couple dozen SUM(CASE WHEN a THEN b*c ELSE MIN(d,e)*f
>> END) type of expressions. In critical places had to replace them with
>> a C coded function that processed a row at a time to avoid the
>> executor dispatch overhead.
>
> I've seen that as well, but Was it the actual fmgr indirection causing
> the overhead, or was it ExecQual/ExecMakeFunctionResultNoSets et al?

I don't remember what the exact profile looked like, but IIRC it was
mostly Exec* stuff with advance_aggregates also up there.

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers