date:20160313

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-13 Thread Noah Misch

[Aside: your new mail editor is rewrapping lines in quoted material, and the
result is messy.  I have rerewrapped one paragraph below.]

On Mon, Mar 14, 2016 at 02:00:03AM +0100, Tomas Vondra wrote:
> On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:
> > I've not attempted to study the behavior on slow hardware.  Instead, my
> > report used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
> > diagnostic patch no longer applies cleanly, so I'm attaching a rebased
> > version.  I've changed the patch name from "stat-coalesce" to
> > "slow-stat-simulate" to more-clearly distinguish it from the
> > "pgstat-coalesce" patch.)  Problems remain after applying your patch;
> > consider "VACUUM pg_am" behavior:
> > 
> > 9.2 w/ stat-coalesce-v1.patch:
> >   VACUUM returns in 3s, stats collector writes each file 1x over 3s
> > HEAD w/ slow-stat-simulate-v2.patch:
> >   VACUUM returns in 3s, stats collector writes each file 5x over 15s
> > HEAD w/ slow-stat-simulate-v2.patch and your patch:
> >   VACUUM returns in 10s, stats collector writes no files over 10s
> 
> Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
> the opposite direction. After fixing that "VACUUM pg_am" completes in 3
> seconds and writes each file just once.

That fixes things.  "make check" passes under an 8s stats write time.

> --- a/src/backend/postmaster/pgstat.c
> +++ b/src/backend/postmaster/pgstat.c
> @@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
>   }
>  
>   /*
> +  * Ignore requests that are already resolved by the last write.
> +  *
> +  * We discard the queue of requests at the end of 
> pgstat_write_statsfiles(),
> +  * so the requests already waiting on the UDP socket at that moment 
> can't
> +  * be discarded in the previous loop.
> +  *
> +  * XXX Maybe this should also care about the clock skew, just like the
> +  * block a few lines down.

Yes, it should.  (The problem is large (>~100s), backward clock resets, not
skew.)  A clock reset causing "msg->clock_time < dbentry->stats_timestamp"
will usually also cause "msg->cutoff_time < dbentry->stats_timestamp".  Such
cases need the correction a few lines down.

The other thing needed here is to look through and update comments about
last_statrequests.  For example, this loop is dead code due to us writing
files as soon as we receive one inquiry:

/*
 * Find the last write request for this DB.  If it's older than the
 * request's cutoff time, update it; otherwise there's nothing to do.
 *
 * Note that if a request is found, we return early and skip the below
 * check for clock skew.  This is okay, since the only way for a DB
 * request to be present in the list is that we have been here since the
 * last write round.
 */
slist_foreach(iter, &last_statrequests) ...

I'm okay keeping the dead code for future flexibility, but the comment should
reflect that.

Thanks,
nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] Re: [HACKERS] Re: [HACKERS] Windows service is not starting so there’s message in log: FATAL: "could not create shared memory segment “Global/PostgreSQL.851401618”: Permission

2016-03-13 Thread Amit Kapila

On Fri, Mar 11, 2016 at 5:21 PM, Haribabu Kommi 
wrote:
>
> On Fri, Mar 11, 2016 at 12:00 AM, Amit Kapila 
wrote:
>
>
> >> I am not able to find the reason for this error. This error is
occurring
> >> only
> >> when the PostgreSQL is started as a service only.
> >>
> >
> > Did you use pg_ctl register/unregister to register different services.
Can
> > you share the detail steps and OS version on which you saw this
behaviour?
>
> Operating system - windows 7
> Binary - PostgreSQL 9.5 (This doesn't matter, 9.4+ can produce the
problem)
>
> 1. Create two standard users in the system (test_user1 and test_user2)
> 2. Create two databases belongs each user listed above.
> 3. Now using pg_ctl register the services for the two users.
> 4. Provide logon permissions to these users to run the services by
changing
> service properties.

Did you mean to say that you changed Log on as: Local System Account in
service properties or something else?

> 5. Now try to start the services, the second service fails with the
> error message.
> 6. Error details can be found out in Event log viewer.
>

If I follow above steps and do as I mentioned for step-4, I am not able to
reproduce the issue on Windows-7 m/c using code of HEAD.

> Yes, it is working as same user services. The main problem is, PostgreSQL
> as a service for two different users in the same system is not working
because
> of same random getting generated for two services.
>

I am not sure why you think same random number is problem, as mentioned
above, even if the dsm name is same due to same random number, the code has
logic to process it appropriately (regenerate the name of dsm).  Having
said that, I don't mean to say that we shouldn't have logic to generate
unique name and I think we might want to add data dir path to name
generation as we do for main shared memory, however it is better to first
completely understand the underneath issue.

If I understand correctly, here the problem is due to the reason that the
second user doesn't have appropriate access rights to access the object
created by first user.  On reading the documentation of
CreateFileMapping(), it seems that user should have SeCreateGlobalPrivilege
privilege to create an object in Global namespace.  Can you once try giving
that privilege to the users created by you?  To give this privilege, go to
control panel->System And Security->Administrative Tools->Local Security
Policy->Local Policies->User Rights Assignment, in the right window, select
Create global objects and double-click the same and add the newly created
users. Rerun your test after these steps.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 17:05, James Sewell  wrote:
>
> Hi again,
>
> I've been playing around with inheritance combined with this patch. Currently 
> it looks like you are taking max(parallel_degree) from all the child tables 
> and using that for the number of workers.
>
> For large machines it makes much more sense to use sum(parallel_degree) - but 
> I've just seen this comment in the code:
>
> /*
>  * Decide what parallel degree to request for this append path.  For
>  * now, we just use the maximum parallel degree of any member.  It
>  * might be useful to use a higher number if the Append node were
>  * smart enough to spread out the workers, but it currently isn't.
>  */
>
> Does this mean that even though we are aggregating in parallel, we are only 
> operating on one child table at a time currently?

There is nothing in the planner yet, or any patch that I know of to
push the Partial Aggregate node to below an Append node. That will
most likely come in 9.7.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

On Mon, Mar 14, 2016 at 3:05 PM, David Rowley 
wrote:

>
> Things to try:
> 1. alter table a add column ts_date date; update a set ts_date =
> date_trunc('DAY',ts); vacuum full analyze ts;
> 2. or, create index on a (date_trunc('DAY',ts)); analyze a;
> 3. or for testing, set the work_mem higher.
>
>
Ah, that makes sense.

Tried with a BTREE index, and it works as perfectly but the index is 428MB
- which is a bit rough.

Removed that and put on a BRIN index, same result for 48kB - perfect!

Thanks for the help,

James

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] pg_stat_get_progress_info(NULL) blows up

2016-03-13 Thread Amit Langote

On 2016/03/14 10:54, Thomas Munro wrote:
> Hi
> 
> I guess pg_stat_get_progress_info should either be strict (see
> attached) or check for NULL.

Thanks a lot for reporting and the patch.  I think that's an oversight.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi again,

I've been playing around with inheritance combined with this patch.
Currently it looks like you are taking max(parallel_degree) from all the
child tables and using that for the number of workers.

For large machines it makes much more sense to use sum(parallel_degree) -
but I've just seen this comment in the code:

/*
 * Decide what parallel degree to request for this append path.  For
 * now, we just use the maximum parallel degree of any member.  It
 * might be useful to use a higher number if the Append node were
 * smart enough to spread out the workers, but it currently isn't.
 */

Does this mean that even though we are aggregating in parallel, we are only
operating on one child table at a time currently?

Cheers,

James Sewell,
 Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 2:39 PM, James Sewell 
wrote:

> Cool,
>
> I've been testing how this works with partitioning (which seems to be
> strange, but I'll post separately about that) and something odd seems to be
> going on now with the parallel triggering:
>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 2000
>
> postgres=# select * from a limit 1;
>  ts | count |  a  |  b   |  c   |  d   | e
> +---+-+--+--+--+---
>  2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
> (1 row)
>
> postgres-# \d a
>  Table "datamart_owner.a"
>  Column |Type | Modifiers
> +-+---
>  ts | timestamp without time zone |
>  count  | integer |
>  a  | integer |
>  b  | integer |
>  c  | integer |
>  d  | integer |
>  e  | integer |
>
> postgres=# select pg_size_pretty(pg_relation_size('a'));
>  pg_size_pretty
> 
>  1149 MB
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>   QUERY PLAN
>
> --
>  Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
>Number of Workers: 5
>->  Partial HashAggregate  (cost=217059.08..217061.58
> rows=200 width=16)
>  Group Key: date_trunc('DAY'::text, ts)
>  ->  Parallel Seq Scan on a  (cost=0.00..197059.06
> rows=405 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
> --
>  GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
> (5 rows)
>
> Unsure what's happening here.
>
>
>
> James Sewell,
> PostgreSQL Team Lead / Solutions Architect
> __
>
>
> Level 2, 50 Queen St, Melbourne VIC 3000
>
> *P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099
>
>
> On Mon, Mar 14, 2016 at 1:31 PM, David Rowley <
> david.row...@2ndquadrant.com> wrote:
>
>> On 14 March 2016 at 14:52, James Sewell 
>> wrote:
>> > One question - how is the upper limit of workers chosen?
>>
>> See create_parallel_paths() in allpaths.c. Basically the bigger the
>> relation (in pages) the more workers will be allocated, up until
>> max_parallel_degree.
>>
>> There is also a comment in that function which states:
>> /*
>> * Limit the degree of parallelism logarithmically based on the size of the
>> * relation.  This probably needs to be a good deal more sophisticated,
>> but we
>> * need something here for now.
>> */
>>
>> So this will likely see some revision at some point, after 9.6.
>>
>> --
>>  David Rowley   http://www.2ndQuadrant.com/
>>  PostgreSQL Development, 24x7 Support, Training & Services
>>
>
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone.

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 16:39, James Sewell  wrote:
>
> I've been testing how this works with partitioning (which seems to be 
> strange, but I'll post separately about that) and something odd seems to be 
> going on now with the parallel triggering:
>
> postgres=# create table a as select * from base_p2015_11;
> SELECT 2000
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
>   QUERY PLAN
> --
>  Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
>Number of Workers: 5
>->  Partial HashAggregate  (cost=217059.08..217061.58 rows=200 
> width=16)
>  Group Key: date_trunc('DAY'::text, ts)
>  ->  Parallel Seq Scan on a  (cost=0.00..197059.06 
> rows=405 width=12)
> (9 rows)
>
> postgres=# analyze a;
>
> postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
> QUERY PLAN
> --
>  GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
>Group Key: (date_trunc('DAY'::text, ts))
>->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
>  Sort Key: (date_trunc('DAY'::text, ts))
>  ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
> (5 rows)
>
> Unsure what's happening here.

This just comes down to the fact that PostgreSQL is quite poor at
estimating the number of groups that will be produced by the
expression date_trunc('DAY',ts). Due to lack of stats when you run the
query before ANALYZE, PostgreSQL just uses a hardcoded guess of 200,
which it thinks will fit quite nicely in the HashAggregate node's hash
table. After you run ANALYZE this estimate goes up to 2024, and
the grouping planner thinks that's a little to much be storing in a
hash table, based on the size of your your work_mem setting, so it
uses a Sort plan instead.

Things to try:
1. alter table a add column ts_date date; update a set ts_date =
date_trunc('DAY',ts); vacuum full analyze ts;
2. or, create index on a (date_trunc('DAY',ts)); analyze a;
3. or for testing, set the work_mem higher.

So, basically, it's no fault of this patch. It's just there's no real
good way for the planner to go estimating something like
date_trunc('DAY',ts) without either adding a column which explicitly
stores that value (1), or collecting stats on the expression (2), or
teaching the planner about the internals of that function, which is
likely just never going to happen. (3) is just going to make the
outlook of a hash plan look a little brighter, although you'll likely
need a work_mem of over 1GB to make the plan change.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-13 Thread Kouhei Kaigai

> -Original Message-
> From: Petr Jelinek [mailto:p...@2ndquadrant.com]
> Sent: Monday, March 14, 2016 12:18 PM
> To: Kaigai Kouhei(海外 浩平); Tom Lane; David Rowley
> Cc: Robert Haas; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP: Upper planner pathification
> 
> On 14/03/16 02:43, Kouhei Kaigai wrote:
> >
> > CustomPath node is originally expected to generate various kind of plan
> > node, not only scan/join, and its interface is designed to support them.
> > For example, we can expect a CustomPath that generates "CustomSort".
> >
> > On the other hands, upper path consideration is more variable than the
> > case of scan/join path consideration. Probably, we can have no centralized
> > point to add custom-paths for sort, group-by, ...
> > So, I think we have hooks for each (supported) upper path workload.
> >
> > In case of sorting for example, the best location of the hook is just
> > above of the Assert() in the create_ordered_paths(). It allows to compare
> > estimated cost between SortPath and CustomPath.
> > However, it does not allow to inject CustomPath(for sort) into the path
> > node that may involve sorting, like WindowPath or AggPath.
> > Thus, another hook may be put on create_window_paths and
> > create_grouping_paths in my thought.
> >
> > Some other good idea?
> >
> > Even though I couldn't check the new planner implementation entirely,
> > it seems to be the points below are good candidate to inject CustomPath
> > (and potentially ForeignScan).
> >
> > - create_grouping_paths
> > - create_window_paths
> > - create_distinct_paths
> > - create_ordered_paths
> > - just below of the create_modifytable_path
> >(may be valuable for foreign-update pushdown)
> >
> 
> To me that seems too low inside the planning tree, perhaps adding it
> just to the subquery_planner before SS_identify_outer_params would be
> better, that's the place where you see the path for the whole (sub)query
> so you can search and modify what you need from there.
>
Thanks for your idea. Yes, I also thought a similar point; where all
the path consideration get completed. It indeed allows extensions to
walk down the path tree and replace a part of them.
However, when we want to inject CustomPath under the built-in paths,
extension has to re-calculate cost of the built-in paths again.
Perhaps, it affects to the decision of built-in path selection.
So, I concluded that it is not realistic to re-implement equivalent
upper planning stuff in the extension side, if we put the hook after
all the planning works done.

If extension can add its CustomPath at create_grouping_paths(), the
later steps, like create_window_paths, stands on the estimated cost
of the CustomPath. Thus, extension don't need to know the detail of
the entire upper planning.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Cool,

I've been testing how this works with partitioning (which seems to be
strange, but I'll post separately about that) and something odd seems to be
going on now with the parallel triggering:

postgres=# create table a as select * from base_p2015_11;
SELECT 2000

postgres=# select * from a limit 1;
 ts | count |  a  |  b   |  c   |  d   | e
+---+-+--+--+--+---
 2015-11-26 21:10:04.856828 |   860 | 946 | 1032 | 1118 | 1204 |
(1 row)

postgres-# \d a
 Table "datamart_owner.a"
 Column |Type | Modifiers
+-+---
 ts | timestamp without time zone |
 count  | integer |
 a  | integer |
 b  | integer |
 c  | integer |
 d  | integer |
 e  | integer |

postgres=# select pg_size_pretty(pg_relation_size('a'));
 pg_size_pretty

 1149 MB

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
  QUERY PLAN
--
 Finalize GroupAggregate  (cost=218242.96..218254.46 rows=200 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=218242.96..218245.96 rows=1200 width=16)
 Sort Key: (date_trunc('DAY'::text, ts))
 ->  Gather  (cost=218059.08..218181.58 rows=1200 width=16)
   Number of Workers: 5
   ->  Partial HashAggregate  (cost=217059.08..217061.58
rows=200 width=16)
 Group Key: date_trunc('DAY'::text, ts)
 ->  Parallel Seq Scan on a  (cost=0.00..197059.06
rows=405 width=12)
(9 rows)

postgres=# analyze a;

postgres=# explain  select sum(count) from a group by date_trunc('DAY',ts);
QUERY PLAN
--
 GroupAggregate  (cost=3164211.55..3564212.03 rows=2024 width=16)
   Group Key: (date_trunc('DAY'::text, ts))
   ->  Sort  (cost=3164211.55..3214211.61 rows=2024 width=12)
 Sort Key: (date_trunc('DAY'::text, ts))
 ->  Seq Scan on a  (cost=0.00..397059.30 rows=2024 width=12)
(5 rows)

Unsure what's happening here.



James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 1:31 PM, David Rowley 
wrote:

> On 14 March 2016 at 14:52, James Sewell  wrote:
> > One question - how is the upper limit of workers chosen?
>
> See create_parallel_paths() in allpaths.c. Basically the bigger the
> relation (in pages) the more workers will be allocated, up until
> max_parallel_degree.
>
> There is also a comment in that function which states:
> /*
> * Limit the degree of parallelism logarithmically based on the size of the
> * relation.  This probably needs to be a good deal more sophisticated, but
> we
> * need something here for now.
> */
>
> So this will likely see some revision at some point, after 9.6.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

[HACKERS] Obsolete comment in postgres_fdw.c

2016-03-13 Thread Etsuro Fujita

Hi,

Here is the comments for foreign_join_ok in postgres_fdw.c:

/*
 * Assess whether the join between inner and outer relations can be
pushed down
 * to the foreign server. As a side effect, save information we obtain
in this
 * function to PgFdwRelationInfo passed in.
 *
 * Joins that satisfy conditions below are safe to push down.
 *
 * 1) Join type is INNER or OUTER (one of LEFT/RIGHT/FULL)
 * 2) Both outer and inner portions are safe to push-down
 * 3) All foreign tables in the join belong to the same foreign server
and use
 *the same user mapping.
 * 4) All join conditions are safe to push down
 * 5) No relation has local filter (this can be relaxed for INNER JOIN,
if we
 *can move unpushable clauses upwards in the join tree).
 */

The condition 3 is now checked by the core, so I'd like to remove that
condition from the above comments.

In addition, I'd like to update some related comments in
src/include/nodes/relation.h and src/backend/optimizer/path/joinpath.c.

Attached is a patch for that.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 3345,3354  postgresImportForeignSchema(ImportForeignSchemaStmt *stmt, Oid serverOid)
   *
   * 1) Join type is INNER or OUTER (one of LEFT/RIGHT/FULL)
   * 2) Both outer and inner portions are safe to push-down
!  * 3) All foreign tables in the join belong to the same foreign server and use
!  *	  the same user mapping.
!  * 4) All join conditions are safe to push down
!  * 5) No relation has local filter (this can be relaxed for INNER JOIN, if we
   *	  can move unpushable clauses upwards in the join tree).
   */
  static bool
--- 3345,3352 
   *
   * 1) Join type is INNER or OUTER (one of LEFT/RIGHT/FULL)
   * 2) Both outer and inner portions are safe to push-down
!  * 3) All join conditions are safe to push down
!  * 4) No relation has local filter (this can be relaxed for INNER JOIN, if we
   *	  can move unpushable clauses upwards in the join tree).
   */
  static bool
*** a/src/backend/optimizer/path/joinpath.c
--- b/src/backend/optimizer/path/joinpath.c
***
*** 213,219  add_paths_to_joinrel(PlannerInfo *root,
  
  	/*
  	 * 5. If inner and outer relations are foreign tables (or joins) belonging
! 	 * to the same server, give the FDW a chance to push down joins.
  	 */
  	if (joinrel->fdwroutine &&
  		joinrel->fdwroutine->GetForeignJoinPaths)
--- 213,220 
  
  	/*
  	 * 5. If inner and outer relations are foreign tables (or joins) belonging
! 	 * to the same server and using the same user mapping, give the FDW a
! 	 * chance to push down joins.
  	 */
  	if (joinrel->fdwroutine &&
  		joinrel->fdwroutine->GetForeignJoinPaths)
*** a/src/include/nodes/relation.h
--- b/src/include/nodes/relation.h
***
*** 416,421  typedef struct PlannerInfo
--- 416,422 
   * all belong to the same foreign server, these fields will be set:
   *
   *		serverid - OID of foreign server, if foreign table (else InvalidOid)
+  *		umid - OID of user mapping, if foreign table (else InvalidOid)
   *		fdwroutine - function hooks for FDW, if foreign table (else NULL)
   *		fdw_private - private state for FDW, if foreign table (else NULL)
   *

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-13 Thread Etsuro Fujita

On 2016/03/14 11:51, Tom Lane wrote:
> Etsuro Fujita  writes:
>> On 2016/03/13 4:46, Andres Freund wrote:
>>> ... The difference apears to be the
>>> check that's now in build_simple_rel() - there was nothing hitting the
>>> user mapping code before for file_fdw.

>> Exactly.

>> I'm not sure it's worth complicating the code to keep that behavior, so
>> I'd vote for adding the change notice to 9.6 release notes or something
>> like that in addition to updating file-fdw.sgml.

> Perhaps it would be useful for an FDW to be able to specify that user
> mappings are meaningless to it?  And then bypass this check for such FDWs?
> 
> I'm not really sold on enforcing that people create meaningless user
> mappings.  For one thing, they're likely to be sloppy about it, which
> could lead to latent security problems if the FDW later acquires a
> concept that user mappings mean something.

Seems reasonable.

Best regards,
Etsuro Fujita




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Upper planner pathification

2016-03-13 Thread Petr Jelinek

On 14/03/16 02:43, Kouhei Kaigai wrote:
>
> CustomPath node is originally expected to generate various kind of plan
> node, not only scan/join, and its interface is designed to support them.
> For example, we can expect a CustomPath that generates "CustomSort".
> 
> On the other hands, upper path consideration is more variable than the
> case of scan/join path consideration. Probably, we can have no centralized
> point to add custom-paths for sort, group-by, ...
> So, I think we have hooks for each (supported) upper path workload.
> 
> In case of sorting for example, the best location of the hook is just
> above of the Assert() in the create_ordered_paths(). It allows to compare
> estimated cost between SortPath and CustomPath.
> However, it does not allow to inject CustomPath(for sort) into the path
> node that may involve sorting, like WindowPath or AggPath.
> Thus, another hook may be put on create_window_paths and
> create_grouping_paths in my thought.
> 
> Some other good idea?
> 
> Even though I couldn't check the new planner implementation entirely,
> it seems to be the points below are good candidate to inject CustomPath
> (and potentially ForeignScan).
> 
> - create_grouping_paths
> - create_window_paths
> - create_distinct_paths
> - create_ordered_paths
> - just below of the create_modifytable_path
>(may be valuable for foreign-update pushdown)
> 

To me that seems too low inside the planning tree, perhaps adding it
just to the subquery_planner before SS_identify_outer_params would be
better, that's the place where you see the path for the whole (sub)query
so you can search and modify what you need from there.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-13 Thread Kouhei Kaigai

> On 14/03/16 02:53, Kouhei Kaigai wrote:
> >> -Original Message-
> >> From: pgsql-hackers-ow...@postgresql.org
> >> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Petr Jelinek
> >> Sent: Friday, March 11, 2016 12:27 AM
> >> To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
> >> Subject: Re: [HACKERS] Reworks of CustomScan serialization/deserialization
> >>
> >> On 10/03/16 08:08, Kouhei Kaigai wrote:
> 
> >> Also in RegisterCustomScanMethods
> >> +  Assert(strlen(methods->CustomName) <= CUSTOM_NAME_MAX_LEN);
> >>
> >> Shouldn't this be actually "if" with ereport() considering this is
> >> public API and extensions can pass anything there? (for that matter 
> >> same
> >> is true for RegisterExtensibleNodeMethods but that's already 
> >> committed).
> >>
> > Hmm. I don't have clear answer which is better. The reason why I put
> > Assert() here is that only c-binary extension uses this interface, thus,
> > author will fix up the problem of too long name prior to its release.
> > Of course, if-with-ereport() also informs extension author the name is
> > too long.
> > One downside of Assert() may be, it makes oversight if --enable-cassert
> > was not specified.
> >
> 
>  Well that's exactly my problem, this should IMHO throw error even
>  without --enable-cassert. It's not like it's some performance sensitive
>  API where if would be big problem, ensuring correctness of the input is
>  more imporant here IMHO.
> 
> >>> We may need to fix up RegisterExtensibleNodeMethods() first.
> >>>
> >>> Also, length limitation is (EXTNODENAME_MAX_LEN-1) because the last byte
> >>> is consumed by '\0' character. In fact, hash, match and keycopy function
> >>> of HTAB for string keys deal with the first (keysize - 1) bytes.
> >>> So, strkey(extnodename) == EXTNODENAME_MAX_LEN is not legal.
> >>>
> >>
> >> Yes, my thoughts as well but that can be separate tiny patch that does
> >> not have to affect this one. In my opinion if we fixed this one it would
> >> be otherwise ready to go in, and I definitely prefer this approach to
> >> the previous one.
> >>
> > OK, I split the previous small patch into two tiny patches.
> > The one is bugfix around max length of the extnodename.
> > The other replaces Assert() by ereport() according to the upthread 
> > discussion.
> >
> 
> Okay, it's somewhat akin to hairsplitting but works for me. Do you plan
> to do same thing with the CustomScan patch itself as well?.
>
Yes. I'll fixup the patch to follow the same manner.

--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-13 Thread Petr Jelinek

On 14/03/16 02:53, Kouhei Kaigai wrote:
>> -Original Message-
>> From: pgsql-hackers-ow...@postgresql.org
>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Petr Jelinek
>> Sent: Friday, March 11, 2016 12:27 AM
>> To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] Reworks of CustomScan serialization/deserialization
>>
>> On 10/03/16 08:08, Kouhei Kaigai wrote:

>> Also in RegisterCustomScanMethods
>> +Assert(strlen(methods->CustomName) <= CUSTOM_NAME_MAX_LEN);
>>
>> Shouldn't this be actually "if" with ereport() considering this is
>> public API and extensions can pass anything there? (for that matter same
>> is true for RegisterExtensibleNodeMethods but that's already committed).
>>
> Hmm. I don't have clear answer which is better. The reason why I put
> Assert() here is that only c-binary extension uses this interface, thus,
> author will fix up the problem of too long name prior to its release.
> Of course, if-with-ereport() also informs extension author the name is
> too long.
> One downside of Assert() may be, it makes oversight if --enable-cassert
> was not specified.
>

 Well that's exactly my problem, this should IMHO throw error even
 without --enable-cassert. It's not like it's some performance sensitive
 API where if would be big problem, ensuring correctness of the input is
 more imporant here IMHO.

>>> We may need to fix up RegisterExtensibleNodeMethods() first.
>>>
>>> Also, length limitation is (EXTNODENAME_MAX_LEN-1) because the last byte
>>> is consumed by '\0' character. In fact, hash, match and keycopy function
>>> of HTAB for string keys deal with the first (keysize - 1) bytes.
>>> So, strkey(extnodename) == EXTNODENAME_MAX_LEN is not legal.
>>>
>>
>> Yes, my thoughts as well but that can be separate tiny patch that does
>> not have to affect this one. In my opinion if we fixed this one it would
>> be otherwise ready to go in, and I definitely prefer this approach to
>> the previous one.
>>
> OK, I split the previous small patch into two tiny patches.
> The one is bugfix around max length of the extnodename.
> The other replaces Assert() by ereport() according to the upthread discussion.
> 

Okay, it's somewhat akin to hairsplitting but works for me. Do you plan
to do same thing with the CustomScan patch itself as well?.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-13 Thread Petr Jelinek


On 14/03/16 03:29, Dilip Kumar wrote:


On Mon, Mar 14, 2016 at 5:02 AM, Jim Nasby mailto:jim.na...@bluetreble.com>> wrote:

Well, 16MB is 2K pages, which is what you'd get if 100 connections
were all blocked and we're doing 20 pages per waiter. That seems
like a really extreme scenario, so maybe 4MB is a good compromise.
That's unlikely to be hit in most cases, unlikely to put a ton of
stress on IO, even with magnetic media (assuming the whole 4MB is
queued to write in one shot...). 4MB would still reduce the number
of locks by 500x.


In my performance results given up thread, we are getting max
performance at 32 clients, means at a time we are extending 32*20 ~= max
(600) pages at a time. So now with 4MB limit (max 512 pages) Results
will looks similar. So we need to take a decision whether 4MB is good
limit, should I change it ?




Well any value we choose will be very arbitrary. If we look at it from 
the point of maximum absolute disk space we allocate for relation at 
once, the 4MB limit would represent 2.5 orders of magnitude change. That 
sounds like enough for one release cycle, I think we can further tune it 
if the need arises in next one. (with my love for round numbers I would 
have suggested 8MB as that's 3 orders of magnitude, but I am fine with 
4MB as well)


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-13 Thread Tom Lane

Etsuro Fujita  writes:
> On 2016/03/13 4:46, Andres Freund wrote:
>> ... The difference apears to be the
>> check that's now in build_simple_rel() - there was nothing hitting the
>> user mapping code before for file_fdw.

> Exactly.

> I'm not sure it's worth complicating the code to keep that behavior, so 
> I'd vote for adding the change notice to 9.6 release notes or something 
> like that in addition to updating file-fdw.sgml.

Perhaps it would be useful for an FDW to be able to specify that user
mappings are meaningless to it?  And then bypass this check for such FDWs?

I'm not really sold on enforcing that people create meaningless user
mappings.  For one thing, they're likely to be sloppy about it, which
could lead to latent security problems if the FDW later acquires a
concept that user mappings mean something.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Only try to push down foreign joins if the user mapping OIDs mat

2016-03-13 Thread Etsuro Fujita


Hi,

On 2016/03/13 4:46, Andres Freund wrote:

On 2016-03-12 11:56:24 -0500, Robert Haas wrote:

On Fri, Mar 11, 2016 at 10:15 PM, Andres Freund  wrote:

On 2016-01-28 19:09:01 +, Robert Haas wrote:

Only try to push down foreign joins if the user mapping OIDs match.

Previously, the foreign join pushdown infrastructure left the question
of security entirely up to individual FDWs, but it would be easy for
a foreign data wrapper to inadvertently open up subtle security holes
that way.  So, make it the core code's job to determine which user
mapping OID is relevant, and don't attempt join pushdown unless it's
the same for all relevant relations.

Per a suggestion from Tom Lane.  Shigeru Hanada and Ashutosh Bapat,
reviewed by Etsuro Fujita and KaiGai Kohei, with some further
changes by me.



I noticed that this breaks some citus regression tests in a minor
manner. Namely previously file_fdw worked without a user mapping, now it
doesn't appear to anymore.

This is easy enough to fix, and it's perfectly ok for us to fix this,
but I do wonder if that's not going to cause trouble for others.



Hmm, I didn't intend to change that.  If that commit broke something,
there's obviously a hole in our regression test coverage.



CREATE EXTENSION file_fdw;
CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw;

CREATE FOREIGN TABLE agg_csv (
 a   int2,
 b   float4
) SERVER file_server
OPTIONS (format 'csv', filename 
'/home/andres/src/postgresql/contrib/file_fdw/data/agg.csv', header 'true', 
delimiter ';', quote '@', escape '"', null '');

SELECT * FROM agg_csv;



worked in 9.5, but doesn't in master. The difference apears to be the
check that's now in build_simple_rel() - there was nothing hitting the
user mapping code before for file_fdw.


Exactly.

I'm not sure it's worth complicating the code to keep that behavior, so 
I'd vote for adding the change notice to 9.6 release notes or something 
like that in addition to updating file-fdw.sgml.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 14:52, James Sewell  wrote:
> One question - how is the upper limit of workers chosen?

See create_parallel_paths() in allpaths.c. Basically the bigger the
relation (in pages) the more workers will be allocated, up until
max_parallel_degree.

There is also a comment in that function which states:
/*
* Limit the degree of parallelism logarithmically based on the size of the
* relation.  This probably needs to be a good deal more sophisticated, but we
* need something here for now.
*/

So this will likely see some revision at some point, after 9.6.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-13 Thread Dilip Kumar

On Mon, Mar 14, 2016 at 5:02 AM, Jim Nasby  wrote:

> Well, 16MB is 2K pages, which is what you'd get if 100 connections were
> all blocked and we're doing 20 pages per waiter. That seems like a really
> extreme scenario, so maybe 4MB is a good compromise. That's unlikely to be
> hit in most cases, unlikely to put a ton of stress on IO, even with
> magnetic media (assuming the whole 4MB is queued to write in one shot...).
> 4MB would still reduce the number of locks by 500x.

In my performance results given up thread, we are getting max performance
at 32 clients, means at a time we are extending 32*20 ~= max (600) pages at
a time. So now with 4MB limit (max 512 pages) Results will looks similar.
So we need to take a decision whether 4MB is good limit, should I change it
?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

[HACKERS] pg_stat_get_progress_info(NULL) blows up

2016-03-13 Thread Thomas Munro

Hi

I guess pg_stat_get_progress_info should either be strict (see
attached) or check for NULL.

-- 
Thomas Munro
http://www.enterprisedb.com


pg_stat_get_progress_info-strict.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reworks of CustomScan serialization/deserialization

2016-03-13 Thread Kouhei Kaigai

> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Petr Jelinek
> Sent: Friday, March 11, 2016 12:27 AM
> To: Kaigai Kouhei(海外 浩平); pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Reworks of CustomScan serialization/deserialization
> 
> On 10/03/16 08:08, Kouhei Kaigai wrote:
> >>
>  Also in RegisterCustomScanMethods
>  +Assert(strlen(methods->CustomName) <= CUSTOM_NAME_MAX_LEN);
> 
>  Shouldn't this be actually "if" with ereport() considering this is
>  public API and extensions can pass anything there? (for that matter same
>  is true for RegisterExtensibleNodeMethods but that's already committed).
> 
> >>> Hmm. I don't have clear answer which is better. The reason why I put
> >>> Assert() here is that only c-binary extension uses this interface, thus,
> >>> author will fix up the problem of too long name prior to its release.
> >>> Of course, if-with-ereport() also informs extension author the name is
> >>> too long.
> >>> One downside of Assert() may be, it makes oversight if --enable-cassert
> >>> was not specified.
> >>>
> >>
> >> Well that's exactly my problem, this should IMHO throw error even
> >> without --enable-cassert. It's not like it's some performance sensitive
> >> API where if would be big problem, ensuring correctness of the input is
> >> more imporant here IMHO.
> >>
> > We may need to fix up RegisterExtensibleNodeMethods() first.
> >
> > Also, length limitation is (EXTNODENAME_MAX_LEN-1) because the last byte
> > is consumed by '\0' character. In fact, hash, match and keycopy function
> > of HTAB for string keys deal with the first (keysize - 1) bytes.
> > So, strkey(extnodename) == EXTNODENAME_MAX_LEN is not legal.
> >
> 
> Yes, my thoughts as well but that can be separate tiny patch that does
> not have to affect this one. In my opinion if we fixed this one it would
> be otherwise ready to go in, and I definitely prefer this approach to
> the previous one.
>
OK, I split the previous small patch into two tiny patches.
The one is bugfix around max length of the extnodename.
The other replaces Assert() by ereport() according to the upthread discussion.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 



pgsql-v9.6-fix-extnodename-max-len.patch
Description: pgsql-v9.6-fix-extnodename-max-len.patch


pgsql-v9.6-replace-assert-by-ereport-on-register-extnode.patch
Description: pgsql-v9.6-replace-assert-by-ereport-on-register-extnode.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi,

Happy to test, really looking forward to seeing this stuff in core.

The explain analyze is below:

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
(actual time=2282.092..2282.202 rows=15 loops=1)
   Group Key: (date_trunc('DAY'::text, pageview_start_tstamp))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16) (actual
time=2281.749..2282.060 rows=105 loops=1)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16) (actual time=2276.879..2277.030 rows=15 loops=7)
   Group Key: date_trunc('DAY'::text, pageview_start_tstamp)
   ->  Parallel Seq Scan on celebrus_fact_agg_1_p2015_12
 (cost=0.00..743769.76 rows=4221741 width=12) (actual time=0.066..1631
.650 rows=3618887 loops=7)

One question - how is the upper limit of workers chosen?


James Sewell,
Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 12:30 PM, David Rowley  wrote:

> On 14 March 2016 at 14:16, James Sewell  wrote:
>
>> I've done some testing with one of my data sets in an 8VPU virtual
>> environment and this is looking really, really good.
>>
>> My test query is:
>>
>> SELECT pageview, sum(pageview_count)
>> FROM fact_agg_2015_12
>> GROUP BY date_trunc('DAY'::text, pageview);
>>
>> The query returns 15 rows. The fact_agg table is 5398MB and holds around
>> 25 million records.
>>
>> Explain with a max_parallel_degree of 8 tells me that the query will
>> only use 6 background workers. I have no indexes on the table currently.
>>
>> Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
>>Group Key: (date_trunc('DAY'::text, pageview))
>>->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
>>  Number of Workers: 6
>>  ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
>> width=16)
>>Group Key: date_trunc('DAY'::text, pageview)
>>->  Parallel Seq Scan on fact_agg_2015_12
>>  (cost=0.00..743769.76 rows=4221741 width=12)
>>
>
> Great! Thanks for testing this.
>
> If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
> number of Gather rows come out at 105? I'd just like to get an idea of my
> cost estimate for the Gather are going to be accurate for real world data
> sets.
>
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] WIP: Upper planner pathification

2016-03-13 Thread Kouhei Kaigai

Hello,

I'm now checking the new planner implementation to find out the way to
integrate CustomPath to the upper planner also.
CustomPath node is originally expected to generate various kind of plan
node, not only scan/join, and its interface is designed to support them.
For example, we can expect a CustomPath that generates "CustomSort".

On the other hands, upper path consideration is more variable than the
case of scan/join path consideration. Probably, we can have no centralized
point to add custom-paths for sort, group-by, ...
So, I think we have hooks for each (supported) upper path workload.

In case of sorting for example, the best location of the hook is just
above of the Assert() in the create_ordered_paths(). It allows to compare
estimated cost between SortPath and CustomPath.
However, it does not allow to inject CustomPath(for sort) into the path
node that may involve sorting, like WindowPath or AggPath.
Thus, another hook may be put on create_window_paths and
create_grouping_paths in my thought.

Some other good idea?

Even though I couldn't check the new planner implementation entirely,
it seems to be the points below are good candidate to inject CustomPath
(and potentially ForeignScan).

- create_grouping_paths
- create_window_paths
- create_distinct_paths
- create_ordered_paths
- just below of the create_modifytable_path
  (may be valuable for foreign-update pushdown)

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


> -Original Message-
> From: pgsql-hackers-ow...@postgresql.org
> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane
> Sent: Saturday, March 05, 2016 3:02 AM
> To: David Rowley
> Cc: Robert Haas; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] WIP: Upper planner pathification
> 
> OK, here is a version that I think addresses all of the recent comments:
> 
> * I refactored the grouping-sets stuff as suggested by Robert and David.
> The GroupingSetsPath code is now used *only* when there are grouping sets,
> otherwise what you get is a plain AGG_SORTED AggPath.  This allowed
> removal of a boatload of weird corner cases in the GroupingSets code path,
> so it was a good change.  (Fundamentally, that's cleaning up some
> questionable coding in the grouping sets patch rather than fixing anything
> directly related to pathification, but I like the code better now.)
> 
> * I refactored the handling of targetlists in createplan.c.  After some
> reflection I decided that the disuse_physical_tlist callers fell into
> three separate categories: those that actually needed the exact requested
> tlist to be returned, those that wanted non-bloated tuples because they
> were going to put them into sort or hash storage, and those that needed
> grouping columns to be properly labeled.  The new approach is to pass down
> a "flags" word that specifies which if any of these cases apply at a
> specific plan level.  use_physical_tlist now always makes the right
> decision to start with, and disuse_physical_tlist is gone entirely, which
> should make things a bit faster since we won't uselessly construct and
> discard physical tlists.  The missing logic from make_subplanTargetList
> and locate_grouping_columns is reincarnated in the physical-tlist code.
> 
> * Added explicit limit/offset fields to LimitPath, as requested by Teodor.
> 
> * Removed SortPath.sortgroupclauses.
> 
> * Fixed handling of parallel-query fields in new path node types.
> (BTW, I found what seemed to be a couple of pre-existing bugs of
> the same kind, eg create_mergejoin_path was different from the
> other two kinds of join as to setting parallel_degree.)
> 
> 
> What remains to be done, IMV:
> 
> * Performance testing as per yesterday's discussion.
> 
> * Debug support in outfuncs.c and print_path() for new node types.
> 
> * Clean up unfinished work on function header comments.
> 
> * Write some documentation about how FDWs might use this.
> 
> I'll work on the performance testing next.  Barring unsatisfactory
> results from that, I think this could be committable in a couple
> of days.
> 
>   regards, tom lane



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 14 March 2016 at 14:16, James Sewell  wrote:

> I've done some testing with one of my data sets in an 8VPU virtual
> environment and this is looking really, really good.
>
> My test query is:
>
> SELECT pageview, sum(pageview_count)
> FROM fact_agg_2015_12
> GROUP BY date_trunc('DAY'::text, pageview);
>
> The query returns 15 rows. The fact_agg table is 5398MB and holds around
> 25 million records.
>
> Explain with a max_parallel_degree of 8 tells me that the query will only
> use 6 background workers. I have no indexes on the table currently.
>
> Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
>Group Key: (date_trunc('DAY'::text, pageview))
>->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
>  Number of Workers: 6
>  ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
> width=16)
>Group Key: date_trunc('DAY'::text, pageview)
>->  Parallel Seq Scan on fact_agg_2015_12
>  (cost=0.00..743769.76 rows=4221741 width=12)
>

Great! Thanks for testing this.

If you run EXPLAIN ANALYZE on this with the 6 workers, does the actual
number of Gather rows come out at 105? I'd just like to get an idea of my
cost estimate for the Gather are going to be accurate for real world data
sets.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread James Sewell

Hi,

I've done some testing with one of my data sets in an 8VPU virtual
environment and this is looking really, really good.

My test query is:

SELECT pageview, sum(pageview_count)
FROM fact_agg_2015_12
GROUP BY date_trunc('DAY'::text, pageview);

The query returns 15 rows. The fact_agg table is 5398MB and holds around 25
million records.

Explain with a max_parallel_degree of 8 tells me that the query will only
use 6 background workers. I have no indexes on the table currently.

Finalize HashAggregate  (cost=810142.42..810882.62 rows=59216 width=16)
   Group Key: (date_trunc('DAY'::text, pageview))
   ->  Gather  (cost=765878.46..808069.86 rows=414512 width=16)
 Number of Workers: 6
 ->  Partial HashAggregate  (cost=764878.46..765618.66 rows=59216
width=16)
   Group Key: date_trunc('DAY'::text, pageview)
   ->  Parallel Seq Scan on fact_agg_2015_12
 (cost=0.00..743769.76 rows=4221741 width=12)


I am getting the following timings (everything was cached before I started
tested). I didn't average the runtime, but I ran each one three times and
took the middle value.

*max_parallel_degree runtime*
0  11693.537 ms
1  6387.937 ms
2 4328.629 ms
3 3292.376 ms
4 2743.148 ms
5 2278.449 ms
6 2000.599 ms


I'm pretty happy!

Cheers,


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000  *W* www.lisasoft.com  *F *(+61) 3 8370 8099


On Mon, Mar 14, 2016 at 8:44 AM, David Rowley 
wrote:

> On 12 March 2016 at 16:31, David Rowley 
> wrote:
> > I've attached an updated patch which is based on commit 7087166,
> > things are really changing fast in the grouping path area at the
> > moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.
>
> --
>  David Rowley   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-13 Thread Tomas Vondra

On Sun, 2016-03-13 at 18:46 -0400, Noah Misch wrote:
> On Thu, Mar 03, 2016 at 06:08:07AM +0100, Tomas Vondra wrote:
> > 
> > On 02/03/2016 06:46 AM, Noah Misch wrote:
> > > 
> > > On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:
> > > > 
> > > > On 12/22/2015 03:49 PM, Noah Misch wrote:
> > > > > 
> > > > > On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera
> > > > > wrote:
> > > > > > 
> > > > > > I have pushed it now.  Further testing, of course, is
> > > > > > always welcome.
> > > > > While investigating stats.sql buildfarm failures, mostly on
> > > > > animals
> > > > > axolotl and shearwater, I found that this patch (commit
> > > > > 187492b)
> > > > > inadvertently removed the collector's ability to coalesce
> > > > > inquiries.
> > > > > Every PGSTAT_MTYPE_INQUIRY received now causes one stats file
> > > > > write.
> > > > > Before, pgstat_recv_inquiry() did:
> > > > > 
> > > > >   if (msg->inquiry_time > last_statrequest)
> > > > >   last_statrequest = msg->inquiry_time;
> > > > > 
> > > > > and pgstat_write_statsfile() did:
> > > > > 
> > > > >   globalStats.stats_timestamp = GetCurrentTimestamp();
> > > > > ... (work of writing a stats file) ...
> > > > >   last_statwrite = globalStats.stats_timestamp;
> > > > >   last_statrequest = last_statwrite;
> > > > > 
> > > > > If the collector entered pgstat_write_statsfile() with more
> > > > > inquiries
> > > > > waiting in its socket receive buffer, it would ignore them as
> > > > > being too
> > > > > old once it finished the write and resumed message
> > > > > processing. Commit
> > > > > 187492b converted last_statrequest to a "last_statrequests"
> > > > > list that we
> > > > > wipe after each write.
> > So I've been looking at this today, and I think the attached patch
> > should do
> > the trick. I can't really verify it, because I've been unable to
> > reproduce the
> > non-coalescing - I presume it requires much slower system (axolotl
> > is RPi, not
> > sure about shearwater).
> > 
> > The patch simply checks DBEntry,stats_timestamp in
> > pgstat_recv_inquiry() and
> > ignores requests that are already resolved by the last write (maybe
> > this
> > should accept a file written up to PGSTAT_STAT_INTERVAL in the
> > past).
> > 
> > The required field is already in DBEntry (otherwise it'd be
> > impossible to
> > determine if backends need to send inquiries or not), so this is
> > pretty
> > trivial change. I can't think of a simpler patch.
> > 
> > Can you try applying the patch on a machine where the problem is
> > reproducible? I might have some RPi machines laying around (for
> > other
> > purposes).
> I've not attempted to study the behavior on slow hardware.  Instead,
> my report
> used stat-coalesce-v1.patch[1] to simulate slow writes.  (That
> diagnostic
> patch no longer applies cleanly, so I'm attaching a rebased
> version.  I've
> changed the patch name from "stat-coalesce" to "slow-stat-simulate"
> to
> more-clearly distinguish it from the "pgstat-coalesce"
> patch.)  Problems
> remain after applying your patch; consider "VACUUM pg_am" behavior:
> 
> 9.2 w/ stat-coalesce-v1.patch:
>   VACUUM returns in 3s, stats collector writes each file 1x over 3s
> HEAD w/ slow-stat-simulate-v2.patch:
>   VACUUM returns in 3s, stats collector writes each file 5x over 15s
> HEAD w/ slow-stat-simulate-v2.patch and your patch:
>   VACUUM returns in 10s, stats collector writes no files over 10s

Oh damn, the timestamp comparison in pgstat_recv_inquiry should be in
the opposite direction. After fixing that "VACUUM pg_am" completes in 3
seconds and writes each file just once.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 14afef6..e750d46 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -4836,6 +4836,20 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	}
 
 	/*
+	 * Ignore requests that are already resolved by the last write.
+	 *
+	 * We discard the queue of requests at the end of pgstat_write_statsfiles(),
+	 * so the requests already waiting on the UDP socket at that moment can't
+	 * be discarded in the previous loop.
+	 *
+	 * XXX Maybe this should also care about the clock skew, just like the
+	 * block a few lines down.
+	 */
+	dbentry = pgstat_get_db_entry(msg->databaseid, false);
+	if ((dbentry != NULL) && (msg->cutoff_time <= dbentry->stats_timestamp))
+		return;
+
+	/*
 	 * There's no request for this DB yet, so create one.
 	 */
 	newreq = palloc(sizeof(DBWriteRequest));
@@ -4852,7 +4866,6 @@ pgstat_recv_inquiry(PgStat_MsgInquiry *msg, int len)
 	 * retreat in the system clock reading could otherwise cause us to neglect
 	 * to update the stats file for a long time.
 	 */
-	dbentry = pgstat_get_db_entry(msg->databaseid, false);
 	if ((dbentry != NULL) &&

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread Haribabu Kommi

On Mon, Mar 14, 2016 at 8:44 AM, David Rowley
 wrote:
> On 12 March 2016 at 16:31, David Rowley  wrote:
>> I've attached an updated patch which is based on commit 7087166,
>> things are really changing fast in the grouping path area at the
>> moment, but hopefully the dust is starting to settle now.
>
> The attached patch fixes a harmless compiler warning about a possible
> uninitialised variable.

The setrefs.c fix for updating the finalize-aggregate target list is nice.
I tested all the float aggregates and are working fine.

Overall the patch is fine. I will do some test and provide the update later.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: Detecting SSI conflicts before reporting constraint violations

2016-03-13 Thread Thomas Munro

On Sat, Mar 12, 2016 at 1:25 AM, Kevin Grittner  wrote:
> On Thu, Mar 10, 2016 at 11:31 PM, Thomas Munro
>  wrote:
>
>> Here's a much simpler version with more comments
>
>> It handles the same set of isolation test specs.
>
> I'm impressed that you found a one-line patch that seems to get us
> 90% of the way to a new guarantee; but I think if we're going to do
> something here it should take us from one clear guarantee to
> another.  We really don't have a new guarantee here that is easy to
> express without weasel-words.  :-(

+1

> Let me see whether I can figure
> out how to cover that one permutation that is left after this
> one-liner.
>
> In terms of theory, one way to look at this is that an insert of an
> index tuple to a unique index involves a read from that index which
> finds a "gap", which in SSI is normally covered by a predicate lock
> on the appropriate part of the index.  (Currently that is done at
> the page level, although hopefully we will eventually enhance that
> to use "next-key gap locking".)  Treating the index tuple insertion
> as having an implied read of that gap is entirely justified and
> proper -- internally the read actually does happen.

Right, that is what I was trying to achieve in earlier versions
immediately after _bt_check_unique returns without error, with
something like this:

+   /*
+* By determining that there is no duplicate key, we have read an index
+* gap, so we predicate lock it.
+*/
+   PredicateLockPage(rel, buf, GetTransactionSnapshot());

But since a conflict-in check immediately follows (because we insert),
wouldn't that be redundant (ie self-created SIREAD locks are dropped
at that point)?

Next I thought that a conflict-out check on the heap tuple might be
necessary to detect a problem here.  Back in the v2 patch I was making
an explicit call to CheckForSerializationConflictOut, but I now see
that a plain old heap_fetch would do that.  I was also using the wrong
htid because I missed that the CIC special case code reassigns htid.
See the attached experimental patch which applies on top of the v4 one
liner patch, adding a call to heap_fetch the conflicting tuple.  This
way, read-write-unique-4.out permutation "r1 w1 w2 c1 c2" now raises a
40001 after c1 completes and w2 continues, and no other isolation test
result changes.  So still no cigar for "r2 w1 w2 c1 c2".  Stepping
through CheckForSerializationConflictOut when it is called before w2
reports a UCV, I see that it returns here:

if (RWConflictExists(MySerializableXact, sxact))
{
/* We don't want duplicate conflict records in the list. */
LWLockRelease(SerializableXactHashLock);
return;
}

So the RW conflict (ie one edge) is detected (and already had been?),
but not a dangerous structure (there are not two consecutive edges in
the graph? We know that we have read something written by an
overlapping transaction, but not that it has read something
[hypothetically] written by us?).  I will have another look at this in
a few days but for now I need to do some other things, so I'm posting
these observations in case they are in some way helpful...

-- 
Thomas Munro
http://www.enterprisedb.com

experiment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-13 Thread Jim Nasby


On 3/13/16 6:30 PM, Peter Geoghegan wrote:

On Sat, Mar 12, 2016 at 5:21 PM, Jeff Janes  wrote:

Would the wiki be a good place for such tips?  Not as formal as the
documentation, and more centralized (and editable) than a collection
of blog posts.


That general direction makes sense, but I'm not sure if the Wiki is
something that this will work for. I fear that it could become
something like the TODO list page: a page that contains theoretically
accurate information, but isn't very helpful. The TODO list needs to
be heavily pruned, but that seems like something that will never
happen.

A centralized location for performance tips will probably only work
well if there are still high standards that are actively enforced.
There still needs to be tight editorial control.


I think there's ways to significantly restrict who can edit a page, so 
this could probably still be done via the wiki. IMO we should also be 
encouraging users to test various tips and provide feedback, so maybe a 
wiki page with a big fat request at the top asking users to submit any 
feedback about the page to -performance.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Relation extension scalability

2016-03-13 Thread Jim Nasby


On 3/11/16 9:57 PM, Petr Jelinek wrote:

I also think some kind of clamp is a good idea. It's not that
uncommon to run max_connections significantly higher than 100, so
the extension could be way larger than 16MB. In those cases this
patch could actually make things far worse as everyone backs up
waiting on the OS to extend many MB when all you actually needed
were a couple dozen more pages.


I agree, We can have some max limit on number of extra pages, What other
thinks ?



Well, that's what I meant with clamping originally. I don't know what is
a good value though.


Well, 16MB is 2K pages, which is what you'd get if 100 connections were 
all blocked and we're doing 20 pages per waiter. That seems like a 
really extreme scenario, so maybe 4MB is a good compromise. That's 
unlikely to be hit in most cases, unlikely to put a ton of stress on IO, 
even with magnetic media (assuming the whole 4MB is queued to write in 
one shot...). 4MB would still reduce the number of locks by 500x.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] checkpointer continuous flushing - V18

2016-03-13 Thread Peter Geoghegan

On Sat, Mar 12, 2016 at 5:21 PM, Jeff Janes  wrote:
> Would the wiki be a good place for such tips?  Not as formal as the
> documentation, and more centralized (and editable) than a collection
> of blog posts.

That general direction makes sense, but I'm not sure if the Wiki is
something that this will work for. I fear that it could become
something like the TODO list page: a page that contains theoretically
accurate information, but isn't very helpful. The TODO list needs to
be heavily pruned, but that seems like something that will never
happen.

A centralized location for performance tips will probably only work
well if there are still high standards that are actively enforced.
There still needs to be tight editorial control.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [patch] Proposal for \crosstabview in psql

2016-03-13 Thread Jim Nasby


On 3/13/16 12:48 AM, Pavel Stehule wrote:

crosstabview is really visualization tool. **But now, there are not any
other tool available from terminal.** So this can be significant help to
all people who would to use this functionality.


Not just the terminal either. Offhand I'm not aware of *any* fairly 
simple tool that provides crosstab. There's a bunch of 
complicated/expensive BI tools that do, but unless you've gone through 
the trouble of getting one of those setup you're currently pretty stuck.


Ultimately I'd really like some way to remove/reduce the restriction of 
result set definitions needing to be determined at plan time. That would 
open the door for server-side crosstab/pivot as well a a host of other 
things (such as dynamically turning a hstore/json/xml field into a 
recordset).

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-13 Thread Jim Nasby


On 3/11/16 6:45 PM, Peter Geoghegan wrote:

I'll add that if people like the interface you propose. (Overloading
the VACUUM cost delay functions to cause a delay for amcheck
functions, too).


I thought that had already been overloaded by CIC, but I'm not finding 
reference to it... ANALYZE does use it though, so the ship has already 
sorta sailed.


I'm actually a bit surprised cost delay isn't used anywhere else. As 
more background operations are added I suspect users will want it at 
some point.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: PATCH: Split stats file per database WAS: autovacuum stress-testing our system

2016-03-13 Thread Noah Misch

On Thu, Mar 03, 2016 at 06:08:07AM +0100, Tomas Vondra wrote:
> On 02/03/2016 06:46 AM, Noah Misch wrote:
> >On Mon, Feb 01, 2016 at 07:03:45PM +0100, Tomas Vondra wrote:
> >>On 12/22/2015 03:49 PM, Noah Misch wrote:
> >>>On Mon, Feb 18, 2013 at 06:19:12PM -0300, Alvaro Herrera wrote:
> I have pushed it now.  Further testing, of course, is always welcome.
> >>>
> >>>While investigating stats.sql buildfarm failures, mostly on animals
> >>>axolotl and shearwater, I found that this patch (commit 187492b)
> >>>inadvertently removed the collector's ability to coalesce inquiries.
> >>>Every PGSTAT_MTYPE_INQUIRY received now causes one stats file write.
> >>>Before, pgstat_recv_inquiry() did:
> >>>
> >>>   if (msg->inquiry_time > last_statrequest)
> >>>   last_statrequest = msg->inquiry_time;
> >>>
> >>>and pgstat_write_statsfile() did:
> >>>
> >>>   globalStats.stats_timestamp = GetCurrentTimestamp();
> >>>... (work of writing a stats file) ...
> >>>   last_statwrite = globalStats.stats_timestamp;
> >>>   last_statrequest = last_statwrite;
> >>>
> >>>If the collector entered pgstat_write_statsfile() with more inquiries
> >>>waiting in its socket receive buffer, it would ignore them as being too
> >>>old once it finished the write and resumed message processing. Commit
> >>>187492b converted last_statrequest to a "last_statrequests" list that we
> >>>wipe after each write.
> 
> So I've been looking at this today, and I think the attached patch should do
> the trick. I can't really verify it, because I've been unable to reproduce the
> non-coalescing - I presume it requires much slower system (axolotl is RPi, not
> sure about shearwater).
> 
> The patch simply checks DBEntry,stats_timestamp in pgstat_recv_inquiry() and
> ignores requests that are already resolved by the last write (maybe this
> should accept a file written up to PGSTAT_STAT_INTERVAL in the past).
> 
> The required field is already in DBEntry (otherwise it'd be impossible to
> determine if backends need to send inquiries or not), so this is pretty
> trivial change. I can't think of a simpler patch.
> 
> Can you try applying the patch on a machine where the problem is
> reproducible? I might have some RPi machines laying around (for other
> purposes).

I've not attempted to study the behavior on slow hardware.  Instead, my report
used stat-coalesce-v1.patch[1] to simulate slow writes.  (That diagnostic
patch no longer applies cleanly, so I'm attaching a rebased version.  I've
changed the patch name from "stat-coalesce" to "slow-stat-simulate" to
more-clearly distinguish it from the "pgstat-coalesce" patch.)  Problems
remain after applying your patch; consider "VACUUM pg_am" behavior:

9.2 w/ stat-coalesce-v1.patch:
  VACUUM returns in 3s, stats collector writes each file 1x over 3s
HEAD w/ slow-stat-simulate-v2.patch:
  VACUUM returns in 3s, stats collector writes each file 5x over 15s
HEAD w/ slow-stat-simulate-v2.patch and your patch:
  VACUUM returns in 10s, stats collector writes no files over 10s


[1] 
http://www.postgresql.org/message-id/20151222144950.ga2553...@tornado.leadboat.com
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 14afef6..4308df2 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -3444,6 +3444,7 @@ pgstat_send_bgwriter(void)
 NON_EXEC_STATIC void
 PgstatCollectorMain(int argc, char *argv[])
 {
+   unsigned total = 0;
int len;
PgStat_Msg  msg;
int wr;
@@ -3555,6 +3556,10 @@ PgstatCollectorMain(int argc, char *argv[])
 errmsg("could not read 
statistics message: %m")));
}
 
+   elog(LOG, "stats %d: %u + %u = %u",
+msg.msg_hdr.m_type, total, len, total + len);
+   total += len;
+
/*
 * We ignore messages that are smaller than our common 
header
 */
@@ -3947,6 +3952,13 @@ pgstat_write_statsfiles(bool permanent, bool allDbs)
 */
fputc('E', fpout);
 
+   if (1)
+   {
+   PG_SETMASK(&BlockSig);
+   pg_usleep(3 * 100L);
+   PG_SETMASK(&UnBlockSig);
+   }
+
if (ferror(fpout))
{
ereport(LOG,
diff --git a/src/test/regress/expected/stats.out 
b/src/test/regress/expected/stats.out
index a811265..064cf9f 100644
--- a/src/test/regress/expected/stats.out
+++ b/src/test/regress/expected/stats.out
@@ -63,6 +63,9 @@ begin
 SELECT (n_tup_ins > 0) INTO updated3
   FROM pg_stat_user_tables WHERE relname='trunc_stats_test';
 
+raise log 'stats updated as of % snapshot: 1:% 2:% 3:%',
+  pg_stat_get_snapshot_timestamp(), updated1, updated2, updated3;
+
 exit when updated1 and updated2 and updated3;
 
 -- wait a little
diff --git a/src/test/regress/

Re: [HACKERS] improving GROUP BY estimation

2016-03-13 Thread Tomas Vondra

Hi,

On Sun, 2016-03-13 at 15:24 +, Dean Rasheed wrote:
> On 4 March 2016 at 13:10, Tomas Vondra 
> wrote:
> > 
> > The risk associated with over-estimation is that we may switch from
> > HashAggregate to GroupAggregate, and generally select paths better
> > suited for large number of rows. Not great, because the plan may
> > not be
> > optimal and thus run much slower - but that usually only happens on
> > small tables, because on large tables we would probably end up
> > using
> > those same paths anyway.
> > 
> > With under-estimation, the main risks are much graver, as we may
> > end up
> > using HashAggregate only to get killed by OOM. When we're lucky not
> > to
> > hit that, my experience this often leads to a cascade of NestedLoop
> > nodes processing orders of magnitude more tuples than expected.
> > Awful.
> > 
> > So I think the risk is asymmetric, and that's why I like the new
> > estimator more, because it tends to over-estimate, but in a very
> > reasonable way.
> > 
> Agreed. Under-estimating is worse than over-estimating.
> 
> 
> -   reldistinct *= rel->rows / rel->tuples;
> +   reldistinct *= (1 - powl((reldistinct - 1) / reldistinct,
> rel->rows)
> 
> Looking at this, I agree that this new formula from [1] is generally
> better than the old one in most (but not all) cases.
> 
> One problem with it is that it no longer takes into account
> rel->tuples, which means that when returning all the tuples from the
> table, it no longer gives an estimate equal to the total number of
> distinct values in the table. In fact it tends to underestimate when
> returning nearly all the rows from the table.

Yes, that's a good point.

> 
> The old formula is clearly not a good general-purpose estimate.
> However, there is one case where it does return an accurate estimate
> -- when all the values in the underlying table are distinct. In this
> case the new formula consistently produces underestimates, while the
> old formula works well. For example:
> 
> CREATE TABLE t AS SELECT (1 * random())::int a,
>  i::int b
> FROM generate_series(1,100) s(i);
> ANALYSE t;
> 
> EXPLAIN ANALYSE
> SELECT a FROM t GROUP BY a;
> 
> So there are clearly around 1 million distinct values for "a", but
> the new formula gives an estimate of around 630,000. That's not a
> massive underestimate, but it's sufficiently obvious and visible that
> it would be good to do better if we can.
> 
> 
> I think that a better formula to use would be
> 
> reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples /
> reldistinct)
> 
> This comes from [2], which gives a general formula for selection
> without replacement, and then gives the above as an approximation to
> the uniform distribution case. It's not really made clear in that
> paper, but that approximation is actually the "with replacement"
> approximation, but starting from different initial assumptions to
> give a formula that has better boundary conditions and special-case
> behaviour, as described below.
> 
> ...
> 
> For most values of N and n, the approximation from [2] is almost
> indistinguishable from the exact formula, whereas the formula from
> [1] tends to underestimate when selecting a large number of distinct
> values (e.g., try setting n=90 in the plot above).

Yes, I agree that formula you propose seems to behave better.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql metaqueries with \gexec

2016-03-13 Thread Jim Nasby

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Still needs documentation.

The new status of this patch is: Waiting on Author

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql metaqueries with \gexec

2016-03-13 Thread Jim Nasby


On 2/22/16 1:01 PM, Corey Huinker wrote:

In the mean time, update patch attached.


Really attached this time.


I'm getting a warning from this patch:

common.c:947:8: warning: variable 'success' is used uninitialized 
whenever 'if' condition is true [-Wsometimes-uninitialized]

if (pset.gexec_flag)
^~~
common.c:995:9: note: uninitialized use occurs here
return success;
   ^~~
common.c:947:4: note: remove the 'if' if its condition is always false
if (pset.gexec_flag)
^~~~
common.c:937:15: note: initialize the variable 'success' to silence this 
warning

boolsuccess;
   ^
= '\0'
1 warning generated.

(note that I'm using CC='ccache clang -Qunused-arguments 
-fcolor-diagnostics')



for (r = 0; r < nrows; r++)
{
for (c = 0; c < ncolumns; c++)
{

etc...

Normally we don't use gratuitous {'s, and I don't think it's helping 
anything in this case. But I'll let whoever commits this decide.



diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 5f27120..0f87f29 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -1280,8 +1280,8 @@ psql_completion(const char *text, int start, int end)
"\\dm", "\\dn", "\\do", "\\dO", "\\dp", "\\drds", "\\ds", 
"\\dS",
"\\dt", "\\dT", "\\dv", "\\du", "\\dx", "\\dy",
"\\e", "\\echo", "\\ef", "\\encoding", "\\ev",
-   "\\f", "\\g", "\\gset", "\\h", "\\help", "\\H", "\\i", "\\ir", 
"\\l",
-   "\\lo_import", "\\lo_export", "\\lo_list", "\\lo_unlink",
+   "\\f", "\\g", "\\gexec", "\\gset", "\\h", "\\help", "\\H", "\\i", 
"\\ir",
+   "\\l", "\\lo_import", "\\lo_export", "\\lo_list", "\\lo_unlink",


FWIW, it's generally better to leave that kind of re-wrapping to the 
next pg_indent run.


I added tests for ON_ERROR_STOP. New patch attached.

The patch still needs to document this feature in the psql docs (and 
maybe the manpage? not sure how that's generated...)

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 9750a5b..5ca769f 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -849,6 +849,13 @@ exec_command(const char *cmd,
status = PSQL_CMD_SEND;
}
 
+   /* \gexec -- send query and treat every result cell as a query to be 
executed */
+   else if (strcmp(cmd, "gexec") == 0)
+   {
+   pset.gexec_flag = true;
+   status = PSQL_CMD_SEND;
+   }
+
/* \gset [prefix] -- send query and store result into variables */
else if (strcmp(cmd, "gset") == 0)
{
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
index 2cb2e9b..54b7790 100644
--- a/src/bin/psql/common.c
+++ b/src/bin/psql/common.c
@@ -710,6 +710,46 @@ StoreQueryTuple(const PGresult *result)
return success;
 }
 
+/*
+ * ExecQueryTuples: assuming query result is OK, execute every query
+ * result as its own statement
+ *
+ * Returns true if successful, false otherwise.
+ */
+static bool
+ExecQueryTuples(const PGresult *result)
+{
+   boolsuccess = true;
+   int nrows = PQntuples(result);
+   int ncolumns = PQnfields(result);
+   int r, c;
+
+   for (r = 0; r < nrows; r++)
+   {
+   for (c = 0; c < ncolumns; c++)
+   {
+   if (! PQgetisnull(result, r, c))
+   {
+   if ( ! SendQuery(PQgetvalue(result, r, c)) )
+   {
+   if (pset.on_error_stop)
+   {
+   return false;
+   }
+   else
+   {
+   success = false;
+   }
+   }
+   }
+   }
+   }
+
+   /* Return true if all queries were successful */
+   return success;
+}
+
+
 
 /*
  * ProcessResult: utility function for use by SendQuery() only
@@ -903,8 +943,14 @@ PrintQueryResults(PGresult *results)
switch (PQresultStatus(results))
{
case PGRES_TUPLES_OK:
-   /* store or print the data ... */
-   if (pset.gset_prefix)
+   /* execute or store or print the data ... */
+   if

Re: [HACKERS] Parallel Aggregate

2016-03-13 Thread David Rowley

On 12 March 2016 at 16:31, David Rowley  wrote:
> I've attached an updated patch which is based on commit 7087166,
> things are really changing fast in the grouping path area at the
> moment, but hopefully the dust is starting to settle now.

The attached patch fixes a harmless compiler warning about a possible
uninitialised variable.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


parallel_aggregation_015971a_2016-03-14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Move PinBuffer and UnpinBuffer to atomics

2016-03-13 Thread Alexander Korotkov

On Fri, Mar 11, 2016 at 7:08 AM, Dilip Kumar  wrote:

>
> On Thu, Mar 10, 2016 at 8:26 PM, Alexander Korotkov <
> a.korot...@postgrespro.ru> wrote:
>
>> I don't think we can rely on median that much if we have only 3 runs.
>> For 3 runs we can only apply Kornfeld method which claims that confidence
>> interval should be between lower and upper values.
>> Since confidence intervals for master and patched versions are
>> overlapping we can't conclude that expected TPS numbers are different.
>> Dilip, could you do more runs? 10, for example. Using such statistics we
>> would be able to conclude something.
>>
>
> Here is the reading for 10 runs
>
>
> Median Result
> ---
>
> Client   Base  Patch
> ---
> 1  1987319739
> 2  3865838276
> 4  6881262075
>
> Full Results of 10 runs...
>
>  Base
> -
>  Runs  1 Client2 Client  4 Client
> -
> 1194423486649023
> 2196053513951695
> 3197263710453899
> 4198353843955708
> 5198663863867473
> 6198803867970152
> 7199733872070920
> 8200483873771342
> 9200573881571403
> 10  203444142377953
> -
>
>
> Patch
> ---
> Runs  1 Client 2 Client  4 Client
> --
> 1188813016154928
> 2194153203159741
> 3195643502261060
> 4196273671261839
> 5196703765962011
> 6198083889462139
> 7198573908162983
> 8199103992375358
> 9201693993777481
> 10  201814000378462
> --
>

I've drawn graphs for these measurements. The variation doesn't look random
here.  TPS is going higher from measurement to measurement.  I bet you did
measurements sequentially.
I think we should do more measurements until TPS stop growing and beyond to
see accumulate average statistics.  Could you please do the same tests but
50 runs.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Recovery test failure for recovery_min_apply_delay on hamster

2016-03-13 Thread Michael Paquier

On Wed, Mar 9, 2016 at 9:05 PM, Michael Paquier
 wrote:
> On Wed, Mar 9, 2016 at 12:29 PM, Alvaro Herrera
>  wrote:
>> Michael Paquier wrote:
>>> After sleeping (best debugger ever) on that, actually a way popped up
>>> in my mind, and I propose the attached, which refactors a bit 005 and
>>> checks that the LSN position of master has been applied on standby
>>> after at least the delay wanted. A maximum delay of 90s is authorized,
>>> like poll_query_until.
>>
>> Hmm, okay, that's great.  A question: what happens if the test itself is
>> slow and the servers are fast, and the test doesn't manage to run two
>> iterations before the two seconds have elapsed?  This may happen on
>> overloaded or slow servers, if you're unlucky.
>
> Yes, a failure would happen. The same thought occurred to me during a
> long flight. And this is why the previous patch was full of meh.
>
>> I don't have any ideas on ensuring that we don't apply earlier than the
>> given period at the moment.
>
> Attached is one, which is based on timestamp values queried from the
> standby server. We could use as well perl's localtime call to
> calculate the time delay.

Actually, the attached is better. This one relies on time() to perform
the delay checks, and takes care of things even for slow machines.
-- 
Michael
diff --git a/src/test/recovery/t/005_replay_delay.pl b/src/test/recovery/t/005_replay_delay.pl
index 986851b..78bcff6 100644
--- a/src/test/recovery/t/005_replay_delay.pl
+++ b/src/test/recovery/t/005_replay_delay.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 2;
+use Test::More tests => 1;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -28,22 +28,41 @@ recovery_min_apply_delay = '2s'
 ));
 $node_standby->start;
 
-# Make new content on master and check its presence in standby
-# depending on the delay of 2s applied above.
+# Make new content on master and check its presence in standby depending
+# on the delay of 2s applied above. Before doing the insertion, get the
+# current timestamp that will be used as a comparison base. Even on slow
+# machines, this allows to have a predictable behavior when comparing the
+# delay between data insertion moment on master and replay time on standby.
+my $standby_insert_time = time();
 $node_master->safe_psql('postgres',
 	"INSERT INTO tab_int VALUES (generate_series(11,20))");
-sleep 1;
 
-# Here we should have only 10 rows
-my $result = $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
-is($result, qq(10), 'check content with delay of 1s');
-
-# Now wait for replay to complete on standby
+# Now wait for replay to complete on standby. We leave once check the
+# previously saved LSN of master has been applied, and then compare the
+# insertion timestamp with the current one.
 my $until_lsn =
   $node_master->safe_psql('postgres', "SELECT pg_current_xlog_location();");
-my $caughtup_query =
-  "SELECT '$until_lsn'::pg_lsn <= pg_last_xlog_replay_location()";
-$node_standby->poll_query_until('postgres', $caughtup_query)
-  or die "Timed out while waiting for standby to catch up";
-$result = $node_standby->safe_psql('postgres', "SELECT count(*) FROM tab_int");
-is($result, qq(20), 'check content with delay of 2s');
+
+my $max_attempts = 90;
+my $attempts = 0;
+while ($attempts < $max_attempts)
+{
+	my $replay_status =
+	  $node_standby->safe_psql('postgres',
+		"SELECT (pg_last_xlog_replay_location() - '$until_lsn'::pg_lsn) >= 0;");
+
+	# Leave now if standby has replayed at least up to the point of
+	# master. It's time to see if data has been applied after the wanted
+	# delay.
+	last if $replay_status eq 't';
+
+	sleep 1;
+	$attempts++;
+}
+
+die "Maximum number of attempts reached" if $attempts >= $max_attempts;
+
+# This test is valid only if LSN has been applied with at least
+# the minimum apply delay expected.
+ok(time() - $standby_insert_time >= 2,
+   'Check if WAL is applied on standby after delay of 2s');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] memory leak in GIN

2016-03-13 Thread Tom Lane

Jeff Janes  writes:
> I bisected it down to:

> d88976cfa1302e8dccdcbfe55e9e29faee8c0cdf is the first bad commit
> commit d88976cfa1302e8dccdcbfe55e9e29faee8c0cdf
> Author: Heikki Linnakangas 
> Date:   Wed Feb 4 17:40:25 2015 +0200

> Use a separate memory context for GIN scan keys.

Yeah, I had come to the same conclusion.  The leak comes from removing
this bit from ginFreeScanKeys():

-   if (entry->list)
-   pfree(entry->list);

Heikki evidently supposed that entry->list would be allocated in
the so->keyCtx, but as things stand, it is not: it gets allocated
in the query-lifespan context, so this change causes a leak of
potentially several kB per ginrescan cycle.  And the test query
does about 12 of those.

I think it would likely be a good thing to fix things so that that
assumption actually holds, but I'm not familiar enough with this code
to decide what's the best way to do that.  (Basically, the question is
"how much of startScanEntry() ought to run with keyCtx as current memory
context?")  As a short-term fix I plan to reinstall the above-cited pfree.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] amcheck (B-Tree integrity checking tool)

2016-03-13 Thread Peter Geoghegan

On Sat, Mar 12, 2016 at 7:46 PM, Matt Kelly  wrote:
> You can actually pretty easily produce a test case by setting up streaming
> replication between servers running two different version of glibc.
>
> I actually wrote a tool that spins up a pair of VMs using vagrant and then
> sets them up as streaming replica's using ansible.  It provides a nice one
> liner to get a streaming replica test environment going and it will easily
> provide the cross glibc test case.  Technically, though it belongs to Trip
> because I wrote it on company time.  Let me see if I can open source a
> version of it later this week that way you can use it for testing.

That could be interesting. The earlier prototypes of this tool are
known to have detected glibc collation incompatibilities in real
production systems.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Sanity checking for ./configure options?

2016-03-13 Thread Jim Nasby


On 2/26/16 9:29 PM, Peter Eisentraut wrote:

Your code and comments suggest that you can specify the port to
configure by setting PGPORT, but that is not the case.

test == is not portable (bashism).

Error messages should have consistent capitalization.

Indentation in configure is two spaces.


>As the comment states, it doesn't catch things like --with-pgport=1a in
>configure, but the compile error you get with that isn't too hard to
>figure out, so I think it's OK.

Passing a non-integer as argument will produce an error message like
(depending on shell)

./configure: line 3107: test: 11a: integer expression expected

but will not actually abort configure.

It would work more robustly if you did something like this

elif test "$default_port" -ge "1" -a "$default_port" -le "65535"; then
   :
else
   AC_MSG_ERROR([port must be between 1 and 65535])
fi

but that still leaks the shell's error message.

There is also the risk of someone specifying a number with a leading
zero, which C would interpret as octal but the shell would not.


All issues should now be addressed.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/configure b/configure
index b3f3abe..e7bddba 100755
--- a/configure
+++ b/configure
@@ -3099,6 +3099,16 @@ cat >>confdefs.h <<_ACEOF
 _ACEOF
 
 
+# It's worth testing for this because it creates a very confusing error
+if test "$default_port" = ""; then
+  as_fn_error $? "invalid empty string supplied with --with-pgport" "$LINENO" 5
+elif test ! `echo $default_port | sed -e 's/[0-9]//g'` = ''; then
+  as_fn_error $? "invalid port specification; must be a number" "$LINENO" 5
+elif test ! `echo $default_port | sed -e 's/^0//g'` = $default_port; then
+  as_fn_error $? "illegal leading 0 specified with --with-pgport" "$LINENO" 5
+elif test "$default_port" -lt "1" -o "$default_port" -gt "65535"; then
+  as_fn_error $? "port must be between 1 and 65535" "$LINENO" 5
+fi
 
 #
 # '-rpath'-like feature can be disabled
diff --git a/configure.in b/configure.in
index 0bd90d7..db6e2a0 100644
--- a/configure.in
+++ b/configure.in
@@ -164,6 +164,16 @@ but it's convenient if your clients have the right default 
compiled in.
 AC_DEFINE_UNQUOTED(DEF_PGPORT_STR, "${default_port}",
[Define to the default TCP port number as a string 
constant.])
 AC_SUBST(default_port)
+# It's worth testing for this because it creates a very confusing error
+if test "$default_port" = ""; then
+  AC_MSG_ERROR([invalid empty string supplied with --with-pgport])
+elif test ! `echo $default_port | sed -e 's/[[0-9]]//g'` = ''; then
+  AC_MSG_ERROR([invalid port specification; must be a number])
+elif test ! `echo $default_port | sed -e 's/^0//g'` = $default_port; then
+  AC_MSG_ERROR([illegal leading 0 specified with --with-pgport])
+elif test "$default_port" -lt "1" -o "$default_port" -gt "65535"; then
+  AC_MSG_ERROR([port must be between 1 and 65535])
+fi
 
 #
 # '-rpath'-like feature can be disabled

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve error handling in pltcl

2016-03-13 Thread Jim Nasby


On 3/3/16 8:51 AM, Pavel Stehule wrote:

Hi

I am testing behave, and some results looks strange


Thanks for the review!


postgres=# \sf foo
CREATE OR REPLACE FUNCTION public.foo()
  RETURNS void
  LANGUAGE plpgsql
AS $function$
begin
   raise exception sqlstate 'ZZ666' using message='hello, world',
detail='hello, my world', hint = 'dont afraid';
end
$function$

postgres=# select tcl_eval('spi_exec "select foo();"');
ERROR:  38000: hello, world
CONTEXT:  hello, world   ==???

the message was in context. Probably it is out of scope of this patch,
but it isn't consistent with other PL


 while executing
"spi_exec "select foo();""
 ("eval" body line 1)
 invoked from within
"eval $1"
 (procedure "__PLTcl_proc_16864" line 3)
 invoked from within
"__PLTcl_proc_16864 {spi_exec "select foo();"}"
in PL/Tcl function "tcl_eval"
LOCATION:  throw_tcl_error, pltcl.c:1217
Time: 1.178 ms


Both problems actually exists in HEAD. The issue is this line in 
throw_tcl_error:


econtext = utf_u2e(Tcl_GetVar(interp, "errorInfo", TCL_GLOBAL_ONLY));

Offhand I don't see any great way to improve that behavior, and in any 
case it seems out of scope for this patch. As a workaround I'm just 
forcing psql error VERBOSITY to terse for now.



postgres=# select tcl_eval('join $::errorCode "\n"');
 tcl_eval
═
  POSTGRES   ↵
  message↵
  hello, world   ↵
  detail ↵
  hello, my world↵
  hint   ↵
  dont afraid↵
  domain ↵
  plpgsql-9.6↵
  context_domain ↵
  postgres-9.6   ↵
  context↵
  PL/pgSQL function foo() line 3 at RAISE↵
  SQL statement "select foo();"  ↵
  cursor_position↵
  0  ↵
  filename   ↵
  pl_exec.c  ↵
  lineno ↵
  3165   ↵
  funcname   ↵
  exec_stmt_raise
(1 row)

I miss a SQLSTATE.


Great catch. Fixed.


Why is used List object instead dictionary? TCL supports it
https://www.tcl.tk/man/tcl8.5/tutorial/Tcl23a.html


Because errorCode unfortunately is an array and not a dict. It doesn't 
really seem worth messing with it in the eval since this is just a 
sanity check...


New patch attached. It also removes some other unstable output from the 
regression test.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
diff --git a/doc/src/sgml/pltcl.sgml b/doc/src/sgml/pltcl.sgml
index d2175d5..d5c576d 100644
--- a/doc/src/sgml/pltcl.sgml
+++ b/doc/src/sgml/pltcl.sgml
@@ -775,6 +775,127 @@ CREATE EVENT TRIGGER tcl_a_snitch ON ddl_command_start 
EXECUTE PROCEDURE tclsnit
 

 
+   
+Error Handling in PL/Tcl
+
+
+ error handling
+ in PL/Tcl
+
+
+
+ All Tcl errors that are allowed to propagate back to the top level of the
+ interpreter, that is, errors not caught within the stored procedure
+ using the Tcl catch command will raise a database
+ error.
+
+
+ Tcl code within or called from the stored procedure can choose to
+ raise a database error by invoking the elog
+ command provided by PL/Tcl or by generating an error using the Tcl
+ error command and not catching it with Tcl's
+ catch command.
+
+
+ Database errors that occur from the PL/Tcl stored procedure's
+ use of spi_exec, spi_prepare,
+ and spi_execp are also catchable by Tcl's
+ catch command.
+
+
+ Tcl provides an errorCode variable that can
+ represent additional information about the error in a form that
+ is easy for programs to interpret.  The contents are in Tcl list
+ format and the first word identifies the subsystem or
+ library responsible for the error and beyond that the contents are left
+ to the individual code or library.  For example if Tcl's
+ open command is asked to open a file that doesn't
+ exist, errorCode
+ might contain POSIX ENOENT {no such file or directory}
+ where the third element may vary by locale but the first and second
+ will not.
+
+
+ When spi_exec, spi_prepare
+ or spi_execp cause a database error to be raised,
+ that database eror propagates back to Tcl as a Tcl error.
+ In this case errorCode is set to a list
+ where the first element is POSTGRES followed by a
+ copious decoding of the Postgres error structure.  Since fields in the
+ structure may or may not be present depending on t

[HACKERS] [PATCH] Obsolete wording in PL/Perl comment

2016-03-13 Thread Dagfinn Ilmari Mannsåker


The comment in hv_store_string() says that negative key length to
hv_store() for UTF-8 is not documented, and mentions that 5.6 doesn't
track UTF-8-ness of keys.  However, the negative length convention has
been documented since 5.16¹, and 5.6 is no longer supported.  The
attached patch updates the comment to reflect this.

[1]: http://perldoc.perl.org/perlapi.html#hv_store

-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law

>From 9dfe4c076b5040557dec694dee91701438d658fa Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Sun, 13 Mar 2016 02:46:45 +
Subject: [PATCH] Fix obsolete wording in PL/Perl hv_store_string comment

Negative klen is documented since Perl 5.16, and 5.6 is no longer
supported.
---
 src/pl/plperl/plperl.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/pl/plperl/plperl.c b/src/pl/plperl/plperl.c
index 269f7f3..a659695 100644
--- a/src/pl/plperl/plperl.c
+++ b/src/pl/plperl/plperl.c
@@ -3911,10 +3911,8 @@ hv_store_string(HV *hv, const char *key, SV *val)
 	hkey = pg_server_to_any(key, strlen(key), PG_UTF8);
 
 	/*
-	 * This seems nowhere documented, but under Perl 5.8.0 and up, hv_store()
-	 * recognizes a negative klen parameter as meaning a UTF-8 encoded key. It
-	 * does not appear that hashes track UTF-8-ness of keys at all in Perl
-	 * 5.6.
+	 * hv_store() recognizes a negative klen parameter as meaning a UTF-8
+	 * encoded key
 	 */
 	hlen = -(int) strlen(hkey);
 	ret = hv_store(hv, hkey, hlen, val, 0);
-- 
2.7.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] memory leak in GIN

2016-03-13 Thread Jeff Janes

On Fri, Mar 11, 2016 at 11:40 PM, Jaime Casanova
 wrote:
> Hi,
>
> On the spanish list, Felipe de Jesús Molina Bravo, reported a few days
> back that a query that worked well in 9.4 consume all memory in 9.5.
> With the self contained test he provided us i reproduced the problem
> in 9.5 and 9.6dev.
>
> To test, execute:
>
> pba.sql -- to create the tables and populate
> query_crash.sql -- this will consume all your memory and crash your
> server eventually
>
> If you drop the GIN indexes, the problem disappear.
>
> I used valgrind to try to hunt the memory leak, attached the resulting
> log showing the backend that executed the query. And from the little a
> could say from the stack trace valgrind showed, the problem is around
> ginPostingListDecodeAllSegments() but i don't see any modifications
> there.

I bisected it down to:

d88976cfa1302e8dccdcbfe55e9e29faee8c0cdf is the first bad commit
commit d88976cfa1302e8dccdcbfe55e9e29faee8c0cdf
Author: Heikki Linnakangas 
Date:   Wed Feb 4 17:40:25 2015 +0200

Use a separate memory context for GIN scan keys.

It was getting tedious to track and release all the different things that
form a scan key. We were leaking at least the queryCategories array, and
possibly more, on a rescan. That was visible if a GIN index was used in a
nested loop join. This also protects from leaks in extractQuery method.

No backpatching, given the lack of complaints from the field. Maybe later,
after this has received more field testing.

I won't have a chance to do any further analysis for a while.

Cheers,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC: Partial sort

2016-03-13 Thread Alexander Korotkov

Hi!

Tom committed upper planner pathification patch.
Partial sort patch rebased to master is attached.  It was quite huge rebase
in planner part of the patch.  But I think now patch becomes better, much
more logical.
It's probably, something was missed after rebase.  I'm going to examine
this path more carefully next week.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


partial-sort-basic-7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] improving GROUP BY estimation

2016-03-13 Thread Dean Rasheed

On 4 March 2016 at 13:10, Tomas Vondra  wrote:
> The risk associated with over-estimation is that we may switch from
> HashAggregate to GroupAggregate, and generally select paths better
> suited for large number of rows. Not great, because the plan may not be
> optimal and thus run much slower - but that usually only happens on
> small tables, because on large tables we would probably end up using
> those same paths anyway.
>
> With under-estimation, the main risks are much graver, as we may end up
> using HashAggregate only to get killed by OOM. When we're lucky not to
> hit that, my experience this often leads to a cascade of NestedLoop
> nodes processing orders of magnitude more tuples than expected. Awful.
>
> So I think the risk is asymmetric, and that's why I like the new
> estimator more, because it tends to over-estimate, but in a very
> reasonable way.
>

Agreed. Under-estimating is worse than over-estimating.

-   reldistinct *= rel->rows / rel->tuples;
+   reldistinct *= (1 - powl((reldistinct - 1) / reldistinct, rel->rows)

Looking at this, I agree that this new formula from [1] is generally
better than the old one in most (but not all) cases.

One problem with it is that it no longer takes into account
rel->tuples, which means that when returning all the tuples from the
table, it no longer gives an estimate equal to the total number of
distinct values in the table. In fact it tends to underestimate when
returning nearly all the rows from the table.

The old formula is clearly not a good general-purpose estimate.
However, there is one case where it does return an accurate estimate
-- when all the values in the underlying table are distinct. In this
case the new formula consistently produces underestimates, while the
old formula works well. For example:

CREATE TABLE t AS SELECT (1 * random())::int a,
 i::int b
FROM generate_series(1,100) s(i);
ANALYSE t;

EXPLAIN ANALYSE
SELECT a FROM t GROUP BY a;

So there are clearly around 1 million distinct values for "a", but the
new formula gives an estimate of around 630,000. That's not a massive
underestimate, but it's sufficiently obvious and visible that it would
be good to do better if we can.

I think that a better formula to use would be

reldistinct *= (1 - powl(1 - rel-rows / rel->tuples, rel->tuples / reldistinct)

This comes from [2], which gives a general formula for selection
without replacement, and then gives the above as an approximation to
the uniform distribution case. It's not really made clear in that
paper, but that approximation is actually the "with replacement"
approximation, but starting from different initial assumptions to give
a formula that has better boundary conditions and special-case
behaviour, as described below.

For the record, here's a quick analysis of how the 2 formulae come about:

Assume there are:
  N rows in the table
  n distinct values (n <= N)
  p rows are chosen at random (p <= N)

so the problem is to estimate how many distinct values there will be
in the p rows chosen. Both approaches work by first estimating the
probability that some particular value x is *not* chosen.

[1] starts from the assumption that each row of the table has a
probability of 1/n of being equal to x. So the probability that x is
not the first value chosen is

  P(not x, 1) = 1 - 1/n

Then, if the value chosen first is replaced, the probability that x is
not the second value chosen is the same. The "with replacement"
approximation makes each choice an independent probability, so the
overall probability that x is not in any of the p rows chosen is just
the product of the p individual probabilities, which is just

  P(not x, p) = (1 - 1/n)^p

Then of course the probability that x *is* chosen at least once in the p rows is

  P(x, p) = 1 - (1 - 1/n)^p

Finally, since there are n possible values of x, and they're all
equally likely in the table, the expected number of distinct values is

  D(p) = n * (1 - (1 - 1/n)^p)

The flaw in this approach is that for large p the "with replacement"
approximation gets worse and worse, and the probabilities P(x, p)
systematically under-estimate the actual probability that x is chosen,
which increases as more values not equal to x are chosen. By the time
the last value is chosen P(x, p=N) should actually be 1.

[2] starts from a different initial assumption (uniform distribution
case) -- for any given value x, assume that the table contains N/n
instances of x (ignoring the fact that that might not be an integer).
Note that with this assumption there is guaranteed to be at least one
instance of x, which is not the case with the above approach.

Now consider the first instance of x in the table. If we choose p rows
from the table, the probability that that first instance of x is not
chosen is

  P(not x[1], p) = 1 - p / N

Making the same "with replacement" simplification, the probability
that the second instance of x is not chose

Re: [HACKERS] [GENERAL] OS X 10.11.3, psql, bus error 10, 9.5.1

2016-03-13 Thread Greg Stark

On 12 Mar 2016 10:58 pm, "Tom Lane"  wrote:
>
> I wrote:
> > That's confusing because it implies that -fno-common is the default,
> > which it evidently is not.  But anyway, my diagnosis is that you're
> > breaking something about the linker's behavior with that switch.

> I shall get rid of the const-ness, as well as the lame casting away
> of it, and I think I will also go make buildfarm member longfin use
> "-fno-common".  It is truly sad that we apparently have no test
> machine that enforces that const means const ...

It looks like this would also be useful for catching any mistakes where we
might have defined a variable in two different files or missed an "extern".
Though I'm not sure the failures will be very obvious.

Re: [HACKERS] auto_explain sample rate

2016-03-13 Thread Magnus Hagander

On Sat, Mar 12, 2016 at 3:20 PM, Julien Rouhaud 
wrote:

> On 11/03/2016 17:55, Robert Haas wrote:
> > On Fri, Mar 11, 2016 at 11:33 AM, Tomas Vondra
> >  wrote:
> >> A bit late, but I think we should rename the GUC variable to
> >> "sampling_rate" (instead of sample_ratio) as that's what pgbench uses
> >> for the same thing. That'd be more consistent.
> >
> > I like that idea.  It seems like slightly better terminology.
> >
>
> I like it too. I also just noticed that I duplicated the var type by
> mistake in the documentation :/
>
> Attached patch fixes both.
>
>
I also agree, and thanks for doing the work :) Applied!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Refectoring of receivelog.c

2016-03-13 Thread Magnus Hagander

On Sun, Mar 13, 2016 at 12:15 AM, Tomas Vondra  wrote:

> On 03/11/2016 11:15 AM, Magnus Hagander wrote:
>
>>
>> ...
>
>>
>>
>> Pushed with updated comments and a named stsruct.
>>
>
> Pretty sure this memset call in pg_basebackup.c is incorrect, as it passes
> parameters in the wrong order:
>
> MemSet(&stream, sizeof(stream), 0);
>
> It seems benign, because we're setting all the fields explicitly, but gcc
> is nagging about it.
>

Indeed,that's backwards. Interestingly enough I thought I did a c&p between
that and the one in pg_receivexlog.c, but clearly I did not. And my gcc did
not nag about it.

Fixed, thanks for the pointer!

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] Performance improvement for joins where outer side is unique

2016-03-13 Thread David Rowley

On 12 March 2016 at 11:43, Tom Lane  wrote:
> I wrote:
>> I wondered why, instead of inventing an extra semantics-modifying flag,
>> we couldn't just change the jointype to *be* JOIN_SEMI when we've
>> discovered that the inner side is unique.
>
> BTW, to clarify: I'm not imagining that we'd make this change in the
> query jointree, as for example prepjointree.c might do.  That would appear
> to add join order constraints, which we don't want.  But I'm thinking that
> at the instant where we form a join Path, we could change the Path's
> jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the
> inner side unique, rather than annotating the Path with a separate flag.
> Then that representation is what propagates forward.

I've attached a patch which implements this, although I did call the
new join type JOIN_LEFT_UNIQUE rather than JOIN_SEMI_OUTER.
I'm not all that confident that I've not added handling for
JOIN_LEFT_UNIQUE in places where it's not possible to get that join
type, I did leave out a good number of places where I know it's not
possible. I'm also not sure with too much certainty that I've got all
cases correct, but the regression tests pass. The patch is more
intended for assisting discussion than as a ready to commit patch.

> It seems like the major intellectual complexity here is to figure out
> how to detect inner-side-unique at reasonable cost.  I see that for
> LEFT joins you're caching that in the SpecialJoinInfos, which is probably
> fine.  But for INNER joins it looks like you're just doing it over again
> for every candidate join, and that seems mighty expensive.

I have a rough idea for this, but I need to think of it a bit more to
make sure it's bulletproof.
I also just noticed (since it's been a while since I wrote the patch)
that in add_paths_to_joinrel() that innerrel can naturally be a
joinrel too, and we can fail to find uniqueness in that joinrel. I
think it should be possible to analyse the join rel too and search for
a base rel which supports the distinctness, and then ensure none of
the other rels which make up the join rel can cause tuple duplication
of that rel. But this just causes missed optimisation opportunities.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

unique_joins_82d2a07_2016-03-14.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

52 matches

Mail list logo