Re: [PERFORM] Rowcount estimation changes based on from clause order

2017-10-12 Thread Ants Aasma
On Thu, Oct 12, 2017 at 11:50 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Ants Aasma <ants.aa...@eesti.ee> writes:
>> I stumbled upon a severe row count underestimation that confusingly
>> went away when two inner joins in the from clause were reordered.
>
> Hm, looks more like an overestimate in this example, but anyway ...
>
>> Does anybody have any idea what is going on here?
>
> set_joinrel_size_estimates says
>
>  * Since there is more than one way to make a joinrel for more than two
>  * base relations, the results we get here could depend on which component
>  * rel pair is provided.  In theory we should get the same answers no matter
>  * which pair is provided; in practice, since the selectivity estimation
>  * routines don't handle all cases equally well, we might not.  But there's
>  * not much to be done about it.
>
> In this example I think the core of the issue is actually not so much
> bad selectivity estimates as rowcount roundoff error.
>
> If we first consider joining "small" with "big", we get an estimate of
> 2000 rows (which is dead on for what would happen if we just joined
> those).  Then we estimate the final result size as the join of that to
> "lookup".  The selectivity number for that step is somewhat hogwash but
> happens to yield a result that's not awful (8 rows).
>
> In the other case we first estimate the size of the join of "small" with
> the "lookup" subquery, and we get a rounded-off estimate of one row,
> whereas without the roundoff it would have been probably about 0.01.
> When that's joined to "big", we are computing one row times 1 million rows
> times a selectivity estimate that's about right for the "small.id =
> big.small_id" clause; but because the roundoff already inflated the first
> join's size so much, you end up with an inflated final result.
>
> This suggests that there might be some value in considering the
> sub-relations from largest to smallest, so that roundoff error
> in the earlier estimates is less likely to contaminate the final
> answer.  Not sure how expensive it would be to do that or what
> sort of instability it might introduce into plan choices.
>
> Whether that's got anything directly to do with your original problem is
> hard to say.  Joins to subqueries, which we normally lack any stats for,
> tend to produce pretty bogus selectivity numbers in themselves; so the
> original problem might've been more of that nature.

Thanks for pointing me in the correct direction. The original issue
was that values from lookup joined to ref_id and the subset filter in
the small table were almost perfectly correlated, which caused the
underestimate. In the second case this was hidden by the intermediate
clamping to 1, accidentally resulting in a more correct estimate.

I actually think that it might be better to consider relations from
smallest to largest. The reasoning being - a join cannot produce a
fraction of a row, it will either produce 0 or 1, and we should
probably plan for the case when it does return something.

Going even further, and I haven't looked at how feasible this is, but
I have run into several cases lately where cardinality underestimates
clamping to 1 result in catastrophically bad plans. Like a stack of
nested loops with unparameterized GroupAggregates and HashAggregates
as inner sides bad. It seems to me that row estimates should clamp to
something slightly larger than 1 unless it's provably going to be 1.

Regards,
Ants Aasma


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Rowcount estimation changes based on from clause order

2017-10-11 Thread Ants Aasma
I stumbled upon a severe row count underestimation that confusingly
went away when two inner joins in the from clause were reordered. I
whittled it down to a reproducible test case.

Schema:

CREATE TABLE small (id serial primary key, ref_id int not null, subset
int not null);
CREATE TABLE big (id serial primary key, small_id int not null);

INSERT INTO small (ref_id, subset) SELECT i/2+1, i/2+1 FROM
generate_series(1,1000) i;
INSERT INTO big (small_id) SELECT (i % 1000) + 1 FROM
generate_series(1,100) i;

CREATE INDEX ON small (ref_id);
CREATE INDEX ON big (small_id);

ANALYZE;

And the queries, differing in only the order of joins:

SELECT * FROM
  small
  INNER JOIN big ON small.id = big.small_id
  INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON
lookup.ref = small.ref_id
WHERE small.subset = 42;

SELECT * FROM
  small
  INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON
lookup.ref = small.ref_id
  INNER JOIN big ON small.id = big.small_id
WHERE small.subset = 42;

Resulting plan for the first case:
 Nested Loop  (cost=20.45..2272.13 rows=8 width=24)
   ->  Nested Loop  (cost=0.28..16.69 rows=1 width=16)
 ->  Append  (cost=0.00..0.04 rows=2 width=4)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
   ->  Result  (cost=0.00..0.01 rows=1 width=4)
 ->  Index Scan using small_ref_id_idx on small
(cost=0.28..8.32 rows=1 width=12)
   Index Cond: (ref_id = (1))
   Filter: (subset = 42)
   ->  Bitmap Heap Scan on big  (cost=20.18..2245.44 rows=1000 width=8)
 Recheck Cond: (small_id = small.id)
 ->  Bitmap Index Scan on big_small_id_idx  (cost=0.00..19.93
rows=1000 width=0)
   Index Cond: (small_id = small.id)

Second case plan is identical except row count of the topmost nest loop:
 Nested Loop  (cost=20.45..2272.13 rows=1000 width=24)

The union subselect was in reality somewhat more complicated, but for
the row count issue the simplification does not seem to matter. The
behavior is seen both on 9.4 and on master.

Does anybody have any idea what is going on here? In the real world
case this is based on the estimation was 5 rows instead of 200k, which
resulted in quite bad plan choices downstream.

Regards,
Ants Aasma


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] In progress INSERT wrecks plans on table

2013-07-13 Thread Ants Aasma
On Sat, Jul 13, 2013 at 1:47 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 But I also tried it with 4 pgbench clients, and ran into a collapse of
 the performance, TPS dropping down to ~8 TPS.   It is too variable to
 figure out how reliable the speed-up with this patch is, so far.
 Apparently they are all fighting over the spinlock on the
 ProcArrayLock.

 This is a single quad core, Intel(R) Xeon(R) CPU   X3210  @ 2.13GHz

 So I agree with (3) above, about not checking
 TransactionIdIsInProgress repeatedly.  Or could we change the order of
 operations so that TransactionIdIsInProgress is checked only after
 XidInMVCCSnapshot?

I haven't checked the patch in detail, but it sounds like my proposal
for CSN based snapshots[1] could help here. Using it
TransactionIdIsInProgress can be done completely lock-free. It would
include a direct dense array lookup, read barrier and a check of the
dense/sparse horizon, and if necessary a binary search in the sparse
array and another read barrier and check for sparse array version
counter.

I plan to start working on the patch next week. I hope to have a first
cut available for CF2.

[1] 
http://www.postgresql.org/message-id/CA+CSw_tEpJ=md1zgxPkjH6CWDnTDft4gBi=+p9snoc+wy3p...@mail.gmail.com

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Question about postmaster's CPU usage

2013-03-30 Thread Ants Aasma
On Mar 28, 2013 9:07 PM, kelphet xiong kelp...@gmail.com wrote:
 explain analyze select * from inventory;
 QUERY PLAN


--
 
  Seq Scan on inventory  (cost=0.00..180937.00 rows=11745000 width=16)
(actual time=0.005..1030.403 rows=117450
 00 loops=1)
  Total runtime: 1750.889 ms
 (2 rows)

A large fraction of that time, if not most is due to timing overhead. You
can try the same query without timing by using explain (analyze on, timing
off) select * from inventory;

Regards,
Ants Aasma


Re: [PERFORM] dbt2 performance regresses from 9.1.6 to 9.2.1

2012-11-02 Thread Ants Aasma
On Thu, Nov 1, 2012 at 12:51 AM, Dong Ye y...@vmware.com wrote:
 The average notpm is 61384.24 with 9.1.6 and 57381.43 with 9.2.1.
 Plotting notps over time shows that the slowdown of 9.2.1 is evident across 
 the entire run period.
 Since we also observed sustained 80+% CPU utilization during both runs, we 
 suspected this is a CPU bottleneck issue.
 So we run oprofile hoping that the profiles may suggest one is using CPU less 
 productively than the other; but nothing jumped out to that explanation.
 The profiling results are posted on http://pgsql.privatepaste.com/3fa3ae0627 
 (9.1.6 run) and http://pgsql.privatepaste.com/930bb51374 (9.2.1 run).

You are using prepared statements, this makes me think that this
regression might be due to support for parameter specific plans for
prepared statements. [1] Can you run the test on both versions without
prepared statements and see if the regressions remains.

I compared the profile results, I'll reproduce the results here incase
they ring any other bells for someone. Here are top 20 functions that
take more time under 9.2:

Function  Diff  v9.2%  v9.1%
 postgres.copyObject  3.48 1.2436 0.3569
  postgres.check_stack_depth  1.92 0.7244 0.3774
 postgres.eval_const_expressions_mutator  1.87 0.3473 0.1853
jbd./jbd  1.82 0.4127 0.2271
  libc-2.14.1.so._int_malloc  1.75 1.4938 0.8540
   libc-2.14.1.so.__strlen_sse42  1.72 0.7098 0.4124
vmlinux-2.6.32.59-0.7-default.copy_user_generic_string  1.70 0.5130 0.3017
postgres.MemoryContextCreate  1.68 0.3206 0.1914
  postgres.MemoryContextAllocZeroAligned  1.64 1.5443 0.9443
libc-2.14.1.so._int_free  1.60 0.7182 0.4476
 postgres.expression_tree_walker  1.60 0.8350 0.5235
 postgres.XLogInsert  1.58 2.7251 1.7210
  ext3./ext3  1.55 0.2065 0.1335
   libc-2.14.1.so.__strcpy_ssse3  1.50 0.3061 0.2046
postgres.expression_tree_mutator  1.41 0.3461 0.2447
  libc-2.14.1.so.__memcpy_ssse3_back  1.40 1.2379 0.8830
  postgres.AllocSetAlloc  1.39 4.6567 3.3467
postgres.LockAcquireExtended  1.39 0.2799 0.2015
 postgres.MemoryContextAlloc  1.38 1.0151 0.7373
 postgres.AllocSetDelete  1.33 0.2130 0.1600

And top 10 functions present under 9.2 but not present with 9.1:
  Function
  postgres._copyList.isra.15 0.341
   postgres._SPI_execute_plan.isra.4 0.224
   postgres.grouping_planner 0.220
  postgres.IndexOnlyNext 0.213
  postgres.GetCachedPlan 0.189
postgres.MemoryContextStrdup 0.171
  postgres.list_copy 0.165
  postgres.index_getnext_tid 0.155
 postgres.MemoryContextSetParent 0.128
  postgres.cost_qual_eval_walker 0.127

I have no idea why is XLogInsert taking so much longer on 9.2.

[1] http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=e6faf910

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SELECT AND AGG huge tables

2012-10-19 Thread Ants Aasma
On Wed, Oct 17, 2012 at 2:24 PM, houmanb hou...@gmx.at wrote:
 Hi all,
 Thanks for your advice and the link about posting my question in an
 appropriate form.
 Here are the info. I thank all of you in advance.

Can you run the EXPLAIN once more with EXPLAIN (ANALYZE, BUFFERS,
TIMING OFF). Given the number of rows processed by the query, the
detailed per node timing overhead might be a considerable factor here.

What happened to the WHERE T.creation_date=$SOME_DATE part of the
query. These examples go through the whole table. The plans shown are
about as fast as it gets. Summarizing 5GB of data will never be fast.
If you need that information quickly, you'll need to actively maintain
the aggregate values via triggers.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guide to Posting Slow Query Questions

2012-10-08 Thread Ants Aasma
On Wed, Sep 26, 2012 at 11:11 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 On Wed, Sep 12, 2012 at 11:40 PM, Ants Aasma a...@cybertec.at wrote:
 I don't have any links for OS level monitoring, but with version 9.2
 track_io_timing would do the job.

 I don't know how to advice people on how to use this to obtain
 information on a specific query.  Would someone else like to take a
 stab at explaining that?

I added a line suggesting that 9.2 users turn it on via SET
track_io_timing TO on;

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Same query doing slow then quick

2012-10-08 Thread Ants Aasma
Sorry for the late answer, I was going through my e-mail backlog and
noticed that this question hadn't been answered.

On Thu, Sep 27, 2012 at 11:33 AM, Undertaker Rude ffw_r...@hotmail.com wrote:
 Oh ok. But what is this command doing ? i'm gonna runn it today. I'll keep
 you posted. Here is some EXPLAIN ANALYZE from the querys :


 Nested Loop  (cost=0.00..353722.89 rows=124893 width=16) (actual
 time=261158.061..10304193.501 rows=99 loops=1)
   Join Filter: ((t2.X = (t1.x_min)::double precision) AND (t2.X =
 (t1.x_max)::double precision) AND (t2.Y = (t1.y_min)::double precision)
 AND (t2.Y = (t1.y_max)::double precision))
   -  Seq Scan on gps_22 t1  (cost=0.00..3431.80 rows=177480 width=44)
 (actual time=0.036..1399.621 rows=177480 loops=1)
   -  Materialize  (cost=0.00..20572.83 rows=57 width=20) (actual
 time=0.012..10.274 rows=2924 loops=177480)
 -  Seq Scan on adresses_22 t2  (cost=0.00..20572.55 rows=57
 width=20) (actual time=1570.240..1726.376 rows=2924 loops=1)
   Filter: ((id_maille_200m)::text = '0'::text)
 Total runtime: 10304211.648 ms

As you can see from the explain plan, postgresql is not using any
indexes here. The reason is the type mismatch between the X and x_min
columns. Use matching types between tables to enable index use. The
same goes for the id column, if the column type is integer use a
numeric literal 0 not a text literal '0'.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Guide to Posting Slow Query Questions

2012-09-14 Thread Ants Aasma
On Wed, Sep 12, 2012 at 7:00 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 Regarding the wiki page on reporting slow queries:
 We currently recommend EXPLAIN ANALYZE over just EXPLAIN.  Should we
 recommend EXPLAIN (ANALYZE, BUFFERS) instead?  I know I very often
 wish I could see that data.  I don't think turning buffer accounting
 on adds much cost over a mere ANALYZE.

Given the amount of version 8 installs out there the recommendation
should be qualified with version 9.0. Otherwise a strong +1

 Also, an additional thing that would be nice for people to report is
 whether long running queries are CPU bound or IO bound.  Should we add
 that recommendation with links to how to do that in a couple OS, say,
 Linux and Windows.  If so, does anyone know of good links that explain
 it for those OS?

I don't have any links for OS level monitoring, but with version 9.2
track_io_timing would do the job.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] very very slow inserts into very large table

2012-07-17 Thread Ants Aasma
On Tue, Jul 17, 2012 at 6:30 AM, Craig Ringer ring...@ringerc.id.au wrote:
 On 07/17/2012 01:56 AM, Jon Nelson wrote:
 To perform reasonably well, Pg would need to be able to defer index updates
 when bulk-loading data in a single statement (or even transaction), then
 apply them when the statement finished or transaction committed. Doing this
 at a transaction level would mean you'd need a way to mark indexes as
 'lazily updated' and have Pg avoid using them once they'd been dirtied
 within a transaction. No such support currently exists, and it'd be
 non-trivial to implement, especially since people loading huge amounts of
 data often want to do it with multiple concurrent sessions. You'd need some
 kind of 'DISABLE INDEX' and 'ENABLE INDEX' commands plus a transactional
 backing table of pending index updates.

It seems to me that if the insertion is done as a single statement it
wouldn't be a problem to collect up all btree insertions and apply
them before completing the statement. I'm not sure how much that would
help though. If the new rows have uniform distribution you end up
reading in the whole index anyway. Because indexes are not stored in
logical order you don't get to benefit from sequential I/O.

The lazy merging approach (the paper that Claudio linked) on the other
hand seems promising but a lot trickier to implement.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how could select id=xx so slow?

2012-07-12 Thread Ants Aasma
On Thu, Jul 12, 2012 at 3:48 PM, Yan Chunlu springri...@gmail.com wrote:
 yes the system seems overloaded, I am dealing with a simple INSERT but not
 sure if it is normal that it took more time than the explain estimated:

The estimated cost is in arbitrary units, its purpose is to compare
different execution plans, not estimate time taken. So it's completely
normal that it doesn't match actual time taken.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how could select id=xx so slow?

2012-07-11 Thread Ants Aasma
On Wed, Jul 11, 2012 at 9:24 AM, Yan Chunlu springri...@gmail.com wrote:
 I have logged one day data and found the checkpoint is rather
 frequently(detail: https://gist.github.com/3088338). Not sure if it is
 normal, but the average time of checkpoint is about 100sec~200sec, it seems
 related with my settings:

 574 checkpoint_segments = 64
 575 wal_keep_segments = 5000

 I set checkpoint_segments as a very large value which is because otherwise
 the slave server always can not follow the master, should I lower that
 value?

 or the slow query is about something else?  thanks!

Some things to notice from the checkpoints log:
* All chcekpoints are triggered by checkpoint_timeout, using up only a
couple log files
* Checkpoints write out around 40MB of buffers
* The write out period is spread out nicely like it's supposed to but
the sync phase is occasionally taking a very long time (more than 2
minutes)

This looks like something (not necessarily the checkpoint sync itself)
is overloading the IO system. You might want to monitor the IO load
with iostat and correlate it with the checkpoints and slow queries to
find the culprit. It's also possible that something else is causing
the issues.

If the cause is checkpoints, just making them less frequent might make
the problem worse. I'm assuming you have 16GB+ of RAM because you have
4GB of shared_buffers. Just making checkpoint_timeout longer will
accumulate a larger number of dirty buffers that will clog up the IO
queues even worse. If you are on Linux, lowering
dirty_expire_centisecs or dirty_background_bytes might help to spread
the load out but will make overall throughput worse.

On the otherhand, if the I/O overload is from queries (more likely
because some checkpoints sync quickly) there are no easy tuning
answers. Making queries less IO intensive is probably the best you can
do. From the tuning side, newer Linux kernels handle I/O fairness a
lot better, and you could also try tweaking the I/O scheduler to
achieve better throughput to avoid congestion or at least provide
better latency for trivial queries. And of course its always possible
to throw more hardware at the problem and upgrade the I/O subsystem.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [pgsql-cluster-hackers][performance] fast reads on a busy server

2012-06-27 Thread Ants Aasma
On Jun 27, 2012 2:29 PM, Willy-Bas Loos willy...@gmail.com wrote:
 Should i use a larger shared_buffers for the other cluster(s) too, so
that i bypass the inefficient OS file-cache?

Once the in-memory cluster has filled its shared buffers, the pages go cold
for the OS cache and get replaced with pages of other clusters that are
actually referenced.

Ants Aasma


Re: [PERFORM] does the query planner consider work_mem?

2012-05-30 Thread Ants Aasma
On Wed, May 30, 2012 at 8:57 PM, Murat Tasan mmu...@gmail.com wrote:
 any insights here?

Have you tried running the slow option multiple times? According to
the explain output all of the time was accounted to the bitmap heap
scan. For the second explain plan the same node was fast. It looks to
me as the first explain on 8.4 was slow because the data was still on
disk. Raising work mem doubled the speed of the sort from 800ms to
400ms.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query got slow from 9.0 to 9.1 upgrade

2012-05-03 Thread Ants Aasma
On Tue, May 1, 2012 at 12:17 AM, Josh Turmel jtur...@gmail.com wrote:
 We just upgraded from 9.0 to 9.1, we're using the same server configuration,
 that has been confirmed 3 or 4 times over. Any help would be appreciated. If
 I remove the ORDER BY it gets fast again because it goes back to using the
 user_id index, if I remove the LIMIT/OFFSET it gets fast again, obviously I
 need both of those, but that was just to test and see what would happen.

 Query: SELECT * FROM bookmark_groups WHERE user_id = 6708929 ORDER BY
 created DESC LIMIT 25 OFFSET 0;

Based on the explain numbers I'd say that 9.0 was fast by accident of
having inaccurate statistics. You can see that 9.0 estimated that 757
rows have this user_id, while actually it had 33868 rows. 9.1
estimated a more accurate 35980 rows, and because of that assumed that
reading the newest created rows would return 25 rows of this user
rather fast, faster than sorting the 35980 rows. This assumption seems
to be incorrect, probably because the rows with this user_id are all
rather old.

You could try tweaking cpu_index_tuple_cost to be higher so that large
index scans get penalized. But ultimately with the current PG version
there isn't a good general way to fix this kind of behavior. You can
rewrite the query to enforce filtering before sorting:

SELECT * FROM (
SELECT * FROM bookmark_groups WHERE user_id = 6708929
OFFSET 0 -- Prevents pushdown of ordering and limit
) AS sub ORDER BY created DESC LIMIT 25 OFFSET 0;

This is the same issue that Simon Riggs talks about in this e-mail:
http://archives.postgresql.org/message-id/ca+u5nmlbxfut9cwdhj3tpxjc3btwqizbkqtwdgzebcb5bag...@mail.gmail.com

The more general approach is to be more pessimistic about limited
filtered index-scans, or collecting multi-dimensional stats to figure
out the correlation that all rows for this user are likely to be old.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad planning with 75% effective_cache_size

2012-04-19 Thread Ants Aasma
On Thu, Apr 19, 2012 at 3:44 AM, Josh Berkus j...@agliodbs.com
 7500ms
 http://explain.depesz.com/s/
 This plan seems very odd -- doing individual index lookups on 2.8m rows
 is not standard planner behavior.  Can you confirm that all of your
 other query cost parameters are the defaults?

This similat to the issue with limit that Simon was complaining about
a few weeks ago [1]. A lot of the estimation in the planner is biased
to give overestimations for number of rows returned in the face of
uncertainty. This works well for joins but interacts really badly with
limits. The two issues here are the join cardinality being
overestimated a factor of 15x and then the unique is off by another
50x. The result is that the planner thinks that it needs to scan 0.25%
of the input, while actually it needs to scan the whole of it,
underestimating the real cost by a factor of 400.

I'm not sure what to do about unique node overestimation, but I think
it could be coaxed to be less optimistic about the limit by adding an
optimization barrier and some selectivity decreasing clauses between
the limit and the rest of the query:

select * from (
select distinct product_code from product p_
inner join product_parent par_ on p_.parent_id=par_.id
where par_.parent_name like 'aa%'
offset 0 -- optimization barrier
) as x
where product_code = product_code -- reduce selectivity estimate by 200x
limit 2;

[1] 
http://archives.postgresql.org/message-id/ca+u5nmlbxfut9cwdhj3tpxjc3btwqizbkqtwdgzebcb5bag...@mail.gmail.com

Cheers,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] bad plan

2012-04-05 Thread Ants Aasma
On Thu, Apr 5, 2012 at 2:47 PM, Julien Cigar jci...@ulb.ac.be wrote:
 - http://www.pastie.org/3731956 : with default config
 - http://www.pastie.org/3731960 : this is with enable_seq_scan = off

It looks like the join selectivity of (context_to_context_links,
ancestors) is being overestimated by almost two orders of magnitude.
The optimizer thinks that there are 564 rows in the
context_to_context_links table for each taxon_id, while in fact for
this query the number is 9. To confirm that this, you can force the
selectivity estimate to be 200x lower by adding a geo_id = geod_id
where clause to the subquery.

If it does help, then the next question would be why is the estimate
so much off. It could be either because the stats for
context_to_context_links.taxon_id are wrong or because
ancestors.taxon_id(subphylum_id = 18830) is a special case. To help
figuring this is out, you could run the following to queries and post
the results:

SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
COUNT(*) AS num FROM context_to_context_links GROUP BY taxon_id) AS
dist GROUP BY 1 ORDER BY 1;

SELECT floor(log(num,2)) AS nmatch, COUNT(*) AS freq FROM (SELECT
COUNT(*) AS num FROM context_to_context_links WHERE NOT geo_id IS NULL
and taxon_id= ANY ( select taxon_id from rab.ancestors  where
   ancestors.subphylum_id = 18830) GROUP BY taxon_id) AS dist GROUP BY
1 ORDER BY 1;

If the second distribution has a significantly different shape then
cross column statistics are necessary to get good plans. As it happens
I'm working on adding this functionality to PostgreSQL and would love
to hear more details about your use-case to understand if it would be
solved by this work.

Regards,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Distinct + Limit

2012-03-27 Thread Ants Aasma
On Tue, Mar 27, 2012 at 11:54 PM, Francois Deliege fdeli...@gmail.com wrote:
 select col1 from table1 group by col1 limit 1;
 select distinct on (col1) col1 from table1 limit 1;

 select col1 from table1 group by col1 limit 2;
 select distinct on (col1) col1 from table1 limit 2;

 Performing any of these following queries results in a full sequential scan, 
 followed by a hash aggregate, and then the limit.  An optimization could be 
 to stop the sequential scan as soon as the limit of results has been reached. 
  Am I missing something?

Yes, that would be an optimization. Unfortunately currently the
aggregation logic doesn't have special case logic to start outputting
tuples immediately  when no aggregate functions are in use. In
principle it's possible to teach it to do that, peeking at the code it
seems that it wouldn't even be too hard to implement.

Currently your best options are to add an indexes for columns that you
select distinct values from, use a server side cursor and do the
distinct operation on the client (might need to adjust
cursor_tuple_fraction before doing the query to make cost estimates
better) or use a stored procedure to do the cursor + manual distinct
trick.

 Similarly, the following query results in a sequential scan:

 select * from table1 where col1  col1;

PostgreSQL query optimizer doesn't try to be a theorem prover and so
doesn't deduce the logical impossibility. For most queries, looking
for nonsensical would be a complete waste of time. The optimizer does
notice impossibilities that crop up during constant propagation, so
WHERE false or WHERE 0 = 1 would work fine. It would be best to fix
Sequel to output literal constant false for PostgreSQL. However, I
wonder if it's worth checking for this very specific case because it
is a common idiom for Oracle users to implement constant false in
where predicates due to Oracle not allowing top level literal booleans
for some arcane reason or another.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Obtaining resource usage statistics from execution? (v 9.1)

2012-03-16 Thread Ants Aasma
On Fri, Mar 16, 2012 at 4:31 PM, Karl Denninger k...@denninger.net wrote:
 What I'd
 like to be able to do is have the code track performance all the time and
 raise alerts when it sees outliers giving me a continually-improving set
 of targets for reduction of resource consumption (up until I reach the point
 where I don't seem to be able to make it any faster of course :-))

Sounds almost exactly like what the auto_explain contrib module is
designed to do:
http://www.postgresql.org/docs/9.1/static/auto-explain.html

It can run with reasonably low overhead if your system has fast
timing. You can check the timing performance of your system with the
tool attached here:
http://archives.postgresql.org/message-id/4F15B930.50108%402ndQuadrant.com

Anything under 200ns should be ok.

Cheers,
Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] index choosing problem

2012-03-15 Thread Ants Aasma
2012/3/15 Rural Hunter ruralhun...@gmail.com:
 Now the query 3-5 using article_others_pkey are quite slow. The rows for cid
 74 are very old and seldom get updated. I think pg needs to scan quite a lot
 on article_others_pkey before it gets the rows for cid 74. The same query
 for other cids with new and majority of rows runs very fast.

This is because the PostgreSQL cost model doesn't know about the
correlation between aid and cid. In absence of information it assumes
that it will find a row with cid=74 about every 68 rows
(889520/13047).

One option to fix this case is to use OFFSET 0 as an optimization barrier:
SELECT max(aid) FROM
(SELECT aid FROM article_others WHERE cid=74 OFFSET 0) AS x;

That has the unfortunate effect of performing badly for cid's that are
extremely popular. That may or may not be acceptable in your case.

To fix this properly the query optimizer needs to know the relationship between
aid and cid and needs to know how to apply that to estimating the cost
of index scans. A prerequisite for implementing this is to have
multi-column statistics. To do the estimation, the current linear cost
model needs to be changed to something that can express a non-linear
relationship between tuples returned and cost, e.g. a piece-wise
linear model. The stats collection part is actually feasible, in fact
I'm currently working on a patch for that. As for the estimation
improvement, I have an idea how it might work, but I'm not really sure
yet if the performance hit for query planning would be acceptable.

 Another question, why the plan shows rows=13047 for cid=74 while actually it
 only has 4199 rows? There is almost no data changes for cid 74 and I just
 vacuum/analyzed the table this morning.

Might just be an artifact of random sampling. Try raising your stats
target and re-analyzing to confirm.

All the best,
Ants Aasma

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice sought : new database server

2012-03-08 Thread Ants Aasma
On Wed, Mar 7, 2012 at 10:18 PM, Merlin Moncure mmonc...@gmail.com wrote:
 those numbers are stupendous for 8 drive sata.  how much shared
 buffers do you have?

Couple of things to notice:
1) The benchmark can run fully in memory, although not 100% in shared_buffers.
2) These are 100k transaction runs, meaning that probably no
checkpointing was going on.
3) Given the amount of memory in the server, with  dirty flush
settings the OS will do mostly sequential writes.

Just ran a quick test. With synchronous_commit=off to simulate a BBU I
have no trouble hitting 11k tps on a single SATA disk. Seems to be
mostly CPU bound on my workstation (Intel i5 2500K @ 3.9GHz, 16GB
memory), dirty writes stay in OS buffers, about 220tps/6MBps of
traffic to the xlog's, checkpoint dumps everything to OS cache which
is then flushed at about 170MB/s (which probably would do nasty things
to latency in real world cases). Unlogged tables are give me about 12k
tps which seems to confirm mostly CPU bound.

So regardless if the benchmark is a good representation of the target
workload or not, it definitely isn't benchmarking the IO system.

Ants Aasma

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bad estimation for where field not in

2012-03-01 Thread Ants Aasma
On Thu, Mar 1, 2012 at 6:40 PM, Daniele Varrazzo
daniele.varra...@gmail.com wrote:
 Is this a known planner shortcoming or something unexpected, to be
 escalated to -bugs? Server version is 9.0.1.

The relevant code is in scalararraysel() function. It makes the
assumption that element wise comparisons are completely independent,
while the exact opposite is true. This has been this way since
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=290166f93404d8759f4bf60ef1732c8ba9a52785
introduced it to version 8.2.

At least for equality and inequality ops it would be good to rework
the logic to aggregate with
s1 = s1 + s2 and s1 = s1 + s2 - 1 correspondingly.

--
Ants Aasma

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] problems with set_config, work_mem, maintenance_work_mem, and sorting

2012-02-28 Thread Ants Aasma
On Feb 29, 2012 1:44 AM, Claudio Freire klaussfre...@gmail.com wrote:
 Another option, depending on your SQLA version, when connections are
 sent back to the pool, I seem to remember they were reset. That would
 also reset the work_mem, you'd still see the same pid on PG logs, but
 it's not the same session.

Except that any open transactions are rolled back no other reset is done.
The correct way to handle this would be to set the options and commit the
transaction in Pool connect or checkout events. The event choice depends on
whether application scope or request scope parameters are wanted.

--
Ants Aasma


Re: [PERFORM] Why so slow?

2012-02-17 Thread Ants Aasma
On Feb 17, 2012 8:35 PM, Alessandro Gagliardi alessan...@path.com wrote:
 Here is the EXPLAIN: http://explain.depesz.com/s/ley

 I'm using PostgreSQL 9.0.6 on i486-pc-linux-gnu, compiled by GCC
gcc-4.4.real (Ubuntu 4.4.3-4ubuntu5) 4.4.3, 32-bit

 My random_page_cost is 2 and yet it still insists on using Seq Scan on
blocks.

As could be inferred from the row counts, it's slow because its joining and
then aggregating a quarter of the blocks table. The hash join with its
sequential scan is probably the correct choice for that type of join, it's
the join itself that should be optimized out. The optimizer doesn't figure
out that the join can be turned into a semi join if the output is
aggregated with distinct and is from only one of the tables (in this case,
because the output is the join key, it can be from either table).

To make the optimizers job easier you can rewrite it as a semi-join
explicitly:
SELECT DISTINCT(user_id) FROM seen_its WHERE EXISTS (SELECT 1 FROM blocks
WHERE blocks.user_id = seen_its.user_id) AND seen_its.created BETWEEN
(now()::date - interval '8 days')::timestamp AND now()::date::timestamp

--
Ants Aasma


Re: [PERFORM] Bogus startup cost for WindowAgg

2010-10-14 Thread Ants Aasma
On Wed, Oct 13, 2010 at 10:35 PM, Mladen Gogala
mladen.gog...@vmsinfo.com wrote:
 You will see that for most of the columns, the length of the histogram array
 corresponds to the value of the default_statistics_target parameter. For
 those that are smaller, the size is the total number of values in the column
 in the sample taken by the analyze command. The longer histogram, the
 better plan. In this case, the size does matter.

The issue here isn't that the statistics are off. The issue is, as Tom
said, that the optimizer doesn't consider them for the cost model of
the window aggregate. The trivial case I put forward wouldn't be too
hard to cover - if there's no partitioning of the window and the frame
is over the full partition, the startup cost should be nearly the same
as the full cost. But outside of the trick I tried, I'm not sure if
the trivial case matters much. I can also see how the estimation gets
pretty hairy when partitioning, frames and real window functions come
into play.

One idea would be to cost three different cases. If the aggregate
needs to read ahead some most likely constant number of rows, i.e. is
not using an unbounded following frame, leave the startup cost as is.
If there is partitioning, estimate the number of groups produced by
the partitioning and add one n-th of the difference between startup
and total cost. Otherwise, if the frame is to the end of the partition
and there is no partitioning, set the startup cost equal to total
cost, or in terms of the previous case, n=1. I don't know how accurate
estimating the number of groups would be, or even if it is feasible to
do it. If those assumptions hold, then it seems to me that this method
should at-least cover any large O(n) effects.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bogus startup cost for WindowAgg

2010-10-13 Thread Ants Aasma
I hit an issue with window aggregate costing while experimenting with
providing a count of the full match along side a limited result set.
Seems that the window aggregate node doesn't take into account that it
has to consume the whole input before outputting the first row. When
this is combined with a limit, the resulting cost estimate is wildly
underestimated, leading to suboptimal plans.

Is this a known issue? I couldn't find anything referring to this on
the mailing list or todo.

Code to reproduce follows:

ants=# CREATE TABLE test (a int, b int);
CREATE TABLE
ants=# INSERT INTO test (a,b) SELECT random()*1e6, random()*1e6 FROM
generate_series(1,100);
INSERT 0 100
ants=# CREATE INDEX a_idx ON test (a);
CREATE INDEX
ants=# CREATE INDEX b_idx ON test (b);
CREATE INDEX
ants=# ANALYZE test;
ANALYZE
ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a 
2500 ORDER BY b LIMIT 10;
 QUERY
PLAN

 Limit  (cost=0.00..195.31 rows=10 width=8) (actual
time=728.325..728.339 rows=10 loops=1)
   -  WindowAgg  (cost=0.00..46209.93 rows=2366 width=8) (actual
time=728.324..728.337 rows=10 loops=1)
 -  Index Scan using b_idx on test  (cost=0.00..46180.36
rows=2366 width=8) (actual time=0.334..727.221 rows=2512 loops=1)
   Filter: (a  2500)
 Total runtime: 728.401 ms
(5 rows)

ants=# SET enable_indexscan = off;
SET
ants=# EXPLAIN ANALYZE SELECT *, COUNT(*) OVER () FROM test WHERE a 
2500 ORDER BY b LIMIT 10;
  QUERY
PLAN
---
 Limit  (cost=3986.82..3986.85 rows=10 width=8) (actual
time=7.186..7.189 rows=10 loops=1)
   -  Sort  (cost=3986.82..3992.74 rows=2366 width=8) (actual
time=7.185..7.187 rows=10 loops=1)
 Sort Key: b
 Sort Method:  top-N heapsort  Memory: 25kB
 -  WindowAgg  (cost=46.70..3935.69 rows=2366 width=8)
(actual time=4.181..6.508 rows=2512 loops=1)
   -  Bitmap Heap Scan on test  (cost=46.70..3906.12
rows=2366 width=8) (actual time=0.933..3.555 rows=2512 loops=1)
 Recheck Cond: (a  2500)
 -  Bitmap Index Scan on a_idx  (cost=0.00..46.10
rows=2366 width=0) (actual time=0.512..0.512 rows=2512 loops=1)
   Index Cond: (a  2500)
 Total runtime: 7.228 ms
(10 rows)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance