Re: [PERFORM] Calculating 95th percentiles

2011-03-05 Thread marcin mank
On Fri, Mar 4, 2011 at 4:18 PM, Landreville
 wrote:

>    create temporary table deltas on commit drop as
>        select * from get_delta_table(p_switchport_id, p_start_date,
> p_end_date);
>
>    select round(count(volume_id) * 0.95) into v_95th_row from deltas;
>    select in_rate into v_record.in_95th from deltas where
> in_rate_order = v_95th_row;
>    select out_rate into v_record.out_95th from deltas where
> out_rate_order = v_95th_row;
>    select sum(in_delta), sum(out_delta) into v_record.in_total,
> v_record.out_total from deltas;
>
> Unfortunately using a temporary table means that I cannot run this
> query on the read-only slave, but I can't see a way around using one.

Is this fast enough on a slave:


with deltas as (select * from get_delta_table(...)),
p95 as(select round(count(volume_id) * 0.95) as p95v from deltas)
select
(select in_rate from deltas, p95 where
in_rate_order = p95v),
(select out_rate from deltas, p95 where
out_rate_order = p95v)
etc..

?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem

2011-02-15 Thread marcin mank
On Tue, Feb 15, 2011 at 6:19 PM, Thomas Pöhler
 wrote:
> Hi list,
>
> See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg
>

What is the bottom graph? queries/minute? Looks like Your database is
just getting hammered.
Maybe there is a really badly coded page somewhere (a query for each
user or something similar)?

Greetings
Marcin Mańk

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-15 Thread marcin mank
On Thu, Jan 14, 2010 at 8:17 PM, Carlo Stonebanks
 wrote:
> . 48 GB RAM
> 2) Which Windows OS would you recommend? (currently 2008 x64 Server)

There is not a 64-bit windows build now - You would be limited to
shared_buffers at about a gigabyte. Choose Linux

Greetings
Marcin Mańk

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Massive table (500M rows) update nightmare

2010-01-07 Thread marcin mank
> every update is a UPDATE ... WHERE id
>>= x AND id < x+10 and a commit is performed after every 1000 updates
> statement, i.e. every 1 rows.

What is the rationale behind this? How about doing 10k rows in 1
update, and committing every time?

You could try making the condition on the ctid column, to not have to
use the index on ID, and process the rows in physical order. First
make sure that newly inserted production data has the correct value in
the new column, and add 'where new_column is null' to the conditions.
But I have never tried this, use at Your own risk.

Greetings
Marcin Mank

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE performance problem

2009-11-25 Thread marcin mank
On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti  wrote:
>         ->  HashAggregate  (cost=1031681.15..1033497.20 rows=181605 width=8) 
> (a
> ctual time=571807.575..610178.552 rows=26185953 loops=1)


This is Your problem. The system`s estimate for the number of distinct
annotation_ids in t2 is wildly off.

The disk activity is almost certainly swapping (You can check it
iostat on the linux machine).

Can You try "analyze t2" just before the delete quety? maybe try
raising statistics target for the annotation_id column.

If all else fails, You may try "set enable_hashagg to false" just
before the query.

Greetings
Marcin Mańk


Greetings
Marcin Mańk

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query optimization

2009-11-23 Thread marcin mank
On Tue, Nov 24, 2009 at 12:49 AM, Faheem Mitha  wrote:
>
> Yes, sorry. I'm using Postgresql 8.4. I guess I should go through diag.pdf
> and make sure all the information is current. Thanks for pointing out my
> error.
>

excellent report!

about the copy problem: You seem to have created the primary key
before doing the copy (at least that`s what the dump before copy
says). This is bad. Create it after the copy.

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres query completion status?

2009-11-20 Thread marcin mank
>>> max_connections = 500                   # (change requires restart)
>>> work_mem = 256MB                                # min 64kB
>>
>> Not that it has to do with your current problem but this combination could
>> bog your server if enough clients run sorted queries simultaneously.
>> You probably should back on work_mem at least an order of magnitude.
>>
>
> What's the correct way to configure this?
>
> * We have one client which needs to run really big transactions (therefore
> needs the work memory).
>

You can set the work_mem for the specific user (like "set work_mem to
x") at the begginning of the session.

Here are some things I noticed (it is more like shooting in the dark,
but still...)

the expensive part is this:
 ->  Sort
(cost=280201.66..281923.16 rows=688602 width=300) (actual
time=177511.806..183486.593 rows=41317448 loops=1)

Sort Key:
du_report_sku.wid, du_report_sku.storeorderid,
du_report_sku.genreorderid

Sort Method:  external
sort  Disk: 380768kB
->  HashAggregate
(cost=197936.75..206544.27 rows=688602 width=36) (actual
time=7396.426..11224.839 rows=6282564 loops=1)
  ->  Seq Scan on
du_report_sku  (cost=0.00..111861.61 rows=6886011 width=36) (actual
time=0.006..573.419 rows=6897682 loops=1)


(it is pretty confusing that the HashAggregate reports ~6M rows, but
the sort does 41M rows, but maybe I can not read this).
Anyway, I think that if You up the work_mem for this query to 512M,
the sort will be in memory, an thus plenty faster.

Also, You say You are experiencing unstable query plans, and this may
mean that geqo is kicking in (but Your query seems too simple for
that, even considering the views involved). A quick way to check that
would be to run explain  a coule tens of times, and check
if the plans change. If they do, try upping geqo_threshold.

You have seq_page_cost 4 times larger than random_page_cost. You say
You are on SSD, so there is no random access penalty. Try setting them
equal.

Your plan is full of merge-joins, some indices may be in order. Merge
join is a kind of "last-chance" plan.

the query is :
SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) FROM
core.demand, viwcs.previous_wave LEFT OUTER JOIN viwcs.wave_end_demand
USING ( wid ) WHERE core.demand.id = viwcs.wave_end_demand.demand_id;

Isn`t the left join equivalent to an inner join, since in where You
are comparing values from the outer side of the join? If they come out
nulls, they will get discarded anyway...

I hope You find some of this useful.

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan

2009-03-24 Thread marcin mank
There is one thing I don`t understand:

  ->  Nested Loop  (cost=0.00..180564.28 rows=1806
width=37) (actual time=0.192..60.214 rows=3174 loops=1)
->  Index Scan using visitors_userid_index2 on
visitors v  (cost=0.00..2580.97 rows=1300 width=33) (actual
time=0.052..2.342 rows=897 loops=1)
  Index Cond: (userid =
'fbe2537f21d94f519605612c0bf7c2c5'::bpchar)
->  Index Scan using
itemexperiencelog__index__visitorid on itemexperiencelog l
(cost=0.00..134.04 rows=230 width=70) (actual time=0.013..0.040 rows=4
loops=897)
  Index Cond: (l.visitorid = v.id)

If it expects 1300 visitors with the userid, and for each of them to
have 230 entries in itemexperiencelog, how can it come up with 1806
returned rows (and be about right!)?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
> I hit an interestinhg paper on n_distinct calculation:
>
> http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf
>
> the PCSA algorithm described there requires O(1) calculation per
> value. Page 22 describes what to do with updates streams.
>
> This I think (disclaimer: I know little about PG internals) means that
> the n_distinct estimation can be done during vacuum time (it would
> play well with the visibility map addon).
>
> What do You think?

ok, if You think that calculating a has function of every data field
for each insert or delete is prohibitive, just say so and don`t bother
reading the paper :]

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] LIMIT confuses the planner

2009-03-22 Thread marcin mank
> So the bottom line here is just that the estimated n_distinct is too
> low.  We've seen before that the equation we use tends to do that more
> often than not.  I doubt that consistently erring on the high side would
> be better though :-(.  Estimating n_distinct from a limited sample of
> the population is known to be a statistically hard problem, so we'll
> probably not ever have perfect answers, but doing better is on the
> to-do list.
>

I hit an interestinhg paper on n_distinct calculation:

http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf

the PCSA algorithm described there requires O(1) calculation per
value. Page 22 describes what to do with updates streams.

This I think (disclaimer: I know little about PG internals) means that
the n_distinct estimation can be done during vacuum time (it would
play well with the visibility map addon).

What do You think?

Greetings
Marcin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] not in(subselect) in 8.4

2009-02-20 Thread marcin mank
> Just as a question to Tom and team,

maybe it`s time for asktom.postgresql.org?  Oracle has it :)

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Help w/speeding up range queries?

2006-11-02 Thread Marcin Mank
> Ie:  select FeatureID from SIMPLE_TABLE where FeatureChromosomeName like
> 'chrX' and StartPosition > 1000500 and EndPosition < 200;

How about ( this assumes that StartPosition <= EndPosition ):

select FeatureID
from SIMPLE_TABLE
where FeatureChromosomeName llike 'chrX'
and StartPosition > 1000500
and StartPosition < 200
and EndPosition > 1000500
and EndPosition < 200;


This at least should help the planner with estimating number of rows.

Also think twice when You assume that a query with ILIKE will use an index.
Read about varchar_pattern_ops.
Make an index on (FeatureChromosomeName,StartPosition) , and all should be
fine.

Greetings
Marcin


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Problems with inconsistant query performance.

2006-09-28 Thread Marcin Mank
> So the question is why on a relatively simple proc and I getting a query
> performance delta between 3549ms and 7ms?

What version of PG is it?

I had such problems in a pseudo-realtime app I use here with Postgres, and
they went away when I moved to 8.1 (from 7.4). I guess it is better shared
buffer management code (don`t You see a
big_query_searching_through_half_the_db just before You get this slow
insert? ) .

Greetings
Marcin


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Marcin Mank
> Is there anything I'm missing that is preventing it from using the index?
It
> just seems weird to me that other joins like this work fine and fast
> with indexes,
> but this one won't.


Did You consider clustering both tables on the dsiacctno index?

I just checked that for a 4M rows table even with enable_seqscan=on and
default *page_cost on PG 8.1.4 an index scan is being chosen for
select * from table order by serial_pkey_field


This is essentially the question in Your case - sort it, or get it sorted
via the index at the expense of more random IO.

I think clustering should work for You, but I am no expert, check with
others.

Greetings
Marcin


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Confirmation of bad query plan generated by 7.4 tree

2006-06-13 Thread Marcin Mank
> warehouse-# WHERE e.event_date > now() - interval '2 days'

Try explicitly querying:
WHERE e.event_date > '2006-06-11 20:15:00'

In my understanding 7.4 does not precalculate this timestamp value for the
purpose of choosing a plan.

Greetings
Marcin


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster