Re: [PERFORM] Calculating 95th percentiles
On Fri, Mar 4, 2011 at 4:18 PM, Landreville wrote: > create temporary table deltas on commit drop as > select * from get_delta_table(p_switchport_id, p_start_date, > p_end_date); > > select round(count(volume_id) * 0.95) into v_95th_row from deltas; > select in_rate into v_record.in_95th from deltas where > in_rate_order = v_95th_row; > select out_rate into v_record.out_95th from deltas where > out_rate_order = v_95th_row; > select sum(in_delta), sum(out_delta) into v_record.in_total, > v_record.out_total from deltas; > > Unfortunately using a temporary table means that I cannot run this > query on the read-only slave, but I can't see a way around using one. Is this fast enough on a slave: with deltas as (select * from get_delta_table(...)), p95 as(select round(count(volume_id) * 0.95) as p95v from deltas) select (select in_rate from deltas, p95 where in_rate_order = p95v), (select out_rate from deltas, p95 where out_rate_order = p95v) etc.. ? Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] high user cpu, massive SELECTs, no io waiting problem
On Tue, Feb 15, 2011 at 6:19 PM, Thomas Pöhler wrote: > Hi list, > > See ganglia: http://dl.dropbox.com/u/183323/CPUloadprobsdb1.jpg > What is the bottom graph? queries/minute? Looks like Your database is just getting hammered. Maybe there is a really badly coded page somewhere (a query for each user or something similar)? Greetings Marcin Mańk -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server to improve performance on our large and busy DB - advice?
On Thu, Jan 14, 2010 at 8:17 PM, Carlo Stonebanks wrote: > . 48 GB RAM > 2) Which Windows OS would you recommend? (currently 2008 x64 Server) There is not a 64-bit windows build now - You would be limited to shared_buffers at about a gigabyte. Choose Linux Greetings Marcin Mańk -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Massive table (500M rows) update nightmare
> every update is a UPDATE ... WHERE id >>= x AND id < x+10 and a commit is performed after every 1000 updates > statement, i.e. every 1 rows. What is the rationale behind this? How about doing 10k rows in 1 update, and committing every time? You could try making the condition on the ctid column, to not have to use the index on ID, and process the rows in physical order. First make sure that newly inserted production data has the correct value in the new column, and add 'where new_column is null' to the conditions. But I have never tried this, use at Your own risk. Greetings Marcin Mank -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] DELETE performance problem
On Tue, Nov 24, 2009 at 2:37 PM, Luca Tettamanti wrote: > -> HashAggregate (cost=1031681.15..1033497.20 rows=181605 width=8) > (a > ctual time=571807.575..610178.552 rows=26185953 loops=1) This is Your problem. The system`s estimate for the number of distinct annotation_ids in t2 is wildly off. The disk activity is almost certainly swapping (You can check it iostat on the linux machine). Can You try "analyze t2" just before the delete quety? maybe try raising statistics target for the annotation_id column. If all else fails, You may try "set enable_hashagg to false" just before the query. Greetings Marcin Mańk Greetings Marcin Mańk -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] query optimization
On Tue, Nov 24, 2009 at 12:49 AM, Faheem Mitha wrote: > > Yes, sorry. I'm using Postgresql 8.4. I guess I should go through diag.pdf > and make sure all the information is current. Thanks for pointing out my > error. > excellent report! about the copy problem: You seem to have created the primary key before doing the copy (at least that`s what the dump before copy says). This is bad. Create it after the copy. Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres query completion status?
>>> max_connections = 500 # (change requires restart) >>> work_mem = 256MB # min 64kB >> >> Not that it has to do with your current problem but this combination could >> bog your server if enough clients run sorted queries simultaneously. >> You probably should back on work_mem at least an order of magnitude. >> > > What's the correct way to configure this? > > * We have one client which needs to run really big transactions (therefore > needs the work memory). > You can set the work_mem for the specific user (like "set work_mem to x") at the begginning of the session. Here are some things I noticed (it is more like shooting in the dark, but still...) the expensive part is this: -> Sort (cost=280201.66..281923.16 rows=688602 width=300) (actual time=177511.806..183486.593 rows=41317448 loops=1) Sort Key: du_report_sku.wid, du_report_sku.storeorderid, du_report_sku.genreorderid Sort Method: external sort Disk: 380768kB -> HashAggregate (cost=197936.75..206544.27 rows=688602 width=36) (actual time=7396.426..11224.839 rows=6282564 loops=1) -> Seq Scan on du_report_sku (cost=0.00..111861.61 rows=6886011 width=36) (actual time=0.006..573.419 rows=6897682 loops=1) (it is pretty confusing that the HashAggregate reports ~6M rows, but the sort does 41M rows, but maybe I can not read this). Anyway, I think that if You up the work_mem for this query to 512M, the sort will be in memory, an thus plenty faster. Also, You say You are experiencing unstable query plans, and this may mean that geqo is kicking in (but Your query seems too simple for that, even considering the views involved). A quick way to check that would be to run explain a coule tens of times, and check if the plans change. If they do, try upping geqo_threshold. You have seq_page_cost 4 times larger than random_page_cost. You say You are on SSD, so there is no random access penalty. Try setting them equal. Your plan is full of merge-joins, some indices may be in order. Merge join is a kind of "last-chance" plan. the query is : SELECT ( core.demand.qty - viwcs.wave_end_demand.qty_remaining ) FROM core.demand, viwcs.previous_wave LEFT OUTER JOIN viwcs.wave_end_demand USING ( wid ) WHERE core.demand.id = viwcs.wave_end_demand.demand_id; Isn`t the left join equivalent to an inner join, since in where You are comparing values from the outer side of the join? If they come out nulls, they will get discarded anyway... I hope You find some of this useful. Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help Me Understand Why I'm Getting a Bad Query Plan
There is one thing I don`t understand: -> Nested Loop (cost=0.00..180564.28 rows=1806 width=37) (actual time=0.192..60.214 rows=3174 loops=1) -> Index Scan using visitors_userid_index2 on visitors v (cost=0.00..2580.97 rows=1300 width=33) (actual time=0.052..2.342 rows=897 loops=1) Index Cond: (userid = 'fbe2537f21d94f519605612c0bf7c2c5'::bpchar) -> Index Scan using itemexperiencelog__index__visitorid on itemexperiencelog l (cost=0.00..134.04 rows=230 width=70) (actual time=0.013..0.040 rows=4 loops=897) Index Cond: (l.visitorid = v.id) If it expects 1300 visitors with the userid, and for each of them to have 230 entries in itemexperiencelog, how can it come up with 1806 returned rows (and be about right!)? Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT confuses the planner
> I hit an interestinhg paper on n_distinct calculation: > > http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf > > the PCSA algorithm described there requires O(1) calculation per > value. Page 22 describes what to do with updates streams. > > This I think (disclaimer: I know little about PG internals) means that > the n_distinct estimation can be done during vacuum time (it would > play well with the visibility map addon). > > What do You think? ok, if You think that calculating a has function of every data field for each insert or delete is prohibitive, just say so and don`t bother reading the paper :] Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] LIMIT confuses the planner
> So the bottom line here is just that the estimated n_distinct is too > low. We've seen before that the equation we use tends to do that more > often than not. I doubt that consistently erring on the high side would > be better though :-(. Estimating n_distinct from a limited sample of > the population is known to be a statistically hard problem, so we'll > probably not ever have perfect answers, but doing better is on the > to-do list. > I hit an interestinhg paper on n_distinct calculation: http://www.pittsburgh.intel-research.net/people/gibbons/papers/distinct-values-chapter.pdf the PCSA algorithm described there requires O(1) calculation per value. Page 22 describes what to do with updates streams. This I think (disclaimer: I know little about PG internals) means that the n_distinct estimation can be done during vacuum time (it would play well with the visibility map addon). What do You think? Greetings Marcin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] not in(subselect) in 8.4
> Just as a question to Tom and team, maybe it`s time for asktom.postgresql.org? Oracle has it :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Help w/speeding up range queries?
> Ie: select FeatureID from SIMPLE_TABLE where FeatureChromosomeName like > 'chrX' and StartPosition > 1000500 and EndPosition < 200; How about ( this assumes that StartPosition <= EndPosition ): select FeatureID from SIMPLE_TABLE where FeatureChromosomeName llike 'chrX' and StartPosition > 1000500 and StartPosition < 200 and EndPosition > 1000500 and EndPosition < 200; This at least should help the planner with estimating number of rows. Also think twice when You assume that a query with ILIKE will use an index. Read about varchar_pattern_ops. Make an index on (FeatureChromosomeName,StartPosition) , and all should be fine. Greetings Marcin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Problems with inconsistant query performance.
> So the question is why on a relatively simple proc and I getting a query > performance delta between 3549ms and 7ms? What version of PG is it? I had such problems in a pseudo-realtime app I use here with Postgres, and they went away when I moved to 8.1 (from 7.4). I guess it is better shared buffer management code (don`t You see a big_query_searching_through_half_the_db just before You get this slow insert? ) . Greetings Marcin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance With Joins on Large Tables
> Is there anything I'm missing that is preventing it from using the index? It > just seems weird to me that other joins like this work fine and fast > with indexes, > but this one won't. Did You consider clustering both tables on the dsiacctno index? I just checked that for a 4M rows table even with enable_seqscan=on and default *page_cost on PG 8.1.4 an index scan is being chosen for select * from table order by serial_pkey_field This is essentially the question in Your case - sort it, or get it sorted via the index at the expense of more random IO. I think clustering should work for You, but I am no expert, check with others. Greetings Marcin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Confirmation of bad query plan generated by 7.4 tree
> warehouse-# WHERE e.event_date > now() - interval '2 days' Try explicitly querying: WHERE e.event_date > '2006-06-11 20:15:00' In my understanding 7.4 does not precalculate this timestamp value for the purpose of choosing a plan. Greetings Marcin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster