Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On 22/01/10 19:06, Tory M Blue wrote: Here is the explain plan for the query. Actual rows that the query returns is 6369 Actually, it processes 19,799 rows (see the actual rows= below). SLOW - Bitmap Heap Scan on userstats (cost=797.69..118850.46 rows=13399 width=8) (actual time=281.604..31190.290 rows=19799 loops=1) Total runtime: 31219.536 ms FAST - Bitmap Heap Scan on userstats a (cost=802.66..118855.43 rows=33276 width=23) (actual time=55.400..3807.908 rows=2606 loops=1) Total runtime: 3813.626 ms OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). You are asking for DISTINCT user-ids, so it's seems reasonable that it will take slightly longer to check a larger set of user-ids. Otherwise, both queries are the same. I'm still a little puzzled by the bitmap scan, but the planner probably knows more about your data than I do. The main time is spent in the bitmap heap scan which is where it's grabbing actual row data (and presumably building a hash over the uid column). you can see how long in the actual time the first number (e.g. 281.604) is the time spent before it starts, and the second is the total time at finish (31190.290). If loops was greater than 1 you would multiply the times by the number of loops to get a total. So - there's nothing wrong in the sense that the second query does the same as the first. Let's take a step back. What you really want is your reports to be faster. You mentioned you were running this query thousands of times with a different makeid each time. Running it once for all possible values and stashing the results in a temp table will probably be *much* faster. The planner can just scan the whole table once and build up its results as it goes. -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Mon, 25 Jan 2010, Richard Huxton wrote: OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). Agreed. One query is faster than the other because it has to do an eighth the amount of work. Matthew -- I wouldn't be so paranoid if you weren't all out to get me!! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Mon, Jan 25, 2010 at 3:59 AM, Matthew Wakeling matt...@flymine.org wrote: On Mon, 25 Jan 2010, Richard Huxton wrote: OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per row) The second processes 2,606 rows in 3,813 ms (about 1.3ms per row). Agreed. One query is faster than the other because it has to do an eighth the amount of work. Matthew Thanks guys, ya this has dropped the time by half. The process is manageable now. Thanks again. For some reason we thought this method would make it take more time, vs less. So again appreciate the help :) Tory -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Thu, Jan 21, 2010 at 7:46 PM, Craig Ringer cr...@postnewspapers.com.auwrote: Any assistance would be appreciated, don't worry about slapping me around I need to figure this out. Otherwise I'm buying new hardware where it may not be required. What is the reporting query that takes 26 hours? You didn't seem to include it, or any query plan information for it (EXPLAIN or EXPLAIN ANALYZE results). It's this query, run 6000 times with a diff makeid's * * *SELECT COUNT(DISTINCT uid ) AS active_users FROM pixelpool.userstats WHERE makeid ='bmw-ferman' AND tagged =true* *Plan* *Aggregate (cost=49467.00..49467.01 rows=1 width=8)* * - Bitmap Heap Scan on userstats (cost=363.49..49434.06 rows=13175 width=8)* *Recheck Cond: (makeid = 'b1mw-ferman'::text)* *Filter: tagged* *- Bitmap Index Scan on idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)* * Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged = true))* What sort of activity is happening on the db concurrently with your tests? What's your max connection limit? 50 max and there is nothing, usually one person connected if that, otherwise it's a cron job that bulk inserts and than jobs later on run that generate the reports off the static data. No deletes or updates happening. What're your shared_buffers and effective_cache_size settings? shared_buffers = 1028MB (Had this set at 128 and 256 and just recently bumped it higher, didn't buy me anything) maintenance_work_mem = 128MB fsync=on random_page_cost = 4.0 effective_cache_size = 7GB default vac settings Could sorts be spilling to disk? Check work_mem size and enable logging of tempfiles (see the manual). work_mem = 100MB# min 64kB Will do and I guess it's possible but during the queries, reports I don't see a ton of writes, mostly reads Does an explicit ANALYZE of the problem table(s) help? It didn't. Thanks Tory
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxton d...@archonet.com wrote: On 21/01/10 22:15, Tory M Blue wrote: · Data distribution = In the 98mill records, there are 7000 unique makeid's, and 21mill unique UID's. About 41mill of the records have tagged=true · Time to execute the following query with indices on makeid and tagged = 90-120 seconds. The planner uses the webid index and filters on tagged and then rechecks the webid index *SELECT COUNT(DISTINCT uid ) AS active_users FROM pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true* · Time to execute the the same query with a combined index on makeid and tagged = 60-100 seconds. The planner uses the combined index and then filters tagged. Two things: 1. You have got the combined index on (makeid, tagged) and not (tagged, makeid) haven't you? Just checking. Yes we do 2. If it's mostly tagged=true you are interested in you can always use a partial index: CREATE INDEX ... (makeid) WHERE tagged This might be a win even if you need a second index with WHERE NOT tagged. Partial index doesn't seem to fit here due to the fact that there are 35-40% Marked True. Didn't think about creating a second index for false, may give that a shot. Also, either I've not had enough cofee yet, or a bitmap scan is an odd choice for only ~ 13000 rows out of 100 million. *- Bitmap Index Scan on idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)* * Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged = true))* Otherwise, see what Craig said. I'm assuming this isn't the query that is CPU bound for a long time. Unless your table is horribly bloated, there's no reason for that judging by this plan. It is, but not always, only when there are 10K more matches. And the explain unfortunately is sometimes way high or way low, so the expalin is hit and miss. But the same sql that returns maybe 500 rows is pretty fast, it's the return of 10K+ rows that seems to stall and is CPU Bound. Thanks Tory
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On 22/01/10 18:03, Tory M Blue wrote: On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxtond...@archonet.com wrote: On 21/01/10 22:15, Tory M Blue wrote: 2. If it's mostly tagged=true you are interested in you can always use a partial index: CREATE INDEX ... (makeid) WHERE tagged This might be a win even if you need a second index with WHERE NOT tagged. Partial index doesn't seem to fit here due to the fact that there are 35-40% Marked True. Didn't think about creating a second index for false, may give that a shot. If you're mostly search tagged=true, try the partial index - it'll mean the planner is just scanning the index for the one term. Also, either I've not had enough cofee yet, or a bitmap scan is an odd choice for only ~ 13000 rows out of 100 million. *-Bitmap Index Scan on idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)* * Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged = true))* Otherwise, see what Craig said. I'm assuming this isn't the query that is CPU bound for a long time. Unless your table is horribly bloated, there's no reason for that judging by this plan. It is, but not always, only when there are 10K more matches. And the explain unfortunately is sometimes way high or way low, so the expalin is hit and miss. But the same sql that returns maybe 500 rows is pretty fast, it's the return of 10K+ rows that seems to stall and is CPU Bound. Hmm - might be able to push that cross-over point up a bit by tweaking various costs, but you've got to be careful you don't end up making all your other queries worse. It'd be good to figure out what the problem is first. Looking at the query there are four stages: 1. Scan the index, build a bitmap of heap pages with matching rows 2. Scan those pages, find the rows that match 3. Run DISTINCT on the uids 4. Count them I wonder if it could be the DISTINCT. What happens with a count(*) or count(uid) instead? Also - you might find EXPLAIN ANALYZE more useful than straight EXPLAIN here. That will show actual times for each stage. On Craig's branch of this thread, you say you call it 6000 times with different makeids. Any reason why you can't join to a temp table and just do it in one query? -- Richard Huxton Archonet Ltd -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Fri, 22 Jan 2010, Tory M Blue wrote: But the same sql that returns maybe 500 rows is pretty fast, it's the return of 10K+ rows that seems to stall and is CPU Bound. Okay, so you have two differing cases. Show us the EXPLAIN ANALYSE for both of them, and we will see what the difference is. Matthew -- The third years are wandering about all worried at the moment because they have to hand in their final projects. Please be sympathetic to them, say things like ha-ha-ha, but in a sympathetic tone of voice -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On Fri, Jan 22, 2010 at 10:59 AM, Tory M Blue tmb...@gmail.com wrote: On Thu, Jan 21, 2010 at 7:46 PM, Craig Ringer cr...@postnewspapers.com.au wrote: Any assistance would be appreciated, don't worry about slapping me around I need to figure this out. Otherwise I'm buying new hardware where it may not be required. What is the reporting query that takes 26 hours? You didn't seem to include it, or any query plan information for it (EXPLAIN or EXPLAIN ANALYZE results). It's this query, run 6000 times with a diff makeid's SELECT COUNT(DISTINCT uid ) AS active_users FROM pixelpool.userstats WHERE makeid ='bmw-ferman' AND tagged =true Any chance of trying this instead: select makeid, count(distinct uid) as active_users from pixelpool.userstats where tagged=true group by makeid And seeing how long it takes? If you're limiting the total number of makeids then you could add and makeid in (biglistofmakeidsgoeshere) Note that a partial index of create index xyz on pixelpool.userstats (makeid) where tagged; might help both the original and this query. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
On 23/01/2010 1:59 AM, Tory M Blue wrote: It's this query, run 6000 times with a diff makeid's / / /SELECT COUNT(DISTINCT uid ) AS active_users FROM pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true/ /Plan/ / Aggregate (cost=49467.00..49467.01 rows=1 width=8)/ / - Bitmap Heap Scan on userstats (cost=363.49..49434.06 rows=13175 width=8)/ / Recheck Cond: (makeid = 'b1mw-ferman'::text)/ / Filter: tagged/ / - Bitmap Index Scan on idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)/ / Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged = true))/ Try: - Adding a partial index on makeid, eg: CREATE INDEX userstats_makeid_where_tagged_idx ON userstats (makeid) WHERE (tagged); - Instead of repeating the query 6000 times in a loop, collect the data in one pass by joining against a temp table containing the makeids of interest. SELECT COUNT(DISTINCT u.uid) AS active_users FROM pixelpool.userstats u INNER JOIN temp_makeids m ON (u.makeid = m.makeid) WHERE u.tagged = true; (If the 6000 repeats are really a correlated subquery part of a bigger query you still haven't shown, then you might be able to avoid 6000 individual passes by adjusting your outer query instead). -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
The issues we are seeing besides just saying the reports take over 26 hours, is that the query seems to be CPU bound. Meaning that the query consumes an entire CPU and quite often it is sitting with 65%-90% WAIT. Now this is not iowait, the disks are fine, 5000-6000tps, 700K reads etc with maybe 10-13% iowait. However much of the time that I see the CPU at 65%-90% Wait, there is very little disk access, so it's not the disk subsystem (in my opinion). I've also moved CPU's around and the sql seems to stall regardless of what CPU the job has been provided with. Memory I pulled memory to test and again, other than this data set consuming 10gigs of data, 700K free (will add more memory), but since the disks are not a bottleneck and I don't appear to be swapping, I keep coming back to the data or sql. I'm providing the data that I think is requested when a performance issue is observed. There is an autovac running, the queries are run on static data so INSERTS/UPDATES/DELETES The query seems to have gotten slower as the data set grew. Redhat Postgres 8.3.4 8 cpu box 10gig of ram Number of rows in the table= 100million · Size of table including indices =21GB · Time to create a combined index on 2 columns (tagged boolean , makeid text) = more than 1 hr 30 minutes · Data distribution = In the 98mill records, there are 7000 unique makeid's, and 21mill unique UID's. About 41mill of the records have tagged=true · Time to execute the following query with indices on makeid and tagged = 90-120 seconds. The planner uses the webid index and filters on tagged and then rechecks the webid index *SELECT COUNT(DISTINCT uid ) AS active_users FROM pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true* · Time to execute the the same query with a combined index on makeid and tagged = 60-100 seconds. The planner uses the combined index and then filters tagged. *SELECT COUNT(DISTINCT uid ) AS active_users FROM pixelpool.userstats WHERE makeid ='bmw-ferman' AND tagged =true* *Plan* *Aggregate (cost=49467.00..49467.01 rows=1 width=8)* * - Bitmap Heap Scan on userstats (cost=363.49..49434.06 rows=13175 width=8)* *Recheck Cond: (makeid = 'b1mw-ferman'::text)* *Filter: tagged* *- Bitmap Index Scan on idx_retargetuserstats_makeidtag (cost=0.00..360.20 rows=13175 width=0)* * Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged = true))* Any assistance would be appreciated, don't worry about slapping me around I need to figure this out. Otherwise I'm buying new hardware where it may not be required. Thanks Tory * *
Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL
Tory M Blue wrote: Any assistance would be appreciated, don't worry about slapping me around I need to figure this out. Otherwise I'm buying new hardware where it may not be required. What is the reporting query that takes 26 hours? You didn't seem to include it, or any query plan information for it (EXPLAIN or EXPLAIN ANALYZE results). What sort of activity is happening on the db concurrently with your tests? What's your max connection limit? What're your shared_buffers and effective_cache_size settings? Could sorts be spilling to disk? Check work_mem size and enable logging of tempfiles (see the manual). Does an explicit ANALYZE of the problem table(s) help? -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance