Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-25 Thread Richard Huxton

On 22/01/10 19:06, Tory M Blue wrote:

 Here is the explain plan for the query. Actual rows that the query
 returns is 6369

Actually, it processes 19,799 rows (see the actual rows= below).


SLOW



  -   Bitmap Heap Scan on userstats  (cost=797.69..118850.46
rows=13399 width=8) (actual time=281.604..31190.290 rows=19799
loops=1)



Total runtime: 31219.536 ms



FAST



  -   Bitmap Heap Scan on userstats a  (cost=802.66..118855.43
rows=33276 width=23) (actual time=55.400..3807.908 rows=2606 loops=1)



Total runtime: 3813.626 ms


OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms 
per row)


The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).

You are asking for DISTINCT user-ids, so it's seems reasonable that it 
will take slightly longer to check a larger set of user-ids.


Otherwise, both queries are the same. I'm still a little puzzled by the 
bitmap scan, but the planner probably knows more about your data than I do.


The main time is spent in the bitmap heap scan which is where it's 
grabbing actual row data (and presumably building a hash over the uid 
column). you can see how long in the actual time the first number 
(e.g. 281.604) is the time spent before it starts, and the second is the 
total time at finish (31190.290). If loops was greater than 1 you 
would multiply the times by the number of loops to get a total.


So - there's nothing wrong in the sense that the second query does the 
same as the first. Let's take a step back. What you really want is your 
reports to be faster.


You mentioned you were running this query thousands of times with a 
different makeid each time. Running it once for all possible values 
and stashing the results in a temp table will probably be *much* faster. 
The planner can just scan the whole table once and build up its results 
as it goes.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-25 Thread Matthew Wakeling

On Mon, 25 Jan 2010, Richard Huxton wrote:
OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms per 
row)


The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).


Agreed. One query is faster than the other because it has to do an eighth 
the amount of work.


Matthew

--
I wouldn't be so paranoid if you weren't all out to get me!!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-25 Thread Tory M Blue
On Mon, Jan 25, 2010 at 3:59 AM, Matthew Wakeling matt...@flymine.org wrote:
 On Mon, 25 Jan 2010, Richard Huxton wrote:

 OK - so the first query processes 19,799 rows in 31,219 ms (about 1.5ms
 per row)

 The second processes 2,606 rows in 3,813 ms (about 1.3ms per row).

 Agreed. One query is faster than the other because it has to do an eighth
 the amount of work.

 Matthew

Thanks guys, ya this has dropped the time by half. The process is
manageable now. Thanks again. For some reason we thought this method
would make it take more time, vs less. So again appreciate the help :)

Tory

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Tory M Blue
On Thu, Jan 21, 2010 at 7:46 PM, Craig Ringer
cr...@postnewspapers.com.auwrote:


  Any assistance would be appreciated, don't worry about slapping me
  around I need to figure this out. Otherwise I'm buying new hardware
  where it may not be required.

 What is the reporting query that takes 26 hours? You didn't seem to
 include it, or any query plan information for it (EXPLAIN or EXPLAIN
 ANALYZE results).


It's this query, run 6000 times with a diff makeid's  *
*

*SELECT COUNT(DISTINCT uid )  AS active_users FROM pixelpool.userstats
WHERE makeid ='bmw-ferman' AND tagged =true*

*Plan*

*Aggregate  (cost=49467.00..49467.01 rows=1 width=8)*

*  -  Bitmap Heap Scan on userstats
(cost=363.49..49434.06 rows=13175 width=8)*

*Recheck Cond: (makeid = 'b1mw-ferman'::text)*

*Filter: tagged*

*-  Bitmap Index Scan on
idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)*

*  Index Cond: ((makeid = 'b1mw-ferman'::text)
AND (tagged = true))*


 What sort of activity is happening on the db concurrently with your
 tests? What's your max connection limit?


50 max and there is nothing, usually one person connected if that, otherwise
it's a cron job that bulk inserts and than jobs later on run that generate
the reports off the static data. No deletes or updates happening.



 What're your shared_buffers and effective_cache_size settings?


shared_buffers = 1028MB  (Had this set at 128 and 256 and just recently
bumped it higher, didn't buy me anything)
maintenance_work_mem = 128MB
fsync=on
random_page_cost = 4.0
effective_cache_size = 7GB
default vac settings



 Could sorts be spilling to disk? Check work_mem size and enable logging
 of tempfiles (see the manual).


work_mem = 100MB# min 64kB

Will do and I guess it's possible but during the queries, reports I don't
see a ton of writes, mostly reads


 Does an explicit ANALYZE of the problem table(s) help?


It didn't.

Thanks
Tory


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Tory M Blue
On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxton d...@archonet.com wrote:

 On 21/01/10 22:15, Tory M Blue wrote:

 · Data distribution = In the 98mill records, there are 7000 unique

 makeid's, and 21mill unique UID's. About 41mill of the records have
 tagged=true

 · Time to execute the following query with indices on makeid and
 tagged = 90-120 seconds. The planner uses the webid index and filters on
 tagged and then rechecks the webid index

 *SELECT COUNT(DISTINCT uid )  AS active_users FROM
 pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true*

 · Time to execute the the same query with a combined index on
 makeid
 and tagged = 60-100 seconds. The planner uses the combined index and then
 filters tagged.


 Two things:

 1. You have got the combined index on (makeid, tagged) and not (tagged,
 makeid) haven't you? Just checking.


Yes we do


 2. If it's mostly tagged=true you are interested in you can always use a
 partial index: CREATE INDEX ... (makeid) WHERE tagged
 This might be a win even if you need a second index with WHERE NOT tagged.


Partial index doesn't seem to fit here due to the fact that there are 35-40%
Marked True.

Didn't think about creating a second index for false, may give that a shot.



 Also, either I've not had enough cofee yet, or a bitmap scan is an odd
 choice for only ~ 13000 rows out of 100 million.

  *-   Bitmap Index Scan on

 idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)*

 *  Index Cond: ((makeid =
 'b1mw-ferman'::text)
 AND (tagged = true))*


 Otherwise, see what Craig said.

 I'm assuming this isn't the query that is CPU bound for a long time. Unless
 your table is horribly bloated, there's no reason for that judging by this
 plan.


It is, but not always, only when there are 10K more matches. And the explain
unfortunately is sometimes way high or way low, so the expalin is hit and
miss.

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.

Thanks

Tory


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Richard Huxton

On 22/01/10 18:03, Tory M Blue wrote:

On Fri, Jan 22, 2010 at 1:42 AM, Richard Huxtond...@archonet.com  wrote:


On 21/01/10 22:15, Tory M Blue wrote:



2. If it's mostly tagged=true you are interested in you can always use a
partial index: CREATE INDEX ... (makeid) WHERE tagged
This might be a win even if you need a second index with WHERE NOT tagged.



Partial index doesn't seem to fit here due to the fact that there are 35-40%
Marked True.

Didn't think about creating a second index for false, may give that a shot.


If you're mostly search tagged=true, try the partial index - it'll mean 
the planner is just scanning the index for the one term.



Also, either I've not had enough cofee yet, or a bitmap scan is an odd
choice for only ~ 13000 rows out of 100 million.

  *-Bitmap Index Scan on


idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)*

*  Index Cond: ((makeid =
'b1mw-ferman'::text)
AND (tagged = true))*



Otherwise, see what Craig said.

I'm assuming this isn't the query that is CPU bound for a long time. Unless
your table is horribly bloated, there's no reason for that judging by this
plan.


It is, but not always, only when there are 10K more matches. And the explain
unfortunately is sometimes way high or way low, so the expalin is hit and
miss.

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.


Hmm - might be able to push that cross-over point up a bit by tweaking 
various costs, but you've got to be careful you don't end up making all 
your other queries worse. It'd be good to figure out what the problem is 
first.


Looking at the query there are four stages:
 1. Scan the index, build a bitmap of heap pages with matching rows
 2. Scan those pages, find the rows that match
 3. Run DISTINCT on the uids
 4. Count them
I wonder if it could be the DISTINCT. What happens with a count(*) or 
count(uid) instead? Also - you might find EXPLAIN ANALYZE more useful 
than straight EXPLAIN here. That will show actual times for each stage.


On Craig's branch of this thread, you say you call it 6000 times with 
different makeids. Any reason why you can't join to a temp table and 
just do it in one query?


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Matthew Wakeling

On Fri, 22 Jan 2010, Tory M Blue wrote:

But the same sql that returns maybe 500 rows is pretty fast, it's the return
of 10K+ rows that seems to stall and is CPU Bound.


Okay, so you have two differing cases. Show us the EXPLAIN ANALYSE for 
both of them, and we will see what the difference is.


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like ha-ha-ha, but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Scott Marlowe
On Fri, Jan 22, 2010 at 10:59 AM, Tory M Blue tmb...@gmail.com wrote:
 On Thu, Jan 21, 2010 at 7:46 PM, Craig Ringer cr...@postnewspapers.com.au
 wrote:
  Any assistance would be appreciated, don't worry about slapping me
  around I need to figure this out. Otherwise I'm buying new hardware
  where it may not be required.

 What is the reporting query that takes 26 hours? You didn't seem to
 include it, or any query plan information for it (EXPLAIN or EXPLAIN
 ANALYZE results).

 It's this query, run 6000 times with a diff makeid's

 SELECT COUNT(DISTINCT uid )  AS active_users FROM pixelpool.userstats
 WHERE makeid ='bmw-ferman' AND tagged =true

Any chance of trying this instead:

select makeid, count(distinct uid) as active_users from
pixelpool.userstats where tagged=true group by makeid

And seeing how long it takes?  If you're limiting the total number of
makeids then you could add

and makeid in (biglistofmakeidsgoeshere)

Note that a partial index of

create index xyz on pixelpool.userstats (makeid) where tagged;

might help both the original and this query.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-22 Thread Craig Ringer

On 23/01/2010 1:59 AM, Tory M Blue wrote:


It's this query, run 6000 times with a diff makeid's /
/

/SELECT COUNT(DISTINCT uid )  AS active_users FROM
pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true/

/Plan/

/ Aggregate  (cost=49467.00..49467.01 rows=1 width=8)/

/   -  Bitmap Heap Scan on userstats  (cost=363.49..49434.06
rows=13175 width=8)/

/ Recheck Cond: (makeid = 'b1mw-ferman'::text)/

/ Filter: tagged/

/ -  Bitmap Index Scan on idx_retargetuserstats_makeidtag
(cost=0.00..360.20 rows=13175 width=0)/

/   Index Cond: ((makeid = 'b1mw-ferman'::text) AND (tagged
= true))/


Try:

- Adding a partial index on makeid, eg:

   CREATE INDEX userstats_makeid_where_tagged_idx
   ON userstats (makeid) WHERE (tagged);

- Instead of repeating the query 6000 times in a loop, collect the data 
in one pass by joining against a temp table containing the makeids of 
interest.


SELECT COUNT(DISTINCT u.uid) AS active_users
FROM pixelpool.userstats u
INNER JOIN temp_makeids m ON (u.makeid = m.makeid)
WHERE u.tagged = true;

(If the 6000 repeats are really a correlated subquery part of a bigger 
query you still haven't shown, then you might be able to avoid 6000 
individual passes by adjusting your outer query instead).


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-21 Thread Tory M Blue
The issues we are seeing besides just saying the reports take over 26 hours,
is that the query seems to be CPU bound. Meaning that the query consumes an
entire CPU and quite often it is sitting with 65%-90% WAIT. Now this is not
iowait, the disks are fine, 5000-6000tps, 700K reads etc with maybe 10-13%
iowait.

However much of the time that I see the CPU at 65%-90% Wait, there is very
little disk access, so it's not the disk subsystem (in my opinion). I've
also moved CPU's around and the sql seems to stall regardless of what CPU
the job has been provided with. Memory I pulled memory to test and again,
other than this data set consuming 10gigs of data, 700K free (will add more
memory), but since the disks are not a bottleneck and I don't appear to be
swapping, I keep coming back to the data or sql.

I'm providing the data that I think is requested when a performance issue is
observed.

There is an autovac running, the queries are run on static data so
INSERTS/UPDATES/DELETES


The query seems to have gotten slower as the data set grew.

Redhat
Postgres 8.3.4
8 cpu box
10gig of ram



Number of rows in the table= 100million

· Size of table including indices =21GB

· Time to create a combined index on 2 columns (tagged boolean ,
makeid text) = more than 1 hr 30 minutes

· Data distribution = In the 98mill records, there are 7000 unique
makeid's, and 21mill unique UID's. About 41mill of the records have
tagged=true

· Time to execute the following query with indices on makeid and
tagged = 90-120 seconds. The planner uses the webid index and filters on
tagged and then rechecks the webid index

*SELECT COUNT(DISTINCT uid )  AS active_users FROM
pixelpool.userstatsWHERE makeid ='bmw-ferman' AND tagged =true*

· Time to execute the the same query with a combined index on makeid
and tagged = 60-100 seconds. The planner uses the combined index and then
filters tagged.

*SELECT COUNT(DISTINCT uid )  AS active_users FROM pixelpool.userstats
WHERE makeid ='bmw-ferman' AND tagged =true*

*Plan*

*Aggregate  (cost=49467.00..49467.01 rows=1 width=8)*

*  -  Bitmap Heap Scan on userstats
(cost=363.49..49434.06 rows=13175 width=8)*

*Recheck Cond: (makeid = 'b1mw-ferman'::text)*

*Filter: tagged*

*-  Bitmap Index Scan on
idx_retargetuserstats_makeidtag  (cost=0.00..360.20 rows=13175 width=0)*

*  Index Cond: ((makeid = 'b1mw-ferman'::text)
AND (tagged = true))*


Any assistance would be appreciated, don't worry about slapping me around I
need to figure this out. Otherwise I'm buying new hardware where it may not
be required.

Thanks

Tory

*
*


Re: [PERFORM] Data Set Growth causing 26+hour runtime, on what we believe to be very simple SQL

2010-01-21 Thread Craig Ringer
Tory M Blue wrote:

 Any assistance would be appreciated, don't worry about slapping me
 around I need to figure this out. Otherwise I'm buying new hardware
 where it may not be required.

What is the reporting query that takes 26 hours? You didn't seem to
include it, or any query plan information for it (EXPLAIN or EXPLAIN
ANALYZE results).

What sort of activity is happening on the db concurrently with your
tests? What's your max connection limit?

What're your shared_buffers and effective_cache_size settings?

Could sorts be spilling to disk? Check work_mem size and enable logging
of tempfiles (see the manual).

Does an explicit ANALYZE of the problem table(s) help?

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance