date:20110512

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Greg Smith


Tomas Vondra wrote:

Stats about access to the data (index/seq scans, cache hit ratio etc.)
are stored in pg_stat_* and pg_statio_* catalogs, and are updated after
running each query. AFAIK it's not a synchronous process, but when a
backend finishes a query, it sends the stats to the postmaster (and
postmaster updates the catalogs).
  


Description in the docs goes over this in a little more detail 
http://www.postgresql.org/docs/current/static/monitoring-stats.html :


"The statistics collector communicates with the backends needing 
information (including autovacuum) through temporary files. These files 
are stored in the pg_stat_tmp subdirectory...When using the statistics 
to monitor current activity, it is important to realize that the 
information does not update instantaneously. Each individual server 
process transmits new statistical counts to the collector just before 
going idle; so a query or transaction still in progress does not affect 
the displayed totals. Also, the collector itself emits a new report at 
most once per PGSTAT_STAT_INTERVAL milliseconds (500 unless altered 
while building the server). So the displayed information lags behind 
actual activity. However, current-query information collected by 
track_activities is always up-to-date."


It's not synchronous at all.  The clients create a temporary file for 
the statistics collector and move on.  The actual statistics don't get 
updated until the statistics collector decides enough time has passed to 
bother, which defaults to at most every 500ms.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Checkpoint execution overrun impact?

2011-05-12 Thread Greg Smith


drvillo wrote:

-given the configuration attached (which is basically a vanilla one) and the
number of buffers written at each execution, are these execution times
normal or above average? 
  


Given the configuration attached, most of them are normal.  One problem 
may be that your vanilla configuration has checkpoint_segments set to 
3.  There is some logic in the checkpoint code to try and spread 
checkpoint writes out over a longer period of time.  The intention is 
for a slower write spread to disrupt concurrent client activity less.  
It doesn't work all that well unless you give it some more segments to 
work with.


Also, with the default setting for shared_buffers, you are doing a lot 
more redundant writes than you should be.  The following postgresql.conf 
changes should improve things for you:


shared_buffers=256MB
checkpoint_segments=10
wal_buffers=16MB

You may have to adjust your kernel shared memory memory settings for 
that to work.  See 
http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server for an 
intro to these and the other common parameters you should consider 
adjusting.



-in the case of the execution that overruns past the timeout, what are the
implications wrt the client application? 
  


There really aren't any in the database.  The server will immediately 
begin another checkpoint.  Some additional disk space is used.  So long 
as the server doesn't run out of disk space from that, clients shouldn't 
care.




-AFAIU client connections are basically stalled during checkpoints. Is it
reasonable to infer that the fact that the application blocking on a
getConnection() might be related to checkpoints being executed?
  


It can be.  What I suspect is happening during the bad one:

2011-04-22 06:51:41 CEST LOG:  checkpoint complete: wrote 108 buffers
(2.6%); 0 transaction log file(s) added, 0 removed, 0 recycled;
write=409.007 s, sync=4.672 s, total=414.070 s
2011-04-22 06:55:42 CEST LOG:  could not receive data from client: No
connection could be made because the target machine actively refused it.


Is that something is happening on the disks of the server that keeps the 
database from being able to write efficiently during this checkpoint.  
It then slows the checkpoint so much that clients are timing out.


The tuning changes I suggested will lower the total amount of I/O the 
server does between checkpoints, which will mean there is less 
information in the OS cache to write out when the checkpoint comes.  
That may help, if the problem is really in the database.



-considering some tuning on the PG side, should I try increasing
checkpoint_timeout and rising checkpoint_completion_target to lessen the
impact of IO on the client or should I shorten the period so there's less
stuff to write? from the number of buffers written on average I'd assume the
first option is the one to go for but I might miss some bit of reasoning
here...
  


Your problems are likely because the operating system cache is getting 
filled with something that is slowing checkpoints down.  Maybe it's the 
regular database writes during the five minutes between checkpoints; 
maybe it's something else running on the server.  Whatever is happening, 
you're unlikely to make it better by adjusting how often they happen.  
Either get the database to write less between checkpoints (like the 
changes I suggested), or figure out what else is doing the writes.  I 
suspect they are coming from outside the database, only because if you 
really had high write activity on this server you'd also be having 
checkpoints more frequently, too.




I've read about
RAID5 not being a wise setup for disks hosting PG, what about RAID1?
  


The problem with RAID5 is that it lowers write performance of a larger 
number of disks so it's potentially no better than a single drive.  
RAID1 is essentially a single drive, too.  You may discover you're just 
running over what one drive can do.  Something odd does seem to be doing 
on though.  Normally in your situation I would try to find some system 
downtime and test the read/write speed of the drives, look for issues 
there.  As Robert said already, you shouldn't be running this slowly 
unless there's something going wrong.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] tuning on ec2

2011-05-12 Thread Josh Berkus


> Sounds like a reasonable starting point.  You could certainly fiddle
> around a bit - especially with shared_buffers - to see if some other
> setting works better, but that should be in the ballpark.

I tend to set it a bit higher on EC2 to discourage the VM from
overcommitting memory I need.  So, I'd do 2.5GB for that one.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] 'Interesting' prepared statement slowdown on large table join

2011-05-12 Thread Jeff Janes

On Thu, May 12, 2011 at 8:53 AM, Prodan, Andrei
 wrote:
>
> @Jeff: thank you for the clear plan interpretation - but I'm afraid I
> don't really understand the second bit:
> 1) I provided the GOOD plan, so we already know what postgres thinks,
> right? (Later edit: guess not. Doesn't work)
> 2) There's no full table scan in any of the plans - it scans indices,
> the problem seems to be that it scans them in the wrong order because it
> thinks there are very few WHERE matches in big_table - which is
> incorrect, as for that particular pair there is a huge amount of rows.

Hi Andrei,

"Explain analyze" only gives you the cost/rows for the plan components
it actually executed, it doesn't give you costs for alternative
rejected plans.  Since the GOOD PLAN doesn't include the index scan in
question, it doesn't give the estimated or actual rows for that scan
under the stats/conditions that provoke the GOOD PLAN to be adopted.
So to get that information, you have to design an experimental
prepared query that will get executed using that particular scan, that
way it will report the results I wanted to see.  My concern is that
the experimental query I proposed you use might instead decide to use
a full table scan rather than the desired index scan.   Although come
to think of it, I think the same code will be used to arrive at the
predicted number of rows regardless of whether it does a FTS or the
desired index scan.

Cheers,

Jeff

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] setting configuration values inside a stored proc

2011-05-12 Thread Samuel Gendler

I've got a stored proc that constructs some aggregation queries as strings
and then executes them.  I'd like to be able to increase work_mem before
running those queries.  If I set a new value for work_mem within the stored
proc prior to executing my query string, will that actually have an impact
on the query or is work_mem basically a constant once the outer statement
that calls the stored proc has begun?  I'd just test, but it will take hours
for me to grab a copy of production data and import into a new db host for
testing. I've already started that process, but I'm betting I'll have an
answer by the time it completes.  It's just the difference between modifying
the application which calls the procs (and doing a full software release in
order to do so or else waiting a month to go in the next release) vs
modifying the procs themselves, which requires only db a update.

--sam

Re: [PERFORM] Poor performance when joining against inherited tables

2011-05-12 Thread Maciek Sakrejda

> It says the sequential scan has a cost that's way too high, and I'm
> presuming that's why it's choosing the extremely slow plan over the much
> faster plan.

Well, not exactly. It's giving you that cost because you disabled
seqscan, which actually just bumps the cost really high:

postgres=# create temporary table foo as select generate_series(1,3);
SELECT
postgres=# explain analyze select * from foo;
   QUERY PLAN
-
 Seq Scan on foo  (cost=0.00..34.00 rows=2400 width=4) (actual
time=0.010..0.012 rows=3 loops=1)
 Total runtime: 2.591 ms
(2 rows)

postgres=# set enable_seqscan to false;
SET
postgres=# explain analyze select * from foo;
 QUERY PLAN

 Seq Scan on foo  (cost=100.00..134.00 rows=2400
width=4) (actual time=0.004..0.007 rows=3 loops=1)
 Total runtime: 0.037 ms
(2 rows)


As far as I know, there is no hard way to disable any given plan
option, since sometimes that may be the only choice.

The (estimated) cost of the seq scan chosen here is *not* the same as
the cost of the scan when the planner actually considers this plan (in
fact, that will the same as the one in the first plan).

However, note the cost of the Index Scan nodes in the second plan:
they are *higher* than their corresponding Seq Scan nodes (in the
first plan), which is why you get the first plan when seq can *is*
enabled.

Also, your plan output looks like plain EXPLAIN and not EXPLAIN
ANALYZE (i.e., the "actual time" nodes are missing).

Other than that, I think Shaun's comments apply.

---
Maciek Sakrejda | System Architect | Truviso

1065 E. Hillsdale Blvd., Suite 215
Foster City, CA 94404
(650) 242-3500 Main
www.truviso.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Tomas Vondra

Dne 12.5.2011 22:03, Willy-Bas Loos napsal(a):
> Then, are the index scans counted in a memory variable and written at
> analyze time?

No, I believe raghu mixed two things - stats used by the planner and
stats about access to the data (how many tuples were read using an
index, etc.)

Stats for the planner are stored in pg_class/pg_statistic/pg_stats
catalogs and are updated by ANALYZE (either manual or automatic). This
is what raghu refered to, but these stats are completely useless when
looking for unused indexes.

Stats about access to the data (index/seq scans, cache hit ratio etc.)
are stored in pg_stat_* and pg_statio_* catalogs, and are updated after
running each query. AFAIK it's not a synchronous process, but when a
backend finishes a query, it sends the stats to the postmaster (and
postmaster updates the catalogs).

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread raghu ram

On Thu, May 12, 2011 at 9:09 PM, Willy-Bas Loos  wrote:

> Hi,
>
> We have some indexes that don't seem to be used at all.
> I'd like to know since when they have not been used.
> That is, the time when postgres started counting to reach the number that
> is in pg_stat_user_indexes.idx_scan
>
> Is there a way to retrieve that from the database ?
>


"Analyze" activity will update the statistics of each catalog table.

pg_postmaster_start_time --> Retrieves the Postmaster [ PostgreSQL Instance]
start time

postgres=# select pg_postmaster_start_time();

--Raghu Ram

Re: [PERFORM] Poor performance when joining against inherited tables

2011-05-12 Thread Lucas Madar


On 05/11/2011 09:38 AM, Robert Haas wrote:

However, if I disable seqscan (set enable_seqscan=false), I get the
following plan:

  QUERY PLAN

  Hash Join  (cost=10001298843.53..290002337961.71 rows=8643757 width=506)
   Hash Cond: (f.id = objects.id)
   ->Append  (cost=100.00..29536334.43 rows=8643757 width=20)
 ->Seq Scan on item f  (cost=100.00..126.30
rows=1630 width=20)
 ->Index Scan using xxx_pkey on itemXX f  (cost=0.00..10.60 rows=90
width=20)
 ->Index Scan using yyy_pkey on itemYY f  (cost=0.00..25.24
rows=266 width=20)
 ->Index Scan using zzz_pkey on itemZZ f  (cost=0.00..9.28 rows=2
width=20)
 ...
   ->Hash  (cost=999347.17..999347.17 rows=3941949 width=490)
 ->Index Scan using objects_pkey on objects (cost=0.00..999347.17
rows=3941949 width=490)

This seems like a much more sensible query plan.

I don't think so.  Scanning the index to extract all the rows in a
table is typically going to be a lot slower than a sequential scan.

A more interesting question is why you're not getting a plan like this:

Nested Loop
->  Seq Scan on objects
->  Append
->   Index Scan using xxx_pkey on itemXX
->   Index Scan using yyy_pkey on itemYY
->   Index Scan using zzz_pkey on itemZZ


Compared to the previous query plan (omitted in this e-mail, in which 
the planner was scanning all the item tables sequentially), the second 
query is much more desirable. It takes about 12 seconds to complete, 
versus the other query which I canceled after six hours. However, what 
you propose seems to make even more sense.



But it seems to think doing
a sequential scan on the *empty* item table is excessively expensive in this
case.

Aside from enable_seqscan=false, is there any way I can make the query
planner not balk over doing a seqscan on an empty table?

Why would you care?  A sequential scan of an empty table is very fast.


My issue is that it looks like it's avoiding the sequential scan:

Seq Scan on item f  (cost=100.00..126.30 rows=1630 width=20)

It says the sequential scan has a cost that's way too high, and I'm 
presuming that's why it's choosing the extremely slow plan over the much 
faster plan. I don't know very much about plans, but I'm assuming the 
planner chooses the plan with the lowest cost.


I'd much prefer it *does* the sequential scan of the empty table and 
goes with the other parts of the plan.


Thanks,
Lucas Madar

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Tom Lane

Greg Smith  writes:
> You're saying to watch out for (3); I think that's not usually the case, 
> but that's a fair thing to warn about.  Even in that case, though, it 
> may still be worth dropping the index.  Year-end processes are not 
> usually very sensitive to whether they take a little or a long time to 
> execute.  But you will be paying to maintain the index every day while 
> it is there.

Yeah.  Another idea worth considering is to have the year-end processing
build the index it wants, use it, drop it.  It seems unlikely that it's
worth maintaining an index year-round for such infrequent usage.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Greg Smith


Tomas Vondra wrote:

BTW it's really really tricky to remove indexes once they're created.
What if the index is created for a single batch process that runs once a
year to close the fiscal year etc?
  


True in theory.  Reports that are executing something big at the end of 
the year fall into three categories:


1) They touch a whole lot of the data for the year first.  In this case, 
sequential scan is likely regardless.


2) They access data similarly to regular queries, using the same indexes.

3) They have some very specific data only they touch that is retrieved 
with an index.


You're saying to watch out for (3); I think that's not usually the case, 
but that's a fair thing to warn about.  Even in that case, though, it 
may still be worth dropping the index.  Year-end processes are not 
usually very sensitive to whether they take a little or a long time to 
execute.  But you will be paying to maintain the index every day while 
it is there.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Willy-Bas Loos

Then, are the index scans counted in a memory variable and written at
analyze time?

On Thu, May 12, 2011 at 8:22 PM, raghu ram  wrote:

>
> "Analyze" activity will update the statistics of each catalog table.
> --Raghu Ram
>
>

-- 
"Patriotism is the conviction that your country is superior to all others
because you were born in it." -- George Bernard Shaw

Re: [PERFORM] 'Interesting' prepared statement slowdown on large table join

2011-05-12 Thread Prodan, Andrei

Thank you for all the leads.
I've increased stats to 1200 on everything obvious (external_id,
attr_name, attr_value, party_id), and ran ANALYZE, but it didn't help at
all - any other ideas of what else could be going wrong ?

We'll disable preparation, but the thing is it works brilliantly 90% of
the time and the other 10% should theoretically be fixable - because
it's almost certainly a border scenario brought on by lack of
maintenance on something somewhere. 
Is there any point in trying to rebuild the indexes involved in case
Postgres decided they're too bloated or something like that?

@Shaun: I just finished trying to max out stats and sadly it doesn't
help, thank you very much for trying anyway.

@Tom: 
The planner doesn't flip between the plans by itself - it will switch to
the BAD plan at some point and never go back.
The big_table has an extremely uneven distribution indeed. But it still
plans right usually - and this apparently regardless of the statistics
target.

@Jeff: thank you for the clear plan interpretation - but I'm afraid I
don't really understand the second bit:
1) I provided the GOOD plan, so we already know what postgres thinks,
right? (Later edit: guess not. Doesn't work)
2) There's no full table scan in any of the plans - it scans indices,
the problem seems to be that it scans them in the wrong order because it
thinks there are very few WHERE matches in big_table - which is
incorrect, as for that particular pair there is a huge amount of rows.

Thank you,
Andrei

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Shaun Thomas


On 05/12/2011 11:07 AM, Tom Lane wrote:


I find it odd that replacing the subquery with a temp table helps,
though, because (unless you stuck in an ANALYZE you didn't mention)
it would have no stats at all about the number of groups in the temp
table.


I did have an analyze initially for exactly that reason. But what I 
found odd is that in my rush to execute this for the end of day reports, 
I forgot that step, and it still ran fine. I've found that the planner 
tends to treat un-analyzed tables somewhat pessimistically, which is 
fine by me.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Tom Lane

Shaun Thomas  writes:
> On 05/12/2011 09:51 AM, Tom Lane wrote:
>> It does.  I was a bit surprised that Shaun apparently got a plan that
>> didn't include a materialize step, because when I test a similar query
>> here, I get:

> Remember when I said "old version" that prevented us from using CTEs?
> We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0).
> It's basically calculating the group aggregation wrong, but is that
> enough to trigger it to go nuts?

Hmm.  As you say, the mistake it's making is a drastic underestimate of
the number of groups in the subquery, leading to a bad choice of join
method.  I find it odd that replacing the subquery with a temp table
helps, though, because (unless you stuck in an ANALYZE you didn't
mention) it would have no stats at all about the number of groups in the
temp table.  Maybe the default guess just happens to produce the more
desirable plan.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Aren Cambre

>
> This is a perfect example of a place where you could push some work out of
> the application and into the database. You can consolidate your 1 to 101
> queries into a single query. If you use:
>
> WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm
> - 256 DESC LIMIT 1
>
> it will always return the same value as the first matching query from your
> list, and will never have to make more than one trip to the database. Your
> one trip might be slightly slower than any one of the single trips above,
> but it will certainly be much faster in the case where you have to hit any
> significant % of your 101 potential queries.
>

THANKS!! I've been obsessing so much about parallelism that I hadn't spent
much time finding better queries.

Aren

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Aren Cambre

Everyone,

Just wanted to say thanks for your help with my performance question. You
have given me plenty of things to investigate. Further, I think the problem
is almost certainly with my app, so I need to do more work there!

I really like the idea of just loading everything in memory and then dumping
it all out later. I have 6 GB RAM, so it should be plenty to handle this.

Aren Cambre

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Tomas Vondra

Dne 12.5.2011 17:39, Willy-Bas Loos napsal(a):
> Hi,
> 
> We have some indexes that don't seem to be used at all.
> I'd like to know since when they have not been used.
> That is, the time when postgres started counting to reach the number
> that is in pg_stat_user_indexes.idx_scan
> 
> Is there a way to retrieve that from the database ?

BTW it's really really tricky to remove indexes once they're created.
What if the index is created for a single batch process that runs once a
year to close the fiscal year etc?

So be very careful about this.

Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Tomas Vondra

Dne 12.5.2011 17:39, Willy-Bas Loos napsal(a):
> Hi,
> 
> We have some indexes that don't seem to be used at all.
> I'd like to know since when they have not been used.
> That is, the time when postgres started counting to reach the number
> that is in pg_stat_user_indexes.idx_scan
> 
> Is there a way to retrieve that from the database ?

Well, not really :-( You could call pg_postmaster_start_time() to get
the start time, but that has two major drawbacks

(1) The stats may be actually collected for much longer, because restart
does not reset them.

(2) If someone called pg_stat_reset(), the stats are lost but the start
time remains the same.

So there really is no reliable way to do detect this.

In 9.1 this is not true - there's a timestamp for each database (and
global stats) to keep track of the last reset.

regards
Tomas

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Eric McKeeth

On Wed, May 11, 2011 at 9:17 PM, Aren Cambre  wrote:

> *2. Not TxDPS reference markers correspond to TxDOT reference markers.*
>
> Now, if I've matched a route, I have to find the reference marker.
>
> The TxDOT database is pretty good but not 100% complete, so some TxDPS
> tickets' reference markers may not exist in the TxDOT table. Plus, it's
> possible that some TxDPS tickets have the wrong marker.
>
> To compensate, I am looking for the closest reference marker along the
> route that is not more than 50 marker units away, either direction. I've
> again implemented that with multiple queries, where I don't stop until I
> find a match. Suppose I am searching for reference marker 256 on TX 71. The
> queries will be like this:
>
>1. rte_nm = 'SH71' AND rm = '256' (base marker)
>2. rte_nm = 'SH71' AND rm = '257' (+1)
>3. rte_nm = 'SH71' AND rm = '255' (-1)
>4. rte_nm = 'SH71' AND rm = '258' (+2)
>5. rte_nm = 'SH71' AND rm = '254' (-2)
>6. ...
>7. rte_nm = 'SH71' AND rm = '306' (+50)
>8. rte_nm = 'SH71' AND rm = '206' (-50)
>
> Assuming a matching route name was found in the prior step, the app will
> have 1 to 101 of these queries for each ticket.
>

This is a perfect example of a place where you could push some work out of
the application and into the database. You can consolidate your 1 to 101
queries into a single query. If you use:

WHERE rte_nm='SH71' AND rm >= 206 AND rm <= 306 ORDER BY abs(rm - 256), rm -
256 DESC LIMIT 1

it will always return the same value as the first matching query from your
list, and will never have to make more than one trip to the database. Your
one trip might be slightly slower than any one of the single trips above,
but it will certainly be much faster in the case where you have to hit any
significant % of your 101 potential queries.

-Eric

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Shaun Thomas

On 05/12/2011 09:51 AM, Tom Lane wrote:

> It does.  I was a bit surprised that Shaun apparently got a plan that
> didn't include a materialize step, because when I test a similar query
> here, I get:

Remember when I said "old version" that prevented us from using CTEs?
We're still on 8.2 (basically, we're waiting for EnterpriseDB 9.0).
It's basically calculating the group aggregation wrong, but is that
enough to trigger it to go nuts?

SELECT c.*
  FROM customer c
  JOIN (SELECT session_id, count(1) as counter
  FROM session
 WHERE date_created >= '2011-05-11 05:00'
   AND date_created < '2011-05-11 06:00'
   AND from_interface = 'UNKNOWN'
 GROUP BY 1) a ON (c.customer_id = a.session_id)
 WHERE c.process_date = CURRENT_DATE - interval '1 day'
   AND c.row_out IS NULL;

So sayeth the planner:

 Nested Loop  (cost=167.49..2354.62 rows=6 width=237) (actual 
time=43.949..166858.604 rows=168 loops=1)
   ->  GroupAggregate  (cost=167.49..176.97 rows=2 width=8) (actual 
time=1.042..2.827 rows=209 loops=1)
 ->  Sort  (cost=167.49..170.64 rows=1260 width=8) (actual 
time=1.037..1.347 rows=230 loops=1)
   Sort Key: session.session_id
   ->  Index Scan using idx_session_date_created on session  
(cost=0.00..102.61 rows=1260 width=8) (actual time=0.044.
.0.690 rows=230 loops=1)
 Index Cond: ((date_created >= '11-MAY-11 
05:00:00'::timestamp without time zone) AND (date_created < '11-MAY-11 
06:00:00'::
timestamp without time zone))
 Filter: ((from_interface)::text = 'UNKNOWN'::text)
   ->  Index Scan using idx_customer_customer_id on customer c  
(cost=0.00..1088.78 rows=3 width=237) (actual time=19.820..798.348 rows=1 loops=
209)
 Index Cond: (c.customer_id = a.session_id)
 Filter: ((process_date = (('now'::text)::date - '@ 1 day'::interval)) 
AND (row_out IS NULL))
 Total runtime: 166859.040 ms

That one hour extract is much, much slower than this:

SELECT 1
  FROM customer c
  JOIN (SELECT session_id, count(*) as counter
FROM session
   WHERE date_created >= '2011-05-08'
   GROUP BY 1) a ON (c.customer_id = a.session_id)
 WHERE c.process_date = CURRENT_DATE
   AND c.row_out IS NULL;

Which gives this plan:

 Merge Join  (cost=244565.52..246488.78 rows=377 width=0) (actual 
time=1958.781..2385.667 rows=22205 loops=1)
   Merge Cond: (a.session_id = c.customer_id)
   ->  GroupAggregate  (cost=19176.22..20275.99 rows=271 width=8) (actual 
time=1142.179..1459.779 rows=26643 loops=1)
 ->  Sort  (cost=19176.22..19541.68 rows=146184 width=8) (actual 
time=1142.152..1374.328 rows=179006 loops=1)
   Sort Key: session.session_id
   ->  Index Scan using idx_session_date_created on session  
(cost=0.00..6635.51 rows=146184 width=8) (actual time=0.0
20..160.339 rows=179267 loops=1)
 Index Cond: (date_created >= '08-MAY-11 
00:00:00'::timestamp without time zone)
   ->  Sort  (cost=225389.30..225797.47 rows=163267 width=8) (actual 
time=816.585..855.459 rows=155067 loops=1)
 Sort Key: c.customer_id
 ->  Index Scan using idx_customer_rpt on customer c  
(cost=0.00..211252.93 rows=163267 width=8) (actual time=0.037..90.337 
rows=155067 
loops=1)
   Index Cond: (process_date = '10-MAY-11 00:00:00'::timestamp 
without time zone)
   Filter: (row_out IS NULL)

But make the inner query slightly smaller, and...

 Nested Loop  (cost=13755.53..223453.98 rows=276 width=0)
   ->  GroupAggregate  (cost=13755.53..14558.26 rows=198 width=8)
 ->  Sort  (cost=13755.53..14022.28 rows=106700 width=8)
   Sort Key: session.session_id
   ->  Index Scan using idx_session_date_created on session  
(cost=0.00..4844.37 rows=106700 width=8)
 Index Cond: (date_created >= '09-MAY-11 
00:00:00'::timestamp without time zone)
   ->  Index Scan using idx_customer_customer_id on customer c  
(cost=0.00..1055.01 rows=1 width=8)
 Index Cond: (c.customer_id = a.session_id)
 Filter: ((process_date = '10-MAY-11 00:00:00'::timestamp without time 
zone) AND (row_out IS NULL))

I didn't want to wait two hours for that to finish. ;) But the
stats are all pretty darn close, so far as I can tell. The only
thing that's off is the group aggregate... by about two orders
of magnitude. So I just chalked it up to 8.2 being relatively
horrible, and punted to just using a temp table to trick the
optimizer into doing it right.

But my greater point was that even doing it all in SQL doesn't
always work, which we all know. Use of EXPLAIN abounds, but that
doesn't necessarily mean a dev will know how to fix a bad plan.

-- 
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

[PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

2011-05-12 Thread Willy-Bas Loos

Hi,

We have some indexes that don't seem to be used at all.
I'd like to know since when they have not been used.
That is, the time when postgres started counting to reach the number that is
in pg_stat_user_indexes.idx_scan

Is there a way to retrieve that from the database ?

Cheers,

WBL

-- 
"Patriotism is the conviction that your country is superior to all others
because you were born in it." -- George Bernard Shaw

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Tom Lane

Josh Berkus  writes:
> On 5/11/11 3:04 PM, Shaun Thomas wrote:
>> The original query, with our very large tables, ran for over *two hours*
>> thanks to a nested loop iterating over the subquery. My replacement ran
>> in roughly 30 seconds. If we were using a newer version of PG, we could
>> have used a CTE. But do you get what I mean? Temp tables are a fairly
>> common technique, but how would a coder know about CTEs? They're pretty
>> new, even to *us*.

> For that matter, it would be even better if PostgreSQL realized that a
> materialize of the subquery was a better execution plan, and just did it
> for you.

It does.  I was a bit surprised that Shaun apparently got a plan that
didn't include a materialize step, because when I test a similar query
here, I get:
1. a hash join, until I turn off enable_hashjoin; then
2. a merge join, until I turn off enable_mergejoin; then
3. a nestloop with materialize on the subquery scan.
In 9.0 and up I can get a nestloop without materialize by also turning
off enable_material, but pre-9.0 there's no such option ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Shaun Thomas


On 05/12/2011 03:30 AM, Michael Graham wrote:


Do you happen to produce slides for these lunch n learns or are they
more informal than that?  I guess you can work out where I'm going with
this ;)


Oh of course. I use rst2s5 for my stuff, so I have the slideshow and 
also generate a PDF complete with several paragraphs of explanation I 
distribute after the presentation itself. I have two of them now, but 
I'll probably have a third in a couple months.


My next topic will probably be geared toward actual DBAs that might be 
intermediate level. Things like, what happens to an OLAP server that 
undergoes maintenance and experiences rapid (temporarily exponential) 
TPS increase. How that can affect the disk subsystem, how to recover, 
how to possibly bootstrap as a temporary fix, etc. Certainly things I 
would have liked to know before seeing them. I'm going to call it "Your 
Database Probably Hates You." ;)


I have a tendency to enjoy "stories from the field," and I've got more 
than a few where I've saved a database from certain death. Sometimes 
it's tweaking a few config settings, sometimes it's new hardware based 
on system monitoring or allocation tests. Little things Senior DBAs 
might know after experiencing them, or reading lists like this one.


--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
stho...@peak6.com

__

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Michael Graham

On Wed, 2011-05-11 at 17:04 -0500, Shaun Thomas wrote:
> We hold regular Lunch'n'Learns for our developers to teach them the 
> good/bad of what they're doing, and that helps significantly. Even
> hours later, I see them using the techniques I showed them. The one
> I'm presenting soon is entitled '10 Ways to Ruin Performance' and
> they're all specific examples taken from day-to-day queries and jobs
> here, all from different categories of mistake. It's just a part of
> being a good DBA.

Do you happen to produce slides for these lunch n learns or are they
more informal than that?  I guess you can work out where I'm going with
this ;)

-- 
Michael Graham 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Postgres refusing to use >1 core

2011-05-12 Thread Vitalii Tymchyshyn


12.05.11 06:18, Aren Cambre ???(??):


> Using one thread, the app can do about 111 rows per second, and it's
> only exercising 1.5 of 8 CPU cores while doing this. 12,000,000
rows /
> 111 rows per second ~= 30 hours.

I don't know how I missed that. You ARE maxing out one cpu core, so
you're quite right that you need more threads unless you can make your
single worker more efficient.


And the problem is my app already has between 20 and 30 threads. 
Something about C#'s PLINQ may not be working as intended...


Have you checked that you are really doing fetch and processing in 
parallel? Dunno about C#, but under Java you have to make specific 
settings (e.g. setFetchSize) or driver will fetch all the data on query 
run. Check time needed to fetch first row from the query.


Best regards, Vitalii Tymchyshyn

Re: [PERFORM] DBT-5 & Postgres 9.0.3

2011-05-12 Thread Sethu Prasad

http://sourceforge.net/mailarchive/forum.php?forum_name=osdldbt-general&max_rows=25&style=nested&viewmonth=201104

- Sethu


On Thu, May 12, 2011 at 5:22 AM, Robert Haas  wrote:

> On Tue, Apr 12, 2011 at 3:51 AM, Sethu Prasad 
> wrote:
> > Anyone lucky to have dbt5 run for PostgreSQL 9.0.3?!
> >
> > I am trying on Novell SuSE Linux Enterprise Server 11 SP1 x86_64 with a
> > virtual machine and bit hard with no success run yet. If you can help me
> > with any docs will be more of a support.
>
> What's going wrong for you?
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [PERFORM] Checkpoint execution overrun impact?

Re: [PERFORM] tuning on ec2

Re: [PERFORM] 'Interesting' prepared statement slowdown on large table join

[PERFORM] setting configuration values inside a stored proc

Re: [PERFORM] Poor performance when joining against inherited tables

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [PERFORM] Poor performance when joining against inherited tables

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [ADMIN] [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [PERFORM] 'Interesting' prepared statement slowdown on large table join

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] Postgres refusing to use >1 core

[PERFORM] since when has pg_stat_user_indexes.idx_scan been counting?

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] Postgres refusing to use >1 core

Re: [PERFORM] DBT-5 & Postgres 9.0.3

27 matches

Site Navigation

Mail list logo

Footer information