Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis
Hi, Rob.

I tried bumping the effective_cache_size. It made no difference.

My latest attempt at forcing PostgreSQL to use the indexes involved two
loops: one to loop over the stations, the other to extract the station data
from the measurement table. The outer loop executes in 1.5 seconds. The
inner loop does a full table scan for each record in the outer loop:

  FOR station IN
SELECT
  sc.station_id,
  sc.taken_start,
  sc.taken_end
FROM
  climate.city c,
  climate.station s,
  climate.station_category sc
WHERE
  c.id = city_id AND
  earth_distance(
ll_to_earth(c.latitude_decimal,c.longitude_decimal),
ll_to_earth(s.latitude_decimal,s.longitude_decimal)) / 1000 <=
radius AND
  s.elevation BETWEEN elevation1 AND elevation2 AND
  s.applicable AND
  sc.station_id = s.id AND
  sc.category_id = category_id AND
  extract(YEAR FROM sc.taken_start) >= year1 AND
  extract(YEAR FROM sc.taken_end) <= year2
ORDER BY
  sc.station_id
  LOOP
RAISE NOTICE 'B.1. % % %', station.station_id, station.taken_start,
station.taken_end;

FOR measure IN
  SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) AS amount
  FROM
climate.measurement m
  WHERE
*m.station_id = station.station_id AND
m.taken BETWEEN station.taken_start AND station.taken_end AND
m.category_id = category_id
*  GROUP BY
extract(YEAR FROM m.taken)
LOOP
  RAISE NOTICE '  B.2. % %', measure.year, measure.amount;
END LOOP;
  END LOOP;

I thought that the bold lines would have evoked index use. The values used
for the inner query:

NOTICE:  B.1. 754 1980-08-01 2001-11-30

When I run the query manually, using constants, it executes in ~25
milliseconds:

SELECT
  extract(YEAR FROM m.taken) AS year,
  avg(m.amount) AS amount
FROM
  climate.measurement m
WHERE
  m.station_id = 754 AND
  m.taken BETWEEN '1980-08-01'::date AND '2001-11-30'::date AND
  m.category_id = 7
GROUP BY
  extract(YEAR FROM m.taken)

With 106 rows it should execute in ~2.65 seconds, which is better than the 5
seconds I get when everything is cached and a tremendous improvement over
the ~85 seconds from cold.

I do not understand why the below query uses a full table scan (executes in
~13 seconds):

SELECT
  extract(YEAR FROM m.taken) AS year,
  avg(m.amount) AS amount
FROM
  climate.measurement m
WHERE
*  m.station_id = station.station_id AND*
*   m.taken BETWEEN station.taken_start AND station.taken_end AND*
*  m.category_id = category_id*
GROUP BY
  extract(YEAR FROM m.taken)

Moreover, what can I do to solve the problem?

Thanks again!

Dave


Re: [PERFORM] Performance issues when the number of records are around 10 Million

2010-05-25 Thread Robert Haas
On Wed, May 12, 2010 at 1:45 AM, venu madhav  wrote:
> [Venu] Yes, autovacuum is running every hour. I could see in the log
> messages. All the configurations for autovacuum are disabled except that it
> should run for every hour. This application runs on an embedded box, so
> can't change the parameters as they effect the other applications running on
> it. Can you please explain what do you mean by default parameters.
> autovacuum = on # enable autovacuum
> subprocess?
> autovacuum_naptime = 3600   # time between autovacuum runs, in
> secs

The default value for autovacuum_naptime is a minute.  Why would you
want to increase it by a factor of 60?  That seems likely to result in
I/O spikes, table bloat, and generally poor performance.

There are dramatic performance improvements in PostgreSQL 8.3 and 8.4.
 Upgrading would probably help, a lot.

The points already made about LIMIT  are also right on target.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Function scan/Index scan to nested loop

2010-05-25 Thread Robert Haas
On Tue, May 11, 2010 at 2:00 PM, Carlo Stonebanks
 wrote:
> I am concerned that there is such a lag between all the index and function
> scans start/complete times and and the nested loops starting. I have
> reformatted the SLOW PLAN results below to make them easier to read. Can you
> tell me if this makes any sense to you?

I think you want to run EXPLAIN ANALYZE on the queries that are being
executed BY mdx_core.zips_in_mile_range('75203', 15::numeric) rather
than the query that calls that function.  You should be able to see
the same caching effect there and looking at that plan might give you
a better idea what is really happening.

(Note that you might need to use PREPARE and EXPLAIN EXECUTE to get
the same plan the function is generating internally, rather than just
EXPLAIN.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis
Hi, Tom.

Yes, that is what happened, making the tests rather meaningless, and giving
me the false impression that the indexes were being used. They were but only
because of cached results. When multiple users making different queries, the
performance will return to ~80s per query.

I also tried Kevin's suggestion, which had no noticeable effect:
effective_cache_size = 512MB

That said, when using the following condition, the query is fast (1 second):

extract(YEAR FROM sc.taken_start) >= 1963 AND
extract(YEAR FROM sc.taken_end) <= 2009 AND

"->  Index Scan using measurement_013_stc_idx on
measurement_013 m  (cost=0.00..511.00 rows=511 width=15) (actual
time=0.018..3.601 rows=3356 loops=104)"
"  Index Cond: ((m.station_id = sc.station_id) AND
(m.taken >= sc.taken_start) AND (m.taken <= sc.taken_end) AND (m.category_id
= 7))"

This condition makes it slow (13 seconds on first run, 8 seconds
thereafter):

*extract(YEAR FROM sc.taken_start) >= 1900 AND
*extract(YEAR FROM sc.taken_end) <= 2009 AND

"  Filter: (category_id = 7)"
"->  Seq Scan on measurement_013 m
(cost=0.00..359704.80 rows=18118464 width=15) (actual time=0.008..4025.692
rows=18118395 loops=1)"

At this point, I'm tempted to write a stored procedure that iterates over
each station category for all the years of each station. My guess is that
the planner's estimate for the number of rows that will be returned by
*extract(YEAR
FROM sc.taken_start) >= 1900* is incorrect and so it chooses a full table
scan for all rows. Even though the lower bound appears to be a constant
value of the 1900, the average year a station started collecting data was 44
years ago (1965), and did so for an average of 21.4 years.

The part I am having trouble with is convincing PG to use the index for the
station ID and the date range for when the station was active. Each station
has a unique ID; the data in the measurement table is ordered by measurement
date then by station.

Should I add a clustered index by station then by date?

Any other suggestions are very much appreciated.

Dave


Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread Rob Wultsch
On Tue, May 25, 2010 at 4:26 PM, David Jarvis  wrote:
> shared_buffers = 1GB
> temp_buffers = 32MB
> work_mem = 32MB
> maintenance_work_mem = 64MB
> effective_cache_size = 256MB

Shouldn't effective_cache_size be significantly larger?

-- 
Rob Wultsch
wult...@gmail.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread Tom Lane
David Jarvis  writes:
>> It sounds as though the active portion of your database is pretty
>> much cached in RAM.  True?

> I would not have thought so; there are seven tables, each with 39 to 43
> million rows as: [ perhaps 64 bytes per row ]
> The machine has 4GB of RAM, donated to PG as follows:

Well, the thing you need to be *really* wary of is setting the cost
parameters to make isolated tests look good.  When you repeat a
particular test case multiple times, all times after the first probably
are fully cached ... but if your DB doesn't actually fit in RAM, that
might not be too representative of what will happen under load.
So if you want to cut the xxx_page_cost settings some more, pay close
attention to what happens to average response time.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis
Hi, Kevin.

Thanks for the response.

It sounds as though the active portion of your database is pretty
> much cached in RAM.  True?
>

I would not have thought so; there are seven tables, each with 39 to 43
million rows as:

CREATE TABLE climate.measurement (
  id bigserial NOT NULL,
  taken date NOT NULL,
  station_id integer NOT NULL,
  amount numeric(8,2) NOT NULL,
  flag character varying(1) NOT NULL DEFAULT ' '::character varying,
  category_id smallint NOT NULL,
}

The machine has 4GB of RAM, donated to PG as follows:

*shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 64MB
effective_cache_size = 256MB
*

Everything else is at its default value. The kernel:

$ cat /proc/sys/kernel/shmmax
2147483648

Two postgres processes are enjoying the (virtual) space:

2619 postgres  20   0 *1126m* 524m 520m S0 13.2   0:09.41 postgres
2668 postgres  20   0 *1124m* 302m 298m S0  7.6   0:04.35 postgres

can make such plans look more attractive by cutting both
> random_page_cost and seq_page_cost.  Some highly cached loads
> perform well with these set to equal values on the order of 0.1 to
> 0.001.
>

I tried this: no improvement.

It would tend to be better than random access to 43 million rows, at
> least if you need to go to disk for many of them.
>

I thought that the index would take care of this? The index has been set to
the unique key of:

station_id, taken, and category_id (the filter for child tables).

Each time I scan for data, I always provide the station identifier and its
date range. The date range is obtained from another table (given the same
station_id).

I will be trying various other indexes. I've noticed now that sometimes the
results are very quick and sometimes very slow. For the query I posted, it
would be great to know what would be the best indexes to use. I have a
suspicion that that's going to require trial and many errors.

Dave


Re: [PERFORM] shared_buffers advice

2010-05-25 Thread Merlin Moncure
On Tue, May 25, 2010 at 5:58 AM, Konrad Garus  wrote:
> 2010/5/24 Merlin Moncure :
>
>> *) a page fault to disk is a much bigger deal than a fault to pg cache
>> vs os/ cache.
>
> That was my impression. That's why I did not touch our 2/16 GB setting
> right away. I guess that 2 more gigabytes in OS cache is better than 2
> more (duplicated) gigabytes in PG shared_buffers. In our case 2 GB
> shared_buffers appears to be enough to avoid thrashing between OS and
> PG.
>
>> *) shared_buffers is one of the _least_ important performance settings
>> in postgresql.conf
>>
>> Many settings, like work_mem, planner tweaks, commit settings,
>> autovacuum settings
>
> Can you recommend any sources on these parameters, especially commit
> settings and planner tweaks?
>
>
> Thank you so much for the whole answer! Not only it addresses the
> immediate question, but also many of the unasked that I had in the
> back of my head. It's brief and gives a broad view over all the
> performance concerns. It should be part of documentation or the first
> page of performance wiki. Have you copied it from somewhere?

Thank you for your nice comments.  This was strictly a brain dump from
yours truly.  There is a fairly verbose guide on the wiki
(http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server).
There is a lot of good info there but it's missing a few things
(from_collapse_limit for example).

I would prefer to see the annotated performance oriented .conf
settings to be written in terms of trade offs (too low? X too high? Y
setting in order to get? Z).  For example, did you know that if crank
max_locks_per_transaction you also increase the duration of every
query that hits pg_locks() -- well, now you do :-).

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] prepared query performs much worse than regular query

2010-05-25 Thread Joshua Tolley
On Tue, May 25, 2010 at 11:27:08AM -0700, Scott Carey wrote:
> On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote:
> > On Fri, 21 May 2010, Richard Yen wrote:
> >> Any ideas why the query planner chooses a different query plan when using 
> >> prepared statements?
> > 
> > This is a FAQ. Preparing a statement makes Postgres create a plan, without 
> > knowing the values that you will plug in, so it will not be as optimal as 
> > if the values were available. The whole idea is to avoid the planning cost 
> > each time the query is executed, but if your data is unusual it can 
> > result in worse plans.
> > 
> Maybe the planner could note a prepared query parameter is on a high skew
> column and build a handful of plans to choose from, or just partially
> re-plan on the skewed column with each execution.  Or make it easier for a
> user to have a prepared statement that re-plans the query each time.  Even
> just a per connection parameter "SET prepared.query.cacheplan = FALSE"

There was talk in this year's developers' meeting of doing this replanning
you've suggested. ("Re(?)plan parameterized plans with actual parameter
values" on http://wiki.postgresql.org/wiki/PgCon_2010_Developer_Meeting,
specificall). This wouldn't show up until at least 9.1, but it's something
people are thinking about.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread Kevin Grittner
David Jarvis  wrote:
 
> The value for *random_page_cost* was at 2.0; reducing it to 1.1
> had a massive performance improvement (nearly an order of
> magnitude). While the results now return in 5 seconds (down from
> ~85 seconds)
 
It sounds as though the active portion of your database is pretty
much cached in RAM.  True?
 
> problematic lines remain. Bumping the query's end date by a single
> year causes a full table scan
 
> How do I persuade PostgreSQL to use the indexes, regardless of
> number of years between the two dates?
 
I don't know about "regardless of the number of years" -- but you
can make such plans look more attractive by cutting both
random_page_cost and seq_page_cost.  Some highly cached loads
perform well with these set to equal values on the order of 0.1 to
0.001.
 
> (A full table scan against 43 million rows is probably not the
> best plan.)
 
It would tend to be better than random access to 43 million rows, at
least if you need to go to disk for many of them.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] prepared query performs much worse than regular query

2010-05-25 Thread Scott Carey

On May 21, 2010, at 8:26 PM, Matthew Wakeling wrote:

> On Fri, 21 May 2010, Richard Yen wrote:
>> Any ideas why the query planner chooses a different query plan when using 
>> prepared statements?
> 
> This is a FAQ. Preparing a statement makes Postgres create a plan, without 
> knowing the values that you will plug in, so it will not be as optimal as 
> if the values were available. The whole idea is to avoid the planning cost 
> each time the query is executed, but if your data is unusual it can 
> result in worse plans.
> 

Two things I disagree with.  
1. The "whole idea" is not just to avoid planning cost.  It is also to easily 
avoid SQL injection, reduce query parse time, and to make client code cleaner 
and more re-usable.
2. The data does not need to be "unusual".  It just needs to have a skewed 
distribution.  Skewed is not unusual (well, it would be for a primary key :P ).

Maybe the planner could note a prepared query parameter is on a high skew 
column and build a handful of plans to choose from, or just partially re-plan 
on the skewed column with each execution.  
Or make it easier for a user to have a prepared statement that re-plans the 
query each time.  Even just a per connection parameter "SET 
prepared.query.cacheplan = FALSE"

> Matthew
> 
> -- 
> Existence is a convenient concept to designate all of the files that an
> executable program can potentially process.   -- Fortran77 standard
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] which hardware setup

2010-05-25 Thread Pedro Axelrud
Sorry Jesper, I thought I had mentioned.. our dataset have 18GB.


Pedro Axelrud
http://mailee.me
http://softa.com.br
http://flavors.me/pedroaxl


On Tue, May 25, 2010 at 03:21, Jesper Krogh  wrote:

> Option 2:
>> App Server and Postgres: Dual Xeon 5520 quad core with 12GB ram and 2x
>> 146GB 15k RPM SAS (RAID1) disks
>>
>>
>  you didnt mention your dataset size, but i the second option would be
> preferrable in most situations since it gives more of the os memory for disc
> caching. 12 gb vs 4 gb for the host running pg
>


Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select

2010-05-25 Thread Jorge Montero
Have you read this?
 
http://blog.endpoint.com/2008/12/why-is-my-function-slow.html 
 
99% of the 'function is slow' problems are caused by this.

Have you checked the difference between explain and prepare + explain execute?

>>> Tyler Hildebrandt  05/25/10 4:59 AM >>>
We're using a function that when run as a select statement outside of the 
function takes roughly 1.5s to complete whereas running an identical
query within a function is taking around 55s to complete.

We are lost as to why placing this query within a function as opposed to
substituting the variables in a select statement is so drastically different.

The timings posted here are from a 512MB memory virtual machine and are not of
major concern on their own but we are finding the same issue in our production
environment with far superior hardware.

The function can be found here:
http://campbell-lange.net/media/files/fn_medirota_get_staff_leave_summary.sql 

---

Timings for the individual components on their own is as follows:

select * from fn_medirota_validate_rota_master(6);
Time: 0.670 ms

select to_date(EXTRACT (YEAR FROM current_date)::text, '');
Time: 0.749 ms

select * from fn_medirota_people_template_generator(2, 6, date'2009-01-01',
date'2009-12-31', TRUE) AS templates;
Time: 68.004 ms

select * from fn_medirota_people_template_generator(2, 6, date'2010-01-01',
date'2010-12-31', TRUE) AS templates;
Time: 1797.323


Copying the exact same for loop select statement from the query above into
the psql query buffer and running them with variable substitution yields the
following:

Running FOR loop SElECT with variable substitution:
Time: 3150.585 ms


Whereas invoking the function yields:

select * from fn_medirota_get_staff_leave_summary(6);
Time: 57375.477 ms


We have tried using explain analyse to update the query optimiser, dropped and
recreated the function and have restarted both the machine and the postgres
server multiple times.

Any help or advice would be greatly appreciated.


Kindest regards,
Tyler Hildebrandt

---

EXPLAIN ANALYSE VERBOSE SELECT * FROM fn_medirota_get_staff_leave_summary(6);

QUERY PLAN
-
{FUNCTIONSCAN
:startup_cost 0.00
:total_cost 260.00
:plan_rows 1000
:plan_width 85
:targetlist (
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 1
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 1
  }
   :resno 1
   :resname id
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 2
  :vartype 1043
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 2
  }
   :resno 2
   :resname t_full_name
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 3
  :vartype 16
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 3
  }
   :resno 3
   :resname b_enabled
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 4
  :vartype 1043
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 4
  }
   :resno 4
   :resname t_anniversary
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 5
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 5
  }
   :resno 5
   :resname n_last_year_annual
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 6
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 6
  }
   :resno 6
   :resname n_last_year_other
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 7
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 7
  }
   :resno 7
   :resname n_this_year_annual
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
 

Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select

2010-05-25 Thread Merlin Moncure
On Tue, May 25, 2010 at 10:55 AM, Merlin Moncure  wrote:
> On Tue, May 25, 2010 at 9:41 AM, Tyler Hildebrandt
>  wrote:
>>> I think, your problem is here:
>>>
>>> SELECT INTO current_user * FROM
>>> fn_medirota_validate_rota_master(in_currentuser);
>>>
>>>
>>> The planner has no knowledge about how many rows this functions returns
>>> if he don't know the actual parameter. Because of this, this query
>>> enforce a seq-scan. Try to rewrite that to something like:
>>>
>>> execute 'select * from fn_medirota_validate_rota_master(' ||
>>> in_currentuser' || ')' into current_user
>>>
>>
>> Thanks for your response.  This doesn't seem to solve our issue, 
>> unfortunately.
>>
>> As a side to that, we have the fn_medirota_validate_rota_master calls in a
>> large amount of our other functions that are running very well.
>
> any chance of seeing the function source?

oops! I missed it :-).  looking at your function, what version of
postgres? have you experimented w/return query?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select

2010-05-25 Thread Merlin Moncure
On Tue, May 25, 2010 at 9:41 AM, Tyler Hildebrandt
 wrote:
>> I think, your problem is here:
>>
>> SELECT INTO current_user * FROM
>> fn_medirota_validate_rota_master(in_currentuser);
>>
>>
>> The planner has no knowledge about how many rows this functions returns
>> if he don't know the actual parameter. Because of this, this query
>> enforce a seq-scan. Try to rewrite that to something like:
>>
>> execute 'select * from fn_medirota_validate_rota_master(' ||
>> in_currentuser' || ')' into current_user
>>
>
> Thanks for your response.  This doesn't seem to solve our issue, 
> unfortunately.
>
> As a side to that, we have the fn_medirota_validate_rota_master calls in a
> large amount of our other functions that are running very well.

any chance of seeing the function source?

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tunning pgsql 7.3.7 over RHEL 4.0 32 x86 (2.6.9-5ELsmp)

2010-05-25 Thread Joshua Tolley
On Tue, May 25, 2010 at 02:04:07PM +, Juan Pablo Sandoval Rivera wrote:
> Please let me give recommendation to the confituracion...

The subject line of this message said you're trying to run PostgreSQL 7.3.7. I
hope that's a typo, and you really mean 8.3.7, in which case this suggestion
boils down to "upgrade to 8.3.11". But if you're really trying to run a
version that's several years old, the best configuration advice you can
receive is to upgrade to something not totally prehistoric. There have been
major performance enhancements in each release since 7.3.7, and no amount of
hardware tuning will make such an old version perform comparatively well. Not
to mention the much greater risk you have that an unsupported version will eat
your data.

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature


[PERFORM] tunning pgsql 7.3.7 over RHEL 4.0 32 x86 (2.6.9-5ELsmp)

2010-05-25 Thread Juan Pablo Sandoval Rivera
Good day list 

I would appreciate some comments to the following: 

I have a Dell PowerEdge SC1420 server with 2 GB of RAM 1 DD 73 Gb SCSI 
Ulltra320 2 Xeon (4 
cache) with PGSQL 7.3.7 

running GNU / Linux Red Hat Enterprise 4, 0 for 32-bit (kernel 2.6.9-5Elsmp) 
Nahant (ES) 

and another server start or operate the same system and database engine, HP 
Proliant ML 150 G6 
two Smart Array P410 Xeon 2 GB RAM, 2 DD Sata 15,000 RPM (250 GB) in RAID 1 

  

I'm validating the operation and Execution / time difference of a process 
between the two is 
not really maquians muicha few seconds almost no time to think it is just 
quicker the Dell 
machine, it must obviously affect the technology of hard drives. 

shmmax is set to 500.00.000, annex Execution / parameters of both machines 
pg_settings 
consultation. 

Please let me give recommendation to the confituracion, if this correct or 
would fail or left 
over tune. an average of 30 users use the system, and is heavy disk usage, uan 
table has 8 
million + another + 13 milloines, the 8 is used daily, desarfortunadame 
Progress can not yet 
migrate to 8.x, that tiempoi tiomaria a development, adjustment and testing, 
but it will fit 
with the current configuration that I mentioned. 

Thank you.


Juan Pablo Sandoval Rivera
Tecnologo Prof. en Ing. de Sistemas

Linux User : 322765 
msn: juan_pab...@hotmail.com
yahoo : juan_pab...@rocketmail.com (juan_pablos.rm)
UIN : 276125187 (ICQ)
Jabber : juan_pab...@www.jabberes.org
Skype  : juan.pablo.sandoval.rivera

APOYA A ECOSEARCH.COM - Ayuda a salvar al Planeta.





80173-settings7.3.7dell
Description: Binary data


80173-settings_7.3.7hp
Description: Binary data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select

2010-05-25 Thread Tyler Hildebrandt
> I think, your problem is here:
> 
> SELECT INTO current_user * FROM
> fn_medirota_validate_rota_master(in_currentuser);
> 
> 
> The planner has no knowledge about how many rows this functions returns
> if he don't know the actual parameter. Because of this, this query
> enforce a seq-scan. Try to rewrite that to something like:
> 
> execute 'select * from fn_medirota_validate_rota_master(' ||
> in_currentuser' || ')' into current_user
> 

Thanks for your response.  This doesn't seem to solve our issue, unfortunately.

As a side to that, we have the fn_medirota_validate_rota_master calls in a
large amount of our other functions that are running very well.

-- 
Tyler Hildebrandt
Software Developer
ty...@campbell-lange.net

Campbell-Lange Workshop
www.campbell-lange.net
020 7631 1555
3 Tottenham Street London W1T 2AF
Registered in England No. 04551928

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Andres Freund
On Tuesday 25 May 2010 11:00:24 Joachim Worringen wrote:
> Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz:
> > temporary tables are handled pretty much like the regular table. The
> > magic happens on schema level, new schema is setup for connection, so
> > that it can access its own temporary tables.
> > Temporary tables also are not autovacuumed.
> > And that's pretty much the most of the differences.
> 
> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
It does matter quite significantly in my experience. Both from an io and a cpu 
overhead perspective.

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query timing increased from 3s to 55s when used as a function instead of select

2010-05-25 Thread A. Kretschmer
In response to Tyler Hildebrandt :
> We're using a function that when run as a select statement outside of the 
> function takes roughly 1.5s to complete whereas running an identical
> query within a function is taking around 55s to complete.
> 
> select * from fn_medirota_get_staff_leave_summary(6);
> Time: 57375.477 ms

I think, your problem is here:

SELECT INTO current_user * FROM
fn_medirota_validate_rota_master(in_currentuser);


The planner has no knowledge about how many rows this functions returns
if he don't know the actual parameter. Because of this, this query
enforce a seq-scan. Try to rewrite that to something like:

execute 'select * from fn_medirota_validate_rota_master(' ||
in_currentuser' || ')' into current_user


*untested*


HTH, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Joachim Worringen

Am 25.05.2010 11:38, schrieb Grzegorz Jaśkiewicz:

WAL does the same thing to DB journaling does to the FS.
Plus allows you to roll back (PITR).

As for the RAM, it will be in ram as long as OS decides to keep it in
RAM cache, and/or its in the shared buffers memory.


Or until I commit the transaction? I have not completely disabled 
sync-to-disk in my setup, as there are of course situations where new 
data comes into the database that needs to be stored in a safe manner.



Unless you have a lot of doubt about the two, I don't think it makes
too much sens to setup ramdisk table space yourself. But try it, and
see yourself.
Make sure that you have logic in place, that would set it up, before
postgresql starts up, in case you'll reboot, or something.


That's what I thought about when mentioning "increased setup 
complexity". Simply adding a keyword like "NONPERSISTENT" to the table 
creation statement would be preferred...


 Joachim


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query timing increased from 3s to 55s when used as a function instead of select

2010-05-25 Thread Tyler Hildebrandt
We're using a function that when run as a select statement outside of the 
function takes roughly 1.5s to complete whereas running an identical
query within a function is taking around 55s to complete.

We are lost as to why placing this query within a function as opposed to
substituting the variables in a select statement is so drastically different.

The timings posted here are from a 512MB memory virtual machine and are not of
major concern on their own but we are finding the same issue in our production
environment with far superior hardware.

The function can be found here:
http://campbell-lange.net/media/files/fn_medirota_get_staff_leave_summary.sql

---

Timings for the individual components on their own is as follows:

select * from fn_medirota_validate_rota_master(6);
Time: 0.670 ms

select to_date(EXTRACT (YEAR FROM current_date)::text, '');
Time: 0.749 ms

select * from fn_medirota_people_template_generator(2, 6, date'2009-01-01',
date'2009-12-31', TRUE) AS templates;
Time: 68.004 ms

select * from fn_medirota_people_template_generator(2, 6, date'2010-01-01',
date'2010-12-31', TRUE) AS templates;
Time: 1797.323


Copying the exact same for loop select statement from the query above into
the psql query buffer and running them with variable substitution yields the
following:

Running FOR loop SElECT with variable substitution:
Time: 3150.585 ms


Whereas invoking the function yields:

select * from fn_medirota_get_staff_leave_summary(6);
Time: 57375.477 ms


We have tried using explain analyse to update the query optimiser, dropped and
recreated the function and have restarted both the machine and the postgres
server multiple times.

Any help or advice would be greatly appreciated.


Kindest regards,
Tyler Hildebrandt

---

EXPLAIN ANALYSE VERBOSE SELECT * FROM fn_medirota_get_staff_leave_summary(6);

QUERY PLAN
-
{FUNCTIONSCAN
:startup_cost 0.00
:total_cost 260.00
:plan_rows 1000
:plan_width 85
:targetlist (
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 1
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 1
  }
   :resno 1
   :resname id
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 2
  :vartype 1043
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 2
  }
   :resno 2
   :resname t_full_name
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 3
  :vartype 16
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 3
  }
   :resno 3
   :resname b_enabled
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 4
  :vartype 1043
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 4
  }
   :resno 4
   :resname t_anniversary
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 5
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 5
  }
   :resno 5
   :resname n_last_year_annual
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 6
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 6
  }
   :resno 6
   :resname n_last_year_other
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 7
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 7
  }
   :resno 7
   :resname n_this_year_annual
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
   :resjunk false
   }
   {TARGETENTRY
   :expr
  {VAR
  :varno 1
  :varattno 8
  :vartype 23
  :vartypmod -1
  :varlevelsup 0
  :varnoold 1
  :varoattno 8
  }
   :resno 8
   :resname n_this_year_other
   :ressortgroupref 0
   :resorigtbl 0
   :resorigcol 0
 

Re: [PERFORM] shared_buffers advice

2010-05-25 Thread Konrad Garus
2010/5/24 Merlin Moncure :

> *) a page fault to disk is a much bigger deal than a fault to pg cache
> vs os/ cache.

That was my impression. That's why I did not touch our 2/16 GB setting
right away. I guess that 2 more gigabytes in OS cache is better than 2
more (duplicated) gigabytes in PG shared_buffers. In our case 2 GB
shared_buffers appears to be enough to avoid thrashing between OS and
PG.

> *) shared_buffers is one of the _least_ important performance settings
> in postgresql.conf
>
> Many settings, like work_mem, planner tweaks, commit settings,
> autovacuum settings

Can you recommend any sources on these parameters, especially commit
settings and planner tweaks?


Thank you so much for the whole answer! Not only it addresses the
immediate question, but also many of the unasked that I had in the
back of my head. It's brief and gives a broad view over all the
performance concerns. It should be part of documentation or the first
page of performance wiki. Have you copied it from somewhere?

-- 
Konrad Garus

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Grzegorz Jaśkiewicz
WAL does the same thing to DB journaling does to the FS.
Plus allows you to roll back (PITR).

As for the RAM, it will be in ram as long as OS decides to keep it in
RAM cache, and/or its in the shared buffers memory.
Unless you have a lot of doubt about the two, I don't think it makes
too much sens to setup ramdisk table space yourself. But try it, and
see yourself.
Make sure that you have logic in place, that would set it up, before
postgresql starts up, in case you'll reboot, or something.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Joachim Worringen

Am 25.05.2010 11:15, schrieb Thom Brown:

2010/5/25 Joachim Worringen:

And, is there anything like RAM-only tables? I really don't care whether the
staging data is lost on the rare event of a machine crash, or whether the
query crashes due to lack of memory (I make sure there's enough w/o paging)
- I only care about performance here.

  Joachim



I think can create a tablespace on a ram disk, and create a table there.


True, but I think this makes the database server configuration more 
complex (which is acceptable), and may add dependencies between the 
server configuration and the SQL statements for the selection of 
tablespace name (which would be a problem)?


But I am a tablespace-novice and will look into this "workaround".

 thanks, Joachim


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Thom Brown
2010/5/25 Joachim Worringen :
> Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz:
>>
>> temporary tables are handled pretty much like the regular table. The
>> magic happens on schema level, new schema is setup for connection, so
>> that it can access its own temporary tables.
>> Temporary tables also are not autovacuumed.
>> And that's pretty much the most of the differences.
>
> Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?
>
> And, is there anything like RAM-only tables? I really don't care whether the
> staging data is lost on the rare event of a machine crash, or whether the
> query crashes due to lack of memory (I make sure there's enough w/o paging)
> - I only care about performance here.
>
>  Joachim
>

I think can create a tablespace on a ram disk, and create a table there.

Thom

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Joachim Worringen

Am 25.05.2010 10:49, schrieb Grzegorz Jaśkiewicz:

temporary tables are handled pretty much like the regular table. The
magic happens on schema level, new schema is setup for connection, so
that it can access its own temporary tables.
Temporary tables also are not autovacuumed.
And that's pretty much the most of the differences.


Thanks. So, the Write-Ahead-Logging (being used or not) does not matter?

And, is there anything like RAM-only tables? I really don't care whether 
the staging data is lost on the rare event of a machine crash, or 
whether the query crashes due to lack of memory (I make sure there's 
enough w/o paging) - I only care about performance here.


 Joachim


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Grzegorz Jaśkiewicz
temporary tables are handled pretty much like the regular table. The
magic happens on schema level, new schema is setup for connection, so
that it can access its own temporary tables.
Temporary tables also are not autovacuumed.
And that's pretty much the most of the differences.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] performance of temporary vs. regular tables

2010-05-25 Thread Joachim Worringen

Greetings,

in 
http://archives.postgresql.org/message-id/1056648218.7041.11.ca...@jester, 
it is stated that the performance of temporary tables is "the same as a 
regular table but without

WAL on the table contents.".

I have a datamining-type application which makes heavy use of temporary 
tables to stage (potentially large amounts of) data between different 
operations. WAL is write-ahead


To effectively multi-thread this application, I (think I) need to switch 
from temporary to regular tables, because
- the concurrent threads need to use different connections, not cursors, 
to effectively operate concurrently
- temporary tables are not visible across connections (as they are 
across cursors of the same connection)


Thus, I wonder how much this will affect performance. Access on the 
temporary table is inserting (millions of) rows once in a single 
transaction, potentially update them all once within a single 
transaction, then select on them once or more.


Of course, eventually loosing the data in these tables is not a problem 
at all. The threads are synchronized above the SQL level.


Thanks for any input on how to maximize performance for this applicaiton.

 Joachim


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance