date:20090127

Re: [PERFORM] Odd behavior with temp usage logging

2009-01-27 Thread Craig Ringer

Josh Berkus wrote:
> Folks,
> 
> I turned on temp file logging for PostgreSQL to see if I needed to
> adjust work_mem.  Oddly, it's logging file usage up to 33 times per
> executed query (and no, the query isn't large enough to need 33 separate
> sorts).

Are you sure there's not a sort happening inside a loop? It might help
if you posted the actual log messages along with the output of `EXPLAIN
ANALYZE' on your query.

--
Craig Ringer

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] Buying hardware

2009-01-27 Thread A B

>> 4) Use software raid unless you have the money to buy a raid
>> controller, in which case here is the ranking of them
>>  
>
> Areca and 3ware/Escalade are the two best controllers for the money
> out right now.  They tend to take turns being the absolute best as
> they release new cards.  Newer Arecas (the 1680 series) use an
> ethernet port for traps and such, so no need for special software that
> might be kernel version dependent.
>
> Both cost about the same for their top of the line cards.
>
> Make sure you have battery backed cache.

While browsing the net I found a server with a raid controller
HP Smart Array P400/512MB BBWC Controller
How does one know what this is, if it is any good or so? I guess they
just stuck their "HP" label onto some other raid controller?
I could write HP but I guess that wouldn't help much. And I could also
look through the archives for the mailinglist. When I find the
time,I'll do so and try to create a wiki page.

The problem with this kind of built-in hardware is that it might suck,
and then you can't plug in any other hardware in the box.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] Buying hardware

2009-01-27 Thread M. Edward (Ed) Borasky

da...@lang.hm wrote:
> On Mon, 26 Jan 2009, David Rees wrote:
> 
>> And yes, the more memory you can squeeze into the machine, the better,
>> though you'll find that after a certain point, price starts going up
>> steeply.  Of course, if you only have a 15GB database, once you reach
>> 16GB of memory you've pretty much hit the point of diminishing
>> returns.
> 
> actually, you need more memory than that. besides the data itself you
> would want memory for several other things, among them:
> 
> 1. your OS
> 2. your indexes
> 3. you per-request memory allocations (for sorting, etc)
>   this is highly dependant on your workload (type and number of parallel
> requests)
> 4. 'dead' tuples in your table (that will be cleared by a vaccum, but
> haven't been yet)
> 
> and probably other things as well.
> 
> I don't know how large a database will fit in 16G of ram, but I suspect
> it's closer to 8G than 15G.
> 
> any experts want to throw out a rule-of-thumb here?
> 
> David Lang
> 

It depends on what else the server is doing. If you're running the whole
LAPP stack on a single box, for example, the PHP interpreter will need
space for intermediate data. Apache and the Linux kernel will use less
space.

If PostgreSQL is the only thing running on the server, though, assuming
64-bit Linux, most of the RAM in a box that large should be in either
memory you've deliberately set aside for internal PostgreSQL data
structures or the Linux page cache.

There are starting to be some tools built that will show you how RAM is
allocated, now that recent kernels (2.6.25+) do a better job of
accounting for RAM pages. So I would expect the total memory dedicated
to the database functionality to be much closer to 15 GB than 8 GB.

Given large amounts of RAM and only PostgreSQL running in the server,
the interesting trade-offs become

a. How little memory can you buy without putting your service level
agreements at risk?

b. How do you allocate the PostgreSQL-specific memory buffers at the
expense of the Linux page cache for optimum performance?
-- 
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] Buying hardware

2009-01-27 Thread M. Edward (Ed) Borasky

M. Edward (Ed) Borasky wrote:
> Given large amounts of RAM and only PostgreSQL running in the server,
> the interesting trade-offs become
> 
> a. How little memory can you buy without putting your service level
> agreements at risk?
> 
> b. How do you allocate the PostgreSQL-specific memory buffers at the
> expense of the Linux page cache for optimum performance?

c. What do I do with the idle cores? :)

(or, how can I exploit them by changing my database design or the
PostgreSQL architecture?)

-- 
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] Buying hardware

2009-01-27 Thread david


On Tue, 27 Jan 2009, M. Edward (Ed) Borasky wrote:


da...@lang.hm wrote:

On Mon, 26 Jan 2009, David Rees wrote:


And yes, the more memory you can squeeze into the machine, the better,
though you'll find that after a certain point, price starts going up
steeply.  Of course, if you only have a 15GB database, once you reach
16GB of memory you've pretty much hit the point of diminishing
returns.


actually, you need more memory than that. besides the data itself you
would want memory for several other things, among them:

1. your OS
2. your indexes
3. you per-request memory allocations (for sorting, etc)
  this is highly dependant on your workload (type and number of parallel
requests)
4. 'dead' tuples in your table (that will be cleared by a vaccum, but
haven't been yet)

and probably other things as well.

I don't know how large a database will fit in 16G of ram, but I suspect
it's closer to 8G than 15G.

any experts want to throw out a rule-of-thumb here?



There are starting to be some tools built that will show you how RAM is
allocated, now that recent kernels (2.6.25+) do a better job of
accounting for RAM pages. So I would expect the total memory dedicated
to the database functionality to be much closer to 15 GB than 8 GB.


that's not quite the opposite of the statement that I was trying to make.

assuming that you are not running anything else on the system, how much 
data can you put on the system and run entirely out of ram.


the database has it's overhead (sort buffers, indexes, per-request 
buffers, 'dead tuples', etc) that mean that if you have a database that 
an uncompressed dump takes 8G, you need substantially more than 8G of ram 
to avoid using the disks (other than to store changes)


how much more is the question. I know it is going to vary from 
installation to installation, but is there any guidelines that people can 
start with?


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [PERFORMANCE] Buying hardware

2009-01-27 Thread M. Edward (Ed) Borasky

da...@lang.hm wrote:
> that's not quite the opposite of the statement that I was trying to make.
> 
> assuming that you are not running anything else on the system, how much
> data can you put on the system and run entirely out of ram.
> 
> the database has it's overhead (sort buffers, indexes, per-request
> buffers, 'dead tuples', etc) that mean that if you have a database that
> an uncompressed dump takes 8G, you need substantially more than 8G of
> ram to avoid using the disks (other than to store changes)
> 
> how much more is the question. I know it is going to vary from
> installation to installation, but is there any guidelines that people
> can start with?

I'm not sure there are any rules of thumb / guidelines for that. My
experience has been that doing no disk I/O except writing logs to disk,
creating and updating rows is an unrealistic expectation, even for
"small" databases. The cost is prohibitive, for one thing. And for
capacity planning, what's probably more important is whether the service
level agreements are being met, not whether you're meeting them purely
in RAM or by re-reading data from disk sometimes.

I think it's "easy", however, to solve the inverse problem. Borrow a
huge-memory server from your vendor, put your small database up on it,
run benchmarks and gradually reduce the amount of memory available until
the performance becomes unacceptable.

The tools exist to measure memory allocations while the benchmarks are
running. If you get enough data points (about five for the simplest
models) you can build a model that you could then "invert" to go the
other way. -- take a database size and figure out how much more RAM was
needed to meet the SLAs.

You don't necessarily have to reboot to reduce available memory -- there
are ways you can tie up memory without consuming processor or disk time
to do so. But you would need to "poison" the caches between runs, and
restart PostgreSQL if you're modifying its memory allocations.

-- 
M. Edward (Ed) Borasky

I've never met a happy clam. In fact, most of them were pretty steamed.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Odd behavior with temp usage logging

2009-01-27 Thread Josh Berkus


Craig Ringer wrote:

Josh Berkus wrote:

Folks,

I turned on temp file logging for PostgreSQL to see if I needed to
adjust work_mem.  Oddly, it's logging file usage up to 33 times per
executed query (and no, the query isn't large enough to need 33 separate
sorts).


Are you sure there's not a sort happening inside a loop? It might help
if you posted the actual log messages along with the output of `EXPLAIN
ANALYZE' on your query.


Hmmm, it's possible.  The sort sizes seem very large for the query in 
question, though.  Will have to check when I get back on the system.


That would make a TODO in tempfile logging though ... it would be *far* 
more useful to colect summary stats per query than log each individual sort.


--Josh


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[PERFORM] LIKE Query performance

2009-01-27 Thread Hari, Balaji

Hi,

I am relatively new to PostgreSQL(8.1) and facing the following problem.

We have indexes defined on timestamp and description (create index 
description_idx on event using btree (description varchar_pattern_ops))

EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, 
event_type_id, realm_name, root_session_number, severity, source_name, 
target_key, target_name, timestamp, jdo_version FROM event WHERE description 
like '%mismatch%' ORDER BY timestamp desc;
  QUERY PLAN
--
 Sort  (cost=36267.09..36272.73 rows=2256 width=314) (actual 
time=19255.075..20345.774 rows=647537 loops=1)
   Sort Key: "timestamp"
   Sort Method:  external merge  Disk: 194080kB
   ->  Seq Scan on event  (cost=0.00..36141.44 rows=2256 width=314) (actual 
time=0.080..1475.041 rows=647537 loops=1)
 Filter: ((description)::text ~~ '%mismatch%'::text)
 Total runtime: 22547.292 ms
(6 rows)

But startsWith query use indexes.

EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, 
event_type_id, realm_name, root_session_number, severity, source_name, 
target_key, target_name, timestamp, jdo_version FROM event WHERE description 
like 'mismatch%' ORDER BY timestamp desc;
  QUERY PLAN
---
 Sort  (cost=9.26..9.27 rows=1 width=314) (actual time=0.766..0.766 rows=0 
loops=1)
   Sort Key: "timestamp"
   Sort Method:  quicksort  Memory: 17kB
   ->  Index Scan using description_idx on event  (cost=0.00..9.25 rows=1 
width=314) (actual time=0.741..0.741 rows=0 loops=1)
 Index Cond: (((description)::text ~>=~ 'mismatch'::text) AND 
((description)::text ~<~ 'mismatci'::text))
 Filter: ((description)::text ~~ 'mismatch%'::text)
 Total runtime: 0.919 ms
(7 rows)

Is there any tweaks to force pgsql to use index on description?

Balaji

P.S The event database has 700k records.

Re: [PERFORM] LIKE Query performance

2009-01-27 Thread Oleg Bartunov


Only wildspeed http://www.sai.msu.su/~megera/wiki/wildspeed
has index support for %text% 
But, it has limitations.


Oleg
On Tue, 27 Jan 2009, Hari, Balaji wrote:


Hi,

I am relatively new to PostgreSQL(8.1) and facing the following problem.

We have indexes defined on timestamp and description (create index 
description_idx on event using btree (description varchar_pattern_ops))

EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, 
event_type_id, realm_name, root_session_number, severity, source_name, 
target_key, target_name, timestamp, jdo_version FROM event WHERE description 
like '%mismatch%' ORDER BY timestamp desc;
 QUERY PLAN
--
Sort  (cost=36267.09..36272.73 rows=2256 width=314) (actual 
time=19255.075..20345.774 rows=647537 loops=1)
  Sort Key: "timestamp"
  Sort Method:  external merge  Disk: 194080kB
  ->  Seq Scan on event  (cost=0.00..36141.44 rows=2256 width=314) (actual 
time=0.080..1475.041 rows=647537 loops=1)
Filter: ((description)::text ~~ '%mismatch%'::text)
Total runtime: 22547.292 ms
(6 rows)

But startsWith query use indexes.

EXPLAIN ANALYZE SELECT event_id, category, current_session_number, description, 
event_type_id, realm_name, root_session_number, severity, source_name, 
target_key, target_name, timestamp, jdo_version FROM event WHERE description 
like 'mismatch%' ORDER BY timestamp desc;
 QUERY PLAN
---
Sort  (cost=9.26..9.27 rows=1 width=314) (actual time=0.766..0.766 rows=0 
loops=1)
  Sort Key: "timestamp"
  Sort Method:  quicksort  Memory: 17kB
  ->  Index Scan using description_idx on event  (cost=0.00..9.25 rows=1 
width=314) (actual time=0.741..0.741 rows=0 loops=1)
Index Cond: (((description)::text ~>=~ 'mismatch'::text) AND 
((description)::text ~<~ 'mismatci'::text))
Filter: ((description)::text ~~ 'mismatch%'::text)
Total runtime: 0.919 ms
(7 rows)

Is there any tweaks to force pgsql to use index on description?

Balaji

P.S The event database has 700k records.



Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Odd behavior with temp usage logging

Re: [PERFORM] [PERFORMANCE] Buying hardware

Re: [PERFORM] [PERFORMANCE] Buying hardware

Re: [PERFORM] [PERFORMANCE] Buying hardware

Re: [PERFORM] [PERFORMANCE] Buying hardware

Re: [PERFORM] [PERFORMANCE] Buying hardware

Re: [PERFORM] Odd behavior with temp usage logging

[PERFORM] LIKE Query performance

Re: [PERFORM] LIKE Query performance

9 matches

Site Navigation

Mail list logo

Footer information