Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Matthew Wakeling

On Mon, 25 Jan 2010, Viji V Nair wrote:

I think this wont help that much if you have a single machine. Partition the
table and keep the data in different nodes. Have a look at the tools like
pgpool.II


So partitioning. You have three choices:

1. Use a single table
2. Partition the table on the same server
3. Partition the data across multiple servers.

This is in increasing order of complexity.

There will probably be no problem at all with option 1. The only problem 
arises if you run a query that performs a full sequential scan of the 
entire table, which would obviously take a while. If your queries are 
indexable, then option 1 is almost certainly the best option.


Option 2 adds complexity in the Postgres server. You will need to 
partition your tables in a logical manner - that is, there needs to be 
some difference between rows in table a compared to rows in table b. This 
means that the partitioning will in effect be a little like indexing. You 
do not want to have too many partitions. The advantage is that if a query 
requires a full sequential scan, then there is the possibility of skipping 
some of the partitions, although there is some complexity involved in 
getting this to work correctly. In a lot of cases, partitioning will make 
queries slower by confusing the planner.


Option 3 is only useful when you have a real performance problem with 
long-running queries (partitioning the data across servers) or with very 
large numbers of queries (duplicating the data across servers). It also 
adds much complexity. It is fairly simple to run a filter these results 
from the table queries across multiple servers, but if that was all you 
were doing, you may as well use an index instead. It becomes impossible to 
perform proper cross-referencing queries without some very clever software 
(because not all the data is available on the server), which will probably 
be hard to manage and slow down the execution anyway.


My recommendation would be to stick with a single table unless you have a 
real need to partition.


Matthew

--
Note: some countries impose serious penalties for a conspiracy to overthrow
 the political system. THIS DOES NOT FIX THE VULNERABILITY.
  -- http://seclists.org/vulnwatch/2003/q2/0002.html

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Matthew Wakeling

On Mon, 25 Jan 2010, nair rajiv wrote:
I am working on a project that will take out structured content from 
wikipedia and put it in our database...

there is a table which will approximately have 5 crore entries after data
harvesting.


Have you asked the Wikimedia Foundation if they mind you consuming that 
much of their bandwidth, or even if there are copyright issues involved in 
grabbing that much of their data?


(The other problem with using the word crore is that although it may 
mean 1000 in a few countries, it could also mean 50.)


Matthew

--
Of course it's your fault. Everything here's your fault - it says so in your
contract.- Quark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread nair rajiv
On Tue, Jan 26, 2010 at 5:15 PM, Matthew Wakeling matt...@flymine.orgwrote:

 On Mon, 25 Jan 2010, nair rajiv wrote:

 I am working on a project that will take out structured content from
 wikipedia and put it in our database...

 there is a table which will approximately have 5 crore entries after data
 harvesting.


 Have you asked the Wikimedia Foundation if they mind you consuming that
 much of their bandwidth, or even if there are copyright issues involved in
 grabbing that much of their data?



We are downloading the nt and owl files kept for download at
http://wiki.dbpedia.org/Downloads34


 (The other problem with using the word crore is that although it may mean
 1000 in a few countries, it could also mean 50.)

 Matthew

 --
 Of course it's your fault. Everything here's your fault - it says so in
 your
 contract.- Quark



[PERFORM] Poor query plan across OR operator

2010-01-26 Thread Mark Hills
One of our most-used queries performs poorly (taking over 2 seconds) and a 
tiny amount of refactoring shows it can be fast (less than 1ms) by 
transforming the OR case (which spans two tables) into a UNION.

I have created a simple test case (below) which shows the difference we 
are seeing in query plans before and after refactoring.

Is it beyond the ability of the query planner to optimise this query 
without refactoring? Or is the appropriate index missing, and if so, what 
would it be?

Perhaps the refactored query is, in fact, different and could produce 
different data in certain corner-cases; I can't see where this could be 
though.

Your suggestions are appreciated and I hope the information is useful. 
Many thanks.

Mark


-- The plans below are from PostgreSQL 8.5alpha3. Also tested with
-- similar results on PostgreSQL 8.4.2

-- Data structure where a container contains multiple items

CREATE TABLE container (
id integer PRIMARY KEY,
selected bool NOT NULL DEFAULT false
);

CREATE TABLE item (
container_id integer NOT NULL
REFERENCES container(id) ON DELETE CASCADE,
n integer NOT NULL,
selected bool NOT NULL DEFAULT false,
PRIMARY KEY (container_id, n)
);

-- Partial indexes to find selected containers or selected items

CREATE INDEX container_selected
ON container (selected)
WHERE selected IS true;

CREATE INDEX item_selected
ON item (selected)
WHERE selected IS true;

-- Populate the data; for a small minority of items and containers,
-- 'selected' is true

CREATE LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION populate()
RETURNS VOID
AS $$
DECLARE
i integer;
j integer;
BEGIN
FOR i IN 0..999 LOOP

INSERT INTO container (id, selected)
VALUES (i, RANDOM()  0.01);

FOR j IN 0..999 LOOP
INSERT INTO item (container_id, n, selected)
VALUES (i, j, RANDOM()  0.001);
END LOOP;

END LOOP;
END
$$ LANGUAGE plpgsql;

SELECT populate();
VACUUM ANALYZE;

SELECT COUNT(*) FROM container; -- 1000
SELECT COUNT(*) FROM container WHERE selected IS true; -- 9
SELECT COUNT(*) FROM item; -- 100
SELECT COUNT(*) FROM item WHERE selected IS true; -- 1004

-- A query to find all items where the item or container is selected

EXPLAIN ANALYZE
SELECT container_id, n
FROM item
INNER JOIN container ON item.container_id = container.id
WHERE item.selected IS true
OR container.selected IS true;

-- Resulting query plan
--
-- Hash Join  (cost=28.50..92591.11 rows=10016 width=8) (actual 
time=372.659..1269.207 rows=9996 loops=1)
--   Hash Cond: (item.container_id = container.id)
--   Join Filter: ((item.selected IS TRUE) OR (container.selected IS TRUE))
--   -  Seq Scan on item  (cost=0.00..78778.68 rows=1002468 width=9) (actual 
time=370.590..663.764 rows=100 loops=1)
--   -  Hash  (cost=16.00..16.00 rows=1000 width=5) (actual time=0.805..0.805 
rows=1000 loops=1)
-- -  Seq Scan on container  (cost=0.00..16.00 rows=1000 width=5) 
(actual time=0.007..0.296 rows=1000 loops=1)
-- Total runtime: 1271.676 ms
-- (7 rows)

-- The refactored SQL, which queries the same data but is fast

EXPLAIN ANALYZE
SELECT container_id, n
FROM item
INNER JOIN container ON item.container_id = container.id
WHERE item.selected IS true
UNION
SELECT container_id, n
FROM item
INNER JOIN container ON item.container_id = container.id
WHERE container.selected IS true;

-- Resulting query plan:
--
-- HashAggregate  (cost=18018.43..18120.33 rows=10190 width=8) (actual 
time=22.784..26.341 rows=9996 loops=1)
--   -  Append  (cost=28.50..17967.48 rows=10190 width=8) (actual 
time=0.908..16.676 rows=10004 loops=1)
-- -  Hash Join  (cost=28.50..90.05 rows=1002 width=8) (actual 
time=0.907..3.113 rows=1004 loops=1)
--   Hash Cond: (public.item.container_id = public.container.id)
--   -  Index Scan using item_selected on item  (cost=0.00..47.77 
rows=1002 width=8) (actual time=0.036..1.425 rows=1004 loops=1)
-- Index Cond: (selected = true)
--   -  Hash  (cost=16.00..16.00 rows=1000 width=4) (actual 
time=0.856..0.856 rows=1000 loops=1)
-- -  Seq Scan on container  (cost=0.00..16.00 rows=1000 
width=4) (actual time=0.006..0.379 rows=1000 loops=1)
-- -  Nested Loop  (cost=0.00..17775.53 rows=9188 width=8) (actual 
time=0.024..9.175 rows=9000 loops=1)
--   -  Index Scan using container_selected on container  
(cost=0.00..12.33 rows=9 width=4) (actual time=0.005..0.012 rows=9 loops=1)
-- Index Cond: (selected = true)
--   -  Index Scan using item_pkey on item  (cost=0.00..1960.93 
rows=1021 width=8) (actual time=0.014..0.460 rows=1000 loops=9)
-- Index Cond: (public.item.container_id = 
public.container.id)
-- Total runtime: 28.617 ms
-- (14 rows)

-- 
Sent via 

Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Grzegorz Jaśkiewicz
just create index on both columns:
CREATE INDEX foo_i ON foo(bar1, bar2);


HTH

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Tom Lane
Mark Hills mark.hi...@framestore.com writes:
 One of our most-used queries performs poorly (taking over 2 seconds) and a 
 tiny amount of refactoring shows it can be fast (less than 1ms) by 
 transforming the OR case (which spans two tables) into a UNION.

I'd suggest going with the UNION.  We are unlikely to make the planner
look for such cases, because usually such a transformation would be a
net loss.  It seems like rather a corner case that it's a win even on
your example.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill

Dear All,

Just wondering whether there is a missing scope for the query planner 
(on 8.4.2) to be cleverer than it currently is.


Specifically, I wonder whether the optimiser should know that by 
converting a CASE condition into a WHERE condition, it can use an index.


Have I found a possible enhancement, or is this simply too hard to do?

Best wishes,

Richard



Example:


In this example, tbl_tracker has 255751 rows, with a primary key id, 
whose values lie uniformly in the range 1...1255750.


If one is trying to count multiple conditions, the following query seems 
to be the most obvious way to do it:


SELECT
  SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
  SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
  SUM (case when id  122 and id  123 then 1 else 0 end) AS c3,
  SUM (case when id  123 and id  124 then 1 else 0 end) AS c4,
  SUM (case when id  124 and id  125 then 1 else 0 end) AS c5
FROM tbl_tracker;


  c1  |  c2  |  c3  |  c4  |  c5
--+--+--+--+--
 2009 | 2018 | 2099 | 2051 | 2030

Time: 361.666 ms



This can be manually optimised into a far uglier (but much much faster) 
query:


SELECT * FROM
 (SELECT COUNT (1) AS c1 FROM tbl_tracker
WHERE id  120 and id  121) AS s1,
 (SELECT COUNT (1) AS c2 FROM tbl_tracker
WHERE id  121 and id  122) AS s2,
 (SELECT COUNT (1) AS c3 FROM tbl_tracker
WHERE id  122 and id  123) AS s3,
 (SELECT COUNT (1) AS c4 FROM tbl_tracker
WHERE id  123 and id  124) AS s4,
 (SELECT COUNT (1) AS c5 FROM tbl_tracker
WHERE id  124 and id  125) AS s5

  c1  |  c2  |  c3  |  c4  |  c5
--+--+--+--+--
 2009 | 2018 | 2099 | 2051 | 2030
(1 row)

Time: 21.091 ms





Debugging
-

The simple queries are:

SELECT SUM (case when id  120 and id  121 then 1 else 0 end) 
from tbl_tracker;


Time: 174.804 ms

Explain shows that this does a sequential scan.



SELECT COUNT(1) from tbl_tracker WHERE id  120 and id  121;

Time: 4.153 ms

Explain shows that this uses the index, as expected.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Tom Lane
Richard Neill rn...@cam.ac.uk writes:
 SELECT
SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
...
 FROM tbl_tracker;

 This can be manually optimised into a far uglier (but much much faster) 
 query:

 SELECT * FROM
   (SELECT COUNT (1) AS c1 FROM tbl_tracker
  WHERE id  120 and id  121) AS s1,
   (SELECT COUNT (1) AS c2 FROM tbl_tracker
  WHERE id  121 and id  122) AS s2,
   ...

We're unlikely to consider doing this, for a couple of reasons:
it's unlikely to come up often enough to justify the cycles the planner
would spend looking for the case *on every query*, and it requires very
special knowledge about the behavior of two specific aggregate functions,
which is something the planner tends to avoid using.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Matthew Wakeling

On Tue, 26 Jan 2010, Richard Neill wrote:
SELECT SUM (case when id  120 and id  121 then 1 else 0 end) from 
tbl_tracker;


Explain shows that this does a sequential scan.


I'd defer to Tom on this one, but really, for Postgres to work this out, 
it would have to peer deep into the mysterious SUM function, and realise 
that the number zero is a noop. I suppose it would be possible, but you'd 
have to define noops for each of the different possible functions, *and* 
make the planner clever enough to spot the noop-matching number in the 
else and convert the WHEN into a WHERE.


In my mind, this is quite a lot of work for the planner to do to solve 
this one. That translates into quite a lot of work for some poor 
programmer to do to achieve it. If you have the money, then hire someone 
to do it!


Matthew

--
I don't want the truth. I want something I can tell parliament!
 -- Rt. Hon. Jim Hacker MP

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Greg Smith

Viji V Nair wrote:
A 15k rpm SAS drive will give you a throughput of 12MB  and 120 IOPS. 
Now you can calculate the number of disks, specifically spindles, for 
getting your desired throughput and IOPs


I think you mean 120MB/s for that first part.  Regardless, presuming you 
can provision a database just based on IOPS rarely works.  It's nearly 
impossible to estimate what you really need anyway for a database app, 
given that much of real-world behavior depends on the cached in memory 
vs. uncached footprint of the data you're working with.  By the time you 
put a number of disks into an array, throw a controller card cache on 
top of it, then add the OS and PostgreSQL caches on top of those, you 
are so far disconnected from the underlying drive IOPS that speaking in 
those terms doesn't get you very far.  I struggle with this every time I 
talk with a SAN vendor.  Their fixation on IOPS without considering 
things like how sequential scans mixed into random I/O will get handled 
is really disconnected from how databases work in practice.  For 
example, I constantly end up needing to detune IOPS in favor of 
readahead to make SELECT x,y,z FROM t run at an acceptable speed on 
big tables.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Richard Neill

Thanks for your answers.


David Wilson wrote:

 Why not simply add the where clause to the original query?

 SELECT
 SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
 SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
 SUM (case when id  122 and id  123 then 1 else 0 end) AS c3,
 SUM (case when id  123 and id  124 then 1 else 0 end) AS c4,
 SUM (case when id  124 and id  125 then 1 else 0 end) AS c5
 FROM tbl_tracker WHERE (id120) AND (id125);

 I didn't populate any test tables, but I'd expect that to do just as
 well without being any uglier than the original query is.

You're absolutely right, but I'm afraid this won't help. I'd simplified 
the original example query, but in real life, I've got about 50 
different sub-ranges, which cover virtually all the id-space.


--

Tom Lane wrote:

Richard Neill rn...@cam.ac.uk writes:

SELECT
   SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
   SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
   ...
FROM tbl_tracker;


This can be manually optimised into a far uglier (but much much faster) 
query:



SELECT * FROM
  (SELECT COUNT (1) AS c1 FROM tbl_tracker
 WHERE id  120 and id  121) AS s1,
  (SELECT COUNT (1) AS c2 FROM tbl_tracker
 WHERE id  121 and id  122) AS s2,
  ...


We're unlikely to consider doing this, for a couple of reasons:
it's unlikely to come up often enough to justify the cycles the planner
would spend looking for the case *on every query*, and it requires very
special knowledge about the behavior of two specific aggregate functions,
which is something the planner tends to avoid using.



OK - that's all I was wondering. I thought I'd raise this in case it 
might be helpful.


I'll add a note to:
http://www.postgresql.org/docs/8.4/interactive/functions-conditional.html
to point out that this is something of a trap for the unwary

Regards,

Richard

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Viji V Nair
On Tue, Jan 26, 2010 at 11:11 PM, Greg Smith g...@2ndquadrant.com wrote:

 Viji V Nair wrote:

 A 15k rpm SAS drive will give you a throughput of 12MB  and 120 IOPS. Now
 you can calculate the number of disks, specifically spindles, for getting
 your desired throughput and IOPs


 I think you mean 120MB/s for that first part.  Regardless, presuming you
 can provision a database just based on IOPS rarely works.  It's nearly
 impossible to estimate what you really need anyway for a database app, given
 that much of real-world behavior depends on the cached in memory vs.
 uncached footprint of the data you're working with.  By the time you put a
 number of disks into an array, throw a controller card cache on top of it,
 then add the OS and PostgreSQL caches on top of those, you are so far
 disconnected from the underlying drive IOPS that speaking in those terms
 doesn't get you very far.  I struggle with this every time I talk with a SAN
 vendor.  Their fixation on IOPS without considering things like how
 sequential scans mixed into random I/O will get handled is really
 disconnected from how databases work in practice.  For example, I constantly
 end up needing to detune IOPS in favor of readahead to make SELECT x,y,z
 FROM t run at an acceptable speed on big tables.


Yes, you are right.

There are catches in the SAN controllers also. SAN vendors wont give that
much information regarding their internal controller design. They will say
they have 4 external 4G ports, you should also check how many internal ports
they have and the how the controllers are operating,  in Active-Active or
Active- Standby mode.




 --
 Greg Smith2ndQuadrant   Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com  www.2ndQuadrant.com




Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Robert Haas
On Tue, Jan 26, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Mark Hills mark.hi...@framestore.com writes:
 One of our most-used queries performs poorly (taking over 2 seconds) and a
 tiny amount of refactoring shows it can be fast (less than 1ms) by
 transforming the OR case (which spans two tables) into a UNION.

 I'd suggest going with the UNION.  We are unlikely to make the planner
 look for such cases, because usually such a transformation would be a
 net loss.  It seems like rather a corner case that it's a win even on
 your example.

This has come up for me, too.  But even if we grant that it's
worthwhile, it seems like a tricky optimization to apply in practice,
because unless your row estimates are very accurate, you might easily
apply it when you would have been better off leaving it alone.  And it
seems like getting accurate estimates would be hard, since the
conditions might be highly correlated, or not, and they're on
different tables.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] splitting data into multiple tables

2010-01-26 Thread Greg Smith

Viji V Nair wrote:
There are catches in the SAN controllers also. SAN vendors wont give 
that much information regarding their internal controller design. They 
will say they have 4 external 4G ports, you should also check how many 
internal ports they have and the how the controllers are operating,  
in Active-Active or Active- Standby mode.


Right, the SAN cache serves the same purpose as the controller cache on 
direct-attached storage.  I've never seen a Fiber Channel card that had 
its own local cache too; doubt that's even possible.  So I think of them 
as basically being the same type of cache, with the primary difference 
being that the transfers between the host and the cache has some latency 
on it with FC compared to direct storage.


You're right that people should question the internal design too of 
course.  Some days I wonder if I'm in the wrong business--the people who 
do SAN tuning seem to have no idea what they're doing and yet are still 
expensive to hire.  But this is off-topic for the question being asked here.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 26, 2010 at 11:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'd suggest going with the UNION.  We are unlikely to make the planner
 look for such cases, because usually such a transformation would be a
 net loss.  It seems like rather a corner case that it's a win even on
 your example.

 This has come up for me, too.  But even if we grant that it's
 worthwhile, it seems like a tricky optimization to apply in practice,
 because unless your row estimates are very accurate, you might easily
 apply it when you would have been better off leaving it alone.  And it
 seems like getting accurate estimates would be hard, since the
 conditions might be highly correlated, or not, and they're on
 different tables.

Actually, in the type of case Mark is showing, the estimates might be
*more* accurate since the condition gets decomposed into separate
per-table conditions.  I'm still dubious about how often it's a win
though.

There's another problem, which is that transforming to UNION isn't
necessarily a safe transformation: it only works correctly if the
query output columns are guaranteed unique.  Otherwise it might fold
duplicates together that would have remained distinct in the original
query.  If your query output columns include a primary key then the
planner could be confident this was safe, but that reduces the scope
of the transformation even further ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor query plan across OR operator

2010-01-26 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 Actually, in the type of case Mark is showing, the estimates might
 be *more* accurate since the condition gets decomposed into
 separate per-table conditions.  I'm still dubious about how often
 it's a win though.
 
 There's another problem, which is that transforming to UNION isn't
 necessarily a safe transformation: it only works correctly if the
 query output columns are guaranteed unique.  Otherwise it might
 fold duplicates together that would have remained distinct in the
 original query.  If your query output columns include a primary
 key then the planner could be confident this was safe, but that
 reduces the scope of the transformation even further ...
 
FWIW, I've seen this optimization in other products.  I remember
being surprised sometimes that it wasn't used where I thought it
would be, and I had to explicitly transform the query to UNION to
get the performance benefit.  That was probably due to the sort of
constraints you mention on when it is truly equivalent.
 
Personally, I'd put this one in the it would be nice if category. 
Does it merit a TODO list entry, perhaps?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-26 Thread Scott Carey

On Jan 25, 2010, at 6:55 AM, fka...@googlemail.com wrote:

 Scott Carey:
 
 (2) The tests:
 
 Note: The standard speed was about 800MB/40s, so 20MB/s.
 
 
 a)
 What I changed: fsync=off
 Result: 35s, so 5s faster.
 
 
 b) like a) but:
 checkpoint_segments=128 (was 3)
 autovacuum=off
 
 Result: 35s (no change...?!)
 
 
 yes, more checkpoint_segments will help if your
 shared_buffers is larger, it won't do a whole lot
 otherwise.  Generally, I like to keep these roughly equal
 sized as a starting point for any small to medium sized
 configuration.  So if shared_buffers is 1GB, that takes 64
 checkpoint segments to hold for heavy write scenarios.
 
 (1)
 
 Ok, that's what I tested: 1024 MB shared_buffers, 64
 checkpoint segments.
 
 Unfortunatelly I could not run it on the same hardware
 anymore: The data is written to a single disk now, not raid
 anymore. So with the default shared_buffers of 8 MB (?) we
 should expect 45s for writing the 800 MB. With the large
 shared_buffers and checkpoints (mentioned above) I got this:
 
 1. run (right after postgres server (re-)start): 28s (!)
 2. run: 44s
 3. run: 42s
 
 So, roughly the same as with small buffers.
 
 
 (2)
 Then I switched again from 8.2.4 to 8.4.2:
 
 1. run (after server start): 25s.
 2. run: 38s
 3. run: 38s
 

If you expect to typically only run a batch of these large inserts 
occasionally, hopefully the 25s performance will be what you get.  

 So, 8.4 helped a bit over 8.2.
 
 
 (3) All in all
 
 By (1) + (2) the performance bottleneck has, however,
 changed a lot (as shown here by the performance monitor):
 
 Now, the test system is definitly disk bound. Roughly
 speaking, at the middle of the whole test, for about 40-50%
 of the time, the 'data' disk was at 100% (and the 'WAL' at
 20%), while before and after that the 'WAL' disk had a lot
 of peaks at 100% (and 'data' disk at 30%).
 
 The average MB/s of the 'data' disk was 40 MB/s (WAL:
 20MB/s) -- while the raw performance is 800MB/40s = 20MB/s,
 so still *half* what the disk does.
 
 So, this remains as the last open question to me: It seems
 the data is doubly written to the 'data' disk, although WAL
 is written to the separate 'WAL' disk.
 

It appears as though there is clear evidence that the system is writing data 
twice (excluding WAL).  This is where my Postgres knowledge ends and someone 
else will have to comment.  Why would it write the TOAST data twice?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Should the optimiser convert a CASE into a WHERE if it can?

2010-01-26 Thread Scott Carey

On Jan 26, 2010, at 9:41 AM, Richard Neill wrote:

 Thanks for your answers.
 
 
 David Wilson wrote:
 
 Why not simply add the where clause to the original query?
 
 SELECT
 SUM (case when id  120 and id  121 then 1 else 0 end) AS c1,
 SUM (case when id  121 and id  122 then 1 else 0 end) AS c2,
 SUM (case when id  122 and id  123 then 1 else 0 end) AS c3,
 SUM (case when id  123 and id  124 then 1 else 0 end) AS c4,
 SUM (case when id  124 and id  125 then 1 else 0 end) AS c5
 FROM tbl_tracker WHERE (id120) AND (id125);
 
 I didn't populate any test tables, but I'd expect that to do just as
 well without being any uglier than the original query is.
 
 You're absolutely right, but I'm afraid this won't help. I'd simplified 
 the original example query, but in real life, I've got about 50 
 different sub-ranges, which cover virtually all the id-space.
 

Well, it probably shouldn't use the index if it covers the vast majority of the 
table.  I wonder if it is actually faster to reformulate with WHERE or not at 
that point -- it might be slower.
-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server to improve performance on our large and busy DB - advice?

2010-01-26 Thread Greg Smith

Carlo Stonebanks wrote:


effective_cache_size should be set to how much memory is leftover 
for disk caching after taking into account what's used by the 
operating system, dedicated PostgreSQL memory, and other applications. 
If it's set too low, indexes may not be used for executing queries the 
way you'd expect. Setting effective_cache_size to 1/2 of total memory 
would be a normal conservative setting. You might find a better 
estimate by looking at your operating system's statistics. On 
UNIX-like systems, add the free+cached numbers from free or top. On 
Windows see the System Cache in the Windows Task Manager's 
Performance tab.


Are these values to look at BEFORE starting PG? If so, how do I relate 
the values returned to setting the effective_cache_size values?


After starting the database.  You can set effective_cache_size to a size 
in megabytes, so basically you'd look at the amount of free cache, maybe 
round down a bit, and set effective_cache_size to exactly that.  It's 
not super important to get the number right.  The point is that the 
default is going to be a tiny number way smaller than the RAM in your 
system, and even getting it within a factor of 2 or 3 of reality will 
radically change some types of query plans.


PS Loved your 1995 era pages. Being a musician, it was great to read 
your recommendations on how to buy these things called CD's. I 
Googled the term, and they appear to be some ancient precursor to MP3s 
which people actually PAID for. What kind of stone were they engraved 
on? ;-D


They're plastic, just like the iPod, iPhone, iToilet, or whatever other 
white plastic Apple products people listen to music during this new 
era.  Since both my CD collection and the stereo I listen to them on are 
each individually worth more than my car, it's really tough to sell me 
on all the terrible sounding MP3s I hear nowadays.  I'm the guy who can 
tell you how the LP, regular CD, gold CD, and SACD/DVD-A for albums I 
like all compare, so dropping below CD quality is right out.  If you 
ever find yourself going hey, I wish I had six different versions of 
'Dark Side of the Moon' around so I could compare the subtle differences 
in the mastering and mix on each of them, I'm your guy.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserting 8MB bytea: just 25% of disk perf used?

2010-01-26 Thread Greg Smith

Scott Carey wrote:

#bgwriter_delay=200ms
#bgwriter_lru_percent=1.0
#bgwriter_lru_maxpages=5
#bgwriter_all_percent=0.333
#bgwriter_all_maxpages=5
#checkpoint_segments=3
#checkpoint_timeout=5min
#checkpoint_warning=30s



Check out this for info on these parameters
http://wiki.postgresql.org/wiki/User:Gsmith  (Is there a better link Greg?)
  


Nope.  I started working on that back when I had some hope that it was 
possible to improve the background writer in PostgreSQL 8.2 without 
completely gutting it and starting over.  The 8.3 development work 
proved that idea was mistaken, which meant historical trivia about how 
the ineffective 8.2 version worked wasn't worth cleaning up to 
presentation quality anymore.  Stuck it on my personal page on the wiki 
just so I didn't lose it and could point at it, never developed into a 
proper article.


Generally, my advice for people running 8.2 is to turn the background 
writer off altogether:


bgwriter_lru_maxpages=0
bgwriter_all_maxpages=0

Because what is there by default isn't enough to really work, and if you 
crank it up enough to do something useful it will waste a lot 
resources.  It's possible with careful study to find a useful middle 
ground--I know Kevin Grittner accomplished that on their 8.2 install, 
and I did it once in a way that wasn't horrible--but you're unlikely to 
just get one easily.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



[PERFORM] Re: [HACKERS] Re: Faster CREATE DATABASE by delaying fsync (was 8.4.1 ubuntu karmic slow createdb)

2010-01-26 Thread Greg Smith

Greg Stark wrote:

Actually before we get there could someone who demonstrated the
speedup verify that this patch still gets that same speedup?
  


Let's step back a second and get to the bottom of why some people are 
seeing this and others aren't.  The original report here suggested this 
was an ext4 issue.  As I pointed out recently on the performance list, 
the reason for that is likely that the working write-barrier support for 
ext4 means it's passing through the fsync to lying hard drives via a 
proper cache flush, which didn't happen on your typical ext3 install.  
Given that, I'd expect I could see the same issue with ext3 given a 
drive with its write cache turned off, so that the theory I started 
trying to prove before seeing the patch operate.


What I did was create a little test program that created 5 databases and 
then dropped them:


\timing
create database a;
create database b;
create database c;
create database d;
create database e;
drop database a;
drop database b;
drop database c;
drop database d;
drop database e;

(All of the drop times were very close by the way; around 100ms, nothing 
particularly interesting there)


If I have my system's boot drive (attached to the motherboard, not on 
the caching controller) in its regular, lying mode with write cache on, 
the creates take the following times:


Time: 713.982 ms  Time: 659.890 ms  Time: 590.842 ms  Time: 675.506 ms  
Time: 645.521 ms


A second run gives similar results; seems quite repeatable for every 
test I ran so I'll just show one run of each.


If I then turn off the write-cache on the drive:

$ sudo hdparm -W 0 /dev/sdb

And repeat, these times show up instead:

Time: 6781.205 ms  Time: 6805.271 ms  Time: 6947.037 ms  Time: 6938.644 
ms  Time: 7346.838 ms


So there's the problem case reproduced, right on regular old ext3 and 
Ubuntu Jaunty:  around 7 seconds to create a database, not real impressive.


Applying the last patch you attached, with the cache on, I see this:

Time: 396.105 ms  Time: 389.984 ms  Time: 469.800 ms  Time: 386.043 ms  
Time: 441.269 ms


And if I then turn the write cache off, back to slow times, but much better:

Time: 2162.687 ms  Time: 2174.057 ms  Time: 2215.785 ms  Time: 2174.100 
ms  Time: 2190.811 ms


That makes the average times I'm seeing on my server:

HEAD  Cached:  657 ms Uncached:  6964 ms
Patched Cached:  417 ms Uncached:  2183 ms

Modest speedup even with a caching drive, and a huge speedup in the case 
when you have one with slow fsync.  Looks to me that if you address 
Tom's concern about documentation and function naming, comitting this 
patch will certainly deliver as promised on the performance side.  Maybe 
2 seconds is still too long for some people, but it's at least a whole 
lot better.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.co


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance