Re: [PERFORM] mysql to postgresql, performance questions

2010-03-22 Thread Merlin Moncure
On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote:
 Note however that Oracle offeres full transactionality and does in place row
 updates. There is more than one way to do it.

There's no free lunch.  If you do mvcc you have to maintain multiple
versions of the same row.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-22 Thread Pierre C
On Mon, 22 Mar 2010 12:14:51 +0100, Merlin Moncure mmonc...@gmail.com  
wrote:



On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote:
Note however that Oracle offeres full transactionality and does in  
place row

updates. There is more than one way to do it.


There's no free lunch.


MVCC : VACUUM
Oracle : Rollback Segments
MyISAM : no concurrency/transactions

It's all about which compromise suits you ;)

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes

2010-03-22 Thread Matthew Wakeling

On Fri, 19 Mar 2010, Stephen Frost wrote:
...it has to go to an external on-disk sort (see later on, and how to 
fix that).


This was covered on this list a few months ago, in 
http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and 
http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php


There seemed to be some consensus that allowing a materialise in front of 
an index scan might have been a good change. Was there any movement on 
this front?



Limit  (cost=66681.50..66681.50 rows=1 width=139) (actual
time=7413.489..7413.489 rows=1 loops=1)
  -  Merge Anti Join  (cost=40520.17..66681.50 rows=367793 width=139)
(actual time=3705.078..7344.256 rows=101 loops=1)
Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id =
f2.user_id))
-  Index Scan using user_ref on friends f1
(cost=0.00..26097.86 rows=2818347 width=139) (actual
time=0.093..1222.592 rows=1917360 loops=1)
-  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
(actual time=3704.977..5043.347 rows=1990148 loops=1)
  -  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
(actual time=3704.970..4710.703 rows=1990148 loops=1)
Sort Key: f2.ref_id, f2.user_id
Sort Method:  external merge  Disk: 49576kB
-  Seq Scan on friends f2  (cost=0.00..18143.18
rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1)
Total runtime: 7422.516 ms



If you had an index on ref_id,user_id (as well as the one on
user_id,ref_id), it'd probably be able to do in-order index traversals
on both and be really fast...  But then updates would be more expensive,
of course, since it'd have more indexes to maintain.


That isn't necessarily so, until the issue referred to in the above linked 
messages is resolved. It depends.


Matthew

--
I've run DOOM more in the last few days than I have the last few
months.  I just love debugging ;-)  -- Linus Torvalds

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes

2010-03-22 Thread Tom Lane
Matthew Wakeling matt...@flymine.org writes:
 On Fri, 19 Mar 2010, Stephen Frost wrote:
 ...it has to go to an external on-disk sort (see later on, and how to 
 fix that).

 This was covered on this list a few months ago, in 
 http://archives.postgresql.org/pgsql-performance/2009-08/msg00184.php and 
 http://archives.postgresql.org/pgsql-performance/2009-08/msg00189.php

 There seemed to be some consensus that allowing a materialise in front of 
 an index scan might have been a good change. Was there any movement on 
 this front?

Yes, 9.0 will consider plans like

 Merge Join  (cost=0.00..14328.70 rows=100 width=488)
   Merge Cond: (a.four = b.hundred)
   -  Index Scan using fouri on tenk1 a  (cost=0.00..1635.62 rows=1 
width=244)
   -  Materialize  (cost=0.00..1727.16 rows=1 width=244)
 -  Index Scan using tenk1_hundred on tenk1 b  (cost=0.00..1702.16 rows
=1 width=244)

Some experimentation shows that it won't insert the materialize unless
quite a bit of re-fetching is predicted (ie neither side of the join is
unique).  We might need to tweak the cost parameters once we get some
field experience with it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling

On Sat, 20 Mar 2010, Yeb Havinga wrote:

The gist virtual pages would then match more the original blocksizes that
were used in Guttman's R-tree paper (first google result, then figure 4.5).
Since the nature/characteristics of the underlying datatypes and keys is not
changed, it might be that with the disk pages getting larger, gist indexing
has therefore become unexpectedly inefficient.


Yes, that is certainly a factor. For example, the page size for bioseg 
which we use here is 130 entries, which is very excessive, and doesn't 
allow very deep trees. On the other hand, it means that a single disc seek 
performs quite a lot of work.



But I am also not really into the core-gist code, but do have a motivation
to dive into it (more than 200% performance increase in Mathew's test case).
However I'd like to verify for community support before working on it.


I'd also love to dive into the core gist code, but am rather daunted by 
it. I believe that there is something there that is taking more time than 
I can account for. The indexing algorithm itself is good.


Matthew

--
The problem with defending the purity of the English language is that
English is about as pure as a cribhouse whore. We don't just borrow words;
on occasion, English has pursued other languages down alleyways to beat
them unconscious and rifle their pockets for new vocabulary. - James Nicoll

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-03-22 Thread Yeb Havinga

Matthew Wakeling wrote:

On Sat, 20 Mar 2010, Yeb Havinga wrote:
The gist virtual pages would then match more the original blocksizes 
that
were used in Guttman's R-tree paper (first google result, then figure 
4.5).
Since the nature/characteristics of the underlying datatypes and keys 
is not
changed, it might be that with the disk pages getting larger, gist 
indexing

has therefore become unexpectedly inefficient.


Yes, that is certainly a factor. For example, the page size for bioseg 
which we use here is 130 entries, which is very excessive, and doesn't 
allow very deep trees. On the other hand, it means that a single disc 
seek performs quite a lot of work.
Yeah, I only did in-memory fitting tests and wondered about increased 
io's. However I bet that even for bigger than ram db's, the benefit of 
having to fan out to less pages still outweighs the over-general non 
leaf nodes and might still result in less disk io's. I redid some 
earlier benchmarking with other datatypes with a 1kB block size and also 
multicolumn gist and the multicolumn variant had an ever greater benefit 
than the single column indexes, both equality and range scans. (Like 
execution times down to 20% of original). If gist is important to you, I 
really recommend doing a test with 1kB blocks.


regards,
Yeb Havinga

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] GiST index performance

2010-03-22 Thread Matthew Wakeling

On Mon, 22 Mar 2010, Yeb Havinga wrote:
Yes, that is certainly a factor. For example, the page size for bioseg 
which we use here is 130 entries, which is very excessive, and doesn't 
allow very deep trees. On the other hand, it means that a single disc seek 
performs quite a lot of work.


Yeah, I only did in-memory fitting tests and wondered about increased io's. 
However I bet that even for bigger than ram db's, the benefit of having to 
fan out to less pages still outweighs the over-general non leaf nodes and 
might still result in less disk io's. I redid some earlier benchmarking with 
other datatypes with a 1kB block size and also multicolumn gist and the 
multicolumn variant had an ever greater benefit than the single column 
indexes, both equality and range scans. (Like execution times down to 20% of 
original). If gist is important to you, I really recommend doing a test with 
1kB blocks.


Purely from a disc seek count point of view, assuming an infinite CPU 
speed and infinite disc transfer rate, the larger the index pages the 
better. The number of seeks per fetch will be equivalent to the depth of 
the tree.


If you take disc transfer rate into account, the break-even point is when 
you spend an equal time transferring as seeking, which places the page 
size around 500kB on a modern disc, assuming RAID stripe alignment doesn't 
make that into two seeks instead of one.


However, for efficient CPU usage, the ideal page size for a tree index is 
much smaller - between two and ten entries, depending on the type of the 
data.


There may be some mileage in reorganising indexes into a two-level system. 
That is, have an index format where the page size is 512kB or similar, but 
each page is internally a CPU-efficient tree itself.


However, this is beyond the scope of the problem of speeding up gist.

Matthew

--
If you let your happiness depend upon how somebody else feels about you,
now you have to control how somebody else feels about you. -- Abraham Hicks

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] mysql to postgresql, performance questions

2010-03-22 Thread Dave Crooke
Absolutely ...

- for fixed size rows with a lot of small updates, Oracle wins. BTW, as of
Oracle 9 they're called UNDO tablesapces
- for lots of transactions and feely mixing transactions of all sizes, MVCC
tables (Postgres) wins
- if you just want a structured filesystem and don't have integrity
requirements or a lot of updates, MyISAM wins

For our app, Oracle would be the best, but it isn't strictly necessary so
Postgres wins on price ;-)

Cheers
Dave

On Mon, Mar 22, 2010 at 6:15 AM, Pierre C li...@peufeu.com wrote:

 On Mon, 22 Mar 2010 12:14:51 +0100, Merlin Moncure mmonc...@gmail.com
 wrote:

  On Sun, Mar 21, 2010 at 9:14 PM, Dave Crooke dcro...@gmail.com wrote:

 Note however that Oracle offeres full transactionality and does in place
 row
 updates. There is more than one way to do it.


 There's no free lunch.


 MVCC : VACUUM
 Oracle : Rollback Segments
 MyISAM : no concurrency/transactions

 It's all about which compromise suits you ;)



Re: [PERFORM] Building multiple indexes concurrently

2010-03-22 Thread Justin Pitts
It seems to me that a separate partition / tablespace would be a much simpler 
approach.
On Mar 17, 2010, at 5:18 PM, Hannu Krosing wrote:

 On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
 Alvaro Herrera wrote:
 Andres Freund escribió:
 
 
 I find it way much easier to believe such issues exist on a tables in 
 constrast to indexes. The likelihood to get sequential accesses on an 
 index is 
 small enough on a big table to make it unlikely to matter much.
 
 
 Vacuum walks indexes sequentially, for one.
 
 
 That and index-based range scans were the main two use-cases I was 
 concerned would be degraded by interleaving index builds, compared with 
 doing them in succession. 
 
 I guess that tweaking file systems to allocate in bigger chunks help
 here ? I know that xfs can be tuned in that regard, but how about other
 common file systems like ext3 ?
 
 - 
 Hannu Krosing   http://www.2ndQuadrant.com
 PostgreSQL Scalability and Availability 
   Services, Consulting and Training
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-22 Thread Hannu Krosing
On Thu, 2010-03-18 at 16:12 -0400, Justin Pitts wrote:
 It seems to me that a separate partition / tablespace would be a much simpler 
 approach.

Do you mean a separate partition/ tablespace for _each_ index built
concurrently ?

 On Mar 17, 2010, at 5:18 PM, Hannu Krosing wrote:
 
  On Wed, 2010-03-17 at 16:49 -0400, Greg Smith wrote:
  Alvaro Herrera wrote:
  Andres Freund escribió:
  
  
  I find it way much easier to believe such issues exist on a tables in 
  constrast to indexes. The likelihood to get sequential accesses on an 
  index is 
  small enough on a big table to make it unlikely to matter much.
  
  
  Vacuum walks indexes sequentially, for one.
  
  
  That and index-based range scans were the main two use-cases I was 
  concerned would be degraded by interleaving index builds, compared with 
  doing them in succession. 
  
  I guess that tweaking file systems to allocate in bigger chunks help
  here ? I know that xfs can be tuned in that regard, but how about other
  common file systems like ext3 ?
  
  - 
  Hannu Krosing   http://www.2ndQuadrant.com
  PostgreSQL Scalability and Availability 
Services, Consulting and Training
  
  
  
  -- 
  Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
  To make changes to your subscription:
  http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Building multiple indexes concurrently

2010-03-22 Thread Justin Pitts
Yes.
On Mar 18, 2010, at 5:20 PM, Hannu Krosing wrote:

 On Thu, 2010-03-18 at 16:12 -0400, Justin Pitts wrote:
 It seems to me that a separate partition / tablespace would be a much 
 simpler approach.
 
 Do you mean a separate partition/ tablespace for _each_ index built
 concurrently ?


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] too complex query plan for not exists query and multicolumn indexes

2010-03-22 Thread Justin Graf
 Message from Corin wakath...@gmail.com at 03-19-2010 01:26:35 PM
--

***snip 
The intention of the query is to find rows with no partner row. The
offset and limit are just to ignore the time needed to send the result
to the client.
---
I don't understand the point of OFFSET,  limit will accomplish the same
thing,  PG will still execute the query the only difference is PG will skip
the step to count through the first million rows before returning a record.

---

SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id) OFFSET 100
LIMIT 1

Mysql uses this query plan:
1 PRIMARY f1 index  NULL user_ref 8  NULL
2818860 Using where; Using index
2 DEPENDENT SUBQUERY f2 ref user_ref user_ref 8
f1.ref_id,f1.user_id 1 Using index
Time: 9.8s

---
if that's a query explain in Mysql its worthless. The above  has no
information, does not tell us how long each step is taking, let alone what
it was thinking it would take to make the query work .
--

Postgre uses this query plan:
Limit  (cost=66681.50..66681.50 rows=1 width=139) (actual
time=7413.489..7413.489 rows=1 loops=1)
  -  Merge Anti Join  (cost=40520.17..66681.50 rows=367793 width=139)
(actual time=3705.078..7344.256 rows=101 loops=1)
*Merge Cond: ((f1.user_id = f2.ref_id) AND (f1.ref_id =
f2.user_id))*
-  Index Scan using user_ref on friends f1
(cost=0.00..26097.86 rows=2818347 width=139) (actual
time=0.093..1222.592 rows=1917360 loops=1)
-  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
(actual time=3704.977..5043.347 rows=1990148 loops=1)
  -  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
(actual time=3704.970..4710.703 rows=1990148 loops=1)
Sort Key: f2.ref_id, f2.user_id
Sort Method:  external merge  Disk: 49576kB
-  Seq Scan on friends f2  (cost=0.00..18143.18
rows=2818347 width=8) (actual time=0.015..508.797 rows=2818347 loops=1)
Total runtime: 7422.516 ms

---
We can see each step PG takes and make inform decisions what part of the
query is slow .  We can See the Sorting the rows takes most of the time
---

It's already faster, which is great, but I wonder why the query plan is
that complex.


Its not complex it showing you all the steps which Mysql is not showing you


I read in the pqsql docs that using a multicolumn key is almost never
needed and only a waste of cpu/space. So I dropped the multicolumn key
and added to separate keys instead:


Where is that at???  I don't recall reading that.  PG will only use indexes
that match exactly where/join conditions.



CREATE INDEX ref1 ON friends USING btree (ref_id);
CREATE INDEX user1 ON friends USING btree (user_id);

New query plan:
Limit  (cost=70345.04..70345.04 rows=1 width=139) (actual
time=43541.709..43541.709 rows=1 loops=1)
  -  Merge Anti Join  (cost=40520.27..70345.04 rows=367793 width=139)
(actual time=3356.694..43467.818 rows=101 loops=1)
   * Merge Cond: (f1.user_id = f2.ref_id)
Join Filter: (f1.ref_id = f2.user_id)
---
*take note the merge has changed.  it now joins on f1.user_id=f2.ref_id then
filters the results down by using the AND condition. Put the index back *
---
*-  Index Scan using user1 on friends f1  (cost=0.00..26059.79
rows=2818347 width=139) (actual time=0.031..1246.668 rows=1917365 loops=1)
-  Materialize  (cost=40520.17..40555.40 rows=2818347 width=8)
(actual time=3356.615..14941.405* rows=130503729* loops=1)
---
take note look at what happened here. this because the of Join is not
limited as it was before.
did you run this query against Mysql with the same kind of indexes???
-
  -  Sort  (cost=40520.17..40527.21 rows=2818347 width=8)
(actual time=3356.611..4127.435 rows=1990160 loops=1)
Sort Key: f2.ref_id
Sort Method:  external merge  Disk: 49560kB
-  Seq Scan on friends f2  (cost=0.00..18143.18
rows=2818347 width=8) (actual time=0.012..496.174 rows=2818347 loops=1)
Total runtime: 43550.187 ms

I also wonder why it makes a difference when adding a LIMIT clause to
the subselect in an EXISTS subselect. Shouldn't pgsql always stop after
finding the a row? In mysql is makes no difference in speed, pgsql even
get's slower when adding a LIMIT to the EXISTS subselect (I hoped it
would get faster?!).



Limits occur last after doing all the major work  is done



SELECT * FROM friends AS f1 WHERE NOT EXISTS (SELECT 1 FROM friends AS
f2 WHERE f1.user_id=f2.ref_id AND f1.ref_id=f2.user_id LIMIT 1) OFFSET
100 LIMIT 1

Limit  (cost=6389166.19..6389172.58 rows=1 width=139) (actual
time=54540.356..54540.356 rows=1 loops=1)
  -  Seq Scan on friends f1  (cost=0.00..9003446.87 rows=1409174
width=139) (actual time=0.511..54460.006 rows=101 loops=1)
Filter: 

Re: [PERFORM] Block at a time ...

2010-03-22 Thread Scott Carey

On Mar 17, 2010, at 9:41 AM, Craig James wrote:

 On 3/17/10 2:52 AM, Greg Stark wrote:
 On Wed, Mar 17, 2010 at 7:32 AM, Pierre Cli...@peufeu.com  wrote:
 I was thinking in something like that, except that the factor I'd use
 would be something like 50% or 100% of current size, capped at (say) 1 GB.
 
 This turns out to be a bad idea. One of the first thing Oracle DBAs
 are told to do is change this default setting to allocate some
 reasonably large fixed size rather than scaling upwards.
 
 This might be mostly due to Oracle's extent-based space management but
 I'm not so sure. Recall that the filesystem is probably doing some
 rounding itself. If you allocate 120kB it's probably allocating 128kB
 itself anyways. Having two layers rounding up will result in odd
 behaviour.
 
 In any case I was planning on doing this a while back. Then I ran some
 experiments and couldn't actually demonstrate any problem. ext2 seems
 to do a perfectly reasonable job of avoiding this problem. All the
 files were mostly large contiguous blocks after running some tests --
 IIRC running pgbench.
 
 This is one of the more-or-less solved problems in Unix/Linux.  Ext* file 
 systems have a reserve usually of 10% of the disk space that nobody except 
 root can use.  It's not for root, it's because with 10% of the disk free, you 
 can almost always do a decent job of allocating contiguous blocks and get 
 good performance.  Unless Postgres has some weird problem that Linux has 
 never seen before (and that wouldn't be unprecedented...), there's probably 
 no need to fool with file-allocation strategies.
 
 Craig
 

Its fairly easy to break.  Just do a parallel import with say, 16 concurrent 
tables being written to at once.  Result?  Fragmented tables.

 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-22 Thread Christian Brink
I previously posted 'forcing index scan on query produces 16x faster' 
and it seemed that the consensus was that 8.0.x series had an issue. I 
have upgraded to the highest practical version for our distro. But we 
seem to have the same issue.


If I force the 'enable_seqscan' off our actual time is 9ms where if 
'enable_seqscan' is on the performance is 2200ms ( the good news is the 
Seq Scan query on 8.2 is 1/2 the time of the 8.0 query ).



The paste is below - I reloaded the table from scratch after the 8.2 
upgrade. Then I ran a 'REINDEX DATABASE' and a 'VACUUM ANALYZE' (then 
ran some queries and reran the vac analyze).




postream= SELECT version();
 version
-
 PostgreSQL 8.2.11 on i386-redhat-linux-gnu, compiled by GCC gcc (GCC) 
4.1.2 20070925 (Red Hat 4.1.2-33)

(1 row)

postream= SET enable_seqscan = false;
SET
postream= EXPLAIN ANALYZE
postream- SELECT si.group1_id as name, sum(si.qty) as count, 
sum(si.amt) as amt

postream-FROM salesitems si, sales s, sysstrings
postream-   WHERE si.id = s.id
postream-AND si.group1_id != ''
postream-AND si.group1_id IS NOT NULL
postream-AND NOT si.void
postream-AND NOT s.void
postream-AND NOT s.suspended
postream-AND s.tranzdate = (cast('2010-02-15' as date) + 
cast(sysstrings.data as time))
postream-AND s.tranzdate  ((cast('2010-02-15' as date) + 1) + 
cast(sysstrings.data as time))

postream-AND sysstrings.id='net/Console/Employee/Day End Time'
postream- GROUP BY name;

QUERY PLAN
--
 HashAggregate  (cost=426973.65..426973.86 rows=14 width=35) (actual 
time=9.424..9.438 rows=12 loops=1)
   -  Nested Loop  (cost=0.01..426245.31 rows=97113 width=35) (actual 
time=0.653..6.954 rows=894 loops=1)
 -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
(actual time=0.595..2.150 rows=225 loops=1)
   -  Index Scan using sysstrings_pkey on sysstrings  
(cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1)
 Index Cond: (id = 'net/Console/Employee/Day End 
Time'::text)
   -  Index Scan using sales_tranzdate_index on sales s  
(cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 
rows=225 loops=1)
 Index Cond: ((s.tranzdate = ('2010-02-15'::date + 
(sysstrings.data)::time without time zone)) AND (s.tranzdate  
('2010-02-16'::date + (sysstrings.data)::time without time zone)))

 Filter: ((NOT void) AND (NOT suspended))
 -  Index Scan using salesitems_pkey on salesitems si  
(cost=0.00..18.54 rows=25 width=39) (actual time=0.007..0.013 rows=4 
loops=225)

   Index Cond: (si.id = s.id)
   Filter: (((group1_id)::text  ''::text) AND (group1_id 
IS NOT NULL) AND (NOT void))

 Total runtime: 9.585 ms
(12 rows)

postream= SET enable_seqscan = true;
SET
postream= EXPLAIN ANALYZE
postream- SELECT si.group1_id as name, sum(si.qty) as count, 
sum(si.amt) as amt

postream-FROM salesitems si, sales s, sysstrings
postream-   WHERE si.id = s.id
postream-AND si.group1_id != ''
postream-AND si.group1_id IS NOT NULL
postream-AND NOT si.void
postream-AND NOT s.void
postream-AND NOT s.suspended
postream-AND s.tranzdate = (cast('2010-02-15' as date) + 
cast(sysstrings.data as time))
postream-AND s.tranzdate  ((cast('2010-02-15' as date) + 1) + 
cast(sysstrings.data as time))

postream-AND sysstrings.id='net/Console/Employee/Day End Time'
postream- GROUP BY name;

   QUERY PLAN

 HashAggregate  (cost=38315.09..38315.30 rows=14 width=35) (actual 
time=2206.531..2206.545 rows=12 loops=1)
   -  Hash Join  (cost=2697.55..37586.74 rows=97113 width=35) (actual 
time=2128.070..2204.048 rows=894 loops=1)

 Hash Cond: (si.id = s.id)
 -  Seq Scan on salesitems si  (cost=0.00..30578.15 
rows=890646 width=39) (actual time=0.047..1487.688 rows=901281 loops=1)
   Filter: (((group1_id)::text  ''::text) AND (group1_id 
IS NOT NULL) AND (NOT void))
 -  Hash  (cost=2416.59..2416.59 rows=22477 width=4) (actual 
time=1.823..1.823 rows=225 loops=1)
   -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
(actual time=0.477..1.592 rows=225 loops=1)
 -  Index Scan using 

Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-22 Thread Tom Lane
Christian Brink cbr...@r-stream.com writes:
   -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
 (actual time=0.595..2.150 rows=225 loops=1)
 -  Index Scan using sysstrings_pkey on sysstrings  
 (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1 loops=1)
   Index Cond: (id = 'net/Console/Employee/Day End 
 Time'::text)
 -  Index Scan using sales_tranzdate_index on sales s  
 (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 
 rows=225 loops=1)
   Index Cond: ((s.tranzdate = ('2010-02-15'::date + 
 (sysstrings.data)::time without time zone)) AND (s.tranzdate  
 ('2010-02-16'::date + (sysstrings.data)::time without time zone)))
   Filter: ((NOT void) AND (NOT suspended))

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-22 Thread Pierre C


This is one of the more-or-less solved problems in Unix/Linux.  Ext*  
file systems have a reserve usually of 10% of the disk space that  
nobody except root can use.  It's not for root, it's because with 10%  
of the disk free, you can almost always do a decent job of allocating  
contiguous blocks and get good performance.  Unless Postgres has some  
weird problem that Linux has never seen before (and that wouldn't be  
unprecedented...), there's probably no need to fool with  
file-allocation strategies.


Craig


Its fairly easy to break.  Just do a parallel import with say, 16  
concurrent tables being written to at once.  Result?  Fragmented tables.


Delayed allocation (ext4, XFS) helps a lot for concurrent writing at a  
medium-high rate (a few megabytes per second and up) when lots of data can  
sit in the cache and be flushed/allocated as big contiguous chunks. I'm  
pretty sure ext4/XFS would pass your parallel import test.


However if you have files like tables (and indexes) or logs that grow  
slowly over time (something like a few megabytes per hour or less), after  
a few days/weeks/months, horrible fragmentation is an almost guaranteed  
result on many filesystems (NTFS being perhaps the absolute worst).



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-22 Thread Dave Crooke
This is why pre-allocation is a good idea if you have the space 

Tom, what about a really simple command in a forthcoming release of PG that
would just preformat a 1GB file at a time? This is what I've always done
scripted with Oracle (ALTER TABLESPACE foo ADD DATAFILE ) rather than
relying on its autoextender when performance has been a concern.

Cheers
Dave

On Mon, Mar 22, 2010 at 3:55 PM, Pierre C li...@peufeu.com wrote:


  This is one of the more-or-less solved problems in Unix/Linux.  Ext* file
 systems have a reserve usually of 10% of the disk space that nobody except
 root can use.  It's not for root, it's because with 10% of the disk free,
 you can almost always do a decent job of allocating contiguous blocks and
 get good performance.  Unless Postgres has some weird problem that Linux has
 never seen before (and that wouldn't be unprecedented...), there's probably
 no need to fool with file-allocation strategies.

 Craig


 Its fairly easy to break.  Just do a parallel import with say, 16
 concurrent tables being written to at once.  Result?  Fragmented tables.


 Delayed allocation (ext4, XFS) helps a lot for concurrent writing at a
 medium-high rate (a few megabytes per second and up) when lots of data can
 sit in the cache and be flushed/allocated as big contiguous chunks. I'm
 pretty sure ext4/XFS would pass your parallel import test.

 However if you have files like tables (and indexes) or logs that grow
 slowly over time (something like a few megabytes per hour or less), after a
 few days/weeks/months, horrible fragmentation is an almost guaranteed result
 on many filesystems (NTFS being perhaps the absolute worst).



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Block at a time ...

2010-03-22 Thread Greg Stark
On Mon, Mar 22, 2010 at 6:47 PM, Scott Carey sc...@richrelevance.com wrote:
 Its fairly easy to break.  Just do a parallel import with say, 16 concurrent 
 tables being written to at once.  Result?  Fragmented tables.


Fwiw I did do some investigation about this at one point and could not
demonstrate any significant fragmentation. But that was on Linux --
different filesystem implementations would have different success
rates. And there could be other factors as well such as how full the
fileystem is or how old it is.

-- 
greg

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] default_statistics_target

2010-03-22 Thread Carlo Stonebanks

HI Greg,


Thanks for the insight. How much more of a server's resources will be 
consumed by an ANALYZE with default_statistics_target = 100?


We have two environments hosting the same data. One is our live server, 
which serves the web site, and this hosts our published data, not more than 
200 - 300 tables.


PRODUCTION: The data warehouse consisting of our published data, as well as 
our input resources which are transformed via ETL processes into our 
published data. It is these input resources which currently consist of 
about 8,000 tables and growing. Don't really require analysis, as they are 
typically run once in a linear read when importing.they are typically read 
linearly, and rarely more than once. They are kept for auditing and 
rollbacks.


LIVE: Hosts just the published data, copied over from the production server. 
Because the data does not get written to very often, older stats from 
ANALYZE are likely to still be valid. Our concern is that with the older 
setting of default_statistics_target = 10 it has not gone deep enough into 
these tables (numbering in the millios of rows) to really represent the data 
distribution properly.


Given that it looks like you're running 8.3 from past messages I've seen 
from you, I'd also be concerned that you've overrun your max_fsm_pages, so 
that VACUUM is growing increasing ineffective for you, and that's 
contributing to your headache.


Below are the config values of our production server (those not listed are 
those stubbed out) . Sadly, in an attempt to improve the server's 
performance, someone wiped out all of the changes I had made to date, along 
with comments indicating previous values, reason for the change, etc. What 
do they call that again? Oh, yeah. Documentation.


# CENTOS 5.4
# Linux mdx_octo 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 
x86_64 x86_64 GNU/Linux

# pgsql 8.3.10, 8 CPUs, 48GB RAM
# RAID 10, 4 Disks
autovacuum = on   # Enable autovacuum subprocess?  'on'
autovacuum_analyze_scale_factor = 0.05 # fraction of table size before 
analyze

autovacuum_analyze_threshold = 1000
autovacuum_naptime = 1min  # time between autovacuum runs
autovacuum_vacuum_cost_delay =  50 # default vacuum cost delay for
autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum
autovacuum_vacuum_threshold = 1000
bgwriter_lru_maxpages = 100  # 0-1000 max buffers written/round
checkpoint_segments = 128 # in logfile segments, min 1, 16MB each
checkpoint_warning = 290s  # 0 is off
client_min_messages =  debug1 # values in order of decreasing detail:
datestyle = 'iso, mdy'
default_statistics_target = 250 # range 1-1000
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'   # locale for system error message
lc_monetary = 'C'   # locale for monetary formatting
lc_numeric = 'C'   # locale for number formatting
lc_time = 'C'# locale for time formatting
listen_addresses = '*'  # what IP address(es) to listen on;
log_destination = 'stderr'  # Valid values are combinations of
log_error_verbosity =  verbose  # terse, default, or verbose messages
log_line_prefix = '%t '   # special values:
log_min_error_statement =  debug1 # values in order of decreasing detail:
log_min_messages = debug1  # values in order of decreasing detail:
logging_collector = on  # Enable capturing of stderr and csvlog
maintenance_work_mem = 256MB
max_connections = 100   # (change requires restart)
max_fsm_relations = 1000  # min 100, ~70 bytes each
max_locks_per_transaction = 128  # min 10
port = 5432# (change requires restart)
shared_buffers = 4096MB
shared_preload_libraries = '$libdir/plugins/plugin_debugger.so'  # (change 
requires restart)

track_counts = on
vacuum_cost_delay = 5   # 0-1000 milliseconds
wal_buffers = 4MB
wal_sync_method = open_sync
work_mem = 64MB

Carlo


Greg Smith g...@2ndquadrant.com wrote in message 
news:4b9e33af.2020...@2ndquadrant.com...

Carlo Stonebanks wrote:
The whole topic of messing with stats makes my head spin but I am 
concerned about some horridly performing queries that have had bad rows 
estimates and others which always choose seq scans when indexes are 
available. Reading up on how to improve planner estimates, I have seen 
references to default_statistics_target being changed from the default of 
10 to 100.


Our DB is large, with thousands of tables


Stop right there for a second.  Are you sure autovacuum is working well 
here?  With thousands of tables, it wouldn't surprise me to discover your 
planner estimates are wrong because there hasn't been a recent enough 
ANALYZE on the relevant tables.  If you haven't already, take a look at 
pg_stat_user_tables and make sure that tables that have the bad estimates 
have actually been analyzed recently.  A look at the live/dead row counts 
there should be helpful as well.


If all that's recent, but you're still getting bad estimates, only then 
would I suggest trying an increase to default_statistics_target.  In the 
situation where 

[PERFORM] Got that new server, now it's time for config!

2010-03-22 Thread Carlo Stonebanks

Here we go again!

Based on recommendations made here, I got my client to migrate off of our 
Windows 2003 Server x64 box to a new Linux box.


# CENTOS 5.4
# Linux mdx_octo 2.6.18-164.el5 #1 SMP Thu Sep 3 03:28:30 EDT 2009 x86_64 
x86_64 x86_64 GNU/Linux

# pgsql 8.3.10, 8 CPUs, 48GB RAM
# RAID 10, 4 Disks

Below are the config values of this production server (those not listed are 
those stubbed out) . Sadly, in an attempt to improve the server's 
performance, someone wiped out all of the changes I had made to date, along 
with comments indicating previous values, reason for the change, etc.


This is a data warehouse production server, used for ETL. 500 GB database, 
approx 8000 tables and growing, although the vast majority of them are the 
original import resource tables and are rarely accessed. The actual core 
data is about 200 tables, consisting of millions of rows. Data importing and 
content management is done via a 15,000 line TCL import scripts and 
application base (as this is ETL with fuzzy logic, not just COPY... FROM...) 
.


So, we have the hardware, we have the O/S - but I think our config leaves 
much to be desired. Typically, our planner makes nad decisions, picking seq 
scan over index scan, where index scan has a better result.


Can anyone see any obvious faults?

Carlo

autovacuum = on
autovacuum_analyze_scale_factor = 0.05
autovacuum_analyze_threshold = 1000
autovacuum_naptime = 1min
autovacuum_vacuum_cost_delay =  50
autovacuum_vacuum_scale_factor = 0.2
autovacuum_vacuum_threshold = 1000
bgwriter_lru_maxpages = 100
checkpoint_segments = 128
checkpoint_warning = 290s
client_min_messages =  debug1
datestyle = 'iso, mdy'
default_statistics_target = 250
default_text_search_config = 'pg_catalog.english'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
listen_addresses = '*'
log_destination = 'stderr'
log_error_verbosity =  verbose
log_line_prefix = '%t '
log_min_error_statement =  debug1
log_min_messages = debug1
logging_collector = on
maintenance_work_mem = 256MB
max_connections = 100
max_fsm_relations = 1000
max_locks_per_transaction = 128
port = 5432
shared_buffers = 4096MB
shared_preload_libraries = '$libdir/plugins/plugin_debugger.so'
track_counts = on
vacuum_cost_delay = 5
wal_buffers = 4MB
wal_sync_method = open_sync
work_mem = 64MB 



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan on query produces faster

2010-03-22 Thread Eger, Patrick
Not to beat a dead horse excessively, but I think the below is a pretty
good argument for index hints? I know the general optimizer wants to be
highest priority (I very much agree with this), but I think there are
fully legitimate cases like the below. Asking the user to rewrite the
query in an unnatural way (or to change optimizer params that may work
for 99% of queries) is, IMO not a good thing. Given that the postgres
optimizer can never be perfect (as it will never have the perfect
knowledge necessary for a perfect decision), I would request that index
hints be reconsidered (for 9.0?). I know many users (myself included)
are doing this in a very rudimentary way by disabling particular access
types on a per session basis set enable_seqscan=off; set
enable_hashjoin=off; QUERY set enable_seqscan=on; set
enable_hashjoin=on;... I'd hack up a patch if I had the time at least
=)

Best regards, Patrick

-Original Message-
From: pgsql-performance-ow...@postgresql.org
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tom Lane
Sent: Monday, March 22, 2010 12:22 PM
To: Christian Brink
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] PostgreSQL upgraded to 8.2 but forcing index scan
on query produces faster 

Christian Brink cbr...@r-stream.com writes:
   -  Nested Loop  (cost=0.01..2416.59 rows=22477 width=4) 
 (actual time=0.595..2.150 rows=225 loops=1)
 -  Index Scan using sysstrings_pkey on sysstrings  
 (cost=0.00..8.27 rows=1 width=182) (actual time=0.110..0.112 rows=1
loops=1)
   Index Cond: (id = 'net/Console/Employee/Day End 
 Time'::text)
 -  Index Scan using sales_tranzdate_index on sales s

 (cost=0.01..1846.40 rows=22477 width=12) (actual time=0.454..1.687 
 rows=225 loops=1)
   Index Cond: ((s.tranzdate = ('2010-02-15'::date
+ 
 (sysstrings.data)::time without time zone)) AND (s.tranzdate  
 ('2010-02-16'::date + (sysstrings.data)::time without time zone)))
   Filter: ((NOT void) AND (NOT suspended))

The fundamental reason why you're getting a bad plan choice is the
factor-of-100 estimation error here.  I'm not sure you can do a whole
lot about that without rethinking the query --- in particular I would
suggest trying to get rid of the non-constant range bounds.  You're
apparently already plugging in an external variable for the date,
so maybe you could handle the time of day similarly instead of joining
to sysstrings for it.

regards, tom lane

-- 
Sent via pgsql-performance mailing list
(pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Got that new server, now it's time for config!

2010-03-22 Thread Dan Harris

On 3/22/10 4:36 PM, Carlo Stonebanks wrote:

Here we go again!

Can anyone see any obvious faults?

Carlo

maintenance_work_mem = 256MB
I'm not sure how large your individual tables are, but you might want to 
bump this value up to get faster vacuums.

max_fsm_relations = 1000

I think this will definitely need to be increased

work_mem = 64MB
Most data warehousing loads I can think of will need more work_mem, but 
this depends on how large of data sets you are planning to sort.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Block at a time ...

2010-03-22 Thread Craig James

On 3/22/10 11:47 AM, Scott Carey wrote:


On Mar 17, 2010, at 9:41 AM, Craig James wrote:


On 3/17/10 2:52 AM, Greg Stark wrote:

On Wed, Mar 17, 2010 at 7:32 AM, Pierre Cli...@peufeu.com   wrote:

I was thinking in something like that, except that the factor I'd use
would be something like 50% or 100% of current size, capped at (say) 1 GB.


This turns out to be a bad idea. One of the first thing Oracle DBAs
are told to do is change this default setting to allocate some
reasonably large fixed size rather than scaling upwards.

This might be mostly due to Oracle's extent-based space management but
I'm not so sure. Recall that the filesystem is probably doing some
rounding itself. If you allocate 120kB it's probably allocating 128kB
itself anyways. Having two layers rounding up will result in odd
behaviour.

In any case I was planning on doing this a while back. Then I ran some
experiments and couldn't actually demonstrate any problem. ext2 seems
to do a perfectly reasonable job of avoiding this problem. All the
files were mostly large contiguous blocks after running some tests --
IIRC running pgbench.


This is one of the more-or-less solved problems in Unix/Linux.  Ext* file systems have a 
reserve usually of 10% of the disk space that nobody except root can use.  
It's not for root, it's because with 10% of the disk free, you can almost always do a 
decent job of allocating contiguous blocks and get good performance.  Unless Postgres has 
some weird problem that Linux has never seen before (and that wouldn't be 
unprecedented...), there's probably no need to fool with file-allocation strategies.

Craig



Its fairly easy to break.  Just do a parallel import with say, 16 concurrent 
tables being written to at once.  Result?  Fragmented tables.


Is this from real-life experience?  With fragmentation, there's a point of diminishing 
return.  A couple head-seeks now and then hardly matter.  My recollection is that even 
when there are lots of concurrent processes running that are all making files larger and 
larger, the Linux file system still can do a pretty good job of allocating 
mostly-contiguous space.  It doesn't just dumbly allocate from some list, but rather 
tries to allocate in a way that results in pretty good contiguousness (if 
that's a word).

On the other hand, this is just from reading discussion groups like this one 
over the last few decades, I haven't tried it...

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Got that new server, now it's time for config!

2010-03-22 Thread Greg Smith

Carlo Stonebanks wrote:
So, we have the hardware, we have the O/S - but I think our config 
leaves much to be desired. Typically, our planner makes nad decisions, 
picking seq scan over index scan, where index scan has a better result.




You're not setting effective_cache_size, so I wouldn't expect it to ever 
choose an index scan given the size of your data set.  The planner 
thinks that anything bigger than 128MB isn't likely to fit in RAM by 
default, which favors sequential scans.  That parameter should probably 
be 24GB on your server, so it's off by more than two orders of magnitude.



wal_sync_method = open_sync


This is a scary setting to be playing with on Linux when using ext3 
filesystems due to general kernel bugginess in this area.  See 
http://archives.postgresql.org/pgsql-hackers/2007-10/msg01310.php for an 
example.  I wouldn't change this from the default in your position if 
using that filesystem.


I'd drastically increase effective_cache_size, put wal_sync_method back 
to the default, and then see how things go for a bit before tweaking 
anything else.  Nothing else jumped out as bad in your configuration 
besides the extremely high logging levels, haven't looked at it that 
carefully yet though.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance