Re: [PERFORM] sql-bench

2006-09-13 Thread yoav x
You can use the test with InnoDB by giving the --create-options=engine=innodb 
option in the
command line. Even with InnoDB, in some specific tests PG looks very bad 
compared to InnoDB.

--- Tom Lane <[EMAIL PROTECTED]> wrote:

> yoav x <[EMAIL PROTECTED]> writes:
> > Are there any tuning parameters that can be changed to speed these
> > queries? Or are these queries especially tuned to show MySQL's
> > stgrenths?
> 
> The latter.  I've ranted about this before --- there are both obvious
> and subtle biases in that benchmark.  The last time I spent any time
> with it, I ended up testing with these nondefault settings:
> 
> shared_buffers = 1  
> work_mem = 10   
> maintenance_work_mem = 10   
> fsync = false   
> checkpoint_segments = 30
> max_locks_per_transaction = 128
> 
> (fsync = false is pretty bogus for production purposes, but if you're
> comparing to mysql using myisam tables, I think it's a reasonably fair
> basis for comparison, as myisam is certainly not crash-safe.  It'd be
> interesting to see what mysql's performance looks like on this test
> using innodb tables, which should be compared against fsync = true
> ... but I don't know how to change it to get all the tables to be
> innodb.)
> 
> Also, on some of the tests it makes a material difference whether you
> are using C locale or some other one --- C is faster.  And make sure you
> have a recent version of DBD::Pg --- a year or two back I recall seeing
> the perl test program eating more CPU than the backend in some of these
> tests, because of inefficiencies in DBD::Pg.
> 
> IIRC, with these settings PG 8.0 seemed to be about half the speed of
> mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the
> truth for tests of this nature, ie, single query stream of fairly simple
> queries.  If you try concurrent-update scenarios or something that
> stresses planning ability you may arrive at different results though.
> I have not retested with more recent versions.
> 
>   regards, tom lane
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] sql-bench

2006-09-13 Thread Merlin Moncure

On 9/14/06, Scott Marlowe <[EMAIL PROTECTED]> wrote:

On Wed, 2006-09-13 at 14:36, Merlin Moncure wrote:



> another small aside, I caught the sqlite people actually *detuning*
> postgresql for performance by turning stats_command_string=on in
> postgresql.conf.  The way it was portrayed it almost looked like
> cheating.  I busted them on it (go to
> http://www.sqlite.org/cvstrac/wiki?p=SpeedComparison and look for the
> remarks right below the results)

They're running autovacuum, which requires that, doesn't it?


actually, you are right, it was row_level, not command_string (i got
it right on their wiki, just not in the email here)...rmy bad on that.
still, they did not disclose it.

merlin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Tom Lane
"Joshua Marsh" <[EMAIL PROTECTED]> writes:
>>> On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote:
 Are the tables perhaps nearly in order by the dsiacctno fields?
>> 
>>> My assumption would be they are in exact order.  The text file I used
>>> in the COPY statement had them in order, so if COPY preserves that in
>>> the database, then it is in order.
>> 
>> Ah.  So the question is why the planner isn't noticing that.  What do
>> you see in the pg_stats view for the two dsiacctno fields --- the
>> correlation field in particular?

> Here are the results:
> data=# select tablename, attname, n_distinct, avg_width, correlation
> from pg_stats where tablename in ('view_505', 'r3s169') and attname =
> 'dsiacctno';
>  tablename |  attname  | n_distinct | avg_width | correlation
> ---+---++---+-
>  view_505  | dsiacctno | -1 |13 |-0.13912
>  r3s169| dsiacctno |  44156 |13 |   -0.126824
> (2 rows)

Wow, that correlation value is *way* away from order.  If they were
really in exact order by dsiacctno then I'd expect to see 1.0 in
that column.  Can you take another look at the tables and confirm
the ordering?  Does the correlation change if you do an ANALYZE on the
tables?  (Some small change is to be expected due to random sampling,
but this is way off.)

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Unsubscribe

2006-09-13 Thread Jamal Ghaffour




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Joshua Marsh

On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote:

"Joshua Marsh" <[EMAIL PROTECTED]> writes:
>> Are the tables perhaps nearly in order by the dsiacctno fields?

> My assumption would be they are in exact order.  The text file I used
> in the COPY statement had them in order, so if COPY preserves that in
> the database, then it is in order.

Ah.  So the question is why the planner isn't noticing that.  What do
you see in the pg_stats view for the two dsiacctno fields --- the
correlation field in particular?



Here are the results:
data=# select tablename, attname, n_distinct, avg_width, correlation
from pg_stats where tablename in ('view_505', 'r3s169') and attname =
'dsiacctno';
tablename |  attname  | n_distinct | avg_width | correlation
---+---++---+-
view_505  | dsiacctno | -1 |13 |-0.13912
r3s169| dsiacctno |  44156 |13 |   -0.126824
(2 rows)


Someone suggested CLUSTER to make sure they are in fact ordered, I can
try that to and let everyone know the results.


> The system has 8GB of ram and work_mem is set to 256MB.

Seems reasonable enough.  BTW, I don't think you've mentioned exactly
which PG version you're using?

   regards, tom lane



I am using 8.0.3.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] sql-bench

2006-09-13 Thread Tom Lane
Scott Marlowe <[EMAIL PROTECTED]> writes:
> On Wed, 2006-09-13 at 14:36, Merlin Moncure wrote:
>> another small aside, I caught the sqlite people actually *detuning*
>> postgresql for performance by turning stats_command_string=on in
>> postgresql.conf.

> They're running autovacuum, which requires that, doesn't it?

No, you're thinking of stats_row_level.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Tom Lane
"Joshua Marsh" <[EMAIL PROTECTED]> writes:
>> Are the tables perhaps nearly in order by the dsiacctno fields?

> My assumption would be they are in exact order.  The text file I used
> in the COPY statement had them in order, so if COPY preserves that in
> the database, then it is in order.

Ah.  So the question is why the planner isn't noticing that.  What do
you see in the pg_stats view for the two dsiacctno fields --- the
correlation field in particular?

> The system has 8GB of ram and work_mem is set to 256MB.

Seems reasonable enough.  BTW, I don't think you've mentioned exactly
which PG version you're using?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Poor performance on seq scan

2006-09-13 Thread Ivan Voras
[EMAIL PROTECTED] wrote:

> This board has Intel chipset. I cannot remember the exact type but it
> was not in the low end category.
> dmesg says:
> 
> 
> kernel: ad4: 152626MB  at ata2-master SATA150
> kernel: ad4: 152627MB  at ata3-master SATA150

There have been reported problems with ICH7 on FreeBSD mailing lists,
though I can't find any that affect performance.

> Components: 2
> Balance: round-robin
> Slice: 4096

See if changing balance algorithm to "split", and slice size to 8192 or
more, while keeping vfs.read_max to 16 or more helps your performance.

(e.g. gmirror configure -b split -s 8192 gm0)

Also, how is your file system mounted? (what does output from 'mount' say?)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Joshua Marsh

Are the tables perhaps nearly in order by the dsiacctno fields?
If that were the case, and the planner were missing it for some reason,
these results would be plausible.

BTW, what are you using for work_mem, and how does that compare to your
available RAM?

   regards, tom lane



My assumption would be they are in exact order.  The text file I used
in the COPY statement had them in order, so if COPY preserves that in
the database, then it is in order.

The system has 8GB of ram and work_mem is set to 256MB.

I'll see if I can't make time to run the sort-seqscan method so we can
have an exact time to work with.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] sql-bench

2006-09-13 Thread Scott Marlowe
On Wed, 2006-09-13 at 14:36, Merlin Moncure wrote:
> On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote:
> > IIRC, with these settings PG 8.0 seemed to be about half the speed of
> > mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the
> > truth for tests of this nature, ie, single query stream of fairly simple
> > queries.  If you try concurrent-update scenarios or something that
> > stresses planning ability you may arrive at different results though.
> > I have not retested with more recent versions.
> 
> if postgresql uses prepared statements for such queries, it will
> roughly tie mysql/myisam in raw query output on this type of load
> which also happens to be very easy to prepare...afaik mysql gets zero
> performance benefit from preparing statements   This is extremely
> trivial to test&confirm even on a shell script. [aside: will this
> still be the case if peter e's planner changes become reality?]
> 
> another cheater trick benchmarkers do to disparage postgresql is to
> not run analyze intentionally.  Basically all production postgresql
> systems of any size will run analyze on cron.
> 
> another small aside, I caught the sqlite people actually *detuning*
> postgresql for performance by turning stats_command_string=on in
> postgresql.conf.  The way it was portrayed it almost looked like
> cheating.  I busted them on it (go to
> http://www.sqlite.org/cvstrac/wiki?p=SpeedComparison and look for the
> remarks right below the results)

They're running autovacuum, which requires that, doesn't it?

I'd rather them be running autovacuum than not vacuuming / analyzing at
all.  And autovacuum is a pretty realistic setting for most databases (I
use it on my production machines.)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Tom Lane
"Joshua Marsh" <[EMAIL PROTECTED]> writes:
> I have a suspision that pgsql isn't tuned to properly deal with tables
> of this size.

Actually, it is.  Most of the planner complaints we get are from people
whose tables fit in memory and they find that the default planner
behavior doesn't apply real well to that case.  I find your
indexscan-is-faster-than-sort results pretty suspicious for large
tables.  Are the tables perhaps nearly in order by the dsiacctno fields?
If that were the case, and the planner were missing it for some reason,
these results would be plausible.

BTW, what are you using for work_mem, and how does that compare to your
available RAM?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Unsubscribe

2006-09-13 Thread Geoffrey

http://www.postgresql.org/community/lists/
http://www.postgresql.org/community/lists/subscribe

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
 - Benjamin Franklin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] sql-bench

2006-09-13 Thread Tom Lane
"Merlin Moncure" <[EMAIL PROTECTED]> writes:
> another small aside, I caught the sqlite people actually *detuning*
> postgresql for performance by turning stats_command_string=on in
> postgresql.conf.

Hm, well, that's not unreasonable if a comparable facility is enabled
in the other databases they're testing ... but it'll hardly matter in
8.2 anyway ;-)

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Poor performance on seq scan

2006-09-13 Thread Luke Lonergan

Lazlo,

>> Thank you for your suggestions. Looks like I need to buy SCSI disks.
> 
> Well before you go do that try the areca SATA raid card

Yes, by all means spend $200 and buy the Areca or 3Ware RAID card - it's a
simple switch out of the cables and you should be golden.

Again - you should only expect an increase in performance from 4-6 times
from what you are getting now unless you increase the number of disks.

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Joshua Marsh

On 9/13/06, Terje Elde <[EMAIL PROTECTED]> wrote:

Jeff Davis wrote:
> Is it overestimating the cost of using indexes or underestimating the
> cost of a seq scan, or both? Maybe explain with the 0.1 setting will
> help?
>

If enable_seqscan is off, and cost is still set to 1, it could
be that it's quite simply forcibly underestimating the cost of a seqscan
in this case.

If enable_secscan was off for the mentioned plan, it'd be interesting to
see if things would be saner with seqscans enabled, and a more
reasonable random page cost.  If more 'sane' values still produce the
desired plan, it might be better for other plans etc.

Terje




I turned enable_seqscan to off and got similar results.

random_age_cost at 4.0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
 QUERY PLAN
---
Merge Join  (cost=293737539.01..301430139.34 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Sort  (cost=127311593.00..127592577.62 rows=112393848 width=20)
Sort Key: v.dsiacctno
->  Seq Scan on view_505 v  (cost=1.00..104602114.48
rows=112393848 width=20)
  ->  Sort  (cost=166425946.01..167139021.69 rows=285230272 width=17)
Sort Key: s.dsiacctno
->  Seq Scan on r3s169 s  (cost=1.00..106873675.72
rows=285230272 width=17)
(8 rows)



random_page_cost at 3.0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
 QUERY PLAN
---
Merge Join  (cost=288303269.01..295995869.34 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Sort  (cost=125775957.00..126056941.62 rows=112393848 width=20)
Sort Key: v.dsiacctno
->  Seq Scan on view_505 v  (cost=1.00..104602114.48
rows=112393848 width=20)
  ->  Sort  (cost=162527312.01..163240387.69 rows=285230272 width=17)
Sort Key: s.dsiacctno
->  Seq Scan on r3s169 s  (cost=1.00..106873675.72
rows=285230272 width=17)
(8 rows)



random_age_cost ad 2,0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
 QUERY PLAN
---
Merge Join  (cost=282868999.01..290561599.34 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Sort  (cost=124240321.00..124521305.62 rows=112393848 width=20)
Sort Key: v.dsiacctno
->  Seq Scan on view_505 v  (cost=1.00..104602114.48
rows=112393848 width=20)
  ->  Sort  (cost=158628678.01..159341753.69 rows=285230272 width=17)
Sort Key: s.dsiacctno
->  Seq Scan on r3s169 s  (cost=1.00..106873675.72
rows=285230272 width=17)
(8 rows)



random_page_cost at 1.0:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
QUERY PLAN

Merge Join  (cost=154730044.01..274040257.41 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..111923570.63 rows=112393848 width=20)
  ->  Sort  (cost=154730044.01..155443119.69 rows=285230272 width=17)
Sort Key: s.dsiacctno
->  Seq Scan on r3s169 s  (cost=1.00..106873675.72
rows=285230272 width=17)
(6 rows)



random_page_cost ad 0.1:
data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
   QUERY PLAN
---
Merge Join  (cost=0.00..51808909.26 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..12755411.69 rows=112393848 width=20)
  ->  Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..32357747.90 rows=285230272 width=17)
(4 rows)

I have a suspision that pgsql isn't tuned to properly deal with tables
of this size.  Are there other things I should look at when dealing
with a database o

Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Terje Elde

Jeff Davis wrote:

Is it overestimating the cost of using indexes or underestimating the
cost of a seq scan, or both? Maybe explain with the 0.1 setting will
help?
  


If enable_seqscan is off, and cost is still set to 1, it could 
be that it's quite simply forcibly underestimating the cost of a seqscan 
in this case.


If enable_secscan was off for the mentioned plan, it'd be interesting to 
see if things would be saner with seqscans enabled, and a more 
reasonable random page cost.  If more 'sane' values still produce the 
desired plan, it might be better for other plans etc.


Terje


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sql-bench

2006-09-13 Thread Merlin Moncure

On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote:

IIRC, with these settings PG 8.0 seemed to be about half the speed of
mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the
truth for tests of this nature, ie, single query stream of fairly simple
queries.  If you try concurrent-update scenarios or something that
stresses planning ability you may arrive at different results though.
I have not retested with more recent versions.


if postgresql uses prepared statements for such queries, it will
roughly tie mysql/myisam in raw query output on this type of load
which also happens to be very easy to prepare...afaik mysql gets zero
performance benefit from preparing statements   This is extremely
trivial to test&confirm even on a shell script. [aside: will this
still be the case if peter e's planner changes become reality?]

another cheater trick benchmarkers do to disparage postgresql is to
not run analyze intentionally.  Basically all production postgresql
systems of any size will run analyze on cron.

another small aside, I caught the sqlite people actually *detuning*
postgresql for performance by turning stats_command_string=on in
postgresql.conf.  The way it was portrayed it almost looked like
cheating.  I busted them on it (go to
http://www.sqlite.org/cvstrac/wiki?p=SpeedComparison and look for the
remarks right below the results)

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Joshua Marsh

Hmm... that sounds bad. I'm sure your system will always choose indexes
with that value.

Is it overestimating the cost of using indexes or underestimating the
cost of a seq scan, or both? Maybe explain with the 0.1 setting will
help?

Regards,
   Jeff Davis


data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER
JOIN r3s169 s on v.dsiacctno = s.dsiacctno;
   QUERY PLAN
---
Merge Join  (cost=0.00..51808909.26 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..12755411.69 rows=112393848 width=20)
  ->  Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..32357747.90 rows=285230272 width=17)
(4 rows)

This is what I wanted, two index scans.  Just to give you an idea of
the difference in time, this plan would allow me to process 100,000
records ever few seconds, while the sequential scan would only
produces 100,000 every 10 minutes.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query Progress (was: Performance With Joins on Large Tables)

2006-09-13 Thread Joshua Marsh

On 9/13/06, Bucky Jordan <[EMAIL PROTECTED]> wrote:


Setting to 0.1 finally gave me the result I was looking for. I know
that the index scan is faster though.  The seq scan never finished (i
killed it after 24+ hours) and I'm running the query now with indexes
and it's progressing nicely (will probably take 4 hours).


In regards to "progressing nicely (will probably take 4 hours)" - is
this just an estimate or is there some way to get progress status (or
something similar- e.g. on step 6 of 20 planned steps) on a query in pg?
I looked through Chap 24, Monitoring DB Activity, but most of that looks
like aggregate stats. Trying to relate these to a particular query
doesn't really seem feasible.

This would be useful in the case where you have a couple of long running
transactions or stored procedures doing analysis and you'd like to give
the user some feedback where you're at.

Thanks,

Bucky



I do it programmatically, not through postgresql.  I'm using a cursor,
so I can keep track of how many records I've handled.  I'm not aware
of a way to do this in Postgresql.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Marcin Mank
> Is there anything I'm missing that is preventing it from using the index?
It
> just seems weird to me that other joins like this work fine and fast
> with indexes,
> but this one won't.


Did You consider clustering both tables on the dsiacctno index?

I just checked that for a 4M rows table even with enable_seqscan=on and
default *page_cost on PG 8.1.4 an index scan is being chosen for
select * from table order by serial_pkey_field


This is essentially the question in Your case - sort it, or get it sorted
via the index at the expense of more random IO.

I think clustering should work for You, but I am no expert, check with
others.

Greetings
Marcin


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Query Progress (was: Performance With Joins on Large Tables)

2006-09-13 Thread Bucky Jordan

Setting to 0.1 finally gave me the result I was looking for. I know
that the index scan is faster though.  The seq scan never finished (i
killed it after 24+ hours) and I'm running the query now with indexes
and it's progressing nicely (will probably take 4 hours).


In regards to "progressing nicely (will probably take 4 hours)" - is
this just an estimate or is there some way to get progress status (or
something similar- e.g. on step 6 of 20 planned steps) on a query in pg?
I looked through Chap 24, Monitoring DB Activity, but most of that looks
like aggregate stats. Trying to relate these to a particular query
doesn't really seem feasible.

This would be useful in the case where you have a couple of long running
transactions or stored procedures doing analysis and you'd like to give
the user some feedback where you're at. 

Thanks,

Bucky

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Unsubscribe

2006-09-13 Thread Christoph Nelles


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Jeff Davis
On Wed, 2006-09-13 at 10:19 -0600, Joshua Marsh wrote:
> > Right, it's just used for planning. Avoid setting it too low, if it's
> > below about 2.0 you would most likely see some very strange plans.
> > Certainly it doesn't make sense at all to set it below 1.0, since that
> > is saying it's cheaper to get a random page than a sequential one.
> >
> > What was your original random_page_cost, and what is the new value you
> > set it to?
> >
> > Regards,
> >Jeff Davis
> >
> >
> >
> >
> 
> I tried it at several levels.  It was initially at 4 (the default).  I
> tried 3 and 2 with no changes.  When I set it to 1, it used and index
> on view_505 but no r3s169:
> 
> data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
> v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
> FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
> s.dsiacctno;
>  QUERY PLAN
> 
>  Merge Join  (cost=154730044.01..278318711.49 rows=285230272 width=11)
>Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
>->  Index Scan using view_505_dsiacctno on view_505 v
> (cost=0.00..111923570.63 rows=112393848 width=20)
>->  Sort  (cost=154730044.01..155443119.69 rows=285230272 width=17)
>  Sort Key: s.dsiacctno
>  ->  Seq Scan on r3s169 s  (cost=1.00..106873675.72
> rows=285230272 width=17)
> 
> 
> Setting to 0.1 finally gave me the result I was looking for. I know
> that the index scan is faster though.  The seq scan never finished (i
> killed it after 24+ hours) and I'm running the query now with indexes
> and it's progressing nicely (will probably take 4 hours).

Hmm... that sounds bad. I'm sure your system will always choose indexes
with that value.

Is it overestimating the cost of using indexes or underestimating the
cost of a seq scan, or both? Maybe explain with the 0.1 setting will
help?

Regards,
Jeff Davis




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Joshua Marsh

On 9/13/06, Jeff Davis <[EMAIL PROTECTED]> wrote:

On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote:
> That seems to have done it.  Are there any side effects to this
> change?  I read about random_page_cost in the documentation and it
> seems like this is strictly for planning.  All the tables on this
> database will be indexed and of a size similar to these two, so I
> don't see it  causing any other problems.  Though I would check though
> :)
>

Right, it's just used for planning. Avoid setting it too low, if it's
below about 2.0 you would most likely see some very strange plans.
Certainly it doesn't make sense at all to set it below 1.0, since that
is saying it's cheaper to get a random page than a sequential one.

What was your original random_page_cost, and what is the new value you
set it to?

Regards,
   Jeff Davis






I tried it at several levels.  It was initially at 4 (the default).  I
tried 3 and 2 with no changes.  When I set it to 1, it used and index
on view_505 but no r3s169:

data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
s.dsiacctno;
QUERY PLAN

Merge Join  (cost=154730044.01..278318711.49 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..111923570.63 rows=112393848 width=20)
  ->  Sort  (cost=154730044.01..155443119.69 rows=285230272 width=17)
Sort Key: s.dsiacctno
->  Seq Scan on r3s169 s  (cost=1.00..106873675.72
rows=285230272 width=17)


Setting to 0.1 finally gave me the result I was looking for. I know
that the index scan is faster though.  The seq scan never finished (i
killed it after 24+ hours) and I'm running the query now with indexes
and it's progressing nicely (will probably take 4 hours).

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Jeff Davis
On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote:
> That seems to have done it.  Are there any side effects to this
> change?  I read about random_page_cost in the documentation and it
> seems like this is strictly for planning.  All the tables on this
> database will be indexed and of a size similar to these two, so I
> don't see it  causing any other problems.  Though I would check though
> :)
> 

Right, it's just used for planning. Avoid setting it too low, if it's
below about 2.0 you would most likely see some very strange plans.
Certainly it doesn't make sense at all to set it below 1.0, since that
is saying it's cheaper to get a random page than a sequential one.

What was your original random_page_cost, and what is the new value you
set it to?

Regards,
Jeff Davis




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] sql-bench

2006-09-13 Thread Tom Lane
yoav x <[EMAIL PROTECTED]> writes:
> Are there any tuning parameters that can be changed to speed these
> queries? Or are these queries especially tuned to show MySQL's
> stgrenths?

The latter.  I've ranted about this before --- there are both obvious
and subtle biases in that benchmark.  The last time I spent any time
with it, I ended up testing with these nondefault settings:

shared_buffers = 1  
work_mem = 10   
maintenance_work_mem = 10   
fsync = false   
checkpoint_segments = 30
max_locks_per_transaction = 128

(fsync = false is pretty bogus for production purposes, but if you're
comparing to mysql using myisam tables, I think it's a reasonably fair
basis for comparison, as myisam is certainly not crash-safe.  It'd be
interesting to see what mysql's performance looks like on this test
using innodb tables, which should be compared against fsync = true
... but I don't know how to change it to get all the tables to be
innodb.)

Also, on some of the tests it makes a material difference whether you
are using C locale or some other one --- C is faster.  And make sure you
have a recent version of DBD::Pg --- a year or two back I recall seeing
the perl test program eating more CPU than the backend in some of these
tests, because of inefficiencies in DBD::Pg.

IIRC, with these settings PG 8.0 seemed to be about half the speed of
mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the
truth for tests of this nature, ie, single query stream of fairly simple
queries.  If you try concurrent-update scenarios or something that
stresses planning ability you may arrive at different results though.
I have not retested with more recent versions.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Performance With Joins on Large Tables

2006-09-13 Thread Joshua Marsh

On 9/13/06, Jim C. Nasby <[EMAIL PROTECTED]> wrote:

On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote:
> data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
> v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
> FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
> s.dsiacctno;
>  QUERY PLAN
> 
---
> Merge Join  (cost=293767607.69..305744319.52 rows=285392608 width=11)
>   Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
>   ->  Sort  (cost=127304933.87..127585815.71 rows=112352736 width=20)
> Sort Key: v.dsiacctno
> ->  Seq Scan on view_505 v  (cost=1.00..104604059.36
> rows=112352736 width=20)
>   ->  Sort  (cost=166462673.82..167176155.34 rows=285392608 width=17)
> Sort Key: s.dsiacctno
> ->  Seq Scan on r3s169 s  (cost=1.00..106875334.08
> rows=285392608 width=17)
> (8 rows)
>
>
> Since enable_seqscan is off, my understanding is that in order for the query
> planner to user a sequential scan it must think there is no other
> alternative.
> Both sides are indexed and anaylzed, so that confuses me a little.
>
> I tried it on a smaller sample set of the data and it works fine:

Actually, enable_seqscan=off just adds a fixed overhead to the seqscan
cost estimate. That's why the cost for the seqscans in that plan starts
at 1. I've suggested changing that to a variable overhead based
on the expected rowcount, but the counter-argument was that anyone with
so much data that the fixed amount wouldn't work would most likely be
having bigger issues anyway.

Other things you can try to get the index scan back would be to reduce
random_page_cost and to analyze the join fields in those tables with a
higher statistics target (though I'm not 100% certain the join cost
estimator actually takes that into account). Or if you don't mind
patching your source code, it wouldn't be difficult to make
enable_seqscan use a bigger 'penalty value' than 1000.
--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



Thanks for the tip. I lowered random_page_cost and got these results:

data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM
v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH
FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno =
s.dsiacctno;
   QUERY PLAN
--
Merge Join  (cost=0.00..20921221.49 rows=285230272 width=11)
  Merge Cond: ("outer".dsiacctno = "inner".dsiacctno)
  ->  Index Scan using view_505_dsiacctno on view_505 v
(cost=0.00..2838595.79 rows=112393848 width=20)
  ->  Index Scan using r3s169_dsiacctno on r3s169 s
(cost=0.00..7106203.68 rows=285230272 width=17)
(4 rows)

That seems to have done it.  Are there any side effects to this
change?  I read about random_page_cost in the documentation and it
seems like this is strictly for planning.  All the tables on this
database will be indexed and of a size similar to these two, so I
don't see it  causing any other problems.  Though I would check though
:)

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sql-bench

2006-09-13 Thread Mark Lewis
The last I checked (years ago), sql-bench was very synthetic (i.e.
reflecting no realistic use case).  It's the sort of test suite that's
useful for database developers when testing the effects of a particular
code change or optimization, but not so applicable to real-world uses.  

Historically the test was also bad for PG because it did nasty things
like 10,000 inserts each in separate transactions because the test was
written for MySQL which at the time didn't support transactions.  Not
sure if that's been fixed yet or not.

Can you provide details about the schema and the queries that are slow?

-- Mark

On Wed, 2006-09-13 at 05:24 -0700, yoav x wrote:
> Hi
> 
> I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux.
> Some of the insert tests seems to be ver slow 
> 
> For example: select_join_in
> 
> Are there any tuning parameters that can be changed to speed these queries? 
> Or are these queries
> especially tuned to show MySQL's stgrenths?
> 
> 
> 
> 
> __
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] sql-bench

2006-09-13 Thread Dave Cramer
First of all you are going to have to show use what these queries are  
exactly, what the machine is you are running on (CPU, memory, and  
disk) , and how you have tuned it.


slow is a relative term.. we need information to determine what  
"slow" means.


Dave
On 13-Sep-06, at 8:50 AM, yoav x wrote:


So why are these queries so slow in PG?


--- Dave Cramer <[EMAIL PROTECTED]> wrote:


All of the tuning parameters would affect all queries

shared buffers, wal buffers, effective cache, to name a few

--dc--
On 13-Sep-06, at 8:24 AM, yoav x wrote:


Hi

I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux.
Some of the insert tests seems to be ver slow

For example: select_join_in

Are there any tuning parameters that can be changed to speed these
queries? Or are these queries
especially tuned to show MySQL's stgrenths?




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---(end of
broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do  
not

   match




---(end of  
broadcast)---

TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so  
that your

   message can get through to the mailing list cleanly




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---(end of  
broadcast)---

TIP 2: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sql-bench

2006-09-13 Thread yoav x
So why are these queries so slow in PG?


--- Dave Cramer <[EMAIL PROTECTED]> wrote:

> All of the tuning parameters would affect all queries
> 
> shared buffers, wal buffers, effective cache, to name a few
> 
> --dc--
> On 13-Sep-06, at 8:24 AM, yoav x wrote:
> 
> > Hi
> >
> > I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux.
> > Some of the insert tests seems to be ver slow
> >
> > For example: select_join_in
> >
> > Are there any tuning parameters that can be changed to speed these  
> > queries? Or are these queries
> > especially tuned to show MySQL's stgrenths?
> >
> >
> >
> >
> > __
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around
> > http://mail.yahoo.com
> >
> > ---(end of  
> > broadcast)---
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >choose an index scan if your joining column's datatypes do not
> >match
> >
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] sql-bench

2006-09-13 Thread Dave Cramer

All of the tuning parameters would affect all queries

shared buffers, wal buffers, effective cache, to name a few

--dc--
On 13-Sep-06, at 8:24 AM, yoav x wrote:


Hi

I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux.
Some of the insert tests seems to be ver slow

For example: select_join_in

Are there any tuning parameters that can be changed to speed these  
queries? Or are these queries

especially tuned to show MySQL's stgrenths?




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

---(end of  
broadcast)---

TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] sql-bench

2006-09-13 Thread yoav x
Hi

I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux.
Some of the insert tests seems to be ver slow 

For example: select_join_in

Are there any tuning parameters that can be changed to speed these queries? Or 
are these queries
especially tuned to show MySQL's stgrenths?




__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Poor performance on seq scan

2006-09-13 Thread Dave Cramer


On 13-Sep-06, at 6:16 AM, Laszlo Nagy wrote:



I have had extremely bad performance historically with onboard  
SATA chipsets
on Linux.  The one exception has been with the Intel based  
chipsets (not the

CPU, the I/O chipset).

This board has Intel chipset. I cannot remember the exact type but  
it was not in the low end category.

dmesg says:


kernel: ad4: 152626MB  at ata2-master  
SATA150
kernel: ad4: 152627MB  at ata3-master  
SATA150


It is very likely that you are having problems with the driver for  
the

chipset.

Are you running RAID1 in hardware?  If so, turn it off and see  
what the

performance is.  The onboard hardware RAID is worse than useless, it
actually slows the I/O down.


I'm using software raid, namely gmirror:

GEOM_MIRROR: Device gm0 created (id=2574033628).
GEOM_MIRROR: Device gm0: provider ad4 detected.
GEOM_MIRROR: Device gm0: provider ad6 detected.
GEOM_MIRROR: Device gm0: provider ad4 activated.
GEOM_MIRROR: Device gm0: provider ad6 activated.

#gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2574033628
Providers:
1. Name: mirror/gm0
  Mediasize: 160040803328 (149G)
  Sectorsize: 512
  Mode: r5w5e6
Consumers:
1. Name: ad4
  Mediasize: 160040803840 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 1153981856
2. Name: ad6
  Mediasize: 160041885696 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 3520427571


I tried to do:

#sysctl vfs.read_max=32
vfs.read_max: 6 -> 32

but I could not reach better disk read performance.

Thank you for your suggestions. Looks like I need to buy SCSI disks.


Well before you go do that try the areca SATA raid card


Regards,

  Laszlo


---(end of  
broadcast)---

TIP 5: don't forget to increase your free space map settings




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Poor performance on seq scan

2006-09-13 Thread Laszlo Nagy



I have had extremely bad performance historically with onboard SATA chipsets
on Linux.  The one exception has been with the Intel based chipsets (not the
CPU, the I/O chipset).
  
This board has Intel chipset. I cannot remember the exact type but it 
was not in the low end category.

dmesg says:


kernel: ad4: 152626MB  at ata2-master SATA150
kernel: ad4: 152627MB  at ata3-master SATA150


It is very likely that you are having problems with the driver for the
chipset.

Are you running RAID1 in hardware?  If so, turn it off and see what the
performance is.  The onboard hardware RAID is worse than useless, it
actually slows the I/O down.
  

I'm using software raid, namely gmirror:

GEOM_MIRROR: Device gm0 created (id=2574033628).
GEOM_MIRROR: Device gm0: provider ad4 detected.
GEOM_MIRROR: Device gm0: provider ad6 detected.
GEOM_MIRROR: Device gm0: provider ad4 activated.
GEOM_MIRROR: Device gm0: provider ad6 activated.

#gmirror list
Geom name: gm0
State: COMPLETE
Components: 2
Balance: round-robin
Slice: 4096
Flags: NONE
GenID: 0
SyncID: 1
ID: 2574033628
Providers:
1. Name: mirror/gm0
  Mediasize: 160040803328 (149G)
  Sectorsize: 512
  Mode: r5w5e6
Consumers:
1. Name: ad4
  Mediasize: 160040803840 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 1153981856
2. Name: ad6
  Mediasize: 160041885696 (149G)
  Sectorsize: 512
  Mode: r1w1e1
  State: ACTIVE
  Priority: 0
  Flags: DIRTY
  GenID: 0
  SyncID: 1
  ID: 3520427571


I tried to do:

#sysctl vfs.read_max=32
vfs.read_max: 6 -> 32

but I could not reach better disk read performance.

Thank you for your suggestions. Looks like I need to buy SCSI disks.

Regards,

  Laszlo


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings