Re: [PERFORM] sql-bench
You can use the test with InnoDB by giving the --create-options=engine=innodb option in the command line. Even with InnoDB, in some specific tests PG looks very bad compared to InnoDB. --- Tom Lane <[EMAIL PROTECTED]> wrote: > yoav x <[EMAIL PROTECTED]> writes: > > Are there any tuning parameters that can be changed to speed these > > queries? Or are these queries especially tuned to show MySQL's > > stgrenths? > > The latter. I've ranted about this before --- there are both obvious > and subtle biases in that benchmark. The last time I spent any time > with it, I ended up testing with these nondefault settings: > > shared_buffers = 1 > work_mem = 10 > maintenance_work_mem = 10 > fsync = false > checkpoint_segments = 30 > max_locks_per_transaction = 128 > > (fsync = false is pretty bogus for production purposes, but if you're > comparing to mysql using myisam tables, I think it's a reasonably fair > basis for comparison, as myisam is certainly not crash-safe. It'd be > interesting to see what mysql's performance looks like on this test > using innodb tables, which should be compared against fsync = true > ... but I don't know how to change it to get all the tables to be > innodb.) > > Also, on some of the tests it makes a material difference whether you > are using C locale or some other one --- C is faster. And make sure you > have a recent version of DBD::Pg --- a year or two back I recall seeing > the perl test program eating more CPU than the backend in some of these > tests, because of inefficiencies in DBD::Pg. > > IIRC, with these settings PG 8.0 seemed to be about half the speed of > mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the > truth for tests of this nature, ie, single query stream of fairly simple > queries. If you try concurrent-update scenarios or something that > stresses planning ability you may arrive at different results though. > I have not retested with more recent versions. > > regards, tom lane > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] sql-bench
On 9/14/06, Scott Marlowe <[EMAIL PROTECTED]> wrote: On Wed, 2006-09-13 at 14:36, Merlin Moncure wrote: > another small aside, I caught the sqlite people actually *detuning* > postgresql for performance by turning stats_command_string=on in > postgresql.conf. The way it was portrayed it almost looked like > cheating. I busted them on it (go to > http://www.sqlite.org/cvstrac/wiki?p=SpeedComparison and look for the > remarks right below the results) They're running autovacuum, which requires that, doesn't it? actually, you are right, it was row_level, not command_string (i got it right on their wiki, just not in the email here)...rmy bad on that. still, they did not disclose it. merlin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance With Joins on Large Tables
"Joshua Marsh" <[EMAIL PROTECTED]> writes: >>> On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote: Are the tables perhaps nearly in order by the dsiacctno fields? >> >>> My assumption would be they are in exact order. The text file I used >>> in the COPY statement had them in order, so if COPY preserves that in >>> the database, then it is in order. >> >> Ah. So the question is why the planner isn't noticing that. What do >> you see in the pg_stats view for the two dsiacctno fields --- the >> correlation field in particular? > Here are the results: > data=# select tablename, attname, n_distinct, avg_width, correlation > from pg_stats where tablename in ('view_505', 'r3s169') and attname = > 'dsiacctno'; > tablename | attname | n_distinct | avg_width | correlation > ---+---++---+- > view_505 | dsiacctno | -1 |13 |-0.13912 > r3s169| dsiacctno | 44156 |13 | -0.126824 > (2 rows) Wow, that correlation value is *way* away from order. If they were really in exact order by dsiacctno then I'd expect to see 1.0 in that column. Can you take another look at the tables and confirm the ordering? Does the correlation change if you do an ANALYZE on the tables? (Some small change is to be expected due to random sampling, but this is way off.) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Unsubscribe
---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance With Joins on Large Tables
On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote: "Joshua Marsh" <[EMAIL PROTECTED]> writes: >> Are the tables perhaps nearly in order by the dsiacctno fields? > My assumption would be they are in exact order. The text file I used > in the COPY statement had them in order, so if COPY preserves that in > the database, then it is in order. Ah. So the question is why the planner isn't noticing that. What do you see in the pg_stats view for the two dsiacctno fields --- the correlation field in particular? Here are the results: data=# select tablename, attname, n_distinct, avg_width, correlation from pg_stats where tablename in ('view_505', 'r3s169') and attname = 'dsiacctno'; tablename | attname | n_distinct | avg_width | correlation ---+---++---+- view_505 | dsiacctno | -1 |13 |-0.13912 r3s169| dsiacctno | 44156 |13 | -0.126824 (2 rows) Someone suggested CLUSTER to make sure they are in fact ordered, I can try that to and let everyone know the results. > The system has 8GB of ram and work_mem is set to 256MB. Seems reasonable enough. BTW, I don't think you've mentioned exactly which PG version you're using? regards, tom lane I am using 8.0.3. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] sql-bench
Scott Marlowe <[EMAIL PROTECTED]> writes: > On Wed, 2006-09-13 at 14:36, Merlin Moncure wrote: >> another small aside, I caught the sqlite people actually *detuning* >> postgresql for performance by turning stats_command_string=on in >> postgresql.conf. > They're running autovacuum, which requires that, doesn't it? No, you're thinking of stats_row_level. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance With Joins on Large Tables
"Joshua Marsh" <[EMAIL PROTECTED]> writes: >> Are the tables perhaps nearly in order by the dsiacctno fields? > My assumption would be they are in exact order. The text file I used > in the COPY statement had them in order, so if COPY preserves that in > the database, then it is in order. Ah. So the question is why the planner isn't noticing that. What do you see in the pg_stats view for the two dsiacctno fields --- the correlation field in particular? > The system has 8GB of ram and work_mem is set to 256MB. Seems reasonable enough. BTW, I don't think you've mentioned exactly which PG version you're using? regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Poor performance on seq scan
[EMAIL PROTECTED] wrote: > This board has Intel chipset. I cannot remember the exact type but it > was not in the low end category. > dmesg says: > > > kernel: ad4: 152626MB at ata2-master SATA150 > kernel: ad4: 152627MB at ata3-master SATA150 There have been reported problems with ICH7 on FreeBSD mailing lists, though I can't find any that affect performance. > Components: 2 > Balance: round-robin > Slice: 4096 See if changing balance algorithm to "split", and slice size to 8192 or more, while keeping vfs.read_max to 16 or more helps your performance. (e.g. gmirror configure -b split -s 8192 gm0) Also, how is your file system mounted? (what does output from 'mount' say?) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance With Joins on Large Tables
Are the tables perhaps nearly in order by the dsiacctno fields? If that were the case, and the planner were missing it for some reason, these results would be plausible. BTW, what are you using for work_mem, and how does that compare to your available RAM? regards, tom lane My assumption would be they are in exact order. The text file I used in the COPY statement had them in order, so if COPY preserves that in the database, then it is in order. The system has 8GB of ram and work_mem is set to 256MB. I'll see if I can't make time to run the sort-seqscan method so we can have an exact time to work with. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] sql-bench
On Wed, 2006-09-13 at 14:36, Merlin Moncure wrote: > On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote: > > IIRC, with these settings PG 8.0 seemed to be about half the speed of > > mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the > > truth for tests of this nature, ie, single query stream of fairly simple > > queries. If you try concurrent-update scenarios or something that > > stresses planning ability you may arrive at different results though. > > I have not retested with more recent versions. > > if postgresql uses prepared statements for such queries, it will > roughly tie mysql/myisam in raw query output on this type of load > which also happens to be very easy to prepare...afaik mysql gets zero > performance benefit from preparing statements This is extremely > trivial to test&confirm even on a shell script. [aside: will this > still be the case if peter e's planner changes become reality?] > > another cheater trick benchmarkers do to disparage postgresql is to > not run analyze intentionally. Basically all production postgresql > systems of any size will run analyze on cron. > > another small aside, I caught the sqlite people actually *detuning* > postgresql for performance by turning stats_command_string=on in > postgresql.conf. The way it was portrayed it almost looked like > cheating. I busted them on it (go to > http://www.sqlite.org/cvstrac/wiki?p=SpeedComparison and look for the > remarks right below the results) They're running autovacuum, which requires that, doesn't it? I'd rather them be running autovacuum than not vacuuming / analyzing at all. And autovacuum is a pretty realistic setting for most databases (I use it on my production machines.) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performance With Joins on Large Tables
"Joshua Marsh" <[EMAIL PROTECTED]> writes: > I have a suspision that pgsql isn't tuned to properly deal with tables > of this size. Actually, it is. Most of the planner complaints we get are from people whose tables fit in memory and they find that the default planner behavior doesn't apply real well to that case. I find your indexscan-is-faster-than-sort results pretty suspicious for large tables. Are the tables perhaps nearly in order by the dsiacctno fields? If that were the case, and the planner were missing it for some reason, these results would be plausible. BTW, what are you using for work_mem, and how does that compare to your available RAM? regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Unsubscribe
http://www.postgresql.org/community/lists/ http://www.postgresql.org/community/lists/subscribe -- Until later, Geoffrey Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] sql-bench
"Merlin Moncure" <[EMAIL PROTECTED]> writes: > another small aside, I caught the sqlite people actually *detuning* > postgresql for performance by turning stats_command_string=on in > postgresql.conf. Hm, well, that's not unreasonable if a comparable facility is enabled in the other databases they're testing ... but it'll hardly matter in 8.2 anyway ;-) regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Poor performance on seq scan
Lazlo, >> Thank you for your suggestions. Looks like I need to buy SCSI disks. > > Well before you go do that try the areca SATA raid card Yes, by all means spend $200 and buy the Areca or 3Ware RAID card - it's a simple switch out of the cables and you should be golden. Again - you should only expect an increase in performance from 4-6 times from what you are getting now unless you increase the number of disks. - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance With Joins on Large Tables
On 9/13/06, Terje Elde <[EMAIL PROTECTED]> wrote: Jeff Davis wrote: > Is it overestimating the cost of using indexes or underestimating the > cost of a seq scan, or both? Maybe explain with the 0.1 setting will > help? > If enable_seqscan is off, and cost is still set to 1, it could be that it's quite simply forcibly underestimating the cost of a seqscan in this case. If enable_secscan was off for the mentioned plan, it'd be interesting to see if things would be saner with seqscans enabled, and a more reasonable random page cost. If more 'sane' values still produce the desired plan, it might be better for other plans etc. Terje I turned enable_seqscan to off and got similar results. random_age_cost at 4.0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN --- Merge Join (cost=293737539.01..301430139.34 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Sort (cost=127311593.00..127592577.62 rows=112393848 width=20) Sort Key: v.dsiacctno -> Seq Scan on view_505 v (cost=1.00..104602114.48 rows=112393848 width=20) -> Sort (cost=166425946.01..167139021.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=1.00..106873675.72 rows=285230272 width=17) (8 rows) random_page_cost at 3.0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN --- Merge Join (cost=288303269.01..295995869.34 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Sort (cost=125775957.00..126056941.62 rows=112393848 width=20) Sort Key: v.dsiacctno -> Seq Scan on view_505 v (cost=1.00..104602114.48 rows=112393848 width=20) -> Sort (cost=162527312.01..163240387.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=1.00..106873675.72 rows=285230272 width=17) (8 rows) random_age_cost ad 2,0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN --- Merge Join (cost=282868999.01..290561599.34 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Sort (cost=124240321.00..124521305.62 rows=112393848 width=20) Sort Key: v.dsiacctno -> Seq Scan on view_505 v (cost=1.00..104602114.48 rows=112393848 width=20) -> Sort (cost=158628678.01..159341753.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=1.00..106873675.72 rows=285230272 width=17) (8 rows) random_page_cost at 1.0: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN Merge Join (cost=154730044.01..274040257.41 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..111923570.63 rows=112393848 width=20) -> Sort (cost=154730044.01..155443119.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=1.00..106873675.72 rows=285230272 width=17) (6 rows) random_page_cost ad 0.1: data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN --- Merge Join (cost=0.00..51808909.26 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..12755411.69 rows=112393848 width=20) -> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..32357747.90 rows=285230272 width=17) (4 rows) I have a suspision that pgsql isn't tuned to properly deal with tables of this size. Are there other things I should look at when dealing with a database o
Re: [PERFORM] Performance With Joins on Large Tables
Jeff Davis wrote: Is it overestimating the cost of using indexes or underestimating the cost of a seq scan, or both? Maybe explain with the 0.1 setting will help? If enable_seqscan is off, and cost is still set to 1, it could be that it's quite simply forcibly underestimating the cost of a seqscan in this case. If enable_secscan was off for the mentioned plan, it'd be interesting to see if things would be saner with seqscans enabled, and a more reasonable random page cost. If more 'sane' values still produce the desired plan, it might be better for other plans etc. Terje ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] sql-bench
On 9/13/06, Tom Lane <[EMAIL PROTECTED]> wrote: IIRC, with these settings PG 8.0 seemed to be about half the speed of mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the truth for tests of this nature, ie, single query stream of fairly simple queries. If you try concurrent-update scenarios or something that stresses planning ability you may arrive at different results though. I have not retested with more recent versions. if postgresql uses prepared statements for such queries, it will roughly tie mysql/myisam in raw query output on this type of load which also happens to be very easy to prepare...afaik mysql gets zero performance benefit from preparing statements This is extremely trivial to test&confirm even on a shell script. [aside: will this still be the case if peter e's planner changes become reality?] another cheater trick benchmarkers do to disparage postgresql is to not run analyze intentionally. Basically all production postgresql systems of any size will run analyze on cron. another small aside, I caught the sqlite people actually *detuning* postgresql for performance by turning stats_command_string=on in postgresql.conf. The way it was portrayed it almost looked like cheating. I busted them on it (go to http://www.sqlite.org/cvstrac/wiki?p=SpeedComparison and look for the remarks right below the results) merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance With Joins on Large Tables
Hmm... that sounds bad. I'm sure your system will always choose indexes with that value. Is it overestimating the cost of using indexes or underestimating the cost of a seq scan, or both? Maybe explain with the 0.1 setting will help? Regards, Jeff Davis data=# explain SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (EPOCH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s on v.dsiacctno = s.dsiacctno; QUERY PLAN --- Merge Join (cost=0.00..51808909.26 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..12755411.69 rows=112393848 width=20) -> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..32357747.90 rows=285230272 width=17) (4 rows) This is what I wanted, two index scans. Just to give you an idea of the difference in time, this plan would allow me to process 100,000 records ever few seconds, while the sequential scan would only produces 100,000 every 10 minutes. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query Progress (was: Performance With Joins on Large Tables)
On 9/13/06, Bucky Jordan <[EMAIL PROTECTED]> wrote: Setting to 0.1 finally gave me the result I was looking for. I know that the index scan is faster though. The seq scan never finished (i killed it after 24+ hours) and I'm running the query now with indexes and it's progressing nicely (will probably take 4 hours). In regards to "progressing nicely (will probably take 4 hours)" - is this just an estimate or is there some way to get progress status (or something similar- e.g. on step 6 of 20 planned steps) on a query in pg? I looked through Chap 24, Monitoring DB Activity, but most of that looks like aggregate stats. Trying to relate these to a particular query doesn't really seem feasible. This would be useful in the case where you have a couple of long running transactions or stored procedures doing analysis and you'd like to give the user some feedback where you're at. Thanks, Bucky I do it programmatically, not through postgresql. I'm using a cursor, so I can keep track of how many records I've handled. I'm not aware of a way to do this in Postgresql. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance With Joins on Large Tables
> Is there anything I'm missing that is preventing it from using the index? It > just seems weird to me that other joins like this work fine and fast > with indexes, > but this one won't. Did You consider clustering both tables on the dsiacctno index? I just checked that for a 4M rows table even with enable_seqscan=on and default *page_cost on PG 8.1.4 an index scan is being chosen for select * from table order by serial_pkey_field This is essentially the question in Your case - sort it, or get it sorted via the index at the expense of more random IO. I think clustering should work for You, but I am no expert, check with others. Greetings Marcin ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Query Progress (was: Performance With Joins on Large Tables)
Setting to 0.1 finally gave me the result I was looking for. I know that the index scan is faster though. The seq scan never finished (i killed it after 24+ hours) and I'm running the query now with indexes and it's progressing nicely (will probably take 4 hours). In regards to "progressing nicely (will probably take 4 hours)" - is this just an estimate or is there some way to get progress status (or something similar- e.g. on step 6 of 20 planned steps) on a query in pg? I looked through Chap 24, Monitoring DB Activity, but most of that looks like aggregate stats. Trying to relate these to a particular query doesn't really seem feasible. This would be useful in the case where you have a couple of long running transactions or stored procedures doing analysis and you'd like to give the user some feedback where you're at. Thanks, Bucky ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Unsubscribe
---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Performance With Joins on Large Tables
On Wed, 2006-09-13 at 10:19 -0600, Joshua Marsh wrote: > > Right, it's just used for planning. Avoid setting it too low, if it's > > below about 2.0 you would most likely see some very strange plans. > > Certainly it doesn't make sense at all to set it below 1.0, since that > > is saying it's cheaper to get a random page than a sequential one. > > > > What was your original random_page_cost, and what is the new value you > > set it to? > > > > Regards, > >Jeff Davis > > > > > > > > > > I tried it at several levels. It was initially at 4 (the default). I > tried 3 and 2 with no changes. When I set it to 1, it used and index > on view_505 but no r3s169: > > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = > s.dsiacctno; > QUERY PLAN > > Merge Join (cost=154730044.01..278318711.49 rows=285230272 width=11) >Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) >-> Index Scan using view_505_dsiacctno on view_505 v > (cost=0.00..111923570.63 rows=112393848 width=20) >-> Sort (cost=154730044.01..155443119.69 rows=285230272 width=17) > Sort Key: s.dsiacctno > -> Seq Scan on r3s169 s (cost=1.00..106873675.72 > rows=285230272 width=17) > > > Setting to 0.1 finally gave me the result I was looking for. I know > that the index scan is faster though. The seq scan never finished (i > killed it after 24+ hours) and I'm running the query now with indexes > and it's progressing nicely (will probably take 4 hours). Hmm... that sounds bad. I'm sure your system will always choose indexes with that value. Is it overestimating the cost of using indexes or underestimating the cost of a seq scan, or both? Maybe explain with the 0.1 setting will help? Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance With Joins on Large Tables
On 9/13/06, Jeff Davis <[EMAIL PROTECTED]> wrote: On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote: > That seems to have done it. Are there any side effects to this > change? I read about random_page_cost in the documentation and it > seems like this is strictly for planning. All the tables on this > database will be indexed and of a size similar to these two, so I > don't see it causing any other problems. Though I would check though > :) > Right, it's just used for planning. Avoid setting it too low, if it's below about 2.0 you would most likely see some very strange plans. Certainly it doesn't make sense at all to set it below 1.0, since that is saying it's cheaper to get a random page than a sequential one. What was your original random_page_cost, and what is the new value you set it to? Regards, Jeff Davis I tried it at several levels. It was initially at 4 (the default). I tried 3 and 2 with no changes. When I set it to 1, it used and index on view_505 but no r3s169: data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = s.dsiacctno; QUERY PLAN Merge Join (cost=154730044.01..278318711.49 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..111923570.63 rows=112393848 width=20) -> Sort (cost=154730044.01..155443119.69 rows=285230272 width=17) Sort Key: s.dsiacctno -> Seq Scan on r3s169 s (cost=1.00..106873675.72 rows=285230272 width=17) Setting to 0.1 finally gave me the result I was looking for. I know that the index scan is faster though. The seq scan never finished (i killed it after 24+ hours) and I'm running the query now with indexes and it's progressing nicely (will probably take 4 hours). ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance With Joins on Large Tables
On Wed, 2006-09-13 at 08:49 -0600, Joshua Marsh wrote: > That seems to have done it. Are there any side effects to this > change? I read about random_page_cost in the documentation and it > seems like this is strictly for planning. All the tables on this > database will be indexed and of a size similar to these two, so I > don't see it causing any other problems. Though I would check though > :) > Right, it's just used for planning. Avoid setting it too low, if it's below about 2.0 you would most likely see some very strange plans. Certainly it doesn't make sense at all to set it below 1.0, since that is saying it's cheaper to get a random page than a sequential one. What was your original random_page_cost, and what is the new value you set it to? Regards, Jeff Davis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] sql-bench
yoav x <[EMAIL PROTECTED]> writes: > Are there any tuning parameters that can be changed to speed these > queries? Or are these queries especially tuned to show MySQL's > stgrenths? The latter. I've ranted about this before --- there are both obvious and subtle biases in that benchmark. The last time I spent any time with it, I ended up testing with these nondefault settings: shared_buffers = 1 work_mem = 10 maintenance_work_mem = 10 fsync = false checkpoint_segments = 30 max_locks_per_transaction = 128 (fsync = false is pretty bogus for production purposes, but if you're comparing to mysql using myisam tables, I think it's a reasonably fair basis for comparison, as myisam is certainly not crash-safe. It'd be interesting to see what mysql's performance looks like on this test using innodb tables, which should be compared against fsync = true ... but I don't know how to change it to get all the tables to be innodb.) Also, on some of the tests it makes a material difference whether you are using C locale or some other one --- C is faster. And make sure you have a recent version of DBD::Pg --- a year or two back I recall seeing the perl test program eating more CPU than the backend in some of these tests, because of inefficiencies in DBD::Pg. IIRC, with these settings PG 8.0 seemed to be about half the speed of mysql 5.0 w/myisam, which is probably somewhere in the ballpark of the truth for tests of this nature, ie, single query stream of fairly simple queries. If you try concurrent-update scenarios or something that stresses planning ability you may arrive at different results though. I have not retested with more recent versions. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance With Joins on Large Tables
On 9/13/06, Jim C. Nasby <[EMAIL PROTECTED]> wrote: On Tue, Sep 12, 2006 at 04:17:34PM -0600, Joshua Marsh wrote: > data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM > v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH > FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = > s.dsiacctno; > QUERY PLAN > --- > Merge Join (cost=293767607.69..305744319.52 rows=285392608 width=11) > Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) > -> Sort (cost=127304933.87..127585815.71 rows=112352736 width=20) > Sort Key: v.dsiacctno > -> Seq Scan on view_505 v (cost=1.00..104604059.36 > rows=112352736 width=20) > -> Sort (cost=166462673.82..167176155.34 rows=285392608 width=17) > Sort Key: s.dsiacctno > -> Seq Scan on r3s169 s (cost=1.00..106875334.08 > rows=285392608 width=17) > (8 rows) > > > Since enable_seqscan is off, my understanding is that in order for the query > planner to user a sequential scan it must think there is no other > alternative. > Both sides are indexed and anaylzed, so that confuses me a little. > > I tried it on a smaller sample set of the data and it works fine: Actually, enable_seqscan=off just adds a fixed overhead to the seqscan cost estimate. That's why the cost for the seqscans in that plan starts at 1. I've suggested changing that to a variable overhead based on the expected rowcount, but the counter-argument was that anyone with so much data that the fixed amount wouldn't work would most likely be having bigger issues anyway. Other things you can try to get the index scan back would be to reduce random_page_cost and to analyze the join fields in those tables with a higher statistics target (though I'm not 100% certain the join cost estimator actually takes that into account). Or if you don't mind patching your source code, it wouldn't be difficult to make enable_seqscan use a bigger 'penalty value' than 1000. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) Thanks for the tip. I lowered random_page_cost and got these results: data=# EXPLAIN SELECT v.phonedirect, v.editdrop, EXTRACT (EPOCH FROM v.custdate), EXTRACT (YEAR FROM s.custdate) || '-' || EXTRACT (MONTH FROM s.custdate) FROM view_505 v INNER JOIN r3s169 s ON v.dsiacctno = s.dsiacctno; QUERY PLAN -- Merge Join (cost=0.00..20921221.49 rows=285230272 width=11) Merge Cond: ("outer".dsiacctno = "inner".dsiacctno) -> Index Scan using view_505_dsiacctno on view_505 v (cost=0.00..2838595.79 rows=112393848 width=20) -> Index Scan using r3s169_dsiacctno on r3s169 s (cost=0.00..7106203.68 rows=285230272 width=17) (4 rows) That seems to have done it. Are there any side effects to this change? I read about random_page_cost in the documentation and it seems like this is strictly for planning. All the tables on this database will be indexed and of a size similar to these two, so I don't see it causing any other problems. Though I would check though :) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] sql-bench
The last I checked (years ago), sql-bench was very synthetic (i.e. reflecting no realistic use case). It's the sort of test suite that's useful for database developers when testing the effects of a particular code change or optimization, but not so applicable to real-world uses. Historically the test was also bad for PG because it did nasty things like 10,000 inserts each in separate transactions because the test was written for MySQL which at the time didn't support transactions. Not sure if that's been fixed yet or not. Can you provide details about the schema and the queries that are slow? -- Mark On Wed, 2006-09-13 at 05:24 -0700, yoav x wrote: > Hi > > I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux. > Some of the insert tests seems to be ver slow > > For example: select_join_in > > Are there any tuning parameters that can be changed to speed these queries? > Or are these queries > especially tuned to show MySQL's stgrenths? > > > > > __ > Do You Yahoo!? > Tired of spam? Yahoo! Mail has the best spam protection around > http://mail.yahoo.com > > ---(end of broadcast)--- > TIP 9: In versions below 8.0, the planner will ignore your desire to >choose an index scan if your joining column's datatypes do not >match ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] sql-bench
First of all you are going to have to show use what these queries are exactly, what the machine is you are running on (CPU, memory, and disk) , and how you have tuned it. slow is a relative term.. we need information to determine what "slow" means. Dave On 13-Sep-06, at 8:50 AM, yoav x wrote: So why are these queries so slow in PG? --- Dave Cramer <[EMAIL PROTECTED]> wrote: All of the tuning parameters would affect all queries shared buffers, wal buffers, effective cache, to name a few --dc-- On 13-Sep-06, at 8:24 AM, yoav x wrote: Hi I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux. Some of the insert tests seems to be ver slow For example: select_join_in Are there any tuning parameters that can be changed to speed these queries? Or are these queries especially tuned to show MySQL's stgrenths? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] sql-bench
So why are these queries so slow in PG? --- Dave Cramer <[EMAIL PROTECTED]> wrote: > All of the tuning parameters would affect all queries > > shared buffers, wal buffers, effective cache, to name a few > > --dc-- > On 13-Sep-06, at 8:24 AM, yoav x wrote: > > > Hi > > > > I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux. > > Some of the insert tests seems to be ver slow > > > > For example: select_join_in > > > > Are there any tuning parameters that can be changed to speed these > > queries? Or are these queries > > especially tuned to show MySQL's stgrenths? > > > > > > > > > > __ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam protection around > > http://mail.yahoo.com > > > > ---(end of > > broadcast)--- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > >choose an index scan if your joining column's datatypes do not > >match > > > > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] sql-bench
All of the tuning parameters would affect all queries shared buffers, wal buffers, effective cache, to name a few --dc-- On 13-Sep-06, at 8:24 AM, yoav x wrote: Hi I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux. Some of the insert tests seems to be ver slow For example: select_join_in Are there any tuning parameters that can be changed to speed these queries? Or are these queries especially tuned to show MySQL's stgrenths? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] sql-bench
Hi I am trying to run sql-bench against PostgreSQL 8.1.4 on Linux. Some of the insert tests seems to be ver slow For example: select_join_in Are there any tuning parameters that can be changed to speed these queries? Or are these queries especially tuned to show MySQL's stgrenths? __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Poor performance on seq scan
On 13-Sep-06, at 6:16 AM, Laszlo Nagy wrote: I have had extremely bad performance historically with onboard SATA chipsets on Linux. The one exception has been with the Intel based chipsets (not the CPU, the I/O chipset). This board has Intel chipset. I cannot remember the exact type but it was not in the low end category. dmesg says: kernel: ad4: 152626MB at ata2-master SATA150 kernel: ad4: 152627MB at ata3-master SATA150 It is very likely that you are having problems with the driver for the chipset. Are you running RAID1 in hardware? If so, turn it off and see what the performance is. The onboard hardware RAID is worse than useless, it actually slows the I/O down. I'm using software raid, namely gmirror: GEOM_MIRROR: Device gm0 created (id=2574033628). GEOM_MIRROR: Device gm0: provider ad4 detected. GEOM_MIRROR: Device gm0: provider ad6 detected. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider ad6 activated. #gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 2574033628 Providers: 1. Name: mirror/gm0 Mediasize: 160040803328 (149G) Sectorsize: 512 Mode: r5w5e6 Consumers: 1. Name: ad4 Mediasize: 160040803840 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 1153981856 2. Name: ad6 Mediasize: 160041885696 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3520427571 I tried to do: #sysctl vfs.read_max=32 vfs.read_max: 6 -> 32 but I could not reach better disk read performance. Thank you for your suggestions. Looks like I need to buy SCSI disks. Well before you go do that try the areca SATA raid card Regards, Laszlo ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Poor performance on seq scan
I have had extremely bad performance historically with onboard SATA chipsets on Linux. The one exception has been with the Intel based chipsets (not the CPU, the I/O chipset). This board has Intel chipset. I cannot remember the exact type but it was not in the low end category. dmesg says: kernel: ad4: 152626MB at ata2-master SATA150 kernel: ad4: 152627MB at ata3-master SATA150 It is very likely that you are having problems with the driver for the chipset. Are you running RAID1 in hardware? If so, turn it off and see what the performance is. The onboard hardware RAID is worse than useless, it actually slows the I/O down. I'm using software raid, namely gmirror: GEOM_MIRROR: Device gm0 created (id=2574033628). GEOM_MIRROR: Device gm0: provider ad4 detected. GEOM_MIRROR: Device gm0: provider ad6 detected. GEOM_MIRROR: Device gm0: provider ad4 activated. GEOM_MIRROR: Device gm0: provider ad6 activated. #gmirror list Geom name: gm0 State: COMPLETE Components: 2 Balance: round-robin Slice: 4096 Flags: NONE GenID: 0 SyncID: 1 ID: 2574033628 Providers: 1. Name: mirror/gm0 Mediasize: 160040803328 (149G) Sectorsize: 512 Mode: r5w5e6 Consumers: 1. Name: ad4 Mediasize: 160040803840 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 1153981856 2. Name: ad6 Mediasize: 160041885696 (149G) Sectorsize: 512 Mode: r1w1e1 State: ACTIVE Priority: 0 Flags: DIRTY GenID: 0 SyncID: 1 ID: 3520427571 I tried to do: #sysctl vfs.read_max=32 vfs.read_max: 6 -> 32 but I could not reach better disk read performance. Thank you for your suggestions. Looks like I need to buy SCSI disks. Regards, Laszlo ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings