Re: [PERFORM] pg_reset_stats + cache I/O %
On Wed, Mar 08, 2006 at 01:35:35PM -0500, mcelroy, tim wrote: > I actually need this info as I was tasked by management to provide it. Not > sure if they understand that or not, I do but management does like to see > how well the system and its components are performing. Also, I would > utilize these results to test any cache tuning changes I may make. What I feared. While monitoring cache hit % over time isn't a bad idea, it's less than half the picture, which makes fertile ground for optimizing for some mythical target instead of actual system performance. If the "conclusion" from these numbers is that shared_buffers needs to get set larger than min(5, 10% of memory) I'd very seriously re-consider how performance tuning is being done. But hopefully I'm just being paranoid and you guys are just doing a great job of monitoring things and keeping on the ball. :) -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] embedded postgres and threading
Sorry if this is the wrong list ... I'm in the process of developing an application based on gtk & postgress for both windows & linux. Short, simple and to the point - I'm using embedded SQL is there anything I should know about using postgress in multiple threads, under linux OR windows? I've not been able to find anything in the FAQ or documentation regarding this ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Postgres and Ingres R3 / SAN
Adding -performance back; you should do a reply-all if you want to reply to list messages. > From: Jeremy Haile [mailto:[EMAIL PROTECTED] > > Can you point us at more info about this? I can't even find > a website > > for Ingres... > > Ingres is based off of the same original codebase that PostgreSQL was > based upon (a long time ago) It is owned by Computer > Associates and was > open sourced last year. It supports clustering and replication, and > I've seen an Ingres install set up as a cluster backed by a > SAN before. > I just haven't talked to anyone (at least unbiased) who has used this > type of setup in production, and I'm not fully aware of the > advantages/disadvantages of this type of setup with Ingres. > Since this > group seems pretty knowledgable about performance advantages > (and we are > currently running PostgreSQL), I wanted to see if there were any > experiences or opinions. > > Here is a link to their website: > http://opensource.ca.com/projects/ingres > > > > Perhaps if you posted your performance requirements someone > could help > > point you to a solution that would meet them. > > This is honestly more of a curiousity question at the moment, > so I don't > have any specific numbers. We definitely have a requirement for > failover in the case of a machine failure, so we at least need > Master->Slave replication. However, I wanted to solicit > information on > clustering alternatives as well, since scalability will likely be a > future problem for our database. Ahh, ok... that's likely a much different requirement than true clustering. What a lot of folks do right now is segregate their application into a read-only stream and the more interactive read-write streams, and then use Slony to replicate data to a number of machines for the read-only work. This way anyone who's hitting the site read-only (and can handle some possible delay) will just hit one of the slave machines. People who are doing interactive work (updating data) will hit the master. Since most applications do far more reading than they do writing, this is a pretty good way to load-balance. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Is good idea an array of 365 elements in a cell of a table, in order to perform searchs?
If you need to compare stuff on a day-by-day basis, I think you'll be much better off just expanding stuff into a table of: item_id int NOT NULL , day date NOT NULL , capacitiy ... , price_per_day ... , price_per_week ... , PRIMARY KEY( item_id, day ) (Note that camel case and databases don't mix well...) Sure, you're de-normalizing here, but the key is that you're putting the data into a format where you can easily do things like: SELECT sum(capacity) FROM ... WHERE day = '2006-12-18'; Trying to do that with arrays would be noticably more complex. And if you wanted to do a whole month or something? Yeck... BTW, another option is to roll price_per_15_days and price_per_month into a different table, since you'd only need 24 rows per item. Might be worth the trade-off in complexity depending on the specifics of the application. On Wed, Mar 08, 2006 at 03:28:36PM +0100, Ruben Rubio Rey wrote: > Hi, > > Im having a dude with a new inplementation in a web site. > The ojective is create a search as fast as possible. I have thought two > possibilities to do that: > > I have several items. Those items has 1 or more of capacity. Each > capacity, has several dates (From 1 january to 10 of april, for > example). The dates covers 366 days, the current year, and they are > indeterminated ranges. Per each date, it has price per day, per week, > per15days and per month. > > I have designed two possibilities: > > First: > IdItem StartDateEndDateCapacity PricePerDay PricePerWeek* > PricePer15days*PricePerMonth* >1 1-1-2005 10-1-2005 2100 > 90 85 80 >1 11-1-2005 20-1-2005 2105 > 94 83 82 >1 21-1-2005 5-2-2005 4405 > 394 283 182 >2 ... > Right now arround 30.000 rows, in one year is spected to have 60.000 rows > > * In order to compare right, all prices will be translated to days. > Example, PricePerWeek will have the Week Price / 7 and go on > > Second > IdItem Capacity Days >Week 15Days Month Year > 1 2 [Array of 365 values, one per day of > year] [ .Array. ] [ .Array. ] [ .Array. ] [ .Array. ] > ^__ Each item of array its a price > > Right now arround 2.500 rows. in one year is spected to have 5.000 rows > > I have to compare prices or prices and dates or prices and dates and > capacity or capacity and prices > > I have no experience working with arrays on a table. Is it fast? > Witch one do u think will have better performance? > Any good idea? > > I hope this is enouth information. > Thanks in advance, > Ruben Rubio Rey > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] pg_reset_stats + cache I/O %
Title: RE: [PERFORM] pg_reset_stats + cache I/O % I actually need this info as I was tasked by management to provide it. Not sure if they understand that or not, I do but management does like to see how well the system and its components are performing. Also, I would utilize these results to test any cache tuning changes I may make. Tim -Original Message- From: Jim C. Nasby [mailto:[EMAIL PROTECTED]] Sent: Wednesday, March 08, 2006 1:28 PM To: mcelroy, tim Cc: 'Tom Lane'; 'pgsql-performance@postgresql.org' Subject: Re: [PERFORM] pg_reset_stats + cache I/O % Out of curiosity, why do you want this info? More important, do the folks who are looking at this understand that a key part of PostgreSQL's tuning strategy is to let the OS handle the bulk of the caching? On Wed, Mar 08, 2006 at 08:59:51AM -0500, mcelroy, tim wrote: > Thanks Tom, sorry I neglected to copy the list on my previous email. > > Does this query make sense and is it valid for an accurate cache % hit ratio > for the entire DB? I would assume I could use the same logic with other > views such as pg_stat_user_tables to get a per table ratio? > > SELECT 100 - round((blks_hit::numeric / (blks_hit::numeric + > blks_read::numeric)) * 100,2) > AS "Cache % Hit" > FROM pg_stat_database > WHERE datname = 'Fix1'; > > > > Cache % Hit > > 98.06 > (1 row) > > Thank you, > Tim > > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED]] > Sent: Tuesday, March 07, 2006 2:37 PM > To: mcelroy, tim > Cc: 'pgsql-performance@postgresql.org' > Subject: Re: [PERFORM] pg_reset_stats + cache I/O % > > "mcelroy, tim" <[EMAIL PROTECTED]> writes: > > ERROR: function round(double precision, integer) does not exist > > Try coercing to numeric instead of float. Also, it'd be a good idea to > put that coercion outside the sum()'s instead of inside --- summing > bigints is probably noticeably faster than summing numerics. > > regards, tom lane -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Re: [PERFORM] pg_reset_stats + cache I/O %
Out of curiosity, why do you want this info? More important, do the folks who are looking at this understand that a key part of PostgreSQL's tuning strategy is to let the OS handle the bulk of the caching? On Wed, Mar 08, 2006 at 08:59:51AM -0500, mcelroy, tim wrote: > Thanks Tom, sorry I neglected to copy the list on my previous email. > > Does this query make sense and is it valid for an accurate cache % hit ratio > for the entire DB? I would assume I could use the same logic with other > views such as pg_stat_user_tables to get a per table ratio? > > SELECT 100 - round((blks_hit::numeric / (blks_hit::numeric + > blks_read::numeric)) * 100,2) > AS "Cache % Hit" > FROM pg_stat_database > WHERE datname = 'Fix1'; > > > > Cache % Hit > >98.06 > (1 row) > > Thank you, > Tim > > -Original Message- > From: Tom Lane [mailto:[EMAIL PROTECTED] > Sent: Tuesday, March 07, 2006 2:37 PM > To: mcelroy, tim > Cc: 'pgsql-performance@postgresql.org' > Subject: Re: [PERFORM] pg_reset_stats + cache I/O % > > "mcelroy, tim" <[EMAIL PROTECTED]> writes: > > ERROR: function round(double precision, integer) does not exist > > Try coercing to numeric instead of float. Also, it'd be a good idea to > put that coercion outside the sum()'s instead of inside --- summing > bigints is probably noticeably faster than summing numerics. > > regards, tom lane -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Bad row estimates
Alex Adriaanse <[EMAIL PROTECTED]> writes: > Its row estimates are still way off. As a matter of fact, it almost seems as > if the index doesn't affect row estimates at all. Indexes normally don't affect estimates. Expression indexes do effectively create a new column to generate stats for, but that doesn't really help here because there aren't any estimation functions for the geometric gist indexes. > -> BitmapAnd (cost=8.99..8.99 rows=1 width=0) (actual time=0.485..0.485 > rows=0 loops=135) > -> Bitmap Index Scan on test_table_2_s_id (cost=0.00..2.17 rows=48 > width=0) (actual time=0.015..0.015 rows=1 loops=135) >Index Cond: (s_id = 13300613::numeric) > -> Bitmap Index Scan on test_table_2_n_id (cost=0.00..6.57 rows=735 > width=0) (actual time=0.467..0.467 rows=815 loops=135) >Index Cond: ("outer".id = test_table_2.n_id) If this query is representative then it seems you might be better off without the test_table_2_n_id index. Of course this could be a problem if you need that index for other purposes. I'm puzzled how test_table_2_s_id's estimate isn't more precise. Are there some values of s_id that are quite common and others that are unique? You might try raising the statistics target on s_id. Incidentally, 70ms is pretty good. I'm usually happy if all my mundane queries are under 100ms and the more complex queries in the vicinity of 300ms. Trying to optimize below 100ms is hard because you'll find a lot of variability in the performance. Any extraneous disk i/o from checkpoints, vacuums, even other services, will throw off your expectations. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Bad row estimates
Thank you all for your valuable input. I have tried creating a partial index, a GIST index, and a GIST + partial index, as suggested, but it does not seem to make a significant difference. For instance: CREATE INDEX test_table_1_interval_idx ON test_table_1 USING GIST (box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer, end_ts::abstime::integer))) WHERE id = g_id; ANALYZE test_table_1; EXPLAIN ANALYZE SELECT count(*) FROM test_table_1 INNER JOIN test_table_2 ON (test_table_2.s_id=13300613 AND test_table_1.id = test_table_2.n_id) WHERE box(point(start_ts::abstime::integer, start_ts::abstime::integer), point(end_ts::abstime::integer, end_ts::abstime::integer)) ~ box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer)) AND test_table_1.id = test_table_1.g_id; QUERY PLAN --- Aggregate (cost=15.09..15.10 rows=1 width=0) (actual time=69.771..69.772 rows=1 loops=1) -> Nested Loop (cost=9.06..15.08 rows=1 width=0) (actual time=69.752..69.752 rows=0 loops=1) -> Index Scan using test_table_1_interval_idx on test_table_1 (cost=0.07..4.07 rows=1 width=22) (actual time=2.930..3.607 rows=135 loops=1) Index Cond: (box(pointstart_ts)::abstime)::integer)::double precision, (((start_ts)::abstime)::integer)::double precision), pointend_ts)::abstime)::integer)::double precision, (((end_ts)::abstime)::integer)::double precision)) ~ box(pointnow())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double precision), pointnow())::abstime)::integer)::double precision, (((now())::abstime)::integer)::double precision))) -> Bitmap Heap Scan on test_table_2 (cost=8.99..11.00 rows=1 width=12) (actual time=0.486..0.486 rows=0 loops=135) Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND ("outer".id = test_table_2.n_id)) -> BitmapAnd (cost=8.99..8.99 rows=1 width=0) (actual time=0.485..0.485 rows=0 loops=135) -> Bitmap Index Scan on test_table_2_s_id (cost=0.00..2.17 rows=48 width=0) (actual time=0.015..0.015 rows=1 loops=135) Index Cond: (s_id = 13300613::numeric) -> Bitmap Index Scan on test_table_2_n_id (cost=0.00..6.57 rows=735 width=0) (actual time=0.467..0.467 rows=815 loops=135) Index Cond: ("outer".id = test_table_2.n_id) Total runtime: 69.961 ms (Note: without the GIST index the query currently runs in about 65ms) Its row estimates are still way off. As a matter of fact, it almost seems as if the index doesn't affect row estimates at all. What would you guys suggest? Thanks, Alex Greg Stark wrote: You could actually take short cuts using expression indexes to do this. If it works out well then you might want to implement a real data type to avoid the overhead of the SQL conversion functions. Here's an example. If I were to do this for real I would look for a better datatype than the box datatype and I would wrap the whole conversion in an SQL function. But this will serve to demonstrate: stark=> create table interval_test (start_ts timestamp with time zone, end_ts timestamp with time zone); CREATE TABLE stark=> create index interval_idx on interval_test using gist (box(point(start_ts::abstime::integer, end_ts::abstime::integer) , point(start_ts::abstime::integer, end_ts::abstime::integer))); CREATE INDEX stark=> explain select * from interval_test where box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer)) ~ box(point(start_ts::abstime::integer, end_ts::abstime::integer) , point(start_ts::abstime::integer, end_ts::abstime::integer)); QUERY PLAN ---
[PERFORM] Is good idea an array of 365 elements in a cell of a table, in order to perform searchs?
Hi, Im having a dude with a new inplementation in a web site. The ojective is create a search as fast as possible. I have thought two possibilities to do that: I have several items. Those items has 1 or more of capacity. Each capacity, has several dates (From 1 january to 10 of april, for example). The dates covers 366 days, the current year, and they are indeterminated ranges. Per each date, it has price per day, per week, per15days and per month. I have designed two possibilities: First: IdItem StartDateEndDateCapacity PricePerDay PricePerWeek* PricePer15days*PricePerMonth* 1 1-1-2005 10-1-2005 2100 90 85 80 1 11-1-2005 20-1-2005 2105 94 83 82 1 21-1-2005 5-2-2005 4405 394 283 182 2 ... Right now arround 30.000 rows, in one year is spected to have 60.000 rows * In order to compare right, all prices will be translated to days. Example, PricePerWeek will have the Week Price / 7 and go on Second IdItem Capacity Days Week 15Days Month Year 1 2 [Array of 365 values, one per day of year] [ .Array. ] [ .Array. ] [ .Array. ] [ .Array. ] ^__ Each item of array its a price Right now arround 2.500 rows. in one year is spected to have 5.000 rows I have to compare prices or prices and dates or prices and dates and capacity or capacity and prices I have no experience working with arrays on a table. Is it fast? Witch one do u think will have better performance? Any good idea? I hope this is enouth information. Thanks in advance, Ruben Rubio Rey ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] pg_reset_stats + cache I/O %
Title: RE: [PERFORM] pg_reset_stats + cache I/O % Thanks Tom, sorry I neglected to copy the list on my previous email. Does this query make sense and is it valid for an accurate cache % hit ratio for the entire DB? I would assume I could use the same logic with other views such as pg_stat_user_tables to get a per table ratio? SELECT 100 - round((blks_hit::numeric / (blks_hit::numeric + blks_read::numeric)) * 100,2) AS "Cache % Hit" FROM pg_stat_database WHERE datname = 'Fix1'; Cache % Hit 98.06 (1 row) Thank you, Tim -Original Message- From: Tom Lane [mailto:[EMAIL PROTECTED]] Sent: Tuesday, March 07, 2006 2:37 PM To: mcelroy, tim Cc: 'pgsql-performance@postgresql.org' Subject: Re: [PERFORM] pg_reset_stats + cache I/O % "mcelroy, tim" <[EMAIL PROTECTED]> writes: > ERROR: function round(double precision, integer) does not exist Try coercing to numeric instead of float. Also, it'd be a good idea to put that coercion outside the sum()'s instead of inside --- summing bigints is probably noticeably faster than summing numerics. regards, tom lane