[PERFORM] Feature suggestion : FAST CLUSTER
Well, CLUSTER is so slow (and it doesn't cluster the toast tables associated with the table to be clustered). However, when people use CLUSTER they use it to speed up their queries. For that the table does not need to be perfectly in-order. So, here is a new idea for CLUSTER : - choose a chunk size (about 50% of your RAM) - setup disk sorts for all indexes - seq scan the table : - take a chunk of chunk_size - sort it (in memory) - write it into new table file - while we have the data on-hand, also send the indexed columns data into the corresponding disk-sorts - finish the index disk sorts and rebuild indexes This does not run a complete sort on the table. It would be about as fast as your seq scan disk throughput. Obviously, the end result is not as good as a real CLUSTER since the table will be made up of several ordered chunks and a range lookup. Therefore, a range lookup on the clustered columns would need at most N seeks, versus 1 for a really clustered table. But it only scans the table once and writes it once, even counting index rebuild. I would think that, with this approach, if people can CLUSTER a large table in 5 minutes instead of hours, they will use it, instead of not using it. Therefore, even if the resulting table is not as optimal as a fully clustered table, it will still be much better than the non-clustered case. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgres Benchmark Results
"Alvaro Herrera" <[EMAIL PROTECTED]> writes: > Scott Marlowe wrote: >> >> I thought you were limited to 250 or so COMMITS to disk per second, and >> since >1 client can be committed at once, you could do greater than 250 >> tps, as long as you had >1 client providing input. Or was I wrong? > > My impression is that you are correct in theory -- this is the "commit > delay" feature. But it seems that the feature does not work as well as > one would like; and furthermore, it is disabled by default. Even without commit delay a client will commit any pending WAL records when it syncs the WAL. The clients waiting to commit their records will find it already synced when they get woken up. However as mentioned a while back in practice it doesn't work quite right and you should expect to get 1/2 the expected performance. So even with 10 clients you should expect to see 5*120 tps on a 7200 rpm drive and 5*250 tps on a 15kprm drive. Heikki posted a patch that experimented with fixing this. Hopefully it'll be fixed for 8.4. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Postgres Benchmark Results
What's interesting here is that on a couple metrics the green curve is actually *better* until it takes that nosedive at 500 MB. Obviously it's not better on average hits/s, the most obvious metric. But on deviation and worst-case hits/s it's actually doing better. Note that while the average hits/s between 100 and 500 is over 600 tps for Postgres there is a consistent smattering of plot points spread all the way down to 200 tps, well below the 400-500 tps that MySQL is getting. Some of those are undoubtedly caused by things like checkpoints and vacuum runs. Hopefully the improvements that are already in the pipeline will reduce them. I mention this only to try to move some of the focus from the average performance to trying to remove the pitfalls that affact 1-10% of transactions and screw the worst-case performance. In practical terms it's the worst-case that governs perceptions, not average case. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
I found several post about INSERT/UPDATE performance in this group, but actually it was not really what I am searching an answer for... I have a simple reference table WORD_COUNTS that contains the count of words that appear in a word array storage in another table. CREATE TABLE WORD_COUNTS ( word text NOT NULL, count integer, CONSTRAINT PK_WORD_COUNTS PRIMARY KEY (word) ) WITHOUT OIDS; I have some PL/pgSQL code in a stored procedure like FOR r IN select id, array_of_words from word_storage LOOP begin -- insert the missing words insert into WORD_COUNTS ( word, count ) ( select word, 0 from ( select distinct (r.array_of_words) [s.index] as d_word from generate_series(1, array_upper( r.array_of_words, 1 ) ) as s(index) ) as distinct_words where word not in ( select d_word from WORD_COUNTS ) ); -- update the counts update WORD_COUNTS set count = COALESCE( count, 0 ) + 1 where word in ( select distinct (r.array_of_words)[s.index] as word from generate_series(1, array_upper( r.array_of_words, 1) ) as s(index) ); exception when others then error_count := error_count + 1; end; record_count := record_count + 1; END LOOP; This code runs extremely slowly. It takes about 10 minutes to process 1 records and the word storage has more then 2 million records to be processed. Does anybody have a know-how about populating of such a reference tables and what can be optimized in this situation. Maybe the generate_series() procedure to unnest the array is the place where I loose the performance? Are the set update/inserts more effitient, then single inserts/updates run in smaller loops? Thanks for your help, Valentine Gogichashvili ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
On 22 May 2007 01:23:03 -0700, valgog <[EMAIL PROTECTED]> wrote: I found several post about INSERT/UPDATE performance in this group, but actually it was not really what I am searching an answer for... I have a simple reference table WORD_COUNTS that contains the count of words that appear in a word array storage in another table. CREATE TABLE WORD_COUNTS ( word text NOT NULL, count integer, CONSTRAINT PK_WORD_COUNTS PRIMARY KEY (word) ) WITHOUT OIDS; Is there any reason why count is not not null? (That should siplify your code by removing the coalesce) insert is more efficient than update because update is always a delete followed by an insert. Oh and group by is nearly always quicker than distinct and can always? be rewritten as such. I'm not 100% sure why its different but it is. Peter. I have some PL/pgSQL code in a stored procedure like FOR r IN select id, array_of_words from word_storage LOOP begin -- insert the missing words insert into WORD_COUNTS ( word, count ) ( select word, 0 from ( select distinct (r.array_of_words) [s.index] as d_word from generate_series(1, array_upper( r.array_of_words, 1 ) ) as s(index) ) as distinct_words where word not in ( select d_word from WORD_COUNTS ) ); -- update the counts update WORD_COUNTS set count = COALESCE( count, 0 ) + 1 where word in ( select distinct (r.array_of_words)[s.index] as word from generate_series(1, array_upper( r.array_of_words, 1) ) as s(index) ); exception when others then error_count := error_count + 1; end; record_count := record_count + 1; END LOOP; This code runs extremely slowly. It takes about 10 minutes to process 1 records and the word storage has more then 2 million records to be processed. Does anybody have a know-how about populating of such a reference tables and what can be optimized in this situation. Maybe the generate_series() procedure to unnest the array is the place where I loose the performance? Are the set update/inserts more effitient, then single inserts/updates run in smaller loops? Thanks for your help, Valentine Gogichashvili ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
valgog wrote: I found several post about INSERT/UPDATE performance in this group, but actually it was not really what I am searching an answer for... I have a simple reference table WORD_COUNTS that contains the count of words that appear in a word array storage in another table. I think this is the root of your problem, I'm afraid. You're trying to count individual words when you're storing an array of words. I don't think any of the Gist/GIN indexes will help you with this either. However, since "you don't want to start from here" isn't very useful here and now: 1. See what the performance (explain analyse) of the "select distinct...generate_series()" statement is. I think you're right and it's going to be slow. 2. You're looping through each row of word_storage and counting separately. Write it as one query if possible. 3. As Peter says, don't insert then update, start with an empty table and just insert totals for the lot (see #2). I'd probably write the query in plperl/python or something else that supports hash/dictionary structures. Then just process the whole word_storage into the hash - assuming you only have a few thousand distinct words that shouldn't take up too much memory. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] is file size relevant in choosing index or table scan?
Hi, I have a table with a file size of 400 MB with an index of 100 MB. Does PostgreSQL take the file sizes of both the table and the index into account when determing if it should do a table or an index scan? TIA Joost ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] is file size relevant in choosing index or table scan?
Joost Kraaijeveld wrote: Hi, I have a table with a file size of 400 MB with an index of 100 MB. Does PostgreSQL take the file sizes of both the table and the index into account when determing if it should do a table or an index scan? In effect yes, although it will think in terms of row sizes and disk blocks. It also considers how many rows it thinks it will fetch and whether the rows it wants are together or spread amongst many blocks. It also tries to estimate what the chances are of those blocks being cached in RAM vs still on disk. So: 1 row from a 4 million row table, accessed by primary key => index. 20 rows from a 200 row table => seq scan (probably). In between => depends on your postgresql.conf -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
I have rewritten the code like existing_words_array := ARRAY( select word from WORD_COUNTS where word = ANY ( array_of_words ) ); not_existing_words_array := ARRAY( select distinct_word from ( select distinct (array_of_words)[s.index] as distinct_word from generate_series(1, array_upper( array_of_words, 1 ) ) as s(index) ) as distinct_words where distinct_word <> ALL ( existing_words_array ) ); -- insert the missing words if not_existing_words_array is not null then insert into WORD_COUNTS ( word, count ) ( select word, 1 from ( select not_existing_words_array[s.index] as word from generate_series( 1, array_upper( not_existing_words_array, 1 ) ) as s(index) ) as distinct_words ); end if; -- update the counts if existing_words_array is not null then update WORD_COUNTS set count = COALESCE( count, 0 ) + 1 where sw_word = ANY ( existing_words_array ); end if; Now it processes a million records in 14 seconds... so it was probably the problem of looking up NOT IN WORD_COUNTS was way too expencive ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Postgres Benchmark Results
Note that while the average hits/s between 100 and 500 is over 600 tps for Postgres there is a consistent smattering of plot points spread all the way down to 200 tps, well below the 400-500 tps that MySQL is getting. Yes, these are due to checkpointing, mostly. Also, note that a real forum would not insert 100 posts/s, so it would not feel this effect. But in order to finish the benchmark in a correct amount of time, we have to push on the inserts. Some of those are undoubtedly caused by things like checkpoints and vacuum runs. Hopefully the improvements that are already in the pipeline will reduce them. I am re-running it with other tuning, notably cost-based vacuum delay and less frequent checkpoints, and it is a *lot* smoother. These take a full night to run, so I'll post more results when I have usefull stuff to show. This has proven to be a very interesting trip to benchmarkland... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
On Tue, 22 May 2007 10:23:03 +0200, valgog <[EMAIL PROTECTED]> wrote: I found several post about INSERT/UPDATE performance in this group, but actually it was not really what I am searching an answer for... I have a simple reference table WORD_COUNTS that contains the count of words that appear in a word array storage in another table. Mmm. If I were you, I would : - Create a procedure that flattens all the arrays and returns all the words : PROCEDURE flatten_arrays RETURNS SETOF TEXT FOR word_array IN SELECT word_array FROM your_table LOOP FOR i IN 1...array_upper( word_array ) LOOP RETURN NEXT tolower( word_array[ i ] ) So, SELECT * FROM flatten_arrays() returns all the words in all the arrays. To get the counts quickly I'd do this : SELECT word, count(*) FROM flatten_arrays() AS word GROUP BY word You can then populate your counts table very easily and quickly, since it's just a seq scan and hash aggregate. One second for 10.000 rows would be slow. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
On May 22, 12:14 pm, [EMAIL PROTECTED] (PFC) wrote: > On Tue, 22 May 2007 10:23:03 +0200, valgog <[EMAIL PROTECTED]> wrote: > > I found several post about INSERT/UPDATE performance in this group, > > but actually it was not really what I am searching an answer for... > > > I have a simple reference table WORD_COUNTS that contains the count of > > words that appear in a word array storage in another table. > > Mmm. > > If I were you, I would : > > - Create a procedure that flattens all the arrays and returns all the > > words : > > PROCEDURE flatten_arrays RETURNS SETOF TEXT > FOR word_array IN SELECT word_array FROM your_table LOOP > FOR i IN 1...array_upper( word_array ) LOOP > RETURN NEXT tolower( word_array[ i ] ) > > So, SELECT * FROM flatten_arrays() returns all the words in all the arrays. > To get the counts quickly I'd do this : > > SELECT word, count(*) FROM flatten_arrays() AS word GROUP BY word > > You can then populate your counts table very easily and quickly, since > it's just a seq scan and hash aggregate. One second for 10.000 rows would > be slow. > > ---(end of broadcast)--- > TIP 3: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faq good idea indeed! will try this approach. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
On May 22, 12:00 pm, valgog <[EMAIL PROTECTED]> wrote: > I have rewritten the code like > > existing_words_array := ARRAY( select word >from WORD_COUNTS > where word = ANY > ( array_of_words ) ); > not_existing_words_array := ARRAY( select distinct_word >from ( select distinct > (array_of_words)[s.index] as distinct_word > from > generate_series(1, array_upper( array_of_words, 1 ) ) as s(index) > ) as distinct_words > where distinct_word <> ALL > ( existing_words_array ) ); > -- insert the missing words > if not_existing_words_array is not null then > insert into WORD_COUNTS > ( word, count ) > ( select word, 1 > from ( select > not_existing_words_array[s.index] as word > from generate_series( 1, > array_upper( not_existing_words_array, 1 ) ) as s(index) ) as > distinct_words > ); > end if; > -- update the counts > if existing_words_array is not null then > update WORD_COUNTS >set count = COALESCE( count, 0 ) + 1 > where sw_word = ANY ( existing_words_array ); > end if; > > Now it processes a million records in 14 seconds... so it was probably > the problem of looking up NOT IN WORD_COUNTS was way too expencive Sorry... this code did not update anythig at all, as I forgot about the NULL values... had to COALASCE practically everything and use array_upper()... do not have the performance numbers of the insert, updates yet... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Performace comparison of indexes over timestamp fields
Hi all, I have some tables where all the queries that will be executed are timestamps driven, so it'd be nice to have an index over those fields. On older versions of PostgreSQL, at least in my experience, queries on timestamps fields even having indexes where performing quite bad mainly sequential scans where performed. Now I have a newer version of PostgreSQL and I've done some tests comparing the performance of an index over a timestamp field with a numeric field. To do so, I have the following table: Table "public.payment_transactions" Column |Type |Modifiers +-+- transaction_id | character varying(32) | not null timestamp_in | timestamp without time zone | default now() credits| integer | epoch_in | bigint | epoch_in2 | double precision| Indexes: "pk_paytrans_transid" PRIMARY KEY, btree (transaction_id) "idx_paytrans_epochin" btree (epoch_in) "idx_paytrans_epochin2" btree (epoch_in2) "idx_paytrans_timestamp" btree (timestamp_in) timestamp_in it's the timestamp, epoch_in and epoch_in2 are the epoch equivalent to timestamp to test how the indexes perform. We have three different indexes (testing purposes) one over a timestamp field, one over an int8 and one over a double precision field. While doing the tests this table has about 100.000 entries. To test the diferent indexes I have executed the following: Index over timestamp_in (timestamp) # explain analyze select * from payment_transactions where timestamp_in between '2007-02-13'::timestamp and '2007-02-15'::timestamp; QUERY PLAN --- Index Scan using idx_paytrans_timestamp on payment_transactions (cost=0.00..1480.24 rows=1698 width=138) (actual time=11.693..310.402 rows=1587 loops=1) Index Cond: ((timestamp_in >= '2007-02-13 00:00:00'::timestamp without time zone) AND (timestamp_in <= '2007-02-15 00:00:00'::timestamp without time zone)) Total runtime: 318.328 ms (3 rows) Index over epoch_in (int8) # explain analyze select * from payment_transactions where epoch_in between extract( epoch from '2007-02-13'::date )::int8 and extract( epoch from '2007-02-15'::date )::int8; QUERY PLAN - Index Scan using idx_paytrans_epochin on payment_transactions (cost=0.00..1483.24 rows=1698 width=138) (actual time=34.369..114.943 rows=1587 loops=1) Index Cond: ((epoch_in >= 1171321200::bigint) AND (epoch_in <= 1171494000::bigint)) Total runtime: 120.804 ms (3 rows) Index over epoch_in (double precision) # explain analyze select * from payment_transactions where epoch_in2 between extract( epoch from '2007-02-13'::date ) and extract( epoch from '2007-02-15'::date ); QUERY PLAN - Index Scan using idx_paytrans_epochin2 on payment_transactions (cost=0.00..1479.24 rows=1698 width=138) (actual time=26.115..51.357 rows=1587 loops=1) Index Cond: ((epoch_in2 >= 1171321200::double precision) AND (epoch_in2 <= 1171494000::double precision)) Total runtime: 57.065 ms (3 rows) As you can see the time difference are very big Timestamp:318.328 ms int8 index: 120.804 ms double precision: 57.065 ms is this normal? am I doing anything wrong? As rule of thumb is better to store epochs than timestamps? Thank you very much -- Arnau ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Key/Value reference table generation: INSERT/UPDATE performance
Le mardi 22 mai 2007, Richard Huxton a écrit : > valgog wrote: > > I found several post about INSERT/UPDATE performance in this group, > > but actually it was not really what I am searching an answer for... > > > > I have a simple reference table WORD_COUNTS that contains the count of > > words that appear in a word array storage in another table. > > I think this is the root of your problem, I'm afraid. You're trying to > count individual words when you're storing an array of words. I don't > think any of the Gist/GIN indexes will help you with this either. > > However, since "you don't want to start from here" isn't very useful > here and now: > > 1. See what the performance (explain analyse) of the "select > distinct...generate_series()" statement is. I think you're right and > it's going to be slow. > 2. You're looping through each row of word_storage and counting > separately. Write it as one query if possible. > 3. As Peter says, don't insert then update, start with an empty table > and just insert totals for the lot (see #2). > > I'd probably write the query in plperl/python or something else that > supports hash/dictionary structures. Then just process the whole > word_storage into the hash - assuming you only have a few thousand > distinct words that shouldn't take up too much memory. +1 I made something very similar, and using PL/pgsql is very slow, when using perl is very quick. I have also use partioning because of cost of update (copy last partition to the new, adding the new count, so there is only insert, and drop old table if you want) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performace comparison of indexes over timestamp fields
On 5/22/07, Arnau <[EMAIL PROTECTED]> wrote: On older versions of PostgreSQL, at least in my experience, queries on timestamps fields even having indexes where performing quite bad mainly sequential scans where performed. PostgreSQL uses B-tree indexes for scalar values. For an expression such as "t between a and b", I believe it's going to match both sides of the table independently (ie., t >= a and t <= b) and intersect these subsets. This is inefficient. You should get better performance by mapping timestamps to a one-dimensional plane and indexing them using GiST. GiST implements an R-tree-like structure that supports bounding-box searches. This involves setting up a functional index: create index ... on payment_transactions using gist ( box(point(extract(epoch from time), 0), point(extract(epoch from time), 0)) box_ops) I'm using box() here because GiST doesn't have a concept of points. Then insert as usual, and then query with something like: select ... from payment_transactions where box( point(extract(epoch from '2006-04-01'::date), 0), point(extract(epoch from '2006-08-01'::date), 0)) && box( point(extract(epoch from time), 0), point(extract(epoch from time), 0)); PostgreSQL should be able to exploit the GiST index by recognizing that the result of box() expression operand is already computed in the index. This much less inconvenient and portable -- I would love for PostgreSQL to be provide syntactic sugar and special-casing to make this transparent -- but worth it if you are dealing with a lot of range searches. Now I have a newer version of PostgreSQL and I've done some tests comparing the performance of an index over a timestamp field with a numeric field. To do so, I have the following table: Table "public.payment_transactions" Column |Type |Modifiers +-+- transaction_id | character varying(32) | not null timestamp_in | timestamp without time zone | default now() credits| integer | epoch_in | bigint | epoch_in2 | double precision| [snip] A timestamp is stored internally as an 8-byte double-precision float. Therefore, timestamp_in and epoch_in2 should behave identically. While doing the tests this table has about 100.000 entries. Make sure PostgreSQL is able to keep the entire table in memory by setting shared_buffers; you don't want to be hitting to the disk. Make sure you run "analyze" on the table before you execute the test. To test the diferent indexes I have executed the following: Your query plans are roughly identical. The difference in the timings implies that you only ran the queries once. I suggest you run each query at least 10 times, and report the individual numbers (the "total runtime" parts of the output) you get. Arithmetic means are not that interesting. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performace comparison of indexes over timestamp fields
On Tue, May 22, 2007 at 02:39:33PM +0200, Alexander Staubo wrote: > PostgreSQL uses B-tree indexes for scalar values. For an expression > such as "t between a and b", I believe it's going to match both sides > of the table independently (ie., t >= a and t <= b) and intersect > these subsets. This is inefficient. A B-tree index can satisfy range queries such as this. > You should get better performance by mapping timestamps to a > one-dimensional plane and indexing them using GiST. GiST implements an > R-tree-like structure that supports bounding-box searches. You may be thinking of interval overlaps? /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performace comparison of indexes over timestamp fields
On 5/22/07, Steinar H. Gunderson <[EMAIL PROTECTED]> wrote: On Tue, May 22, 2007 at 02:39:33PM +0200, Alexander Staubo wrote: > PostgreSQL uses B-tree indexes for scalar values. For an expression > such as "t between a and b", I believe it's going to match both sides > of the table independently (ie., t >= a and t <= b) and intersect > these subsets. This is inefficient. A B-tree index can satisfy range queries such as this. You're right, and I'm wrong -- my head is not in the right place today. B-trees are inefficient for intervals, but quite satisfactory for range searches. Alexander. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Tips & Tricks for validating hardware/os
Hi, Out of curiosity, can anyone share his tips & tricks to validate a machine before labelling it as 'ready to use postgres - you probably won't trash my data today' ? I'm looking for a way to stress test components especially kernel/disk to have confidence > 0 that I can use postgres on top of it. Any secret trick is welcome (beside the memtest one :) Thanks ! -- stephane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Tips & Tricks for validating hardware/os
On 5/22/07, Stephane Bailliez <[EMAIL PROTECTED]> wrote: Out of curiosity, can anyone share his tips & tricks to validate a machine before labelling it as 'ready to use postgres - you probably won't trash my data today' ? I'm looking for a way to stress test components especially kernel/disk to have confidence > 0 that I can use postgres on top of it. Any secret trick is welcome (beside the memtest one :) Compile the Linux kernel -- it's a pretty decent stress test. You could run pgbench, which comes with PostgreSQL (as part of the contrib package). Give a database size that's larger than the amount of physical memory in the box. Alexander. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tips & Tricks for validating hardware/os
Out of curiosity, can anyone share his tips & tricks to validate a machine before labelling it as 'ready to use postgres - you probably won't trash my data today' ? I'm looking for a way to stress test components especially kernel/disk to have confidence > 0 that I can use postgres on top of it. That would be running a filesystem benchmark, pulling the plug, then counting the dead. http://sr5tech.com/write_back_cache_experiments.htm ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Performace comparison of indexes over timestamp fields
Arnau <[EMAIL PROTECTED]> writes: > As you can see the time difference are very big >Timestamp:318.328 ms >int8 index: 120.804 ms >double precision: 57.065 ms As already suggested elsewhere, you probably weren't sufficiently careful in taking your measurements. A look at the code says that int8 comparison ought to be the fastest of these. If timestamps are implemented as floats (which you didn't say) their comparison speed ought to be *exactly* the same as floats, because the comparison functions are line-for-line the same. If timestamps are implemented as int8 then they should be similar to int8 comparisons, maybe a tad slower due to an extra level of function call. But in any case it seems likely that the actual comparison function calls would be just a small fraction of the runtime. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Domains versus Check Constraints
Are there any performance improvements that come from using a domain over a check constraint (aside from the ease of management component)? thanks -- Chander Ganesan Open Technology Group, Inc. One Copley Parkway, Suite 210 Morrisville, NC 27560 Phone: 877-258-8987/919-463-0999 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Drop table vs Delete record
My application has two threads, one inserts thousands of records per second into a table (t1) and the other thread periodically deletes expired records (also in thousands) from the same table (expired ones). So, we have one thread adding a row while the other thread is trying to delete a row. In a short time the overall performance of any sql statements on that instance degrades. (ex. Select count(*) from t1 takes more then few seconds with less than 10K rows). My question is: Would any sql statement perform better if I would rename the table to t1_%indx periodically, create a new table t1 (for new inserts) and just drop the tables with expired records rather then doing a delete record? (t1 is a simple table with many rows and no constraints). (I know I could run vacuum analyze) Thanks, Orhan A.
Re: [PERFORM] Drop table vs Delete record
Consider table partitioning (it's described in the manual). Andreas -- Ursprüngl. Mitteil. -- Betreff:[PERFORM] Drop table vs Delete record Von:"Orhan Aglagul" <[EMAIL PROTECTED]> Datum: 22.05.2007 18:42 My application has two threads, one inserts thousands of records per second into a table (t1) and the other thread periodically deletes expired records (also in thousands) from the same table (expired ones). So, we have one thread adding a row while the other thread is trying to delete a row. In a short time the overall performance of any sql statements on that instance degrades. (ex. Select count(*) from t1 takes more then few seconds with less than 10K rows). My question is: Would any sql statement perform better if I would rename the table to t1_%indx periodically, create a new table t1 (for new inserts) and just drop the tables with expired records rather then doing a delete record? (t1 is a simple table with many rows and no constraints). (I know I could run vacuum analyze) Thanks, Orhan A. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Drop table vs Delete record
Checking out right now Thanks for the fast response. -Original Message- From: Andreas Kostyrka [mailto:[EMAIL PROTECTED] Sent: Tuesday, May 22, 2007 11:49 AM To: Orhan Aglagul Cc: Subject: AW: [PERFORM] Drop table vs Delete record Consider table partitioning (it's described in the manual). Andreas -- Ursprüngl. Mitteil. -- Betreff:[PERFORM] Drop table vs Delete record Von:"Orhan Aglagul" <[EMAIL PROTECTED]> Datum: 22.05.2007 18:42 My application has two threads, one inserts thousands of records per second into a table (t1) and the other thread periodically deletes expired records (also in thousands) from the same table (expired ones). So, we have one thread adding a row while the other thread is trying to delete a row. In a short time the overall performance of any sql statements on that instance degrades. (ex. Select count(*) from t1 takes more then few seconds with less than 10K rows). My question is: Would any sql statement perform better if I would rename the table to t1_%indx periodically, create a new table t1 (for new inserts) and just drop the tables with expired records rather then doing a delete record? (t1 is a simple table with many rows and no constraints). (I know I could run vacuum analyze) Thanks, Orhan A. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Postgres Benchmark Results
On Tue, 22 May 2007, Gregory Stark wrote: However as mentioned a while back in practice it doesn't work quite right and you should expect to get 1/2 the expected performance. So even with 10 clients you should expect to see 5*120 tps on a 7200 rpm drive and 5*250 tps on a 15kprm drive. I would agree that's the approximate size of the upper-bound. There are so many factors that go into the effectiveness of commit_delay that I wouldn't word it so strongly as to say you can "expect" that much benefit. The exact delay amount (which can be hard to set if your client load varies greatly), size of the transactions, balance of seek-bound reads vs. memory based ones in the transactions, serialization in the transaction stream, and so many other things can slow the effective benefit. Also, there are generally other performance issues in the types of systems you would think would get the most benefit from this parameter that end up slowing things down anyway. I've been seeing a best case of closer to 2*single tps rather than 5* on my single-drive systems with no write caching, but I'll admit I haven't done an exhausting look at it yet (too busy with the real systems that have good controllers). One of these days... -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Tips & Tricks for validating hardware/os
You forgot pulling some RAID drives at random times to see how the hardware deals with the fact. And how it deals with the rebuild afterwards. (Many RAID solutions leave you with worst of both worlds, taking longer to rebuild than a restore from backup would take, while at the same ime providing a disc IO performance that is SO bad that the server becomes useless during the rebuild) Andreas -- Ursprüngl. Mitteil. -- Betreff:Re: [PERFORM] Tips & Tricks for validating hardware/os Von:Greg Smith <[EMAIL PROTECTED]> Datum: 23.05.2007 05:15 On Tue, 22 May 2007, Stephane Bailliez wrote: > Out of curiosity, can anyone share his tips & tricks to validate a machine > before labelling it as 'ready to use postgres - you probably won't trash my > data today' ? Write a little script that runs pgbench in a loop forever. Set your shared_buffer cache to use at least 50% of the memory in the machine, and adjust the database size and concurrent clients so it's generating a substantial amount of disk I/O and using a fair amount of the CPU. Install the script so that it executes on system startup, like adding it to rc.local Put the machine close to your desk. Every time you walk by it, kill the power and then start it back up. This will give you a mix of long overnight runs with no interruption to stress the overall system, with a nice dose of recovery trauma. Skim the Postgres and OS log files every day. Do that for a week, if it's still running your data should be safe under real conditions. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Tips & Tricks for validating hardware/os
On Tue, 22 May 2007, Stephane Bailliez wrote: Out of curiosity, can anyone share his tips & tricks to validate a machine before labelling it as 'ready to use postgres - you probably won't trash my data today' ? Write a little script that runs pgbench in a loop forever. Set your shared_buffer cache to use at least 50% of the memory in the machine, and adjust the database size and concurrent clients so it's generating a substantial amount of disk I/O and using a fair amount of the CPU. Install the script so that it executes on system startup, like adding it to rc.local Put the machine close to your desk. Every time you walk by it, kill the power and then start it back up. This will give you a mix of long overnight runs with no interruption to stress the overall system, with a nice dose of recovery trauma. Skim the Postgres and OS log files every day. Do that for a week, if it's still running your data should be safe under real conditions. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate