[PERFORM] best db schema for time series data?
Hi, I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? Thanks, -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
Hello my opinion: @1 can be faster for access to last items with index @2 can be more effective about data files length allocation @1 or @2 - it depends on number of prices per product. For small number (less 100) I am strong for @2 (if speed is important). Personally prefer @2. Pavel 2010/11/16 Louis-David Mitterrand vindex+lists-pgsql-performa...@apartia.org: Hi, I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? Thanks, -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
On Tue, Nov 16, 2010 at 12:03:29PM +0100, Pavel Stehule wrote: Hello my opinion: @1 can be faster for access to last items with index @2 can be more effective about data files length allocation Hi Pavel, What is data files length allocation ? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
2010/11/16 Louis-David Mitterrand vindex+lists-pgsql-performa...@apartia.org: On Tue, Nov 16, 2010 at 12:03:29PM +0100, Pavel Stehule wrote: Hello my opinion: @1 can be faster for access to last items with index @2 can be more effective about data files length allocation Hi Pavel, What is data files length allocation ? size of data files on disc :) pg needs a some bytes for head on every row - so if you use a array, then you share its. Next varlena types (like array) can be compressed. Pavel -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
On 16-11-2010 11:50, Louis-David Mitterrand wrote: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? If you mostly need the last few prices, I'd definitaly go with the first aproach, its much cleaner. Besides, you can store a date/time per price, so you know when it changed. With the array-approach that's a bit harder to do. If you're concerned with performance, introduce some form of a materialized view for the most recent price of a product. Or reverse the entire process and make a current price-table and a price history-table. Best regards, Arjen -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
On Tue, Nov 16, 2010 at 12:18:35PM +0100, Arjen van der Meijden wrote: On 16-11-2010 11:50, Louis-David Mitterrand wrote: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? If you mostly need the last few prices, I'd definitaly go with the first aproach, its much cleaner. Besides, you can store a date/time per price, so you know when it changed. With the array-approach that's a bit harder to do. If you're concerned with performance, introduce some form of a materialized view for the most recent price of a product. Or reverse the entire process and make a current price-table and a price history-table. That's exactly my current 'modus operandi'. So it's nice to have confirmation that I'm not using the worst schema out there :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
Hi, If you mostly need the last few prices, I'd definitaly go with the first aproach, its much cleaner. Besides, you can store a date/time per price, so you know when it changed. We too were using such an approach for 'soft deletes'. Soon we realized that using a one char valid flag to mark the latest records was better. It was easier to filter on that. An index on the modified date column was not being used consistently for some reason or the other. The VALID records form a small portion of the big table and an index on the column help fetch the data pretty fast. Of course, you could partition on the flag also (we did not have to). A slight processing overhead of updating the valid FLAG column is the penalty. This was an Oracle database. Regards, Jayadevan DISCLAIMER: The information in this e-mail and any attachment is intended only for the person to whom it is addressed and may contain confidential and/or privileged material. If you have received this e-mail in error, kindly contact the sender and destroy all copies of the original communication. IBS makes no warranty, express or implied, nor guarantees the accuracy, adequacy or completeness of the information contained in this email or any attachment and is not liable for any errors, defects, omissions, viruses or for resultant loss or damage, if any, direct or indirect. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] autovacuum blocks the operations of other manual vacuum
Excerpts from kuopo's message of vie nov 12 05:01:24 -0300 2010: Hi, I have a question about the behavior of autovacuum. When I have a big table A which is being processed by autovacuum, I also manually use (full) vacuum to clean another table B. Then I found that I always got something like “found 0 removable, 14283 nonremovable row”. However, if I stop the autovacuum functionality and use vacuum on that big table A manually, I can clean table B (ex. found 22615 removable, 2049 nonremovable row). Is this correct? Why do vacuum and autovacuum have different actions? Vacuum full does not assume that it can clean up tuples while other transactions are running, and that includes the (non full, or lazy) vacuum that autovacuum is running. Autovacuum only runs lazy vacuum; and that one is aware that other concurrent vacuums can be ignored. Just don't use vacuum full unless strictly necessary. It has other drawbacks. -- Álvaro Herrera alvhe...@commandprompt.com The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
In article 4ce2688b.2050...@tweakers.net, Arjen van der Meijden acmmail...@tweakers.net writes: On 16-11-2010 11:50, Louis-David Mitterrand wrote: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? If you mostly need the last few prices, I'd definitaly go with the first aproach, its much cleaner. Besides, you can store a date/time per price, so you know when it changed. With the array-approach that's a bit harder to do. I'd probably use a variant of this: CREATE TABLE prices ( pid int NOT NULL REFERENCES products, validTil timestamp(0) NULL, price int NOT NULL, UNIQUE (pid, validTil) ); The current price of a product is always the row with validTil IS NULL. The lookup should be pretty fast because it can use the index of the UNIQUE constraint. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand) writes: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? I'd definitely bias towards #1, but with a bit of a change... create table product ( id_product serial primary key ); create table price ( id_product integer references product, as_at timestamptz default now(), primary key (id_product, as_at), price integer ); The query to get the last 5 prices for a product should be splendidly efficient: select price, as_at from price where id_product = 17 order by as_at desc limit 5; (That'll use the PK index perfectly nicely.) If you needed higher performance, for latest price, then I'd add a secondary table, and use triggers to copy latest price into place: create table latest_prices ( id_product integer primary key references product, price integer ); create or replace function capture_latest_price () returns trigger as $$ declare begin delete from latest_prices where id_product = NEW.id_product; insert into latest_prices (id_product,price) values (NEW.id_product, NEW.price); return NEW; end $$ language plpgsql; create trigger price_capture after insert on price execute procedure capture_latest_price(); This captures *just* the latest price for each product. (There's a bit of race condition - if there are two concurrent price updates, one will fail, which wouldn't happen without this trigger in place.) -- ... Turns out that JPG was in fact using his brain... and I am inclined to encourage him to continue the practice even if it isn't exactly what I would have done myself. -- Alan Bawden (way out of context) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Time for a deeper look at what's going on here...I installed RHEL6 Beta 2 yesterday, on the presumption that since the release version just came out this week it was likely the same version Marti tested against. Also, it was the one I already had a DVD to install for. This was on a laptop with 7200 RPM hard drive, already containing an Ubuntu installation for comparison sake. Initial testing was done with the PostgreSQL test_fsync utility, just to get a gross idea of what situations the drives involved were likely flushing data to disk correctly during, and which it was impossible for that to be true. 7200 RPM = 120 rotations/second, which puts an upper limit of 120 true fsync executions per second. The test_fsync released with PostgreSQL 9.0 now reports its value on the right scale that you can directly compare against that (earlier versions reported seconds/commit, not commits/second). First I built test_fsync from inside of an existing PostgreSQL 9.1 HEAD checkout: $ cd [PostgreSQL source code tree] $ cd src/tools/fsync/ $ make And I started with looking at the Ubuntu system running ext3, which represents the status quo we've been seeing the past few years. Initially the drive write cache was turned on: Linux meddle 2.6.28-19-generic #61-Ubuntu SMP Wed May 26 23:35:15 UTC 2010 i686 GNU/Linux $ cat /etc/lsb-release DISTRIB_ID=Ubuntu DISTRIB_RELEASE=9.04 DISTRIB_CODENAME=jaunty DISTRIB_DESCRIPTION=Ubuntu 9.04 /dev/sda5 on / type ext3 (rw,relatime,errors=remount-ro) $ ./test_fsync Loops = 1 Simple write: 8k write 88476.784/second Compare file sync methods using one write: (unavailable: open_datasync) open_sync 8k write 1192.135/second 8k write, fdatasync1222.158/second 8k write, fsync1097.980/second Compare file sync methods using two writes: (unavailable: open_datasync) 2 open_sync 8k writes 527.361/second 8k write, 8k write, fdatasync 1105.204/second 8k write, 8k write, fsync 1084.050/second Compare open_sync with different sizes: open_sync 16k write 966.047/second 2 open_sync 8k writes 529.565/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 1064.177/second 8k write, close, fsync 1042.337/second Two notable things here. One, there is no open_datasync defined in this older kernel. Two, all methods of commit give equally inflated commit rates, far faster than the drive is capable of. This proves this setup isn't flushing the drive's write cache after commit. You can get safe behavior out of the old kernel by disabling its write cache: $ sudo /sbin/hdparm -W0 /dev/sda /dev/sda: setting drive write-caching to 0 (off) write-caching = 0 (off) Loops = 1 Simple write: 8k write 89023.413/second Compare file sync methods using one write: (unavailable: open_datasync) open_sync 8k write 106.968/second 8k write, fdatasync 108.106/second 8k write, fsync 104.238/second Compare file sync methods using two writes: (unavailable: open_datasync) 2 open_sync 8k writes51.637/second 8k write, 8k write, fdatasync 109.256/second 8k write, 8k write, fsync 103.952/second Compare open_sync with different sizes: open_sync 16k write 109.562/second 2 open_sync 8k writes52.752/second Test if fsync on non-write file descriptor is honored: (If the times are similar, fsync() can sync data written on a different descriptor.) 8k write, fsync, close 107.179/second 8k write, close, fsync 106.923/second And now results are as expected: just under 120/second. Onto RHEL6. Setup for this initial test was: $ uname -a Linux meddle 2.6.32-44.1.el6.x86_64 #1 SMP Wed Jul 14 18:51:29 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux $ cat /etc/redhat-release Red Hat Enterprise Linux Server release 6.0 Beta (Santiago) $ mount /dev/sda7 on / type ext4 (rw) And I started with the write cache off to see a straight comparison against the above: $ sudo hdparm -W0 /dev/sda /dev/sda: setting drive write-caching to 0 (off) write-caching = 0 (off) $ ./test_fsync Loops = 1 Simple write: 8k write 104194.886/second Compare file sync methods using one write: open_datasync 8k write 97.828/second open_sync 8k write 109.158/second 8k write, fdatasync 109.838/second 8k write, fsync 20.872/second Compare file sync methods using two writes: 2 open_datasync 8k writes53.902/second 2 open_sync 8k writes53.721/second 8k write, 8k write, fdatasync 109.731/second 8k write, 8k write, fsync20.918/second Compare open_sync with different sizes:
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Tue, Nov 16, 2010 at 3:39 PM, Greg Smith g...@2ndquadrant.com wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. How about both? open_datasync seems problematic for a number of reasons - you get an immediate write-through whether you need it or not, including, as you point out, the case where the you want to write several blocks at once and then force them all out together. And 64kB for a ring buffer just seems awfully small. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Josh Berkus j...@agliodbs.com writes: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wed, Nov 17, 2010 at 01:31, Tom Lane t...@sss.pgh.pa.us wrote: Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? In my testing, when running a large bulk insert query with fdatasync on ext4, changing wal_buffers has very little effect: http://ompldr.org/vNjNiNQ/wal_sync_method1.png (More details at http://archives.postgresql.org/pgsql-performance/2010-11/msg00094.php ) It would take some more testing to say this conclusively, but looking at the raw data, there only seems to be an effect when moving from 8 to 16MB. Could be different on other file systems though. Regards, Marti -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Josh Berkus wrote: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. Speaking of the SYSV SHMEM, is it possible to use huge pages? -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wednesday 17 November 2010 00:31:34 Tom Lane wrote: Josh Berkus j...@agliodbs.com writes: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. Well, we're not going to increase the default to gigabytes Especially not as I don't think it will have any effect after wal_segment_size as that will force a write-out anyway. Or am I misremembering the implementation? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 00:31:34 Tom Lane wrote: Well, we're not going to increase the default to gigabytes Especially not as I don't think it will have any effect after wal_segment_size as that will force a write-out anyway. Or am I misremembering the implementation? Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wednesday 17 November 2010 01:51:28 Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 00:31:34 Tom Lane wrote: Well, we're not going to increase the default to gigabytes Especially not as I don't think it will have any effect after wal_segment_size as that will force a write-out anyway. Or am I misremembering the implementation? Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. Maybe I am missing something, but I think the relevant AdvanceXLInsertBuffer() is currently called with WALInsertLock held? Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 01:51:28 Tom Lane wrote: Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. Maybe I am missing something, but I think the relevant AdvanceXLInsertBuffer() is currently called with WALInsertLock held? The fsync is associated with the write, which is not done with insert lock held. We're not quite that dumb. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Wednesday 17 November 2010 02:04:28 Tom Lane wrote: Andres Freund and...@anarazel.de writes: On Wednesday 17 November 2010 01:51:28 Tom Lane wrote: Well, there's a forced fsync after writing the last page of an xlog file, but I don't believe that proves that more than 16MB of xlog buffers is useless. Other processes could still be busy filling the buffers. Maybe I am missing something, but I think the relevant AdvanceXLInsertBuffer() is currently called with WALInsertLock held? The fsync is associated with the write, which is not done with insert lock held. We're not quite that dumb. Ah, I see. The XLogWrite in AdvanceXLInsertBuffer is only happening if the head of the buffer gets to the tail - which is more likely if the wal buffers are small... Andres -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? Last I checked, though, this comes out of the allocation available to shared_buffers. And there definitely are several OSes (several linuxes, OSX) still limited to 32MB by default. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
Josh Berkus j...@agliodbs.com writes: Well, we're not going to increase the default to gigabytes, but we could very probably increase it by a factor of 10 or so without anyone squawking. It's been awhile since I heard of anyone trying to run PG in 4MB shmmax. How much would a change of that size help? Last I checked, though, this comes out of the allocation available to shared_buffers. And there definitely are several OSes (several linuxes, OSX) still limited to 32MB by default. Sure, but the current default is a measly 64kB. We could increase that 10x for a relatively small percentage hit in the size of shared_buffers, if you suppose that there's 32MB available. The current default is set to still work if you've got only a couple of MB in SHMMAX. What we'd want is for initdb to adjust the setting as part of its probing to see what SHMMAX is set to. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?
On Tue, Nov 16, 2010 at 6:25 PM, Josh Berkus j...@agliodbs.com wrote: On 11/16/10 12:39 PM, Greg Smith wrote: I want to next go through and replicate some of the actual database level tests before giving a full opinion on whether this data proves it's worth changing the wal_sync_method detection. So far I'm torn between whether that's the right approach, or if we should just increase the default value for wal_buffers to something more reasonable. We'd love to, but wal_buffers uses sysV shmem. places tongue firmly in cheek Gee, too bad there's not some other shared-memory implementation we could use... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query Performance SQL Server vs. Postgresql
2010/11/17 Humair Mohammed huma...@hotmail.com: There are no indexes on the tables either in SQL Server or Postgresql - I am comparing apples to apples here. I ran ANALYZE on the postgresql tables, after that query performance times are still high 42 seconds with COALESCE and 35 seconds with IS DISTINCT FROM. Here is the execution plan from Postgresql for qurey - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and coalesce(pb.response,'MISSING') coalesce(pg.response,'MISSING') Execution Time: 42 seconds Hash Join (cost=16212.30..48854.24 rows=93477 width=17) Hash Cond: (((pb.id)::text = (pg.id)::text) AND ((pb.question)::text = (pg.question)::text)) Join Filter: ((COALESCE(pb.response, 'MISSING'::character varying))::text (COALESCE(pg.response, 'MISSING'::character varying))::text) - Seq Scan on pivotbad pb (cost=0.00..2804.96 rows=93496 width=134) - Hash (cost=7537.12..7537.12 rows=251212 width=134) - Seq Scan on pivotgood pg (cost=0.00..7537.12 rows=251212 width=134) this is little bit strange - did you ANALYZE and VACUUM? please send result of EXPLAIN ANALYZE Pavel And here is the execution plan from SQL Server for query - select pb.id from pivotbad pb inner join pivotgood pg on pb.id = pg.id and pb.question = pg.question and isnull(pb.response,'ISNULL') isnull(pg.response,'ISNULL') Execution Time: 1 second Cost: 1% |--Parallelism(Gather Streams) Cost: 31% |--Hash Match(Inner Join, HASH:([pb].[ID], [pb].[Question])=([pg].[ID], [pg].[Question]), RESIDUAL:([master].[dbo].[pivotbad].[ID] as [pb].[ID]=[master].[dbo].[pivotgood].[ID] as [pg].[ID] AND [master].[dbo].[pivotbad].[Question] as [pb].[Question]=[master].[dbo].[pivotgood].[Question] as [pg].[Question] AND [Expr1006][Expr1007])) Cost: 0% |--Bitmap(HASH:([pb].[ID], [pb].[Question]), DEFINE:([Bitmap1008])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1006]=isnull([master].[dbo].[pivotbad].[Response] as [pb].[Response],'ISNULL'))) Cost: 6% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pb].[ID], [pb].[Question])) Cost: 12% |--Table Scan(OBJECT:([master].[dbo].[pivotbad] AS [pb])) Cost: 0% |--Compute Scalar(DEFINE:([Expr1007]=isnull([master].[dbo].[pivotgood].[Response] as [pg].[Response],'ISNULL'))) Cost: 17% |--Parallelism(Repartition Streams, Hash Partitioning, PARTITION COLUMNS:([pg].[ID], [pg].[Question])) Cost: 33% |--Table Scan(OBJECT:([master].[dbo].[pivotgood] AS [pg]), WHERE:(PROBE([Bitmap1008],[master].[dbo].[pivotgood].[ID] as [pg].[ID],[master].[dbo].[pivotgood].[Question] as [pg].[Question]))) From: pavel.steh...@gmail.com Date: Tue, 16 Nov 2010 08:12:03 +0100 Subject: Re: [PERFORM] To: huma...@hotmail.com CC: pgsql-performance@postgresql.org 2010/11/15 Humair Mohammed huma...@hotmail.com: I have 2 tables with a 200,000 rows of data 3 character/string columns ID, Question and Response. The query below compares the data between the 2 tables based on ID and Question and if the Response does not match between the left table and the right table it identifies the ID's where there is a mismatch. Running the query in SQL Server 2008 using the ISNULL function take a few milliseconds. Running the same query in Postgresql takes over 70 seconds. The 2 queries are below: SQL Server 2008 R2 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and isnull(t1.response,'ISNULL') isnull(t2.response,'ISNULL') Postgres 9.1 Query select t1.id from table1 t1 inner join table2 t2 on t1.id = t2.id and t1.question = t2.question and coalesce(t1.response,'ISNULL') coalesce(t2.response,'ISNULL') What gives? I think, so must problem can be in ugly predicate coalesce(t1.response,'ISNULL') coalesce(t2.response,'ISNULL') try use a IS DISTINCT OF operator ... AND t1.response IS DISTINCT t2.response Regards Pavel Stehule p.s. don't use a coalesce in WHERE clause if it is possible. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance