Re: [PERFORM] planner with index scan cost way off actual cost, advices to tweak cost constants?
Guillaume, Thanks for your answer. On 17 Mar 2006 11:09:50 +0100, Guillaume Cottenceau wrote: Reading the documentation and postgresql list archives, I have run ANALYZE right before my tests, I have increased the statistics target to 50 for the considered table; my problem is that the index scan cost reported by EXPLAIN seems to be around 12.7 times higher that it should, a figure I suppose incompatible (too large) for just random_page_cost and effective_cache_size tweaks. It's not surprising you have a high cost for an index scan which is planned to return and returns so much rows. I really don't think the planner does something wrong on this one. My point is that the planner's cost estimate is way above the actual cost of the query, so the planner doesn't use the best plan. Even if the index returns so much rows, actual cost of the query is so that index scan (worst case, all disk cache flushed) is still better than seq scan but the planner uses seq scan. AFAIK, increasing the statistics target won't do anything to reduce the cost as the planner estimation for the number of returned rows is already really accurate and probably can't be better. Ok, thanks. Of course real queries use smaller date ranges. What about providing us the respective plans for your real queries? And in a real case. It's a bad idea to compare index scan and seqscan The original query is more complicated and sometimes involves restricting the resultset with another constraint. I am not sure it is very interesting to show it; I know that best performance would be achieved with an index on the date column for the shown query, and an index on the date column and the other column when doing a query on these.. when your data have to be loaded in RAM. What do you mean? That I should not flush disk cache before timing? I did so to find the worst case.. I am not sure it is the best solution.. maybe half worst case would be? but this depends a lot on whether the index pages would stay in disk cache or not before next query.. which cannot be told for sure unless a full serious timing of the real application is done (and my application can be used in quite different scenarios, which means such a test is not entirely possible/meaningful). Before doing so create an index on the date column to have the most effective index possible. Yes, as I said, I know that doing this would improve a lot the queries. My point was to understand why the cost of the index scan is so inaccurate compared to actual cost. Adding an index on the date column enlarges the data by 100-150M so I'd rather save this if possible. - I then tried to tweak random_page_cost and effective_cache_size following advices from documentation: SET random_page_cost = 2; random_page_cost is the way to go for this sort of thing but I don't think it's a good idea to have it too low globally and I'm still Thanks, I suspected so. thinking the problem is that your test case is not accurate. Ok. -- Guillaume Cottenceau ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] data doesnt get saved in the database / idle in transaction
Hi, I have a strange problem with my Postgres application. The problem is that the data entered in the application never reaches the database, although the record id (serial) is generated, and the record can be retrieved again, and be modified. Multiple records can be added and modified. But when i check the data with psql, the record is not there. The application uses persistant database connection, and when i check the status of the connection, it shows: idle in transaction. I am pretty sure that every insert is being committed with explicit commit() . It always worked before weird. thanks for any hints Ksenia. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] data doesnt get saved in the database / idle in transaction
Ksenia Marasanova [EMAIL PROTECTED] wrote The application uses persistant database connection, and when i check the status of the connection, it shows: idle in transaction. I am pretty sure that every insert is being committed with explicit commit() . It always worked before weird. Try to use the following command to see what commands reach the server: set log_statement = all; Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote: last pid: 5788; load averages: 0.32, 0.31, 0.28 up 127+15:16:08 13:59:24 169 processes: 1 running, 168 sleeping CPU states: 5.4% user, 0.0% nice, 9.9% system, 0.0% interrupt, 84.7% idle Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free Swap: 4096M Total, 216K Used, 4096M Free PID USERNAME PRI NICE SIZERES STATE C TIME WCPUCPU COMMAND 14501 pgsql 2 0 254M 242M select 2 76:26 1.95% 1.95% postgre 5720 root 28 0 2164K 1360K CPU0 0 0:00 1.84% 0.88% top 5785 pgsql 2 0 255M 29296K sbwait 0 0:00 3.00% 0.15% postgre 5782 pgsql 2 0 255M 11900K sbwait 0 0:00 3.00% 0.15% postgre 5772 pgsql 2 0 255M 11708K sbwait 2 0:00 1.54% 0.15% postgre That doesn't look good. Is this machine freshly rebooted, or has it been running postgres for a while? 179M cache and 199M buffer with 2.6 gig inactive is horrible for a machine running a 10gig databases. No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as Free. It's the same as 'active' memory except that it's pages that haven't been accessed in X amount of time (between 100 and 200 ms, I think). When free memory starts getting low, FBSD will start moving pages from the inactive queue to the free queue (possibly resulting in writes to disk along the way). IIRC, Cache is the directory cache, and Buf is disk buffers, which is somewhat akin to shared_buffers in PostgreSQL. -- Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED] Pervasive Software http://pervasive.comwork: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Migration study, step 1: bulk write performance optimization
Ok, here's the deal: I am responisble for an exciting project of evaluating migration of a medium/large application for a well-known swedish cartruck manufacturer from a proprietary DB to Postgres. The size of the database is currently about 50Gb, annual growth depending on sales, but probably in the 30-50Gb range. Migrating the schema was easily done, mostly involving a search/replace of some vendor specific datatypes. The next step is to migrate the data itself, and for this we have written a Java app relying on JDBC metadata to map the tables in the source schema to the target schema. The goal right now is to find the set of parameters that gives as short bulk insert time as possible, minimizing downtime while the data itself is migrated. The machine used for the study is a Dell PE2850, 6GB memory, 1xXEON 3.0GHz/2MB cache, internal SCSI 0+1 raid (currently 4x36GB 1rpm striped+mirrored, two more 146GB 15000rpm disks will arrive later). Not sure about the brand/model of the raid controller, so I'll leave that for now. File system is ext3(I know, maybe not the optimal choice but this is how it was when I got it) with a 8k block size. The OS currently installed is CentOS4. Until the new disks arrive, both the OS itself, pg_xlog and the data reside on the same disks. When they arrive, I will probably move the data to the new disks (need two more to get raid 0+1, though) and leave the OS + pg_xlog on the 1rpm disks. Mounting the 15000rpm data disks with the noatime option (this is safe, right?) and using a 16kb block size (for read performance) will probably be considered as well. NOTE: this machine/configuration is NOT what we will be using in production if the study turns out OK, it's just supposed to work as a development machine in the first phase whose purpose more or less is to get familiar with configurating Postgres and see if we can get the application up and running (we will probably use a 64bit platform and either a FC SAN or internal raid with a battery backed cache for production use, if all goes well). The first thing I did when I got the machine was to do a raw dd write test: # time bash -c (dd if=/dev/zero of=/opt/bigfile count=1310720 bs=8k sync) 1310720+0 records in 1310720+0 records out real2m21.438s user0m0.998s sys 0m51.347s (10*1024)Mb/~141s = ~75.5Mb/s As a simple benchmark, I created a simple table without PK/indexes with 1k wide rows: create table iotest.one_kb_rows ( the_col char(1024) not null ); To fill the table, I use this simple function: create or replace function iotest.writestress(megs integer) returns void as $$ declare char_str char(1024) := repeat('x', 1024); begin for i in 1..megs loop for j in 1..1024 loop insert into one_kb_rows(the_col) values (char_str); end loop; end loop; end; $$ language plpgsql; Then, I tested how long it takes to write 10Gb of data to this table: iotest= \timing Timing is on. iotest= select writestress((10*1024)); writestress - (1 row) Time: 379971.252 ms This gives that 10Gb takes ~380s = ~27Mb/s (with fsync=off), compared to the raw dd result (~75.5Mb/s). I assume this difference is due to: - simultaneous WAL write activity (assumed: for each byte written to the table, at least one byte is also written to WAL, in effect: 10Gb data inserted in the table equals 20Gb written to disk) - lousy test method (it is done using a function = the transaction size is 10Gb, and 10Gb will *not* fit in wal_buffers :) ) - poor config - something else? I have tried to read up as much as possible on Postgres configuration (disk layout, buffer management, WAL sync methods, etc) and found this post regarding bgwriter tweaking: http://archives.postgresql.org/pgsql-performance/2006-03/msg00218.php - which explains the bgwriter config below. All params in postgresql.conf that are not commented out: - max_connections = 100 superuser_reserved_connections = 2 shared_buffers = 16000 bgwriter_lru_percent = 20 bgwriter_lru_maxpages = 160 bgwriter_all_percent = 10 bgwriter_all_maxpages = 320 fsync = off wal_sync_method = open_sync wal_buffers = 128 checkpoint_segments = 3 log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_rotation_age = 1440 log_line_prefix = '%m: ([EMAIL PROTECTED]) ' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' fsync can safely be kept off during data migration as we are able to restart the procedure without losing data if something goes wrong. Increasing chekpoint_segments to 8/16/32 only increased the insert time, so I kept it at the default. I will increase shared_buffers and effective_cache_size as
[PERFORM] Query Feromance
Hello! Can I Increment the perfomance of execution query? Where is the instrument to analyze the query runnnig for create a Index query for a single optimize that? thank's Marco Furetto Berri ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Migration study, step 1: bulk write performance optimization
Mikael Carneholm wrote: I am responisble for an exciting project of evaluating migration of a medium/large application for a well-known swedish cartruck manufacturer ... The goal right now is to find the set of parameters that gives as short bulk insert time as possible, minimizing downtime while the data itself is migrated. If you haven't explored the COPY command yet, check it out. It is stunningly fast compared to normal INSERT commands. http://www.postgresql.org/docs/8.1/static/sql-copy.html pg_dump and pg_restore make use of the COPY command. Since you're coming from a different vendor, you'd have to dump the data into a COPY-compatible set of files yourself. But it will be worth the effort. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Migration study, step 1: bulk write performance
Mikael, I've just recently passed such an experience, i.e. migrating from another vendor to postgres of a DB about the same size category you have. I think you got it right with the fsync turned off during migration (just don't forget to turn it back after finishing ;-), and using tables without indexes/foreign keys. In our case recreating all the indexes/foreign keys/other constraints took actually longer than the raw data transfer itself... but it's possible that the process was not tuned 100%, we are still learning how to tune postgres... What I can add from our experience: ext3 turned out lousy for our application, and converting to XFS made a quite big improvement for our DB load. I don't have hard figures, but I think it was some 30% improvement in overall speed, and it had a huge improvement for heavy load times... what I mean is that with ext3 we had multiple parallel big tasks executing in more time than if we would have executed them sequentially, and with XFS that was gone, load scales linearly. In any case you should test the performance of your application on different FS and different settings, as this could make a huge difference. And another thing, we're still fighting with performance problems due to the fact that our application was designed to perform well with the other DB product... I think you'll have more work to do in this regard than just some search/replace ;-) Cheers, Csaba. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
On Mon, 2006-03-20 at 08:45, Jim C. Nasby wrote: On Fri, Mar 17, 2006 at 05:00:34PM -0600, Scott Marlowe wrote: last pid: 5788; load averages: 0.32, 0.31, 0.28 up 127+15:16:08 13:59:24 169 processes: 1 running, 168 sleeping CPU states: 5.4% user, 0.0% nice, 9.9% system, 0.0% interrupt, 84.7% idle Mem: 181M Active, 2632M Inact, 329M Wired, 179M Cache, 199M Buf, 81M Free Swap: 4096M Total, 216K Used, 4096M Free PID USERNAME PRI NICE SIZERES STATE C TIME WCPUCPU COMMAND 14501 pgsql 2 0 254M 242M select 2 76:26 1.95% 1.95% postgre 5720 root 28 0 2164K 1360K CPU0 0 0:00 1.84% 0.88% top 5785 pgsql 2 0 255M 29296K sbwait 0 0:00 3.00% 0.15% postgre 5782 pgsql 2 0 255M 11900K sbwait 0 0:00 3.00% 0.15% postgre 5772 pgsql 2 0 255M 11708K sbwait 2 0:00 1.54% 0.15% postgre That doesn't look good. Is this machine freshly rebooted, or has it been running postgres for a while? 179M cache and 199M buffer with 2.6 gig inactive is horrible for a machine running a 10gig databases. No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as Free. It's the same as 'active' memory except that it's pages that haven't been accessed in X amount of time (between 100 and 200 ms, I think). When free memory starts getting low, FBSD will start moving pages from the inactive queue to the free queue (possibly resulting in writes to disk along the way). IIRC, Cache is the directory cache, and Buf is disk buffers, which is somewhat akin to shared_buffers in PostgreSQL. So, then, the inact is pretty much the same as kernel buffers in linux? ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Migration study, step 1: bulk write performance optimization
using a 16kb block size (for read performance) will probably be considered as well. Hm, this means that when postgres wants to write just one 8k page, the OS will have to read 16k, replace half of it with the new block, and write 16k again... I guess it should be better to stick with the usual block size. Also, it will have to read 16k every time it rally wants to read one page... which happens quite often except for seq scan. NOTE: this machine/configuration is NOT what we will be using in production if the study turns out OK, it's just supposed to work as a development machine in the first phase whose purpose more or less is to get familiar with configurating Postgres and see if we can get the application up and running (we will probably use a 64bit platform and Opteron xDDD Use XFS or Reiser... ext3 isn't well suited for this. use noatime AND nodiratime. It's safe to turn off fsync while importing your data. For optimum speed, put the WAL on another physical disk. Look in the docs which of maintenance_work_mem, or work_mem or sort_mem is used for index creation, and set it to a very large value, to speed up that index creation. Create your indexes with fsync=off also. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query Feromance
Marco, Could you give us the query you would like to improve performance? - Original Message - From: Marco Furetto [EMAIL PROTECTED] To: pgsql-performance@postgresql.org Sent: Monday, March 20, 2006 11:59 AM Subject: [PERFORM] Query Feromance Hello! Can I Increment the perfomance of execution query? Where is the instrument to analyze the query runnnig for create a Index query for a single optimize that? thank's Marco Furetto Berri ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Migration study, step 1: bulk write performance optimization
Others are reporting better performance on 8.1.x with very large shared buffers. You may want to try tweaking that possibly as high as 20% of available memory Dave On 20-Mar-06, at 9:59 AM, Mikael Carneholm wrote: Ok, here's the deal: I am responisble for an exciting project of evaluating migration of a medium/large application for a well-known swedish cartruck manufacturer from a proprietary DB to Postgres. The size of the database is currently about 50Gb, annual growth depending on sales, but probably in the 30-50Gb range. Migrating the schema was easily done, mostly involving a search/ replace of some vendor specific datatypes. The next step is to migrate the data itself, and for this we have written a Java app relying on JDBC metadata to map the tables in the source schema to the target schema. The goal right now is to find the set of parameters that gives as short bulk insert time as possible, minimizing downtime while the data itself is migrated. The machine used for the study is a Dell PE2850, 6GB memory, 1xXEON 3.0GHz/2MB cache, internal SCSI 0+1 raid (currently 4x36GB 1rpm striped+mirrored, two more 146GB 15000rpm disks will arrive later). Not sure about the brand/model of the raid controller, so I'll leave that for now. File system is ext3(I know, maybe not the optimal choice but this is how it was when I got it) with a 8k block size. The OS currently installed is CentOS4. Until the new disks arrive, both the OS itself, pg_xlog and the data reside on the same disks. When they arrive, I will probably move the data to the new disks (need two more to get raid 0+1, though) and leave the OS + pg_xlog on the 1rpm disks. Mounting the 15000rpm data disks with the noatime option (this is safe, right?) and using a 16kb block size (for read performance) will probably be considered as well. NOTE: this machine/configuration is NOT what we will be using in production if the study turns out OK, it's just supposed to work as a development machine in the first phase whose purpose more or less is to get familiar with configurating Postgres and see if we can get the application up and running (we will probably use a 64bit platform and either a FC SAN or internal raid with a battery backed cache for production use, if all goes well). The first thing I did when I got the machine was to do a raw dd write test: # time bash -c (dd if=/dev/zero of=/opt/bigfile count=1310720 bs=8k sync) 1310720+0 records in 1310720+0 records out real2m21.438s user0m0.998s sys 0m51.347s (10*1024)Mb/~141s = ~75.5Mb/s As a simple benchmark, I created a simple table without PK/indexes with 1k wide rows: create table iotest.one_kb_rows ( the_col char(1024) not null ); To fill the table, I use this simple function: create or replace function iotest.writestress(megs integer) returns void as $$ declare char_str char(1024) := repeat('x', 1024); begin for i in 1..megs loop for j in 1..1024 loop insert into one_kb_rows(the_col) values (char_str); end loop; end loop; end; $$ language plpgsql; Then, I tested how long it takes to write 10Gb of data to this table: iotest= \timing Timing is on. iotest= select writestress((10*1024)); writestress - (1 row) Time: 379971.252 ms This gives that 10Gb takes ~380s = ~27Mb/s (with fsync=off), compared to the raw dd result (~75.5Mb/s). I assume this difference is due to: - simultaneous WAL write activity (assumed: for each byte written to the table, at least one byte is also written to WAL, in effect: 10Gb data inserted in the table equals 20Gb written to disk) - lousy test method (it is done using a function = the transaction size is 10Gb, and 10Gb will *not* fit in wal_buffers :) ) - poor config - something else? I have tried to read up as much as possible on Postgres configuration (disk layout, buffer management, WAL sync methods, etc) and found this post regarding bgwriter tweaking: http:// archives.postgresql.org/pgsql-performance/2006-03/msg00218.php - which explains the bgwriter config below. All params in postgresql.conf that are not commented out: - max_connections = 100 superuser_reserved_connections = 2 shared_buffers = 16000 bgwriter_lru_percent = 20 bgwriter_lru_maxpages = 160 bgwriter_all_percent = 10 bgwriter_all_maxpages = 320 fsync = off wal_sync_method = open_sync wal_buffers = 128 checkpoint_segments = 3 log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' log_rotation_age = 1440 log_line_prefix = '%m: ([EMAIL PROTECTED]) ' lc_messages = 'C' lc_monetary = 'C' lc_numeric = 'C' lc_time = 'C' fsync can safely be kept off during data migration as we are able to restart the procedure without losing data if something goes wrong. Increasing chekpoint_segments to 8/16/32 only increased the insert time, so I kept it at the default. I will
Re: [PERFORM] 1 TB of memory
On Mar 17, 2006, at 8:55 AM, Merlin Moncure wrote: I like their approach...ddr ram + raid sanity backup + super reliable power system. Their prices are on jupiter (and i dont mean jupiter, fl) but hopefully there will be some competition and the invetible Nothing unique to them. I have a 4 year old SSD from a now out-of- business company, Imperial Technology. Initially we bought it for about $20k with 1GB of RAM. Subsequently upgraded to 5GB for another $20k. The speed is wicked fast even with just ultra2 SCSI (4 channels). The unit has the same battery backup to disk stuff (although it only does the backup at power fail). At one time they quoted me about $80k to upgrade it to a full 32MB that the unit supports. I passed. For my use it was worth the price. However, given the speed increase of other components since then, I don't think I'd buy one today. Parallelism (if you can do it like Luke suggested) is the way to go. And no, I have not run a database on one of these... though I am tempted to... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 1 TB of memory
On Mar 17, 2006, at 5:07 PM, Scott Marlowe wrote: Open Source SSD via iSCSI with commodity hardware... hmmm. sounds like a useful project. sh! don't give away our top secret plans! ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Auto performance tuning?
I have to say I've been really impressed with the quality and diversity of tools here to increase performance for PostgreSQL. But I keep seeing a lot of the same basic things repeated again and again. Has anyone looked into a smart or auto-adjusting resource manager for postgres? Consider for instance you set it to aggressively use system resources, then it would do things like notice that it needs more work mem after profiling a few thousand queries and adds it for you, or that a specific index or table should be moved to a different spindle and does it in the background, or that query plans keep screwing up on a particular table so it knows to up the amount of stastics it keeps on that table. Is this a crazy idea or something someone's already working on? Orion ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S
On Mar 17, 2006, at 5:11 PM, Kenji Morishige wrote: In summary, my questions: 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance? FreeBSD 6.x will definitely get you improvements. Many speedup improvements have been made to both the generic disk layer and the specific drivers. However, the current best of breed RAID controller is the LSI 320-x (I use 320-2X). I have one box into which this card will not fit (Thanks Sun, for making a box with only low-profile slots!) so I use an Adaptec 2230SLP card in it. Testing shows it is about 80% speed of a LSI 320-2x on sequential workload (load DB, run some queries, rebuild indexes, etc.) If you do put on FreeBSD 6, I'd love to see the output of diskinfo - v -t on your RAID volume(s). 2. Should I change SCSI controller config to use RAID 10 instead of 5? I use RAID10. 3. Why isn't postgres using all 4GB of ram for at least caching table for reads? I think FreeBSD has a hard upper limit on the total ram it will use for disk cache. I haven't been able to get reliable, irrefutable, answers about it, though. 4. Are there any other settings in the conf file I could try to tweak? I like to bump up the checkpoint segments to 256. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 1 TB of memory
I like their approach...ddr ram + raid sanity backup + super reliable power system. Their prices are on jupiter (and i dont mean jupiter, fl) but hopefully there will be some competition and the invetible Nothing unique to them. I have a 4 year old SSD from a now out-of- business company, Imperial Technology. Initially we bought it for about $20k with 1GB of RAM. Subsequently upgraded to 5GB for another $20k. The speed is wicked fast even with just ultra2 SCSI (4 channels). The unit has the same battery backup to disk stuff (although it only does the backup at power fail). you may or may not be intersted to know they are back in business :). For my use it was worth the price. However, given the speed increase of other components since then, I don't think I'd buy one today. Parallelism (if you can do it like Luke suggested) is the way to go. Thats an interesting statement. My personal opionion is that SSD will ultimately take over the database storage market as well as most consumer level devices for primary storage. except perhaps for very large databases (1tb). Hard disk drives will displace tapes for backup storage. merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
On Mon, 20 Mar 2006, Jim C. Nasby wrote: No, this is perfectly fine. Inactive memory in FreeBSD isn't the same as Free. It's the same as 'active' memory except that it's pages that haven't been accessed in X amount of time (between 100 and 200 ms, I think). When free memory starts getting low, FBSD will start moving pages from the inactive queue to the free queue (possibly resulting in writes to disk along the way). IIRC, Cache is the directory cache, and Buf is disk buffers, which is somewhat akin to shared_buffers in PostgreSQL. I don't believe that's true. I'm not an expert in FreeBSD's VM internals, but this is how I believe it works: Active pages are pages currently mapped in to a process's address space. Inactive pages are pages which are marked dirty (must be written to backing store before they can be freed) and which are not mapped in to a process's address. They're still associated with a VM object of some kind - like part of a process's virtual address space or a as part of the cache for a file on disk. If it's still part of a process's virtual address space and is accessed a fault is generated. The page is then put back in to the address mappings. Cached pages are like inactive pages but aren't dirty. Then can be either re-mapped or freed immediately. Free pages are properly free. Wired pages are unswappable. Buf I'm not sure about. It doesn't represent that amount of memory used to cache files on disk, I'm sure of that. The sysctl -d description is 'KVA memory used for bufs', so I suspect that it's the amount of kernel virtual address space mapped to pages in the 'active', 'inactive' and 'cache' queues. -- Alex Hayward Seatbooker ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec
Vivek Khera wrote: On Mar 17, 2006, at 5:11 PM, Kenji Morishige wrote: In summary, my questions: 1. Would running PG on FreeBSD 5.x or 6.x or Linux improve performance? FreeBSD 6.x will definitely get you improvements. Many speedup improvements have been made to both the generic disk layer and the specific drivers. However, the current best of breed RAID controller is the LSI 320-x (I use 320-2X). I have one box into which this card will not fit (Thanks Sun, for making a box with only low-profile slots!) so I use an Adaptec 2230SLP card in it. Testing shows it is about 80% speed of a LSI 320-2x on sequential workload (load DB, run some queries, rebuild indexes, etc.) If you do put on FreeBSD 6, I'd love to see the output of diskinfo - v -t on your RAID volume(s). Not directly related ... i have a HP dl380 g3 with array 5i controlled (1+0), these are my results shiva2# /usr/sbin/diskinfo -v -t /dev/da2s1d /dev/da2s1d 512 # sectorsize 218513555456# mediasize in bytes (204G) 426784288 # mediasize in sectors 52301 # Cylinders according to firmware. 255 # Heads according to firmware. 32 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 1.138232 sec =4.553 msec Half stroke: 250 iter in 1.084474 sec =4.338 msec Quarter stroke: 500 iter in 1.690313 sec =3.381 msec Short forward:400 iter in 0.752646 sec =1.882 msec Short backward: 400 iter in 1.306270 sec =3.266 msec Seq outer: 2048 iter in 0.766676 sec =0.374 msec Seq inner: 2048 iter in 0.803759 sec =0.392 msec Transfer rates: outside: 102400 kbytes in 2.075984 sec =49326 kbytes/sec middle:102400 kbytes in 2.100510 sec =48750 kbytes/sec inside:102400 kbytes in 2.042313 sec =50139 kbytes/sec is this good enough? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
Miguel, On 3/20/06 12:52 PM, Miguel [EMAIL PROTECTED] wrote: i have a HP dl380 g3 with array 5i controlled (1+0), these are my results Another known bad RAID controller. The Smartarray 5i is horrible on Linux - this is the first BSD result I've seen. Seek times: Full stroke: 250 iter in 1.138232 sec =4.553 msec Half stroke: 250 iter in 1.084474 sec =4.338 msec These seem OK - are they access times or are they actually seek times? Seems like with RAID 10, you should get better by maybe double. Transfer rates: outside: 102400 kbytes in 2.075984 sec =49326 kbytes/sec middle:102400 kbytes in 2.100510 sec =48750 kbytes/sec inside:102400 kbytes in 2.042313 sec =50139 kbytes/sec is this good enough? It's pretty slow. How many disk drives do you have? - Luke ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec
Luke Lonergan wrote: Miguel, On 3/20/06 12:52 PM, Miguel [EMAIL PROTECTED] wrote: i have a HP dl380 g3 with array 5i controlled (1+0), these are my results Another known bad RAID controller. The Smartarray 5i is horrible on Linux - this is the first BSD result I've seen. Seek times: Full stroke: 250 iter in 1.138232 sec =4.553 msec Half stroke: 250 iter in 1.084474 sec =4.338 msec These seem OK - are they access times or are they actually seek times? i dont know, how can i check? Transfer rates: outside: 102400 kbytes in 2.075984 sec =49326 kbytes/sec middle:102400 kbytes in 2.100510 sec =48750 kbytes/sec inside:102400 kbytes in 2.042313 sec =50139 kbytes/sec is this good enough? It's pretty slow. How many disk drives do you have? I have 6 ultra a320 72G 10k discs --- Miguel ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
Miguel, On 3/20/06 1:12 PM, Miguel [EMAIL PROTECTED] wrote: i dont know, how can i check? No matter - it's the benchmark that would tell you, it's probably access time that's being measured even though the text says seek time. The difference is that seek time represents only the head motion, where access time is the whole access including seek. Access times of 4.5ms are typical of a single 10K RPM SCSI disk drive like the Seagate barracuda. Transfer rates: outside: 102400 kbytes in 2.075984 sec =49326 kbytes/sec middle:102400 kbytes in 2.100510 sec =48750 kbytes/sec inside:102400 kbytes in 2.042313 sec =50139 kbytes/sec I have 6 ultra a320 72G 10k discs Yah - ouch. With 6 drives in a RAID10, you should expect 3 drives worth of sequential scan performance, or anywhere from 100MB/s to 180MB/s. You're getting from half to 1/3 of the performance you'd get with a decent raid controller. If you add a simple SCSI adapter like the common LSI U320 adapter to your DL380G3 and then run software RAID, you will get more than 150MB/s with less CPU consumption. I'd also expect you'd get down to about 2ms access times. This might not be easy for you to do, and you might prefer hardware RAID adapters, but I don't have a recommendation for you there. I'd stay away from the HP line. - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec
Luke Lonergan wrote: Transfer rates: outside: 102400 kbytes in 2.075984 sec =49326 kbytes/sec middle:102400 kbytes in 2.100510 sec =48750 kbytes/sec inside:102400 kbytes in 2.042313 sec =50139 kbytes/sec I have 6 ultra a320 72G 10k discs Yah - ouch. With 6 drives in a RAID10, you should expect 3 drives worth of sequential scan performance, or anywhere from 100MB/s to 180MB/s. You're getting from half to 1/3 of the performance you'd get with a decent raid controller. If you add a simple SCSI adapter like the common LSI U320 adapter to your DL380G3 and then run software RAID, you will get more than 150MB/s with less CPU consumption. I'd also expect you'd get down to about 2ms access times. This might not be easy for you to do, and you might prefer hardware RAID adapters, but I don't have a recommendation for you there. I'd stay away from the HP line. This is my new postgreql 8.1.3 server, so i have many options (in fact, any option) to choose from, i want maximum performance, if i undestood you well, do you mean using something like vinum? i forgot to mention that the 6 discs are in a MSA500 G2 external storadge, additionally i have two 36G a320 10k in raid 10 for the os installed in the server slots. --- Miguel ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
Miguel, On 3/20/06 1:51 PM, Miguel [EMAIL PROTECTED] wrote: i forgot to mention that the 6 discs are in a MSA500 G2 external storadge, additionally i have two 36G a320 10k in raid 10 for the os installed in the server slots. I just checked online and I think the MSA500 G2 has it's own SCSI RAID controllers, so you are actually just using the 5i as a SCSI attach, which it's not good at (no reordering/command queueing, etc). So, just using a simple SCSI adapter to connect to the MSA might be a big win. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
Luke Lonergan wrote: Miguel, On 3/20/06 1:51 PM, Miguel [EMAIL PROTECTED] wrote: i forgot to mention that the 6 discs are in a MSA500 G2 external storadge, additionally i have two 36G a320 10k in raid 10 for the os installed in the server slots. I just checked online and I think the MSA500 G2 has it's own SCSI RAID controllers, Yes, it has its own redundant controller, so you are actually just using the 5i as a SCSI attach, which it's not good at (no reordering/command queueing, etc). So, just using a simple SCSI adapter to connect to the MSA might be a big win. I will try a LS320 and will let you know if i got any performance gain, thanks for your advises --- Miguel ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S
If you do put on FreeBSD 6, I'd love to see the output of diskinfo - v -t on your RAID volume(s). Not directly related ... i have a HP dl380 g3 with array 5i controlled (1+0), these are my results [...] is this good enough? Is that on a loaded box or a mostly quiet box? Those number seem rather low for my tastes. For comparison, here are numbers from a Dell 1850 with a built-in PERC 4e/Si RAID in a two disk mirror. All numbers below are on mostly or totally quiet disk systems. amrd0 512 # sectorsize 73274490880 # mediasize in bytes (68G) 143114240 # mediasize in sectors 8908# Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.756718 sec =3.027 msec Half stroke: 250 iter in 0.717824 sec =2.871 msec Quarter stroke: 500 iter in 1.972368 sec =3.945 msec Short forward:400 iter in 1.193179 sec =2.983 msec Short backward: 400 iter in 1.322440 sec =3.306 msec Seq outer: 2048 iter in 0.271402 sec =0.133 msec Seq inner: 2048 iter in 0.271151 sec =0.132 msec Transfer rates: outside: 102400 kbytes in 1.080339 sec =94785 kbytes/sec middle:102400 kbytes in 1.166021 sec =87820 kbytes/sec inside:102400 kbytes in 1.461498 sec =70065 kbytes/sec And for the *real* disks In the following two cases, I used a Dell 1425SC with 1GB RAM and connected the controllers to the same Dell PowerVault 14 disk U320 array (one controller at a time, obviously). For each controller each pair of the mirror was on the opposite channel of the controller for optimal speed. disk 0 is a RAID1 of two drives, and disk 1 is a RAID10 of the remaining 12 drives. All running FreeBSD 6.0 RELEASE. First I tested the Adaptec 2230SLP and got these: aacd0 512 # sectorsize 36385456128 # mediasize in bytes (34G) 71065344# mediasize in sectors 4423# Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 2.288389 sec =9.154 msec Half stroke: 250 iter in 1.657302 sec =6.629 msec Quarter stroke: 500 iter in 2.756597 sec =5.513 msec Short forward:400 iter in 1.205275 sec =3.013 msec Short backward: 400 iter in 1.249310 sec =3.123 msec Seq outer: 2048 iter in 0.412770 sec =0.202 msec Seq inner: 2048 iter in 0.428585 sec =0.209 msec Transfer rates: outside: 102400 kbytes in 1.204412 sec =85021 kbytes/sec middle:102400 kbytes in 1.347325 sec =76002 kbytes/sec inside:102400 kbytes in 2.036832 sec =50274 kbytes/sec aacd1 512 # sectorsize 218307231744# mediasize in bytes (203G) 426381312 # mediasize in sectors 26541 # Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.856699 sec =3.427 msec Half stroke: 250 iter in 1.475651 sec =5.903 msec Quarter stroke: 500 iter in 2.693270 sec =5.387 msec Short forward:400 iter in 1.127831 sec =2.820 msec Short backward: 400 iter in 1.216876 sec =3.042 msec Seq outer: 2048 iter in 0.416340 sec =0.203 msec Seq inner: 2048 iter in 0.436471 sec =0.213 msec Transfer rates: outside: 102400 kbytes in 1.245798 sec =82196 kbytes/sec middle:102400 kbytes in 1.169033 sec =87594 kbytes/sec inside:102400 kbytes in 1.390840 sec =73625 kbytes/sec And the LSI 320-2X card: amrd0 512 # sectorsize 35999711232 # mediasize in bytes (34G) 70311936# mediasize in sectors 4376# Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.737130 sec =2.949 msec Half stroke: 250 iter in 0.694498 sec =2.778 msec Quarter stroke: 500 iter in 2.040667 sec =4.081 msec Short forward:400 iter in 1.418592 sec =3.546 msec Short backward: 400 iter in 0.896076 sec =2.240 msec Seq outer: 2048 iter in 0.292390 sec =0.143 msec Seq inner: 2048
Re: [PERFORM] update == delete + insert?
go with design 1, update does = delete + insert. -- Original Message --- From: Craig A. James [EMAIL PROTECTED] To: pgsql-performance@postgresql.org Sent: Mon, 20 Mar 2006 14:49:43 -0800 Subject: [PERFORM] update == delete + insert? I've seen it said here several times that update == delete + insert. On the other hand, I've noticed that alter table [add|drop] column ... is remarkably fast, even for very large tables, which leads me to wonder whether each column's contents are in a file specifically for that column. My question: Suppose I have a very wide set of data, say 100 columns, and one of those columns will be updated often, but the others are fairly static. I have two choices: Design 1: create table a ( id integer, frequently_updated integer); create table b( id integer, infrequently_updated_1 integer, infrequently_updated_2 integer, infrequently_updated_3 integer, ... etc. infrequently_updated_99 integer); Design 2: create table c( id integer, frequently_updated integer, infrequently_updated_1 integer, infrequently_updated_2 integer, infrequently_updated_3 integer, ... etc. infrequently_updated_99 integer); If update == delete + insert is strictly true, then Design 2 would be poor since 99 columns would be moved around with each update. But if columns are actually stored in separate files, the Designs 1 and 2 would be essentially equivalent when it comes to vacuuming. Thanks, Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend --- End of Original Message --- ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec
Vivek Khera wrote: If you do put on FreeBSD 6, I'd love to see the output of diskinfo - v -t on your RAID volume(s). Not directly related ... i have a HP dl380 g3 with array 5i controlled (1+0), these are my results [...] is this good enough? Is that on a loaded box or a mostly quiet box? Those number seem rather low for my tastes. For comparison, here are numbers from a Dell 1850 with a built-in PERC 4e/Si RAID in a two disk mirror. All numbers below are on mostly or totally quiet disk systems. My numbers are on totally quiet box, i've just installed it. amrd0 512 # sectorsize 73274490880 # mediasize in bytes (68G) 143114240 # mediasize in sectors 8908# Cylinders according to firmware. 255 # Heads according to firmware. 63 # Sectors according to firmware. Seek times: Full stroke: 250 iter in 0.756718 sec =3.027 msec Half stroke: 250 iter in 0.717824 sec =2.871 msec Quarter stroke: 500 iter in 1.972368 sec =3.945 msec Short forward:400 iter in 1.193179 sec =2.983 msec Short backward: 400 iter in 1.322440 sec =3.306 msec Seq outer: 2048 iter in 0.271402 sec =0.133 msec Seq inner: 2048 iter in 0.271151 sec =0.132 msec Transfer rates: outside: 102400 kbytes in 1.080339 sec =94785 kbytes/sec middle:102400 kbytes in 1.166021 sec =87820 kbytes/sec inside:102400 kbytes in 1.461498 sec =70065 kbytes/sec Umm, in my box i see better seektimes but worst transfer rates, does it make sense? i think i have something wrong, the question i cant answer is what tunning am i missing? --- Miguel ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec RAID 2200S
On Mar 20, 2006, at 6:04 PM, Miguel wrote: Umm, in my box i see better seektimes but worst transfer rates, does it make sense? i think i have something wrong, the question i cant answer is what tunning am i missing? Well, I forgot to mention I have 15k RPM disks, so the transfers should be faster. I did no tuning to the disk configurations. I think your controller is either just not supported well in FreeBSD, or is bad in general... I *really* wish LSI would make a low profile card that would fit in a Sun X4100... as it stands the only choice for dual channel cards is the adaptec 2230SLP... ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB Adaptec
Vivek Khera wrote: On Mar 20, 2006, at 6:04 PM, Miguel wrote: Umm, in my box i see better seektimes but worst transfer rates, does it make sense? i think i have something wrong, the question i cant answer is what tunning am i missing? Well, I forgot to mention I have 15k RPM disks, so the transfers should be faster. I did no tuning to the disk configurations. I think your controller is either just not supported well in FreeBSD, or is bad in general... :-( I guess you are right, i made a really bad choice, i better look at dell next time, thanks --- Miguel ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] update == delete + insert?
Craig A. James [EMAIL PROTECTED] writes: I've seen it said here several times that update == delete + insert. On the other hand, I've noticed that alter table [add|drop] column ... is remarkably fast, even for very large tables, which leads me to wonder whether each column's contents are in a file specifically for that column. No. The reason drop column is fast is that we make no attempt to remove the data from existing rows; we only mark the column's entry in the system catalogs as deleted. add column is only fast if you are adding a column with no default (a/k/a default NULL). In that case likewise we don't have to modify existing rows; the desired behavior falls out from the fact that the tuple access routines return NULL if asked to fetch a column beyond those existing in a particular tuple. You can read about the storage layout in http://developer.postgresql.org/docs/postgres/storage.html regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Migration study, step 1: bulk write performance optimization
Craig A. James [EMAIL PROTECTED] writes: If you haven't explored the COPY command yet, check it out. It is stunningly fast compared to normal INSERT commands. Note also that his benchmark is testing multiple INSERTs issued within a loop in a plpgsql function, which has got nearly nothing to do with the performance that will be obtained from INSERTs issued by a client (especially if said INSERTs aren't prepared and/or aren't batched into transactions). regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] update == delete + insert?
On 3/20/06, Craig A. James [EMAIL PROTECTED] wrote: I've seen it said here several times that update == delete + insert. On the other hand, I've noticed that alter table [add|drop] column ... is remarkably fast, even for very large tables, which leads me to wonder whether each column's contents are in a file specifically for that column. My question: Suppose I have a very wide set of data, say 100 columns, and one of those columns will be updated often, but the others are fairly static. I have two choices: Design 1: create table a ( id integer, frequently_updated integer); create table b( id integer, infrequently_updated_1 integer, infrequently_updated_2 integer, infrequently_updated_3 integer, ... etc. infrequently_updated_99 integer); Design 2: create table c( id integer, frequently_updated integer, infrequently_updated_1 integer, infrequently_updated_2 integer, infrequently_updated_3 integer, ... etc. infrequently_updated_99 integer); If update == delete + insert is strictly true, then Design 2 would be poor since 99 columns would be moved around with each update. But if columns are actually stored in separate files, the Designs 1 and 2 would be essentially equivalent when it comes to vacuuming. Thanks, Craig design 1 is normalized and better design 2 is denormalized and a bad approach no matter the RDBMS update does delete + insert, and vacuum is the way to recover the space -- Atentamente, Jaime Casanova What they (MySQL) lose in usability, they gain back in benchmarks, and that's all that matters: getting the wrong answer really fast. Randal L. Schwartz ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Best OS Configuration for Dual Xeon w/4GB
Mark Kirkwood wrote: I think Freebsd 'Inactive' corresponds pretty closely to Linux's 'Inactive Dirty'|'Inactive Laundered'|'Inactive Free'. Hmmm - on second thoughts I think I've got that wrong :-(, since in Linux all the file buffer pages appear in 'Cached' don't they... (I also notice that 'Inactive Laundered' does not seem to be mentioned in vanilla - read non-Redhat - 2.6 kernels) So I think its more correct to say Freebsd 'Inactive' is similar to Linux 'Inactive' + some|most of Linux 'Cached'. A good discussion of how the Freebsd vm works is here: http://www.freebsd.org/doc/en_US.ISO8859-1/books/arch-handbook/vm.html In particular: FreeBSD reserves a limited amount of KVM to hold mappings from struct bufs, but it should be made clear that this KVM is used solely to hold mappings and does not limit the ability to cache data. Cheers Mark ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Perfomance test figures
Hi All,I want tocompare performance of postgresql database with some other database.Somebody must have done some performance testing. Can you pls. share that data (performance figures) with me? And if possible pls. shareprocedure also, that how you have done the same?Thanks In Advance, -Amit