Re: [PERFORM] index structure for 114-dimension vector
On 21-4-2007 1:42 Mark Kirkwood wrote: I don't think that will work for the vector norm i.e: |x - y| = sqrt(sum over j ((x[j] - y[j])^2)) I don't know if this is usefull here, but I was able to rewrite that algorithm for a set of very sparse vectors (i.e. they had very little overlapping factors) to something like: |x - y| = sum over j (x[j]^2) + sum over j (y[j]^2) + for each j where x[j] and y[j] are both non-zero: - (x[j]^2 + y[j]^2) + (x[j] - y[j])^2 The first two parts sums can be calculated only once. So if you have very little overlap, this is therefore much more efficient (if there is no overlap at all you end up with x[j]^2 + y[j]^2 anyway). Besides, this rewritten calculation allows you to store the X and Y vectors using a trivial table-layout vector(x,i,value) which is only filled with non-zero's and which you can trivially self-join to find the closest matches. You don't care about the j's where there is either no x or y-value anyway with this rewrite. I can compare over 1000 y's of on average 100 elements to two x's of over 1000 elements on just a single 1.8Ghz amd processor. (I use it for a bi-kmeans algorithm, so there are only two buckets to compare to). So it might be possible to rewrite your algorithm to be less calculation-intensive. Obviously, with a dense-matrix this isn't going to work, but there may be other ways to circumvent parts of the algorithm or to cache large parts of it. It might also help to extract only the 6 relevant columns into a seperate temporary table which will have much smaller records and thus can fit more records per page. Best regards, Arjen ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: Filesystem fragmentation (Re: [PERFORM] Fragmentation of WAL files)
* Bill Moran: To clarify my viewpoint: To my knowledge, there is no Unix filesystem that _suffers_ from fragmentation. Specifically, all filessytems have some degree of fragmentation that occurs, but every Unix filesystem that I am aware of has built-in mechanisms to mitigate this and prevent it from becoming a performance issue. One database engine tends to create a huge number of fragments because the files are written with holes in them. There is a significant impact on sequential reads, but that doesn't matter much because the engine doesn't implement fast, out-of-order B-tree scans anyway. 8-/ I still think that preallocating in reasonably sized chunks is beneficial. -- Florian Weimer[EMAIL PROTECTED] BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Usage up to 50% CPU
On Fri, Apr 27, 2007 at 04:43:06AM +, Andres Retzlaff wrote: Hi, I have pg 8.1.4 running in Windows XP Pro wirh a Pentium D and I notice that I can not use more than 50% of the cpus (Pentium D has 2 cpus), how can I change the settings to use the 100% of it. A single query will only use one CPU. If you have multiple parallell clients, they will use the differnt CPUs. //Magnus ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Usage up to 50% CPU
Hi Magnus, in this case each CPU goes up to 50%, giveing me 50% total usage. I was specting as you say 1 query 100% cpu. Any ideas? Andrew On Fri, Apr 27, 2007 at 04:43:06AM +, Andres Retzlaff wrote: Hi, I have pg 8.1.4 running in Windows XP Pro wirh a Pentium D and I notice that I can not use more than 50% of the cpus (Pentium D has 2 cpus), how can I change the settings to use the 100% of it. A single query will only use one CPU. If you have multiple parallell clients, they will use the differnt CPUs. //Magnus _ Advertisement: Its simple! Sell your car for just $30 at carsales.com.au http://a.ninemsn.com.au/b.aspx?URL=http%3A%2F%2Fsecure%2Dau%2Eimrworldwide%2Ecom%2Fcgi%2Dbin%2Fa%2Fci%5F450304%2Fet%5F2%2Fcg%5F801577%2Fpi%5F1005244%2Fai%5F838588_t=754951090_r=tig_m=EXT ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Usage up to 50% CPU
On Fri, Apr 27, 2007 at 08:10:48AM +, Andres Retzlaff wrote: Hi Magnus, in this case each CPU goes up to 50%, giveing me 50% total usage. I was specting as you say 1 query 100% cpu. Any ideas? No. 1 query will only use 100% of *one* CPU, which means 50% total usage. You need at least one query per CPU to reach full 100% of the whole system. (Actually, you can get slightly above 50% since the query will run on one CPU and the OS and autovacuum and bgwriter can run on the other. But it's marginally) //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Usage up to 50% CPU
Magnus Hagander wrote: On Fri, Apr 27, 2007 at 08:10:48AM +, Andres Retzlaff wrote: Hi Magnus, in this case each CPU goes up to 50%, giveing me 50% total usage. I was specting as you say 1 query 100% cpu. Any ideas? No. 1 query will only use 100% of *one* CPU, which means 50% total usage. You need at least one query per CPU to reach full 100% of the whole system. (Actually, you can get slightly above 50% since the query will run on one CPU and the OS and autovacuum and bgwriter can run on the other. But it's marginally) //Magnus I would think that as you are sitting and watching the cpu usage, your query would seem to taking a while to run, leading me to wonder if you are getting a full table scan that is causing pg to wait for disk response? Or are you running a long list of steps that take a while? -- Shane Ambler [EMAIL PROTECTED] Get Sheeky @ http://Sheeky.Biz ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Usage up to 50% CPU
On Fri, Apr 27, 2007 at 07:23:41PM +0930, Shane Ambler wrote: I would think that as you are sitting and watching the cpu usage, your query would seem to taking a while to run, leading me to wonder if you are getting a full table scan that is causing pg to wait for disk response? If so, you probably wouldn't be seeing that much cpu usage... (if the cpu is waiting for disk it's idling) Mike Stone ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Tom Lane wrote: Carlos Moreno [EMAIL PROTECTED] writes: ... But, wouldn't it make sense that the configure script determines the amount of physical memory and perhaps even do a HD speed estimate to set up defaults that are closer to a performance-optimized configuration? No. Most copies of Postgres these days are executed on machines very far away from where the code was built. It's a little bit safer to try to tune things at initdb time ... as indeed we already do. D'oh! Yes, that makes more sense, of course. But the fundamental problem remains that we don't know that much about how the installation will be used. Notice that the second part of my suggestion covers this --- have additional switches to initdb so that the user can tell it about estimates on how the DB will be used: estimated size of the DB, estimated percentage of activity that will involve writing, estimated percentage of activity that will be transactions, percentage that will use indexes, percentage of queries that will be complex, etc. etc. Wouldn't initdb be able to do a better job at coming up with sensible defaults if it counts on this information? Of course, all these parameters would have their own defaults --- the user won't necessarily know or have an accurate estimate for each and every one of them. Also, there is an extremely good reason why Postgres will never be set up to try to take over the whole machine by default: most of the developers run multiple postmasters on their machines. Wouldn't this be covered by the above suggestion?? One of the switches for the command initdb could allow the user to specify how many instances will be run (I assume you're talking about having different instances listening on different ports for increased concurrency-related benefits?) Does my suggestion make more sense now? Or is it still too unrealistic to make it work properly/safely? Carlos -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] What`s wrong with JFS configuration?
Adding -performance back in so others can learn. On Apr 26, 2007, at 9:40 AM, Paweł Gruszczyński wrote: Jim Nasby napisał(a): On Apr 25, 2007, at 8:51 AM, Paweł Gruszczyński wrote: where u6 stores Fedora Core 6 operating system, and u0 stores 3 partitions with ext2, ext3 and jfs filesystem. Keep in mind that drives have a faster data transfer rate at the outer-edge than they do at the inner edge, so if you've got all 3 filesystems sitting on that array at the same time it's not a fair test. I heard numbers on the impact of this a *long* time ago and I think it was in the 10% range, but I could be remembering wrong. You'll need to drop each filesystem and create the next one go get a fair comparison. I thought about it by my situation is not so clear, becouse my hard drive for postgresql data is rather logical becouse of RAID array i mode 1+0. My RAID Array is divided like this: Device Boot Start End Blocks Id System /dev/sda1 1 159850 163686384 83 Linux /dev/sda2 159851 319431 163410944 83 Linux /dev/sda3 319432 478742 163134464 83 Linux and partitions are: /dev/sda1 ext2 161117780 5781744 147151720 4% /fs/ext2 /dev/sda2 ext3 160846452 2147848 150528060 2% /fs/ext3 /dev/sda3 jfs 163096512 3913252 159183260 3% /fs/jfs so if RAID 1+0 do not change enything, JFS file system is at third partition wich is at the end of hard drive. Yes, which means that JFS is going to be at a disadvantage to ext3, which will be at a disadvantage to ext2. You should really re-perform the tests with each filesystem in the same location. What about HDD with two magnetic disk`s? Then the speed depending of partition phisical location is more difficult to calculate ;) Propably first is slow, secund is fast in firs halt and slow in secund halt, third is the fastes one. In both cases my JFS partitin should be ath the end on magnetic disk. Am I wrong? I'm not a HDD expert, but as far as I know the number of platters doesn't change anything. When you have multiple platters, the drive essentially splits bytes across all the platters; it doesn't start writing one platter, then switch to another platter. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: Notice that the second part of my suggestion covers this --- have additional switches to initdb so that the user can tell it about estimates on how the DB will be used: estimated size of the DB, estimated percentage of activity that will involve writing, estimated percentage of activity that will be transactions, percentage that will use indexes, percentage of queries that will be complex, etc. etc. If the person knows all that, why wouldn't they know to just change the config parameters? Mike Stone ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Carlos Moreno [EMAIL PROTECTED] writes: Tom Lane wrote: But the fundamental problem remains that we don't know that much about how the installation will be used. Notice that the second part of my suggestion covers this --- have additional switches to initdb That's been proposed and rejected before, too; the main problem being that initdb is frequently a layer or two down from the user (eg, executed by initscripts that can't pass extra arguments through, even assuming they're being invoked by hand in the first place). regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Maybe he's looking for a switch for initdb that would make it interactive and quiz you about your expected usage-- sort of a magic auto-configurator wizard doohicky? I could see that sort of thing being nice for the casual user or newbie who otherwise would have a horribly mis-tuned database. They could instead have only a marginally mis-tuned database :) On Fri, 2007-04-27 at 10:30 -0400, Michael Stone wrote: On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: Notice that the second part of my suggestion covers this --- have additional switches to initdb so that the user can tell it about estimates on how the DB will be used: estimated size of the DB, estimated percentage of activity that will involve writing, estimated percentage of activity that will be transactions, percentage that will use indexes, percentage of queries that will be complex, etc. etc. If the person knows all that, why wouldn't they know to just change the config parameters? Mike Stone ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Hello. Just my 2 cents, and not looking to the technical aspects: setting up PSQL is the weakest point of PSQL as we have experienced ourself, once it is running it is great. I can imagine that a lot of people of stops after their first trials after they have experienced the troubles and performance of a standard set up. This is ofcourse a lost user forever. So anything that could be done to get an easier and BETTER setup would strongly enhance PSQL. My 2 cents. Henk Sanders -Oorspronkelijk bericht- Van: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Tom Lane Verzonden: vrijdag 27 april 2007 16:37 Aan: Carlos Moreno CC: PostgreSQL Performance Onderwerp: Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning Carlos Moreno [EMAIL PROTECTED] writes: Tom Lane wrote: But the fundamental problem remains that we don't know that much about how the installation will be used. Notice that the second part of my suggestion covers this --- have additional switches to initdb That's been proposed and rejected before, too; the main problem being that initdb is frequently a layer or two down from the user (eg, executed by initscripts that can't pass extra arguments through, even assuming they're being invoked by hand in the first place). regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Fri, Apr 27, 2007 at 07:36:52AM -0700, Mark Lewis wrote: Maybe he's looking for a switch for initdb that would make it interactive and quiz you about your expected usage-- sort of a magic auto-configurator wizard doohicky? I could see that sort of thing being nice for the casual user or newbie who otherwise would have a horribly mis-tuned database. They could instead have only a marginally mis-tuned database :) However you implement it, anyone who can answer all of those questions is probably capable of reading and understanding the performance section in the manual. It's probably more practical to have a seperate script that looks at the running system (ram, disks, pg config, db size, indices, stats, etc.) and makes suggestions--if someone wants to write such a thing. Mike Stone ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] [Fwd: ] How
Hi! I read the link below and am puzzled by or curious about something. http://www.postgresql.org/docs/8.1/interactive/datatype-character.html The Tip below is intriguing Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. How can a field that doesn't have a limit like text perform similarly to char varying(128), for example? At some point, we need to write data to disk. The more data that needs to be written, the longer the disk write will take, especially when it requires finding free sectors to write to. Another interesting quote from the same page is the following: Long values are also stored in background tables so they do not interfere with rapid access to the shorter column values. If the long values are stored in a separate table, on a different part of the disk, doesn't this imply an extra disk seek? Won't it therefore take longer? Sid ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] How can fixed and variable width columns perform similarly?
Hi! I read the link below and am puzzled by or curious about something. http://www.postgresql.org/docs/8.1/interactive/datatype-character.html The Tip below is intriguing Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. How can a field that doesn't have a limit like text perform similarly to char varying(128), for example? At some point, we need to write data to disk. The more data that needs to be written, the longer the disk write will take, especially when it requires finding free sectors to write to. Another interesting quote from the same page is the following: Long values are also stored in background tables so they do not interfere with rapid access to the shorter column values. If the long values are stored in a separate table, on a different part of the disk, doesn't this imply an extra disk seek? Won't it therefore take longer? Sid ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Apr 27, 2007, at 3:30 PM, Michael Stone wrote: On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: Notice that the second part of my suggestion covers this --- have additional switches to initdb so that the user can tell it about estimates on how the DB will be used: estimated size of the DB, estimated percentage of activity that will involve writing, estimated percentage of activity that will be transactions, percentage that will use indexes, percentage of queries that will be complex, etc. etc. If the person knows all that, why wouldn't they know to just change the config parameters? Because knowing your expected workload is a lot easier for many people than knowing what every GUC does. Personally, I think it would be a tremendous start if we just provided a few sample configs like MySQL does. Or if someone wanted to get fancy they could stick a web page somewhere that would produce a postgresql.conf based simply on how much available RAM you had, since that's one of the biggest performance-hampering issues we run into (ie: shared_buffers left at the default of 32MB). -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How can fixed and variable width columns perform similarly?
Siddharth Anand [EMAIL PROTECTED] writes: How can a field that doesn't have a limit like text perform similarly to char varying(128), for example? At some point, we need to write data to disk. The more data that needs to be written, the longer the disk write will take, especially when it requires finding free sectors to write to. What's your point? If you're not going to put more than 128 characters in the field, there's no difference in the amount of data involved. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] [Fwd: ] How
Siddharth Anand wrote: Hi! I read the link below and am puzzled by or curious about something. http://www.postgresql.org/docs/8.1/interactive/datatype-character.html The Tip below is intriguing Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. How can a field that doesn't have a limit like text perform similarly to char varying(128), for example? At some point, we need to write data to disk. The more data that needs to be written, the longer the disk write will take, especially when it requires finding free sectors to write to. That's no difference *for the same amount of data*. So, char(128), varchar(128) with 128 characters and text with 128 characters in it are the same. This isn't always the case with other systems. Another interesting quote from the same page is the following: Long values are also stored in background tables so they do not interfere with rapid access to the shorter column values. If the long values are stored in a separate table, on a different part of the disk, doesn't this imply an extra disk seek? Won't it therefore take longer? Yes. But you gain every time you read from the table and aren't interested in that column. Typically large text columns contain descriptive text and aren't used in joins, so it pays for itself quite easily. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] How can fixed and variable width columns perform similarly?
I think the manual is implying that if you store a value like Sid in a field either of type varchar(128) or type text there is no performance difference. The manual is not saying that you get the same performance storing a 500k text field as when you store the value Sid. Dave -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Siddharth Anand Sent: Friday, April 27, 2007 10:32 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] How can fixed and variable width columns perform similarly? Hi! I read the link below and am puzzled by or curious about something. http://www.postgresql.org/docs/8.1/interactive/datatype-character.html The Tip below is intriguing Tip: There are no performance differences between these three types, apart from the increased storage size when using the blank-padded type. While character(n) has performance advantages in some other database systems, it has no such advantages in PostgreSQL. In most situations text or character varying should be used instead. How can a field that doesn't have a limit like text perform similarly to char varying(128), for example? At some point, we need to write data to disk. The more data that needs to be written, the longer the disk write will take, especially when it requires finding free sectors to write to. Another interesting quote from the same page is the following: Long values are also stored in background tables so they do not interfere with rapid access to the shorter column values. If the long values are stored in a separate table, on a different part of the disk, doesn't this imply an extra disk seek? Won't it therefore take longer? Sid ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] How can fixed and variable width columns perform similarly?
Hi Tom, My question wasn't phrased clearly. Oracle exhibits a performance degradation for very large-sized fields (CLOB types that I equate to PostGres' text type) when compared with the performance of field types like varchar that handle a max character limit of a few thousand bytes in Oracle. It sounds like PostGres doesn't exhibit this same difference. I wanted to understand how this could be and whether there was a trade-off. Cheers! Sid Siddharth Anand [EMAIL PROTECTED] writes: How can a field that doesn't have a limit like text perform similarly to char varying(128), for example? At some point, we need to write data to disk. The more data that needs to be written, the longer the disk write will take, especially when it requires finding free sectors to write to. What's your point? If you're not going to put more than 128 characters in the field, there's no difference in the amount of data involved. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] How can fixed and variable width columns perform similarly?
Siddharth Anand [EMAIL PROTECTED] writes: My question wasn't phrased clearly. Oracle exhibits a performance degradation for very large-sized fields (CLOB types that I equate to PostGres' text type) when compared with the performance of field types like varchar that handle a max character limit of a few thousand bytes in Oracle. It sounds like PostGres doesn't exhibit this same difference. I wanted to understand how this could be and whether there was a trade-off. Ah. Well, the answer is that we change behavior dynamically depending on the size of the particular field value, instead of hard-wiring it to the declared column type. It sounds like Oracle's CLOB might be doing about the same thing as an out-of-line toasted field value in Postgres. In PG, text and varchar behave identically except that varchar(N) adds an insert-time check on the length of the field value --- but this is just a constraint check and doesn't have any direct influence on how the value is stored. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Very specific server situation
Hi, We're facing some perfomance problems with the database for a web site with very specific needs. First of all, we're using version 8.1 in a server with 1GB of RAM. I know memory normally should be more, but as our tables are not so big (as a matter of fact, they are small) I think the solution would not be adding more RAM. What we basically have is a site where each user has a box with links to other randomly selected users. Whenever a box from a user is shown, a SPs is executed: a credit is added to that user and a credit is substracted from the accounts of the shown links. Accounts with no credits do not have to be listed. So, we've lots (LOTS) of users querying and updating the same table. Sometimes with big peaks. Our first attempt was to split that table in two: one for the actual credits and another one for the users. So, only the credits table gets updated on every request, but it has a trigger that updates a flag field in the users table saying if the user has credits. This had a good impact, but I guess it's not enough. For now, we only have 23.000 users, but it's going to grow. Do you have any advice? Is this possible with postgres or do you recommend just to try with a volatile memory approach for the credits? We're using pgpool and the output from free shows only 350M of RAM being used. Some relevants parts of the .conf: max_connections = 160 shared_buffers = 4 work_mem = 3096 maintenance_work_mem = 131072 max_fsm_pages = 7 fsync = false autovacuum = on Any help would be really appreciated. Thanks in advance, Mauro. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Michael Stone wrote: On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: Notice that the second part of my suggestion covers this --- have additional switches to initdb snip If the person knows all that, why wouldn't they know to just change the config parameters? Exactly.. What I think would be much more productive is to use the great amount of information that PG tracks internally and auto-tune the parameters based on it. For instance: Why does the user need to manually track max_fsm_pages and max_fsm_relations? I bet there are many users who have never taken the time to understand what this means and wondering why performance still stinks after vacuuming their database ( spoken from my own experience ) How about work_mem? shared_buffers? column statistics sizes? random_page_cost? Couldn't some fairly simple regression tests akin to a VACUUM process spot potential problems? Hey, it looks like you need more fsm_relations.. I bumped that up automatically for you. Or These indexes look bloated, shall I automatically reindex them for you? I'm sure there are many more examples, that with some creative thinking, could be auto-adjusted to match the usage patterns of the database. PG does an excellent job of exposing the variables to the users, but mostly avoids telling the user what to do or doing it for them. Instead, it is up to the user to know where to look, what to look for, and how to react to things to improve performance. This is not all bad, but it is assuming that all users are hackers ( which used to be true ), but certainly doesn't help when the average SQLServer admin tries out Postgres and then is surprised at the things they are now responsible for managing. PG is certainly *not* the only database to suffer from this syndrome, I know.. I like to think of my systems as good employees. I don't want to have to micromanage everything they do. I want to tell them here's what I want done, and assuming I made a good hiring choice, they will do it and take some liberty to adjust parameters where needed to achieve the spirit of the goal, rather than blindly do something inefficiently because I failed to explain to them the absolute most efficient way to accomplish the task. Granted, there are some people who don't like the developers making any assumptions about their workload. But this doesn't have to be an either/or proposition. I don't think any control needs to be abandoned. But self-adjusting defaults seem like an achievable goal ( I know, I know, show us the patch ). I just don't know if this feeling has resonated well between new users and long-term developers. I know it must be grating to have to answer the same questions over and over and over have you analyzed? Did you leave postgresql.conf at the defaults??. Seems like a win-win for both sides, IMHO. In closing, I am not bashing PG! I love it and swear by it. These comments are purely from an advocacy perspective. I'd love to see PG user base continue to grow. My .02 -Dan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Dan, Exactly.. What I think would be much more productive is to use the great amount of information that PG tracks internally and auto-tune the parameters based on it. For instance: *Everyone* wants this. The problem is that it's very hard code to write given the number of variables. I'm working on it but progress is slow, due to my travel schedule. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
In response to Dan Harris [EMAIL PROTECTED]: Michael Stone wrote: On Fri, Apr 27, 2007 at 09:27:49AM -0400, Carlos Moreno wrote: Notice that the second part of my suggestion covers this --- have additional switches to initdb snip If the person knows all that, why wouldn't they know to just change the config parameters? Exactly.. What I think would be much more productive is to use the great amount of information that PG tracks internally and auto-tune the parameters based on it. For instance: Why does the user need to manually track max_fsm_pages and max_fsm_relations? I bet there are many users who have never taken the time to understand what this means and wondering why performance still stinks after vacuuming their database ( spoken from my own experience ) But there are two distinct routes that can be taken if there's not enough fsm space: add fsm space or vacuum more frequently. I don't want the system to eat up a bunch of memory for fsm entries if my workload indicates that I can easily vacuum more frequently. How about work_mem? shared_buffers? column statistics sizes? random_page_cost? The only one that seems practical (to me) is random_page_cost. The others are all configuration options that I (as a DBA) want to be able to decide for myself. For example, I have some dedicated PG servers that I pretty much max those values out at, to let PG know that it can use everything on the system -- but I also have some shared use machines with PG, where I carefully constrain those values so that PG doesn't muscle other daemons out of their share of the RAM (work_mem is probably the best example) It would be nice to have some kind of utility that could tell me what random_page_cost should be, as I've never felt comfortable tweaking it. Like some utility to run that would say based on the seek tests I just ran, you should set random_page_cost to x. Of course, if such a thing existed, it could just fill in the value for you. But I haven't figured out how to pick a good value for that setting, so I have no idea how to suggest to have it automatically set. Couldn't some fairly simple regression tests akin to a VACUUM process spot potential problems? Hey, it looks like you need more fsm_relations.. I bumped that up automatically for you. Or These indexes look bloated, shall I automatically reindex them for you? A lot of that stuff does happen. A vacuum verbose will tell you what it thinks you should do, but I don't _want_ it to do it automatically. What if I create huge temporary tables once a week for some sort of analysis that overload the fsm space? And if I'm dropping those tables when the analysis is done, do I want the fsm space constantly adjusting? Plus, some is just impossible. shared_buffers requires a restart. Do you want your DB server spontaneously restarting because it thought more buffers might be nice? I'm sure there are many more examples, that with some creative thinking, could be auto-adjusted to match the usage patterns of the database. PG does an excellent job of exposing the variables to the users, but mostly avoids telling the user what to do or doing it for them. Instead, it is up to the user to know where to look, what to look for, and how to react to things to improve performance. This is not all bad, but it is assuming that all users are hackers ( which used to be true ), but certainly doesn't help when the average SQLServer admin tries out Postgres and then is surprised at the things they are now responsible for managing. PG is certainly *not* the only database to suffer from this syndrome, I know.. I expect the suffering is a result of the fact that databases are non-trivial pieces of software, and there's no universally simple way to set them up and make them run well. I like to think of my systems as good employees. I don't want to have to micromanage everything they do. I want to tell them here's what I want done, and assuming I made a good hiring choice, they will do it and take some liberty to adjust parameters where needed to achieve the spirit of the goal, rather than blindly do something inefficiently because I failed to explain to them the absolute most efficient way to accomplish the task. That's silly. No software does that. You're asking software to behave like humans. If that were the case, this would be Isaac Asimov's world, not the real one. Granted, there are some people who don't like the developers making any assumptions about their workload. But this doesn't have to be an either/or proposition. I don't think any control needs to be abandoned. But self-adjusting defaults seem like an achievable goal ( I know, I know, show us the patch ). I just don't know if this feeling has resonated well between new users and long-term developers. I know it must be grating to have to answer the same questions over and
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
At 10:36a -0400 on 27 Apr 2007, Tom Lane wrote: That's been proposed and rejected before, too; the main problem being that initdb is frequently a layer or two down from the user (eg, executed by initscripts that can't pass extra arguments through, even assuming they're being invoked by hand in the first place). And following after Dan Harris' response . . . So what's the problem with having some sort of cronjob contrib module that utilizes the actual and current statistics to make recommendations? I don't think it'd be right to simply change the configuration options as it sees fit (especially as it was pointed out that many run multiple postmasters or have other uses for the machines in question), but perhaps it could send a message (email?) along the lines of Hey, I'm currently doing this many of X transactions, against this much of Y data, and working under these constraints. You might get better performance (in this area ... ) if you altered the the configurations options like so: ... Certainly not for the masters, but perhaps for standard installation sort of deals, sort of liking bringing up the rear . . . just a thought. Kevin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Fri, Apr 27, 2007 at 02:40:07PM -0400, Kevin Hunter wrote: out that many run multiple postmasters or have other uses for the machines in question), but perhaps it could send a message (email?) along the lines of Hey, I'm currently doing this many of X transactions, against this much of Y data, and working under these constraints. You might get better performance (in this area ... ) if you altered the the configurations options like so: ... or storing the values in the db for later trending analysis, witness ora statspack. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Bill, The only one that seems practical (to me) is random_page_cost. The others are all configuration options that I (as a DBA) want to be able to decide for myself. Actually, random_page_cost *should* be a constant 4.0 or 3.5, which represents the approximate ratio of seek/scan speed which has been relatively constant across 6 years of HDD technology. The only reason we make it a configuration variable is that there's defects in our cost model which cause users to want to tinker with it. Mind you, that's gotten better in recent versions as well. Lately I mostly tinker with effective_cache_size and the various cpu_* stats rather than modifying random_page_cost. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Bill Moran wrote: In response to Dan Harris [EMAIL PROTECTED]: snip Why does the user need to manually track max_fsm_pages and max_fsm_relations? I bet there are many users who have never taken the time to understand what this means and wondering why performance still stinks after vacuuming their database ( spoken from my own experience ) But there are two distinct routes that can be taken if there's not enough fsm space: add fsm space or vacuum more frequently. I don't want the system to eat up a bunch of memory for fsm entries if my workload indicates that I can easily vacuum more frequently. There's no magic bullet here, but heuristics should be able to tell us you can easily vacuum more frequently And again, I said these things would be *optional*. Like an item in postgresql.conf i_have_read_the_manual_and_know_what_this_all_means = false #default false. If you change it to true, you have all the control you're used to and nothing will get in your way. How about work_mem? shared_buffers? column statistics sizes? random_page_cost? The only one that seems practical (to me) is random_page_cost. The others are all configuration options that I (as a DBA) want to be able to decide for myself. For example, I have some dedicated PG servers that I pretty much max those values out at, to let PG know that it can use everything on the system -- but I also have some shared use machines with PG, where I carefully constrain those values so that PG doesn't muscle other daemons out of their share of the RAM (work_mem is probably the best example) Just because you carefully constrain it does not preclude the ability for program logic to maintain statistics to do what I suggested. It would be nice to have some kind of utility that could tell me what random_page_cost should be, as I've never felt comfortable tweaking it. Like some utility to run that would say based on the seek tests I just ran, you should set random_page_cost to x. Of course, if such a thing existed, it could just fill in the value for you. But I haven't figured out how to pick a good value for that setting, so I have no idea how to suggest to have it automatically set. Me either, but I thought if there's a reason it's user-settable, there must be some demonstrable method for deciding what is best. Couldn't some fairly simple regression tests akin to a VACUUM process spot potential problems? Hey, it looks like you need more fsm_relations.. I bumped that up automatically for you. Or These indexes look bloated, shall I automatically reindex them for you? A lot of that stuff does happen. A vacuum verbose will tell you what it thinks you should do, but I don't _want_ it to do it automatically. What if I create huge temporary tables once a week for some sort of analysis that overload the fsm space? And if I'm dropping those tables when the analysis is done, do I want the fsm space constantly adjusting? I understand *you* don't want it done automatically. But my suspicion is that there are a lot more newbie pg admins who would rather let the system do something sensible as a default. Again, you sound defensive that somehow my ideas would take power away from you. I'm not sure why that is, but certainly I'm not suggesting that. An auto-pilot mode is not a bad idea just because a few pilots don't want to use it. Plus, some is just impossible. shared_buffers requires a restart. Do you want your DB server spontaneously restarting because it thought more buffers might be nice? Well, maybe look at the bigger picture and see if it can be fixed to *not* require a program restart? Or.. take effect on the next pid that gets created? This is a current limitation, but doesn't need to be one for eternity does it? I'm sure there are many more examples, that with some creative thinking, could be auto-adjusted to match the usage patterns of the database. PG does an excellent job of exposing the variables to the users, but mostly avoids telling the user what to do or doing it for them. Instead, it is up to the user to know where to look, what to look for, and how to react to things to improve performance. This is not all bad, but it is assuming that all users are hackers ( which used to be true ), but certainly doesn't help when the average SQLServer admin tries out Postgres and then is surprised at the things they are now responsible for managing. PG is certainly *not* the only database to suffer from this syndrome, I know.. I expect the suffering is a result of the fact that databases are non-trivial pieces of software, and there's no universally simple way to set them up and make them run well. Speaking as a former SQL Server admin ( from day 1 of the Sybase fork up to version 2000 ), I can say there *is* a way to make them simple. It's certainly not a perfect piece of software, but the learning curve speaks for itself. It can auto-shrink your databases ( without locking
Re: [PERFORM] Very specific server situation
Mauro N. Infantino [EMAIL PROTECTED] writes: What we basically have is a site where each user has a box with links to other randomly selected users. Whenever a box from a user is shown, a SPs is executed: a credit is added to that user and a credit is substracted from the accounts of the shown links. Accounts with no credits do not have to be listed. So, we've lots (LOTS) of users querying and updating the same table. Have you checked to make sure the query plans are reasonable? Have you checked that autovacuum is running often enough? (You might want to try contrib/pgstattuple to see how much dead space there is in your heavily-updated tables.) Also, on a high-update workload it is absolutely critical to boost checkpoint_segments far enough that you are not doing checkpoints oftener than maybe once every five minutes. If the performance problems seem bursty then you may also need to look at adjusting bgwriter and/or vacuum cost delay parameters to smooth out the I/O load. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Dan, Yes, this is the classic problem. I'm not demanding anyone pick up the ball and jump on this today, tomorrow, etc.. I just think it would be good for those who *could* make a difference to keep those goals in mind when they continue. If you have the right mindset, this problem will fix itself over time. Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took Microsoft 3-4 years, and theirs still has major issues last I checked. And both of those DBs support less OSes than we do. I think it's going to take more than the *right mindset* and my spare time. I appreciate your efforts in this regard. Do you have a formal project plan for this? If you can share it with me, I'll take a look and see if there is anything I can do to help out. Nope, just some noodling around on the configurator: www.pgfoundry.org/projects/configurator I am on the verge of starting a Java UI that will query a bunch of the pg_* tables and give the user information about wasted table space, index usage, table scans, slow-running queries and spoon-feed it in a nice attractive interface that can be a real-time system monitor tool. This could be a cooperative project or might have some redundancy with what you're up to. I'd be *very* interested in collaborating with you on this. Further, we could feed DTrace ( systemtap?) into the interface to get data that PostgreSQL doesn't currently produce. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Fri, 27 Apr 2007, Josh Berkus wrote: Dan, Yes, this is the classic problem. I'm not demanding anyone pick up the ball and jump on this today, tomorrow, etc.. I just think it would be good for those who *could* make a difference to keep those goals in mind when they continue. If you have the right mindset, this problem will fix itself over time. Don't I wish. Autotuning is *hard*. It took Oracle 6 years. It took Microsoft 3-4 years, and theirs still has major issues last I checked. And both of those DBs support less OSes than we do. I think it's going to take more than the *right mindset* and my spare time. I think there are a couple different things here. 1. full autotuning as you say, this is very hard and needs a lot of info about your particular database useage. 2. getting defaults that are closer to right then current. this is much easier. for this nobody is expecting that the values are right, we're just begging for some tool to get us within an couple orders of magnatude of what's correct. the current defaults are appropriate for a single cpu with 10's of MB of ram and a single drive nowdays you have people trying to run quick-and-dirty tests on some spare hardware they have laying around (waiting for another project) that's got 4-8 CPU's with 10's of GB of ram and a couple dozen drives these people don't know about database tuneing, they can learn, but they want to see if postgres is even in the ballpark. if the results are close to acceptable they will ask questions and research the tuneing, but if the results are orders of magnatude lower then they need to be they'll just say that postgress is too slow and try another database. an autodefault script that was written assuming that postgres has the box to itself would be a wonderful start. I think the next step would be to be able to tell the script 'only plan on useing 1/2 of this box' and beyond that would be the steps that you are thinking of where the useage pattern is considered. but when every performance question is answered with did you change the defaults? they are way too low for modern hardware, raise them by 2 orders of magnatude and then we'll start investigating David Lang ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq