Re: [PERFORM] Very long deletion time on a 200 GB database
Hi, Shaun. You wrote: In the end, it was agreed that we could execute the deletes over time, deleting items in the background, or in parallel with the application's work. After all, if the disk is filling up at the rate of 2 GB/day, then so long as we delete 4 GB/day (which is pretty easy to do), we should be fine. Please tell me you understand deleting rows from a PostgreSQL database doesn't work like this. :) The MVCC storage system means you'll basically just be marking all those deleted rows as reusable, so your database will stop growing, but you'll eventually want to purge all the accumulated dead rows. Oh, I understand that all right. I've had many, *many* conversations with this company explaining MVCC. It doesn't seem to work; when they refer to "vacuuming the database," I remind them that we have autovacuum working, to which they respond, "Oh, we mean VACUUM FULL." At which point I remind them that VACUUM FULL is almost certainly not what they want to do, and then they say, "Yes, we know, but we still like to do it every so often." From what I understand, the issue isn't one of current disk space, but rather of how quickly the disk space is being used up. Maybe they want to reclaim disk space, but it's more crucial to stop the rate at which disk space is being taken. If we were to delete all of the existing rows, and let vacuum mark them as dead and available for reuse, then that would probably be just fine. I wouldn't be surprised if we end up doing a CLUSTER at some point. The problem is basically that this machine is in 24/7 operation at high-speed manufacturing plants, and the best-case scenario is for a 4-hour maintenance window. I've suggested that we might be able to help the situation somewhat by attaching a portable USB-based hard disk, and adding a new tablespace that'll let us keep running while we divide up the work that the disk is doing, but they've made it clear that the current hardware configuration cannot and will not change. Period. So for now, we'll just try to DELETE faster than we INSERT, and combined with autovacuum, I'm hoping that this crisis will be averted. That said, the current state of affairs with these machines is pretty fragile, and I think that we might want to head off such problems in the future, rather than be surprised by them. Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
Hi, everyone. I wanted to thank you again for your help on the huge delete problem that I was experiencing. After a lot of trial and error, we finally came to the conclusion that deleting this much data in the time frame that they need, on underpowered hardware that is shared with an application, with each test iteration taking 5-9 hours to run (but needing to run in 2-3), is just not going to happen. We tried many of the options that people helpfully suggested here, but none of them gave us the performance that we needed. (One of the developers kept asking me how it can possibly take so long to delete 200 GB, when he can delete files of that size in much less time. I had to explain to him that deleting rows from a database, is a far more complicated task, and can't really be compared to deleting a few files.) In the end, it was agreed that we could execute the deletes over time, deleting items in the background, or in parallel with the application's work. After all, if the disk is filling up at the rate of 2 GB/day, then so long as we delete 4 GB/day (which is pretty easy to do), we should be fine. Adding RAM or another disk are simply out of the question, which is really a shame for a database of this size. I should add that it was interesting/amusing to see the difference between the Unix and Windows philosophies. Each time I would update my pl/pgsql functions, the Windows guys would wrap it into a string, inside of a .NET program, which then needed to be compiled, installed, and run. (Adding enormous overhead to our already long testing procedure.) I finally managed to show them that we could get equivalent functionality, with way less overhead, by just running psql -f FILENAME. This version doesn't have fancy GUI output, but it works just fine... I always tell people that PostgreSQL is not just a great database, but a fantastic, helpful community. Thanks to everyone for their suggestions and advice. Reuven -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
Hi again, everyone. Wow, I can't get over how helpful everyone has been. Shaun wrote: The main problem you're going to run into is that your table is larger than the memory in that server. 4GB is really pretty small for a server hosting a 200+GB database. That they didn't mean it to get that big doesn't really help you clean it up. Yep! And as you pointed out later in you note, PostgreSQL isn't the only thing running on this computer. There's also a full-fledged Windows application normally running on it. And the nature of the manufacturing, black-box context means that maintenance is supposed to be rare, and that anything which gets us off of a 24/7 work schedule is enormously expensive. This has been a fun problem to fix, for sure... We're not there yet, but I feel like we're really close. I'm currently trying a hybrid approach, based on several suggestions that were posted to this list: Given that during this maintenance operation, nothing else should running, I'm going to bump up the shared_buffers. Even after we run our maintenance, the fact that shared_buffers was so ridiculously low couldn't be helping anything, and I'll push it up. I finally remembered why I had such a tortured set of subselects in my original query: If you're going to do a query with LIMIT in it, you had better be sure that you know what you're doing, because without an ORDER BY clause, you might be in for surprises. And sure enough, in our testing, I realized that when we asked the database for up to 5 rows, we were getting the same rows again and again, thus stopping after it deleted a few bunches of rows. So I changed tactics somewhat, and it appears to be working much, much faster: I first created a table (not a temp table, simply because my functions are getting invoked by the .NET application in a new connection each time, and I obviously don't want my table to go away) with the IDs of the R table that are older than n days old.This table has about 200,000 rows in it, but each column is an int, so it's pretty small. I then have a separate function that takes a parameter, the chunk size. I loop through the table created in the first function (old_report_ids), deleting all of the records in the B table that references the R table. I then remove the row from the old_report_ids table, and then loop again, until I've reached the chunk size. There are undoubtedly more elegant ways to do this, but we just gotta get it working at this point. :-) We're about to test this, but from my small tests on my computer, it ran much, much faster than other options. We'll see what happens when we try it now on the 200 GB monster... Reuven -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
Hi, everyone. Jeff wrote: Is this 9 hours run time for deleting one day worth of data, or for deleting the entire accumulation of cruft that filled up the hard drive in the first place (which would be 170 days, if you have 200GB that accumulated at 1GB per day and you only need 30 days) ? Unfortunately, it took 9 hours to delete all of the rows associated with the older-than-30-days records. Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
Hi, everyone. Shaun said: On 02/24/2012 12:39 AM, Reuven M. Lerner wrote: - CPU is virtually idle when running the deletes, using about 1% of CPU I think you found your problem. See if you can get the Windows admins to give you some info on how busy the disks are (percent utilization, IOPS, something) the next time you try this. Increasing your memory settings may help, but a 1% CPU usage usually suggests it's waiting for disk blocks to be read before it can actually do something. I asked them for disk readings, but I'm not sure how to contextualize the numbers I got: I/O writes: process1: 820,000, process2: 1Milion Process3: 33,000 Any suggestions for what I can do to improve performance with such a slow disk, and a lack of additional RAM? Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
Hi, everyone. Samuel wrote: What is work_mem set to? If all the other values were set so low, I'd expect work_mem to also be small, which could be causing all kind of disk activity when steps don't fit into a work_mem segment. I just checked, and work_mem is set to 30 MB. That seems a bit low to me, given the size of the database and the fact that we're doing so much sorting and subselecting. Am I right that we should push that up some more? Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
Hi, everyone. So it turns out that we're not using 25 GB of virtual memory. (That's what I had been shown yesterday, and it was a bit surprising, to say the least...) A few statistics that I managed to get from the Windows developers/system administrators: - The machine has a total of 3.5 GB of RAM - shared_buffers was set to 256 MB (yes, MB!) - Virtual memory usage by our process is 3 MB (yes, MB) - CPU is virtually idle when running the deletes, using about 1% of CPU - No other processes are accessing the database when we're running the maintenance; there are a total of three server processes, but two are idle. (I was a bit surprised, to say the least, by the low number on shared_buffers, given that I believe it's one of the first things I told them to increase about 18 months ago.) As for Tom's point about rules, I know that rules are bad, and I'm not sure why the system is using a rule rather than a trigger.I'll see if I can change that to a trigger, but I have very indirect control over the machines, and every change requires (believe it or not) writing a .NET program that runs my changes, rather than just a textual script that deploys them. The only foreign keys are from the B table (i.e., the table whose records I want to remove) to other tables. There are no REFERENCES pointing to the B table. That said, I hadn't realized that primary keys and indexes can also delay the DELETE. For the latest round of testing, I quadrupled shared_buffers to 1 GB, turned off hash joins (as suggested by someone), and also simplified the query (based on everyone's suggestions). In the tests on my own computer (with a somewhat random 1 GB snapshot of the 200 GB database), the simplified query was indeed much faster, so I'm optimistic. Several people suggested that chunking the deletes might indeed help, which makes me feel a bit better. Unfortunately, given the time that it takes to run the queries, it's hard to figure out the right chunk size. Whoever suggested doing it in time slices had an interesting idea, but I'm not sure if it'll be implementable given our constraints. Thanks again to everyone for your help. I'll let you know what happens... Reuven -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Very long deletion time on a 200 GB database
Hi, everyone. Thanks for all of the help and suggestions so far; I'll try to respond to some of them soon. Andrew wrote: How about: DELETE FROM B WHERE r_id IN (SELECT distinct R.id FROM R WHERE r.end_date< (NOW() - (interval '1 day' * 30)) ? Or possibly without the DISTINCT. But I agree that the original query shouldn't have B in the subquery - that alone could well make it crawl. I put B in the subquery so as to reduce the number of rows that would be returned, but maybe that was indeed backfiring on me. Now that I think about it, B is a huge table, and R is a less-huge one, so including B in the subselect was probably a mistake. What is the distribution of end_dates? It might be worth running this in several steps, deleting records older than, say, 90 days, 60 days, 30 days. I've suggested something similar, but was told that we have limited time to execute the DELETE, and that doing it in stages might not be possible. Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Very long deletion time on a 200 GB database
Hi, everyone. I'm maintaining an application that exists as a "black box" in manufacturing plants. The system is based on Windows, .NET, and PostgreSQL 8.3. I'm a Unix kind of guy myself, but the application layer and system administration are being handled by other people; I'm just the PostgreSQL guy. Because of the nature of the application, we don't have direct control over what happens. And it turns out that at one installation, we're quickly running out of disk space. The database is already taking up about 200 GB of space, and is growing by 1 GB or so a day. Switching disks, either to a larger/faster traditional drive, or even to a SSD, is not an option. (And yes, I know that SSDs have their own risks, but I'm just throwing that out as one option.) Right now, the best solution to the space problem is to delete information associated with old records, where "old" is from at least 30 days ago. The old records are spread across a few tables, including many large objects. (The application was written by people who were new to PostgreSQL, and didn't realize that they could use BYTEA.) Basically, given a foreign key B.a_id that points to table A, I want to DELETE all in B where A's creation date is at least 30 days ago. Unfortunately, when we implemented this simple delete, it executed slower than molasses, taking about 9 hours to do its thing. Not only does this seem like a really, really long time to do such deleting, but we have only a 4-hour window in which to run this maintenance activity, before the factory starts to use our black box again. I've tried a few approaches so far, none of which have been hugely successful. The fact that it takes several hours to test each theory is obviously a bit of a pain, and so I'm curious to hear suggestions from people here. I should note that my primary concern is available RAM. The database, as I wrote, is about 200 GB in size, and PostgreSQL is reporting (according to Windows) use of about 5 GB RAM, plus another 25 GB of virtual memory. I've told the Windows folks on this project that virtual memory kills a database, and that it shouldn't surprise us to have horrible performance if the database and operating system are both transferring massive amounts of data back and forth. But there doesn't seem to be a good way to handle this This is basically what I'm trying to execute: DELETE FROM B WHERE r_id IN (SELECT R.id FROM R, B WHERE r.end_date < (NOW() - (interval '1 day' * 30)) AND r.id = b.r_id (1) I tried to write this as a join, rather than a subselect. But B has an oid column that points to large objects, and on which we have a rule that removes the associated large object when a row in B is removed. Doing the delete as a join resulted in "no such large object with an oid of xxx" errors. (I'm not sure why, although it might have to do with the rule.) (2) I tried to grab the rows that *do* interest me, put them into a temporary table, TRUNCATE the existing table, and then copy the rows back. I only tested that with a 1 GB subset of the data, but that took longer than other options. (3) There are some foreign-key constraints on the B table. I thought that perhaps doing a mass DELETE was queueing up all of those constraints, and possibly using up lots of memory and/or taking a long time to execute. I thus rewrote my queries such that they first removed the constraints, then executed the DELETE, and then restored the constraints. That didn't seem to improve things much either, and took a long time (30 minutes) just to remove the constraints. I expected re-adding the constraints to take a while, but shouldn't removing them be relatively quick? (4) I tried "chunking" the deletes, such that instead of trying to delete all of the records from the B table, I would instead delete just those associated with 100 or 200 rows from the R table. On a 1 GB subset of the data, this seemed to work just fine. But on the actual database, it was still far too slow. I've been surprised by the time it takes to delete the records in question. I keep trying to tell the others on this project that PostgreSQL isn't inherently slow, but that a 200 GB database running on a non-dedicated machine, with an old version (8.3), and while it's swapping RAM, will be slow regardless of the database software we're using. But even so, 9 hours to delete 100 GB of data strikes me as a very long process. Again, I continue to believe that given our hard time deadlines, and the fact that we're using a large amount of virtual memory, that there isn't really a solution that will work quickly and easily. But I'd be delighted to be wrong, and welcome any and all comments and suggestions for how to deal with
Re: [PERFORM] Triggering autovacuum
Hi, Greg. Thanks for the quick and useful answer, even if it means that my hopes for a quick fix have been dashed. I guess I'll need to do some actual monitoring, then... Reuven -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Triggering autovacuum
Hi, everyone. Some people with whom I'm working, and who have an 8.3 system running under Windows, asked me to look into their performance issues. They have a 1.5 GB database with a few dozen tables, and 500,000 records at most. They said that their system has been running for a few days, doing lots of INSERTs and SELECTs, and that the performance has gotten worse and worse over time. (I don't have numbers to share.) It's true that the computer is being used by other processes as part of a black-box manufacturing system, but those are pretty constant in CPU, disk, and memory needs, so I don't think that we would expect to see degradation over time as a result of that work. I looked at the system, and found that we need to change effective_cache_size, such that it'll match the "system cache" number in the Windows performance monitor. So yes, we'll take care of that, and I expect to see some improvement. But the really surprising thing to me was that autovacuum hadn't run at all in the last three days. I checked, and the "autovacuum" parameter was set in postgresql.conf, and using "show" in psql shows me that it was set. But when I looked at pg_stat_user_tables, there was no indication of autovacuum *ever* having run.We also fail to see any autovacuum processes in the Windows process listing. Could this be because we're only doing INSERTs and SELECTs? In such a case, then we would never reach the threshold of modified tuples that autovacuum looks for, and thus it would never run. That would, by my reasoning, mean that we'll never tag dead tuples (which isn't a big deal if we're never deleting or updating rows), but also that we'll never run ANALYZE as part of autovacuum. Which would mean that we'd be running with out-of-date statistics. I ran a manual "vacuum analyze", by the way, and it's taking a really long time (1.5 hours, as of this writing) to run, but it's clearly doing something. Moreover, when we went to check on our vacuum process after about an hour, we saw that autovacuum had kicked in, and was now running. Could it be that our manual invocation of vacuum led to autovacuum running? I have a feeling that our solution is going to have to involve a cron type of job, running vacuum at regular intervals (like in the bad old days), because autovacuum won't get triggered. But hey, if anyone has any pointers to offer on this topic, I'd certainly appreciate it. Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speeding up loops in pl/pgsql function
Hi, Merlin. You wrote: select string_agg(v, '') from (select ascii(regexp_split_to_table('abc', $$\s*$$))::text as v) q; Wow. I've been programming with pl/pgsql for a good number of years, but only now do I see the amazing usefulness of regexp_split_to_table and string_agg, neither of which I really used until now. Thanks for both the solution and for opening my eyes. (what about 3 digit ascii codes?) I have to put the number into a text field anyway, so I've been converting the resulting number to text, and then using lpad to add leading zeroes as necessary. Thanks again, Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speeding up loops in pl/pgsql function
Hi, everyone. First of all, thanks for all of your help several days ago. The improvements to our program were rather dramatic (in a positive sense). Based on the help that everyone gave, I'm working on something similar, trying to use regexp_replace to transform a string into the result of invoking a function on each character. For example, I'd like to do the following: regexp_replace('abc', '(.)', ascii(E'\\1')::text, 'g'); Unfortunately, the above invokes ascii() on the literal string E'\\1', rather than on the value of the backreference, which isn't nearly as useful. I'd like to get '979899' back as a string. And of course, once I can get back the value of ascii(), I figure that it should work for any function that I define. Thanks again for any suggestions everyone might have. (And if this should go to pgsql-general, then I'll understand. If it helps, my alternative to regexp_replace is a super-slow function, akin to the one that I showed here last week.) Reuven -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speeding up loops in pl/pgsql function
Wow. Color me impressed and grateful. I've been working on a different project today, but I'll test these tonight. I'll never underestimate the regexp functionality in PostgreSQL again! Reuven -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speeding up loops in pl/pgsql function
Hi, everyone. Merlin wrote: let's see the source. I bet we can get this figured out. Here you go... it looked nicer before I started to make optimizations; I've gotten it to run about 2x as fast as the previous version, but now I'm sorta stuck, looking for further optimizations, including possible use of builtin functions. Thanks for any suggestions you can offer. CREATE OR REPLACE FUNCTION translate_octals_into_decimals(bytea_string BYTEA) RETURNS BYTEA AS $$ DECLARE bytea_string_length INTEGER := length(bytea_string); current_substring TEXT := ''; translated_string_array BYTEA[]; output_number INTEGER := 0; output_number_text TEXT := ''; current_digit TEXT := ''; BEGIN RAISE NOTICE '[translate_octals_into_decimals] start at %, string of length %', clock_timestamp(), pg_size_pretty(length(bytea_string)); FOR i IN 1..length(bytea_string) BY 3 LOOP current_substring := substring(bytea_string from i for 3); output_number := 0; FOR j IN 0..(length(current_substring) - 1) LOOP current_digit := substring(current_substring from (length(current_substring) - j) for 1); output_number := output_number + current_digit::integer * (8 ^ j); END LOOP; output_number_text = lpad(output_number::text, 3, '0'); IF output_number_text::int = 92 THEN translated_string_array := array_append(translated_string_array, E''::bytea); ELSIF output_number_text::int = 0 THEN translated_string_array := array_append(translated_string_array, E'\\000'::bytea); ELSE translated_string_array := array_append( translated_string_array, chr(output_number_text::integer)::bytea ); END IF; END LOOP; RETURN array_to_string(translated_string_array, ''); END; $$ LANGUAGE 'plpgsql'; Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Speeding up loops in pl/pgsql function
Hi, Alex. You wrote: Have you tried something like: SELECT encode(regexp_replace('141142143', '(\d{3})', '\\\1', 'g')::bytea, 'escape'); Hmm, forgot about regexp_replace. It might do the trick, but without a full-blown eval that I can run on the replacement side, it'll be a bit more challenging. But that's a good direction to consider, for sure. I think select E'\XXX' is what you are looking for (per the fine manual: http://www.postgresql.org/docs/current/static/datatype-binary.html) I didn't think that I could (easily) build a string like that from digits in a variable or a column, but I'll poke around and see if it can work. Thanks, Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Speeding up loops in pl/pgsql function
Hi, everyone. I'm working on a project that's using PostgreSQL 8.3, that requires me to translate strings of octal digits into strings of characters -- so '141142143' should become 'abc', although the database column containing this data (both before and after) is a bytea. While the function I've written is accurate, it turns out that it's also ridiculously slow. I've managed to speed it up a fair amount, to twice what it was previously doing, by folding a helper function into a main one, and appending to an array (which I then join into a string at the end of the function) instead of concatenating a string onto itself time after time. I realize that pl/pgsql is not a good choice for doing this sort of task, and that another language -- say, one with direct support for octal digits, or with built-in, speedy array functions such as pop() and push() -- would be a better choice. But that's not an option at this point. I should also note that I'm not manipulating a huge amount of data here. We're talking about 300 or so rows, each of which contains about 250 KB of data. (Hmm, could the problem be that I'm constantly forcing the system to compress and uncompress the data in TOAST? I hadn't thought of that until just now...) I thus have two basic questions: (1) Are there any good guidelines for what operations in pl/pgsql are optimized for which data structures? For example, it turns out that a great deal of time is being spent in the substring() function, which surprised me. I thought that by switching to an array, it might be faster, but that wasn't the case, at least in my tests. Having a sense of what I should and shouldn't be trying, and which built-in functions are particularly fast or slow, would be useful to know. (2) Is there any configuration setting that would (perhaps) speed things up a bit? I thought that maybe work_mem would help, but the documentation didn't indicate this at all, and sure enough, nothing really changed when I increased it. Of course, any suggestions for how to deal with octal digits in PostgreSQL 8.3, such as an octal equivalent to the x'ff' syntax, would be more than welcome. Thanks in advance, Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] SQL functions vs. PL/PgSQL functions
Wow. Thanks so much to all of you for the thoughtful and helpful responses! Reuven -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] SQL functions vs. PL/PgSQL functions
Hi, everyone. I'm working with a client to try to optimize their use of PostgreSQL. They're running 8.3 on a Windows platform, packaged as part of a physical product that is delivered to customers. We're planning to upgrade to 9.0 at some point in the coming months, but this question is relevant for 8.3 (and perhaps beyond). All of the database-related logic for this application is in server-side functions, written in PL/PgSQL. That is, the application never issues a SELECT or INSERT; rather, it invokes a function with parameters, and the function handles the query. It's not unusual for a function to invoke one or more other PL/PgSQL functions as part of its execution. Since many of these PL/PgSQL functions are just acting as wrappers around queries, I thought that it would be a cheap speedup for us to change some of them to SQL functions, rather than PL/PgSQL. After all, PL/PgSQL is (I thought) interpreted, whereas SQL functions can be inlined and handled directly by the optimizer and such. We made the change to one or two functions, and were rather surprised to see the performance drop by quite a bit. My question is whether this is somehow to be expected. Under what conditions will SQL functions be slower than PL/PgSQL functions? Is there a heuristic that I can/should use to know this in advance? Does it matter if the SELECT being executed operates against a table, or a PL/PgSQL function? Thanks in advance for any insights everyone can offer. Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Backup strategies
I'm running a medium-traffic Web site that has been running for a few years, and which uses about four PostgreSQL databases on a regular basis. I'm currently running 8.2, although I'm planning to upgrade to 8.3 in the coming week or two, in part because of the problems that I'm having. The databases consume a combined total of 35 GB. Like a good boy, I've been backing the system up overnight, when we have less traffic, since the site began to run. I use pg_dump to back up, saving both schemas and data for a full restore in case of failure. pg_dump typically executes from another machine on a local network; if it would help to run pg_dump locally, then I'm certainly open to doing that. Over the last month or two, database performance has become increasingly problematic during the hours that I run pg_dump. Moreover, the size of the database has gotten to the point where it takes a good number of hours to dump everything to disk. This ends up interfering with our users on the East Coast of the United States, when they access our site early in the morning. One possible solution is for me to backup our main database more regularly, and our development database less regularly. But given the growth in our traffic (about double what it was 12 months ago), I have to assume that this isn't a long-term solution. I'm also considering taking our oldest data and sticking into a separate database (sort of a data warehouse), so that the production database becomes smaller, and thus easier to back up. But before I do any of these things, I want to hear what others have discovered in terms of high-performance backups. Is there a way to stop pg_dump from locking up the database so much? Is there a knob that I can turn to do a low-priority backup while the live site is running? Is there a superior backup strategy than pg_dump every 24 hours? Thanks in advance for any advice you can offer! Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance