Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-27 Thread Reuven M. Lerner
Hi, everyone.  I wanted to thank you again for your help on the huge 
delete problem that I was experiencing.


After a lot of trial and error, we finally came to the conclusion that 
deleting this much data in the time frame that they need, on 
underpowered hardware that is shared with an application, with each test 
iteration taking 5-9 hours to run (but needing to run in 2-3), is just 
not going to happen.  We tried many of the options that people helpfully 
suggested here, but none of them gave us the performance that we needed.


(One of the developers kept asking me how it can possibly take so long 
to delete 200 GB, when he can delete files of that size in much less 
time.  I had to explain to him that deleting rows from a database, is a 
far more complicated task, and can't really be compared to deleting a 
few files.)


In the end, it was agreed that we could execute the deletes over time, 
deleting items in the background, or in parallel with the application's 
work.  After all, if the disk is filling up at the rate of 2 GB/day, 
then so long as we delete 4 GB/day (which is pretty easy to do), we 
should be fine.  Adding RAM or another disk are simply out of the 
question, which is really a shame for a database of this size.


I should add that it was interesting/amusing to see the difference 
between the Unix and Windows philosophies.  Each time I would update my 
pl/pgsql functions, the Windows guys would wrap it into a string, inside 
of a .NET program, which then needed to be compiled, installed, and run. 
 (Adding enormous overhead to our already long testing procedure.)  I 
finally managed to show them that we could get equivalent functionality, 
with way less overhead, by just running psql -f FILENAME.  This version 
doesn't have fancy GUI output, but it works just fine...


I always tell people that PostgreSQL is not just a great database, but a 
fantastic, helpful community.  Thanks to everyone for their suggestions 
and advice.


Reuven

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-27 Thread Reuven M. Lerner

Hi, Shaun.  You wrote:



In the end, it was agreed that we could execute the deletes over
time, deleting items in the background, or in parallel with the
application's work. After all, if the disk is filling up at the rate
of 2 GB/day, then so long as we delete 4 GB/day (which is pretty easy
to do), we should be fine.


Please tell me you understand deleting rows from a PostgreSQL database
doesn't work like this. :) The MVCC storage system means you'll
basically just be marking all those deleted rows as reusable, so your
database will stop growing, but you'll eventually want to purge all the
accumulated dead rows.


Oh, I understand that all right.  I've had many, *many* conversations 
with this company explaining MVCC.  It doesn't seem to work; when they 
refer to vacuuming the database, I remind them that we have autovacuum 
working, to which they respond, Oh, we mean VACUUM FULL.  At which 
point I remind them that VACUUM FULL is almost certainly not what they 
want to do, and then they say, Yes, we know, but we still like to do it 
every so often.


From what I understand, the issue isn't one of current disk space, but 
rather of how quickly the disk space is being used up.  Maybe they want 
to reclaim disk space, but it's more crucial to stop the rate at which 
disk space is being taken.  If we were to delete all of the existing 
rows, and let vacuum mark them as dead and available for reuse, then 
that would probably be just fine.


I wouldn't be surprised if we end up doing a CLUSTER at some point.  The 
problem is basically that this machine is in 24/7 operation at 
high-speed manufacturing plants, and the best-case scenario is for a 
4-hour maintenance window.  I've suggested that we might be able to help 
the situation somewhat by attaching a portable USB-based hard disk, and 
adding a new tablespace that'll let us keep running while we divide up 
the work that the disk is doing, but they've made it clear that the 
current hardware configuration cannot and will not change.  Period.


So for now, we'll just try to DELETE faster than we INSERT, and combined 
with autovacuum, I'm hoping that this crisis will be averted.  That 
said, the current state of affairs with these machines is pretty 
fragile, and I think that we might want to head off such problems in the 
future, rather than be surprised by them.


Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-26 Thread Reuven M. Lerner

Hi again, everyone.

Wow, I can't get over how helpful everyone has been.

Shaun wrote:


The main problem you're going to run into is that your table is larger than the 
memory in that server. 4GB is really pretty small for a server hosting a 200+GB 
database. That they didn't mean it to get that big doesn't really help you 
clean it up.


Yep!  And as you pointed out later in you note, PostgreSQL isn't the 
only thing running on this computer.  There's also a full-fledged 
Windows application normally running on it.  And the nature of the 
manufacturing, black-box context means that maintenance is supposed to 
be rare, and that anything which gets us off of a 24/7 work schedule is 
enormously expensive.


This has been a fun problem to fix, for sure...  We're not there yet, 
but I feel like we're really close.


I'm currently trying a hybrid approach, based on several suggestions 
that were posted to this list:


Given that during this maintenance operation, nothing else should 
running, I'm going to bump up the shared_buffers.  Even after we run our 
maintenance, the fact that shared_buffers was so ridiculously low 
couldn't be helping anything, and I'll push it up.


I finally remembered why I had such a tortured set of subselects in my 
original query: If you're going to do a query with LIMIT in it, you had 
better be sure that you know what you're doing, because without an ORDER 
BY clause, you might be in for surprises.  And sure enough, in our 
testing, I realized that when we asked the database for up to 5 rows, we 
were getting the same rows again and again, thus stopping after it 
deleted a few bunches of rows.


So I changed tactics somewhat, and it appears to be working much, much 
faster: I first created a table (not a temp table, simply because my 
functions are getting invoked by the .NET application in a new 
connection each time, and I obviously don't want my table to go away) 
with the IDs of the R table that are older than n days old.This 
table has about 200,000 rows in it, but each column is an int, so it's 
pretty small.


I then have a separate function that takes a parameter, the chunk size. 
 I loop through the table created in the first function 
(old_report_ids), deleting all of the records in the B table that 
references the R table.  I then remove the row from the old_report_ids 
table, and then loop again, until I've reached the chunk size.  There 
are undoubtedly more elegant ways to do this, but we just gotta get it 
working at this point. :-)


We're about to test this, but from my small tests on my computer, it ran 
much, much faster than other options.  We'll see what happens when we 
try it now on the 200 GB monster...


Reuven

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-25 Thread Reuven M. Lerner

Hi, everyone.  Jeff wrote:


Is this 9 hours run time for deleting one day worth of data, or for
deleting the entire accumulation of cruft that filled up the hard
drive in the first place (which would be 170 days, if you have 200GB
that accumulated at 1GB per day and you only need 30 days) ?


Unfortunately, it took 9 hours to delete all of the rows associated with 
the older-than-30-days records.


Reuven
--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-24 Thread Reuven M. Lerner

Hi, everyone.  Samuel wrote:


What is work_mem set to?  If all the other values were set so low, I'd
expect work_mem to also be small, which could be causing all kind of
disk activity when steps don't fit into a work_mem segment.


I just checked, and work_mem is set to 30 MB.  That seems a bit low to 
me, given the size of the database and the fact that we're doing so much 
sorting and subselecting.  Am I right that we should push that up some more?


Reuven
--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-24 Thread Reuven M. Lerner

Hi, everyone.  Shaun said:

On 02/24/2012 12:39 AM, Reuven M. Lerner wrote:


- CPU is virtually idle when running the deletes, using about 1% of
CPU


I think you found your problem.

See if you can get the Windows admins to give you some info on how busy
the disks are (percent utilization, IOPS, something) the next time you
try this. Increasing your memory settings may help, but a 1% CPU usage
usually suggests it's waiting for disk blocks to be read before it can
actually do something.


I asked them for disk readings, but I'm not sure how to contextualize 
the numbers I got:


I/O writes: process1: 820,000, process2: 1Milion Process3: 33,000

Any suggestions for what I can do to improve performance with such a 
slow disk, and a lack of additional RAM?


Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Very long deletion time on a 200 GB database

2012-02-23 Thread Reuven M. Lerner
Hi, everyone.  I'm maintaining an application that exists as a black 
box in manufacturing plants.  The system is based on Windows, .NET, and 
PostgreSQL 8.3.  I'm a Unix kind of guy myself, but the application 
layer and system administration are being handled by other people; I'm 
just the PostgreSQL guy.


Because of the nature of the application, we don't have direct control 
over what happens.  And it turns out that at one installation, we're 
quickly running out of disk space.  The database is already taking up 
about 200 GB of space, and is growing by 1 GB or so a day.  Switching 
disks, either to a larger/faster traditional drive, or even to a SSD, is 
not an option.  (And yes, I know that SSDs have their own risks, but I'm 
just throwing that out as one option.)


Right now, the best solution to the space problem is to delete 
information associated with old records, where old is from at least 30 
days ago.  The old records are spread across a few tables, including 
many large objects.  (The application was written by people who were new 
to PostgreSQL, and didn't realize that they could use BYTEA.) 
Basically, given a foreign key B.a_id that points to table A, I want to 
DELETE all in B where A's creation date is at least 30 days ago.


Unfortunately, when we implemented this simple delete, it executed 
slower than molasses, taking about 9 hours to do its thing.   Not only 
does this seem like a really, really long time to do such deleting, but 
we have only a 4-hour window in which to run this maintenance activity, 
before the factory starts to use our black box again.


I've tried a few approaches so far, none of which have been hugely 
successful.  The fact that it takes several hours to test each theory is 
obviously a bit of a pain, and so I'm curious to hear suggestions from 
people here.


I should note that my primary concern is available RAM.  The database, 
as I wrote, is about 200 GB in size, and PostgreSQL is reporting 
(according to Windows) use of about 5 GB RAM, plus another 25 GB of 
virtual memory.  I've told the Windows folks on this project that 
virtual memory kills a database, and that it shouldn't surprise us to 
have horrible performance if the database and operating system are both 
transferring massive amounts of data back and forth.  But there doesn't 
seem to be a good way to handle this


This is basically what I'm trying to execute:

DELETE FROM  B
WHERE r_id IN (SELECT R.id
 FROM R, B
WHERE r.end_date  (NOW() - (interval '1 day' * 30))
  AND r.id = b.r_id

(1) I tried to write this as a join, rather than a subselect.  But B has 
an oid column that points to large objects, and on which we have a rule 
that removes the associated large object when a row in B is removed. 
Doing the delete as a join resulted in no such large object with an oid 
of xxx errors.  (I'm not sure why, although it might have to do with 
the rule.)


(2) I tried to grab the rows that *do* interest me, put them into a 
temporary table, TRUNCATE the existing table, and then copy the rows 
back.   I only tested that with a 1 GB subset of the data, but that took 
longer than other options.


(3) There are some foreign-key constraints on the B table.  I thought 
that perhaps doing a mass DELETE was queueing up all of those 
constraints, and possibly using up lots of memory and/or taking a long 
time to execute.  I thus rewrote my queries such that they first removed 
the constraints, then executed the DELETE, and then restored the 
constraints.  That didn't seem to improve things much either, and took a 
long time (30 minutes) just to remove the constraints.  I expected 
re-adding the constraints to take a while, but shouldn't removing them 
be relatively quick?


(4) I tried chunking the deletes, such that instead of trying to 
delete all of the records from the B table, I would instead delete just 
those associated with 100 or 200 rows from the R table.  On a 1 GB 
subset of the data, this seemed to work just fine.  But on the actual 
database, it was still far too slow.


I've been surprised by the time it takes to delete the records in 
question.  I keep trying to tell the others on this project that 
PostgreSQL isn't inherently slow, but that a 200 GB database running on 
a non-dedicated machine, with an old version (8.3), and while it's 
swapping RAM, will be slow regardless of the database software we're 
using.  But even so, 9 hours to delete 100 GB of data strikes me as a 
very long process.


Again, I continue to believe that given our hard time deadlines, and the 
fact that we're using a large amount of virtual memory, that there isn't 
really a solution that will work quickly and easily.  But I'd be 
delighted to be wrong, and welcome any and all comments and suggestions 
for how to deal with this.


Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

--
Sent via pgsql-performance

Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-23 Thread Reuven M. Lerner
Hi, everyone.  Thanks for all of the help and suggestions so far; I'll 
try to respond to some of them soon.  Andrew wrote:



How about:

DELETE FROM B
WHERE r_id IN (SELECT distinct R.id
FROM R WHERE r.end_date (NOW() - (interval '1 day' * 30))

?



Or possibly without the DISTINCT. But I agree that the original query
shouldn't have B in the subquery - that alone could well make it crawl.


I put B in the subquery so as to reduce the number of rows that would be 
returned, but maybe that was indeed backfiring on me.  Now that I think 
about it, B is a huge table, and R is a less-huge one, so including B in 
the subselect was probably a mistake.




What is the distribution of end_dates? It might be worth running this in
several steps, deleting records older than, say, 90 days, 60 days, 30 days.


I've suggested something similar, but was told that we have limited time 
to execute the DELETE, and that doing it in stages might not be possible.


Reuven


--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-23 Thread Reuven M. Lerner

Hi, everyone.

So it turns out that we're not using 25 GB of virtual memory.  (That's 
what I had been shown yesterday, and it was a bit surprising, to say the 
least...)


A few statistics that I managed to get from the Windows 
developers/system administrators:


- The machine has a total of 3.5 GB of RAM
- shared_buffers was set to 256 MB (yes, MB!)
- Virtual memory usage by our process is 3 MB (yes, MB)
- CPU is virtually idle when running the deletes, using about 1% of CPU
- No other processes are accessing the database when we're running the 
maintenance; there are a total of three server processes, but two are idle.


(I was a bit surprised, to say the least, by the low number on 
shared_buffers, given that I believe it's one of the first things I told 
them to increase about 18 months ago.)


As for Tom's point about rules, I know that rules are bad, and I'm not 
sure why the system is using a rule rather than a trigger.I'll see 
if I can change that to a trigger, but I have very indirect control over 
the machines, and every change requires (believe it or not) writing a 
.NET program that runs my changes, rather than just a textual script 
that deploys them.


The only foreign keys are from the B table (i.e., the table whose 
records I want to remove) to other tables.  There are no REFERENCES 
pointing to the B table.  That said, I hadn't realized that primary keys 
and indexes can also delay the DELETE.


For the latest round of testing, I quadrupled shared_buffers to 1 GB, 
turned off hash joins (as suggested by someone), and also simplified the 
query (based on everyone's suggestions).  In the tests on my own 
computer (with a somewhat random 1 GB snapshot of the 200 GB database), 
the simplified query was indeed much faster, so I'm optimistic.


Several people suggested that chunking the deletes might indeed help, 
which makes me feel a bit better.  Unfortunately, given the time that it 
takes to run the queries, it's hard to figure out the right chunk size. 
 Whoever suggested doing it in time slices had an interesting idea, but 
I'm not sure if it'll be implementable given our constraints.


Thanks again to everyone for your help.  I'll let you know what happens...

Reuven

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Triggering autovacuum

2011-06-11 Thread Reuven M. Lerner
Hi, Greg.  Thanks for the quick and useful answer, even if it means that 
my hopes for a quick fix have been dashed.  I guess I'll need to do some 
actual monitoring, then...


Reuven


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Triggering autovacuum

2011-06-09 Thread Reuven M. Lerner
Hi, everyone.  Some people with whom I'm working, and who have an 8.3 
system running under Windows, asked me to look into their performance 
issues.  They have a 1.5 GB database with a few dozen tables, and 
500,000 records at most.  They said that their system has been running 
for a few days, doing lots of INSERTs and SELECTs, and that the 
performance has gotten worse and worse over time.  (I don't have numbers 
to share.)  It's true that the computer is being used by other processes 
as part of a black-box manufacturing system, but those are pretty 
constant in CPU, disk, and memory needs, so I don't think that we would 
expect to see degradation over time as a result of that work.


I looked at the system, and found that we need to change 
effective_cache_size, such that it'll match the system cache number in 
the Windows performance monitor.   So yes, we'll take care of that, and 
I expect to see some improvement.


But the really surprising thing to me was that autovacuum hadn't run at 
all in the last three days.  I checked, and the autovacuum parameter 
was set in postgresql.conf, and using show in psql shows me that it 
was set.  But when I looked at pg_stat_user_tables, there was no 
indication of autovacuum *ever* having run.We also fail to see any 
autovacuum processes in the Windows process listing.


Could this be because we're only doing INSERTs and SELECTs?  In such a 
case, then we would never reach the threshold of modified tuples that 
autovacuum looks for, and thus it would never run.  That would, by my 
reasoning, mean that we'll never tag dead tuples (which isn't a big deal 
if we're never deleting or updating rows), but also that we'll never run 
ANALYZE as part of autovacuum.  Which would mean that we'd be running 
with out-of-date statistics.


I ran a manual vacuum analyze, by the way, and it's taking a really 
long time (1.5 hours, as of this writing) to run, but it's clearly doing 
something.  Moreover, when we went to check on our vacuum process after 
about an hour, we saw that autovacuum had kicked in, and was now 
running.  Could it be that our manual invocation of vacuum led to 
autovacuum running?


I have a feeling that our solution is going to have to involve a cron 
type of job, running vacuum at regular intervals (like in the bad old 
days), because autovacuum won't get triggered.  But hey, if anyone has 
any pointers to offer on this topic, I'd certainly appreciate it.


Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up loops in pl/pgsql function

2011-06-02 Thread Reuven M. Lerner

Hi, Merlin.  You wrote:


select string_agg(v, '') from (select
ascii(regexp_split_to_table('abc', $$\s*$$))::text as v) q;
Wow.   I've been programming with pl/pgsql for a good number of years, 
but only now do I see the amazing usefulness of regexp_split_to_table 
and string_agg, neither of which I really used until now.  Thanks for 
both the solution and for opening my eyes.

(what about 3 digit ascii codes?)
I have to put the number into a text field anyway, so I've been 
converting the resulting number to text, and then using lpad to add 
leading zeroes as necessary.


Thanks again,

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up loops in pl/pgsql function

2011-05-26 Thread Reuven M. Lerner

Wow.

Color me impressed and grateful.  I've been working on a different 
project today, but I'll test these tonight.


I'll never underestimate the regexp functionality in PostgreSQL again!

Reuven

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Speeding up loops in pl/pgsql function

2011-05-25 Thread Reuven M. Lerner
Hi, everyone.  I'm working on a project that's using PostgreSQL 8.3, 
that requires me to translate strings of octal digits into strings of 
characters -- so '141142143' should become 'abc', although the database 
column containing this data (both before and after) is a bytea.



While the function I've written is accurate, it turns out that it's also 
ridiculously slow.  I've managed to speed it up a fair amount, to twice 
what it was previously doing, by folding a helper function into a main 
one, and appending to an array (which I then join into a string at the 
end of the function) instead of concatenating a string onto itself time 
after time.



I realize that pl/pgsql is not a good choice for doing this sort of 
task, and that another language -- say, one with direct support for 
octal digits, or with built-in, speedy array functions such as pop() and 
push()  -- would be a better choice.  But that's not an option at this 
point.



I should also note that I'm not manipulating a huge amount of data 
here.  We're talking about 300 or so rows, each of which contains about 
250 KB of data.  (Hmm, could the problem be that I'm constantly forcing 
the system to compress and uncompress the data in TOAST?  I hadn't 
thought of that until just now...)



I thus have two basic questions:


(1) Are there any good guidelines for what operations in pl/pgsql are 
optimized for which data structures?  For example, it turns out that a 
great deal of time is being spent in the substring() function, which 
surprised me.  I thought that by switching to an array, it might be 
faster, but that wasn't the case, at least in my tests.  Having a sense 
of what I should and shouldn't be trying, and which built-in functions 
are particularly fast or slow, would be useful to know.



(2) Is there any configuration setting that would (perhaps) speed things 
up a bit?  I thought that maybe work_mem would help, but the 
documentation didn't indicate this at all, and sure enough, nothing 
really changed when I increased it.



Of course, any suggestions for how to deal with octal digits in 
PostgreSQL 8.3, such as an octal equivalent to the x'ff' syntax, would 
be more than welcome.



Thanks in advance,


Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up loops in pl/pgsql function

2011-05-25 Thread Reuven M. Lerner

Hi, Alex.  You wrote:

Have you tried something like:
SELECT  encode(regexp_replace('141142143', '(\d{3})', '\\\1',
'g')::bytea, 'escape');
Hmm, forgot about regexp_replace.  It might do the trick, but without a 
full-blown eval that I can run on the replacement side, it'll be a bit 
more challenging.  But that's a good direction to consider, for sure.



I think select E'\XXX' is what you are looking for (per the fine
manual: http://www.postgresql.org/docs/current/static/datatype-binary.html)
I didn't think that I could (easily) build a string like that from 
digits in a variable or a column, but I'll poke around and see if it can 
work.


Thanks,

Reuven


--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speeding up loops in pl/pgsql function

2011-05-25 Thread Reuven M. Lerner

Hi, everyone.  Merlin wrote:


let's see the source.  I bet we can get this figured out.


Here you go... it looked nicer before I started to make optimizations; 
I've gotten it to run about 2x as fast as the previous version, but now 
I'm sorta stuck, looking for further optimizations, including possible 
use of builtin functions.


Thanks for any suggestions you can offer.

CREATE OR REPLACE FUNCTION translate_octals_into_decimals(bytea_string 
BYTEA) RETURNS BYTEA AS $$

DECLARE
  bytea_string_length INTEGER := length(bytea_string);
  current_substring TEXT := '';
  translated_string_array BYTEA[];

  output_number INTEGER := 0;
  output_number_text TEXT := '';
  current_digit TEXT := '';
BEGIN
  RAISE NOTICE '[translate_octals_into_decimals] start at %, string of 
length %', clock_timestamp(), pg_size_pretty(length(bytea_string));


  FOR i IN 1..length(bytea_string) BY 3 LOOP
current_substring := substring(bytea_string from i for 3);

output_number := 0;

FOR j IN 0..(length(current_substring) - 1) LOOP
  current_digit := substring(current_substring from 
(length(current_substring) - j) for 1);

  output_number := output_number + current_digit::integer * (8 ^ j);
END LOOP;

output_number_text = lpad(output_number::text, 3, '0');

IF output_number_text::int = 92 THEN
  translated_string_array := array_append(translated_string_array, 
E''::bytea);

ELSIF output_number_text::int = 0 THEN
  translated_string_array := array_append(translated_string_array, 
E'\\000'::bytea);

ELSE
  translated_string_array := array_append( translated_string_array, 
chr(output_number_text::integer)::bytea );

END IF;

  END LOOP;

  RETURN array_to_string(translated_string_array, '');
END;
$$ LANGUAGE 'plpgsql';

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] SQL functions vs. PL/PgSQL functions

2010-10-13 Thread Reuven M. Lerner
Hi, everyone.  I'm working with a client to try to optimize their use of
PostgreSQL.  They're running 8.3 on a Windows platform, packaged as part
of a physical product that is delivered to customers.

We're planning to upgrade to 9.0 at some point in the coming months, but
this question is relevant for 8.3 (and perhaps beyond).

All of the database-related logic for this application is in server-side
functions, written in PL/PgSQL.  That is, the application never issues a
SELECT or INSERT; rather, it invokes a function with parameters, and the
function handles the query.   It's not unusual for a function to invoke
one or more other PL/PgSQL functions as part of its execution.

Since many of these PL/PgSQL functions are just acting as wrappers around
queries, I thought that it would be a cheap speedup for us to change some
of them to SQL functions, rather than PL/PgSQL.  After all, PL/PgSQL is (I
thought) interpreted, whereas SQL functions can be inlined and handled
directly by the optimizer and such.

We made the change to one or two functions, and were rather surprised to
see the performance drop by quite a bit.

My question is whether this is somehow to be expected.  Under what
conditions will SQL functions be slower than PL/PgSQL functions?  Is there
a heuristic that I can/should use to know this in advance?  Does it matter
if the SELECT being executed operates against a table, or a PL/PgSQL
function?

Thanks in advance for any insights everyone can offer.

Reuven

-- 
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] SQL functions vs. PL/PgSQL functions

2010-10-13 Thread Reuven M. Lerner
 Wow.  Thanks so much to all of you for the thoughtful and helpful 
responses!


Reuven

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Backup strategies

2008-10-14 Thread Reuven M. Lerner
I'm running a medium-traffic Web site that has been running for a few 
years, and which uses about four PostgreSQL databases on a regular 
basis.  I'm currently running 8.2, although I'm planning to upgrade to 
8.3 in the coming week or two, in part because of the problems that I'm 
having.  The databases consume a combined total of 35 GB.  Like a good 
boy, I've been backing the system up overnight, when we have less 
traffic, since the site began to run.  I use pg_dump to back up, saving 
both schemas and data for a full restore in case of  failure.  pg_dump 
typically executes from another machine on a local network; if it would 
help to run pg_dump locally, then I'm certainly open to doing that.


Over the last month or two, database performance has become increasingly 
problematic during the hours that I run pg_dump.  Moreover, the size of 
the database has gotten to the point where it takes a good number of 
hours to dump everything to disk.  This ends up interfering with our 
users on the East Coast of the United States, when they access our site 
early in the morning.


One possible solution is for me to backup our main database more 
regularly, and our development database less regularly.  But given the 
growth in our traffic (about double what it was 12 months ago), I have 
to assume that this isn't a long-term solution. 

I'm also considering taking our oldest data and sticking into a separate 
database (sort of a data warehouse), so that the production database 
becomes smaller, and thus easier to back up.


But before I do any of these things, I want to hear what others have 
discovered in terms of high-performance backups.  Is there a way to stop 
pg_dump from locking up the database so much?  Is there a knob that I 
can turn to do a low-priority backup while the live site is running?  Is 
there a superior backup strategy than pg_dump every 24 hours?


Thanks in advance for any advice you can offer!

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance